Ai2 Drops MolmoWeb: An Open-Weight AI That Browses by Sight
Ai2's MolmoWeb navigates the web using screenshots instead of HTML parsing, available in 4B and 8B parameter sizes.
The Allen Institute for AI (Ai2) just launched MolmoWeb, an open-weight visual web agent that sees the internet the way you do — through screenshots.
Instead of parsing raw HTML like most browser agents, MolmoWeb looks at rendered browser screenshots to understand and interact with web pages. It ships in two sizes: 4B and 8B parameters.
The timing is sharp. Developers building browser agents right now are stuck choosing between closed APIs they can't peek inside or open-weight frameworks that lack a properly trained model. MolmoWeb aims to fill that gap — open weights, trained model, visual approach.
The screenshot-based method is notable. Skipping HTML parsing means the agent works regardless of how messy or dynamic a page's underlying code is. If you can see it in a browser, MolmoWeb can work with it.