Ai2 Drops MolmoWeb: An Open-Weight AI That Browses by Sight

Ai2's MolmoWeb navigates the web using screenshots instead of HTML parsing, available in 4B and 8B parameter sizes.

Ai2 Drops MolmoWeb: An Open-Weight AI That Browses by Sight

The Allen Institute for AI (Ai2) just launched MolmoWeb, an open-weight visual web agent that sees the internet the way you do — through screenshots.

Instead of parsing raw HTML like most browser agents, MolmoWeb looks at rendered browser screenshots to understand and interact with web pages. It ships in two sizes: 4B and 8B parameters.

The timing is sharp. Developers building browser agents right now are stuck choosing between closed APIs they can't peek inside or open-weight frameworks that lack a properly trained model. MolmoWeb aims to fill that gap — open weights, trained model, visual approach.

The screenshot-based method is notable. Skipping HTML parsing means the agent works regardless of how messy or dynamic a page's underlying code is. If you can see it in a browser, MolmoWeb can work with it.