Ai2 releases MolmoWeb, an open-weight visual web agent with 30K human task trajectories and a full training stack
Engineers building browser agents today face a choice between closed APIs they cannot inspect and open-weight frameworks with no trained model underneath them. Ai2 is now offering a third option.Th...
Source: venturebeat.com
Engineers building browser agents today face a choice between closed APIs they cannot inspect and open-weight frameworks with no trained model underneath them. Ai2 is now offering a third option.The Seattle-based nonprofit behind the open-source OLMo language models and Molmo vision-language family today is releasing MolmoWeb, an open-weight visual web agent available in 4 billion and 8 billion parameter sizes. Until now, no open-weight visual web agent shipped with the training data and pipeline needed to audit or reproduce it. MolmoWeb does. MolmoWebMix, the accompanying dataset, includes 30,000 human task trajectories across more than 1,100 websites, 590,000 individual subtask demonstrations and 2.2 million screenshot question-answer pairs — which Ai2 describes as the largest publicly released collection of human web-task execution ever assembled."Can you go from just passively understanding images, describing them and captioning them, to actually making them take action in som