Blog link: https://tinyurl.com/4xp5ms5s
By Zengyi Qin from MIT. 01/23/2025.
Author's twitter: https://x.com/qinzytech Author's homepage: https://www.qinzy.tech
Background: Our MIT team has developed an internal Agent benchmark for computer-use agents. We tested OpenAI Operator and show 5 cases here. We did not cherrypick but Operator simply failed in all the 5 tasks. See below for details.
BTW - Our MIT team is collaborating with data vendors to collect a hundred-billion-token scale pre-training data for computer-use. If you are interested in what we are doing, welcome to contact.
Get a image from google. Open the image, then apply a 20% decrease in brightness and a 15% increase in contrast.
Failure reason: entered the wrong number
Operator screen recording (the video may fail to play on mobile. use computer instead):
https://operator.chatgpt.com/v/6792f1f5e18c8190879571cd580ce717
Create a new solid color layer with #0000FF, then apply the Outer Glow effect with a 10px size.
Failure reason: does not know how to use online tools
Operator screen recording:
https://operator.chatgpt.com/v/6792f1ffc6248190b0e2d5e257f1369c