Skip to content
COMPUTER USE

Screenshot. Click. Type. Automate any app.

Most of the tools you rely on live behind apps with no API. Your agent needs eyes, hands, and the patience to wait for the screen to settle.

ARCHITECTURE

HOW IT WORKS.

See, decide, act, verify. Every computer-use action completes that loop before the next one starts.

SCREENSHOT
VISION
CLICK / TYPE
WAIT FOR CHANGE
DEEP DIVE

DEEP DIVE.

Four building blocks for driving a desktop app without an API.

Full-screen or bounded-region screenshots, compressed to PNG and passed to downstream tools. The Planner can request a snapshot at any point in a conversation and feed it to the Vision module for analysis.
LIVE DEMO

LIVE DEMO.

A vision-guided click: screenshot, locate, gate, execute, wait for the screen to change, confirm.

computer-use.click
RECEIPTS

RECEIPTS.

8
ACTIONS
phase 2
llava
VISION MODEL
local
screen diff
PACING
no sleep()
per zone
APPROVAL
gate-enforced
CROSS-LINKS

WHERE NEXT.