-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Adjust coordinate handling #1560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 18fcf17 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 14 files
Greptile SummaryThis PR refactors coordinate handling for Google's computer use models to respect user-configured viewport dimensions. Previously, coordinate normalization used hardcoded 1288x711 dimensions, causing misalignment when users specified custom viewport sizes or when Key Changes:
Architecture:
Both paths now properly use the configured viewport dimensions, ensuring clicks land at the correct coordinates regardless of user viewport settings. Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant V3
participant GoogleCUA
participant Handler
participant Tools
participant Browser
Note over User,Browser: Pure CUA Mode (Google returns click_at actions)
User->>GoogleCUA: execute(instruction)
GoogleCUA->>GoogleCUA: Get screenshot
GoogleCUA->>Handler: updateClientViewport()
Handler->>V3: isAdvancedStealth?
V3-->>Handler: true/false
Handler->>V3: configuredViewport
V3-->>Handler: {width, height}
Handler->>GoogleCUA: setViewport(width, height)
GoogleCUA->>GoogleCUA: Model returns click_at(x: 0-1000, y: 0-1000)
GoogleCUA->>GoogleCUA: normalizeCoordinates(x, y)
Note over GoogleCUA: Uses currentViewport<br/>(from configured or stealth)
GoogleCUA->>Handler: executeAction({type: "click", x, y})
Handler->>Browser: click(x, y)
Note over User,Browser: Hybrid Mode (Google calls click tool as function)
User->>GoogleCUA: execute(instruction) with tools
GoogleCUA->>GoogleCUA: Model calls click tool function
GoogleCUA->>Tools: click.execute({coordinates: [x, y]})
Tools->>Tools: processCoordinates(x, y, provider, v3)
Tools->>V3: isAdvancedStealth?
V3-->>Tools: true/false
Tools->>V3: configuredViewport
V3-->>Tools: {width, height}
Tools->>Tools: normalizeGoogleCoordinates(x, y, viewport)
Tools->>Browser: click(normalized_x, normalized_y)
|
| public get isAdvancedStealth(): boolean { | ||
| return ( | ||
| this.opts.browserbaseSessionCreateParams?.browserSettings | ||
| ?.advancedStealth === true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might not return an accurate value whenever the session is created with the bb sdk and the session id is passed into stagehand (just fyi not blocking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah shit good point
Why
Google's computer use models return coordinates in a normalized 0-1000 range, which must be converted to actual pixel coordinates for browser interactions. Previously, this normalization used hardcoded viewport dimensions that didn't account for:
localBrowserLaunchOptionsorbrowserbaseSessionCreateParamsThis caused coordinate misalignment issues where clicks would land in the wrong location, especially when using non-default viewport sizes or advancedStealth mode.
What Changed
Core Changes
v3.tsisAdvancedStealthgetter to check if advancedStealth is enabled in Browserbase settingsconfiguredViewportgetter to retrieve viewport dimensions from user config (with 1288x711 fallback)coordinateNormalization.tsv3instance to access configurationv3.configuredViewportfor normal mode (respects user's viewport settings)1288x711for advancedStealth mode (matches what stealth produces)v3CuaAgentHandler.tsupdateClientViewport()to set correct dimensions for Google CUA:getPNGDimensionsfunction andsetScreenshotSizecallsGoogleCUAClient.tsnormalizeCoordinates()to use justcurrentViewport(no more screenshot size tracking)actualScreenshotSizeproperty andsetScreenshotSizemethodCleanup
OpenAICUAClient.ts&MicrosoftCUAClient.tsactualScreenshotSizeproperty andsetScreenshotSizemethodTool Updates
Updated all coordinate-using tools to pass
v3toprocessCoordinates():click.tstype.tsscroll.tsclickAndHold.tsdragAndDrop.tsfillFormVision.tsTest Plan
Manual Testing
Summary by cubic
Fixes misaligned clicks by converting Google CUA’s 0–1000 coordinates to the correct viewport: user-configured in normal mode and fixed 1288×711 in advancedStealth. Simplifies the path from model coordinates to browser actions for consistent interactions.
Bug Fixes
Refactors
Written for commit 18fcf17. Summary will update on new commits.