Skip to content

Conversation

@tkattkat
Copy link
Collaborator

@tkattkat tkattkat commented Jan 16, 2026

Why

Google's computer use models return coordinates in a normalized 0-1000 range, which must be converted to actual pixel coordinates for browser interactions. Previously, this normalization used hardcoded viewport dimensions that didn't account for:

  1. User-configured viewports - Users can specify custom viewport sizes in localBrowserLaunchOptions or browserbaseSessionCreateParams
  2. advancedStealth mode

This caused coordinate misalignment issues where clicks would land in the wrong location, especially when using non-default viewport sizes or advancedStealth mode.


What Changed

Core Changes

v3.ts

  • Added isAdvancedStealth getter to check if advancedStealth is enabled in Browserbase settings
  • Added configuredViewport getter to retrieve viewport dimensions from user config (with 1288x711 fallback)

coordinateNormalization.ts

  • Now accepts v3 instance to access configuration
  • Uses v3.configuredViewport for normal mode (respects user's viewport settings)
  • Uses hardcoded 1288x711 for advancedStealth mode (matches what stealth produces)

v3CuaAgentHandler.ts

  • Updated updateClientViewport() to set correct dimensions for Google CUA:
    • advancedStealth → 1288x711
    • normal → configured viewport
  • Removed unnecessary getPNGDimensions function and setScreenshotSize calls

GoogleCUAClient.ts

  • Simplified normalizeCoordinates() to use just currentViewport (no more screenshot size tracking)
  • Removed actualScreenshotSize property and setScreenshotSize method

Cleanup

OpenAICUAClient.ts & MicrosoftCUAClient.ts

  • Removed actualScreenshotSize property and setScreenshotSize method
  • Reverted coordinate scaling to use original viewport-based logic

Tool Updates

Updated all coordinate-using tools to pass v3 to processCoordinates():

  • click.ts
  • type.ts
  • scroll.ts
  • clickAndHold.ts
  • dragAndDrop.ts
  • fillFormVision.ts

Test Plan

Manual Testing

  • Local with default viewport: Run agent task, verify clicks land correctly
  • Local with custom viewport (e.g., 1920x1080): Verify coordinate normalization uses configured size
  • Browserbase without advancedStealth: Verify uses configured viewport
  • Browserbase with advancedStealth: Verify uses 1288x711 for normalization

Summary by cubic

Fixes misaligned clicks by converting Google CUA’s 0–1000 coordinates to the correct viewport: user-configured in normal mode and fixed 1288×711 in advancedStealth. Simplifies the path from model coordinates to browser actions for consistent interactions.

  • Bug Fixes

    • Use the configured viewport (local or Browserbase) for normalization.
    • Use 1288×711 when advancedStealth is enabled to match stealth behavior.
    • Apply the correct viewport to Google CUA so clicks, scrolls, and drags land accurately.
  • Refactors

    • Removed screenshot-size tracking and PNG parsing; clients now scale from currentViewport only.
    • Added v3.isAdvancedStealth and v3.configuredViewport; coordinate utils now consume v3.
    • All coordinate-using tools pass v3 to processCoordinates; handler sets Google CUA viewport based on mode.

Written for commit 18fcf17. Summary will update on new commits.

@changeset-bot
Copy link

changeset-bot bot commented Jan 16, 2026

🦋 Changeset detected

Latest commit: 18fcf17

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 14 files

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 16, 2026

Greptile Summary

This PR refactors coordinate handling for Google's computer use models to respect user-configured viewport dimensions. Previously, coordinate normalization used hardcoded 1288x711 dimensions, causing misalignment when users specified custom viewport sizes or when advancedStealth mode was enabled.

Key Changes:

  • Added configuredViewport and isAdvancedStealth getters to V3 class to expose viewport configuration from user settings
  • Updated coordinateNormalization.ts to accept v3 instance and use configured viewport (normal mode) or hardcoded 1288x711 (advancedStealth mode)
  • Simplified GoogleCUAClient.normalizeCoordinates() to directly map 0-1000 → viewport dimensions, removing the complex screenshot size tracking
  • Removed actualScreenshotSize property and setScreenshotSize() method from all CUA clients (Google, OpenAI, Microsoft)
  • Updated all coordinate-using tools (click, type, scroll, dragAndDrop, clickAndHold, fillFormVision) to pass v3 instance to processCoordinates()
  • Removed getPNGDimensions() helper function from v3CuaAgentHandler.ts

Architecture:
The refactor correctly handles two coordinate normalization paths:

  1. Pure CUA mode: Google model returns click_at actions → GoogleCUAClient.normalizeCoordinates() → handler execution
  2. Hybrid mode: Google model calls tool functions → tool's processCoordinates() → browser action

Both paths now properly use the configured viewport dimensions, ensuring clicks land at the correct coordinates regardless of user viewport settings.

Confidence Score: 5/5

  • This PR is safe to merge - it's a well-structured refactoring that simplifies coordinate handling while fixing viewport configuration issues
  • The refactor removes complexity by eliminating screenshot dimension tracking and instead uses configured viewport directly. The logic is sound: advancedStealth gets hardcoded 1288x711, normal mode gets user-configured viewport. All coordinate paths (CUA mode and hybrid mode) are properly updated. No breaking changes or edge case issues detected.
  • No files require special attention

Important Files Changed

Filename Overview
packages/core/lib/v3/v3.ts Added isAdvancedStealth and configuredViewport getters to expose viewport configuration from user settings, enabling proper coordinate normalization
packages/core/lib/v3/agent/utils/coordinateNormalization.ts Updated to accept v3 instance and use configured viewport (or stealth viewport) for Google coordinate normalization instead of hardcoded dimensions
packages/core/lib/v3/handlers/v3CuaAgentHandler.ts Removed complex screenshot dimension tracking, simplified to use configured viewport for GoogleCUA coordinate normalization, removed unnecessary getPNGDimensions function
packages/core/lib/v3/agent/GoogleCUAClient.ts Simplified coordinate normalization to directly map from Google's 0-1000 range to viewport dimensions, removed actualScreenshotSize tracking and setScreenshotSize method
packages/core/lib/v3/agent/OpenAICUAClient.ts Removed screenshot size tracking and coordinate scaling logic, reverted to simpler approach that spreads action properties directly
packages/core/lib/v3/agent/MicrosoftCUAClient.ts Removed actualScreenshotSize tracking, now uses resizedViewport consistently for coordinate transformation

Sequence Diagram

sequenceDiagram
    participant User
    participant V3
    participant GoogleCUA
    participant Handler
    participant Tools
    participant Browser

    Note over User,Browser: Pure CUA Mode (Google returns click_at actions)
    User->>GoogleCUA: execute(instruction)
    GoogleCUA->>GoogleCUA: Get screenshot
    GoogleCUA->>Handler: updateClientViewport()
    Handler->>V3: isAdvancedStealth?
    V3-->>Handler: true/false
    Handler->>V3: configuredViewport
    V3-->>Handler: {width, height}
    Handler->>GoogleCUA: setViewport(width, height)
    GoogleCUA->>GoogleCUA: Model returns click_at(x: 0-1000, y: 0-1000)
    GoogleCUA->>GoogleCUA: normalizeCoordinates(x, y)
    Note over GoogleCUA: Uses currentViewport<br/>(from configured or stealth)
    GoogleCUA->>Handler: executeAction({type: "click", x, y})
    Handler->>Browser: click(x, y)

    Note over User,Browser: Hybrid Mode (Google calls click tool as function)
    User->>GoogleCUA: execute(instruction) with tools
    GoogleCUA->>GoogleCUA: Model calls click tool function
    GoogleCUA->>Tools: click.execute({coordinates: [x, y]})
    Tools->>Tools: processCoordinates(x, y, provider, v3)
    Tools->>V3: isAdvancedStealth?
    V3-->>Tools: true/false
    Tools->>V3: configuredViewport
    V3-->>Tools: {width, height}
    Tools->>Tools: normalizeGoogleCoordinates(x, y, viewport)
    Tools->>Browser: click(normalized_x, normalized_y)
Loading

public get isAdvancedStealth(): boolean {
return (
this.opts.browserbaseSessionCreateParams?.browserSettings
?.advancedStealth === true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might not return an accurate value whenever the session is created with the bb sdk and the session id is passed into stagehand (just fyi not blocking)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah shit good point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants