Prompt: Open Tinder and perform 3 swipes for me.
SwipeRightToTinderTask.mp4
Other single tasks https://github.com/user-attachments/assets/39a507cc-439e-4b4d-b8cc-17e8b57d4365
This is an Android port of the iOS PhoneAgent that uses OpenAI models to control an Android phone through AI commands. It can interact with apps, tap buttons, fill forms, swipe, scroll, and perform complex multi-app workflows.
- Cross-app automation: Control any Android app using natural language
- Voice commands: Use speech-to-text for hands-free operation
- AI-powered: Uses OpenAI GPT models to understand and execute commands
- Accessibility service: Leverages Android's accessibility framework for system-wide control
- Gesture support: Tap, swipe, scroll, and type across apps
- Real-time UI analysis: Reads current screen content to make intelligent decisions
The Android PhoneAgent uses:
- Android Accessibility Service (instead of iOS UITest) for system-wide UI access
- OpenAI GPT-4 for command understanding and action planning
- Gesture API for performing taps, swipes, and scrolls
- UI Automator concepts for cross-app navigation
- Android device running API 24+ (Android 7.0+)
- OpenAI API key
- Development setup with Android Studio
-
Clone and build:
git clone <your-repo> cd AndroidPhoneAgent ./gradlew assembleDebug
-
Install on device:
adb install app/build/outputs/apk/debug/app-debug.apk
-
Enable Accessibility Service:
- Open Android Settings
- Go to Accessibility → PhoneAgent
- Enable the service
- Grant permissions when prompted
-
Configure API Key:
- Open the PhoneAgent app
- Enter your OpenAI API key when prompted
- The key is stored securely on your device
Basic Actions:
- "Open Settings"
- "Scroll down"
- "Tap the WiFi option"
- "Type 'Hello World' in the text field"
App Management:
- "Open Instagram and like the first post"
- "Download Spotify from the Play Store"
- "Send a message to John saying 'Running late'"
Complex Workflows:
- "Take a screenshot and share it via email"
- "Turn on airplane mode and then turn it off"
- "Open Google Maps and search for coffee shops nearby"
- Tap the microphone icon in the app
- Grant microphone permission if prompted
- Speak your command clearly
- The agent will transcribe and execute it
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ MainActivity │────│ PhoneAgent │────│ OpenAI API │
│ (UI Layer) │ │ (AI Brain) │ │ (GPT-4) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
│ ┌──────────────────┐
└──────────────│ AccessibilityService │
│ (UI Automation) │
└──────────────────┘
- PhoneAgentAccessibilityService: Captures UI hierarchy and performs gestures
- PhoneAgent: Communicates with OpenAI and orchestrates actions
- MainActivity: User interface for commands and status
- OpenAIService: Handles API communication with function calling
| Feature | iOS (UITest) | Android (Accessibility) |
|---|---|---|
| UI Access | XCTest framework | Accessibility Service |
| Gesture API | XCTest gestures | Android Gesture API |
| App Launching | URL schemes | Package manager |
| Permissions | Developer certificate | Accessibility permission |
| Background Mode | Limited | Full system access |
- ✅ Get current UI state
- ✅ Tap coordinates
- ✅ Click UI elements
- ✅ Type text
- ✅ Swipe gestures
- ✅ Scroll in any direction
- ✅ Open apps by package name
- ✅ Wait/delay actions
- Requires accessibility service permission (security consideration)
- Some apps may block accessibility interactions
- Performance depends on OpenAI API response time
- Limited to apps that properly implement accessibility
- May not work with games or heavily custom UIs
- API Key: Stored locally using Android Keystore
- UI Data: Only sent to OpenAI when commands are executed
- Permissions: Only requests necessary accessibility permissions
- Network: All communication with OpenAI is encrypted (HTTPS)
-
Setup:
git clone <repo> cd AndroidPhoneAgent
-
Build:
./gradlew assembleDebug
-
Test:
./gradlew test
- MVVM Pattern: Clean separation of UI and business logic
- Hilt DI: Dependency injection for testability
- Coroutines: Async operations and OpenAI API calls
- Compose UI: Modern Android UI framework
- Retrofit: HTTP client for OpenAI API
- Add action to
AutomationActionsealed class - Implement execution in
PhoneAgent.executeAction() - Add OpenAI function definition in
createFunctionDefinitions() - Update accessibility service if needed
Accessibility Service not working:
- Ensure it's enabled in Settings → Accessibility
- Check that the app has permission to access other apps
- Restart the app after enabling the service
Commands not working:
- Verify OpenAI API key is valid and has credits
- Check internet connection
- Ensure target app supports accessibility
- Try simpler commands first
- Fork the repository
- Create a feature branch
- Follow Android development best practices
- Add tests for new functionality
- Submit a pull request
MIT License - see LICENSE file for details
- This is experimental software for educational purposes
- Use responsibly and respect app terms of service
- The agent can perform actions on your behalf - use with caution
- Not affiliated with OpenAI or Google
- Original iOS PhoneAgent by https://github.com/rounak
- OpenAI for the GPT API
- Android Accessibility Service framework