Show HN: I built an AI Agent that uses the iPhone

1 day ago 6

This is an iPhone using Agent that uses OpenAI models to get things done on a phone, spanning across multiple apps, very similar to a human user. It was built during an OpenAI hackathon last year.

https://www.youtube.com/shorts/4rnv6dN-2Lg

Example prompts:

Click a new selfie and send it to {Contact name} with a haiku about the weekend
Download {app name} from the App Store
Send a message to {Contact name}: my flight is DL 1715 and Call an Uber X to SFO
Open Control Center and enable the torch

Clone the repo
Open the Xcode project
Open PhoneAgentUITests.swift and run the testLoop function
Paste your OpenAI API key, and input your command (text or voice)

The model can see an app's accessibilty tree
It can tap, swipe, scroll, type and open apps
You can follow up on a task by replying to the completion notification
You can talk to the agent using the microphone button
There is an optional Always On mode that listens for prompts starting with a wake word (Agent by default) even when the app is backgroundded. So you can say something like "Agent, open Settings"
The app persists your OpenAI API key securely on your device's keychain

iOS apps are sandboxed, so this project uses Xcode's UI testing harness to inspect and interact with apps and the system. (no jailbreak required).

The agent is powered by OpenAI's gpt-4.1 model. It is surprisingly good at using the iPhone just with the accessibility contents of an app. It access to these tools:

getting the contents of the current app
tapping on a UI element
typing in a text field
opening an app

The host app communicates with the UI test via a TCP Server to trigger prompts.

Keyboard input can be improved
Capturing the view hierarchy while an animation is inflight confuses the model
The model doesn't wait for long running tasks to complete, so it might give up prematurely.