Guides

How to Automate Any Android App With Natural Language

A complete step-by-step guide to automating Android apps by describing the goal in plain language — no locators, no scripts. Includes prompt tips, handling logins and popups, and troubleshooting.

A

Auten Team

May 31, 20268 min read
A glowing cyan voice waveform turning into a tap gesture on a phone

Traditional automation makes you spell out every step. With an AI agent you describe the outcome and let it work out the steps. This guide walks through the entire workflow with Auten — from connecting a device to a reliable, repeatable automation — plus the prompt techniques and troubleshooting that separate a one-off demo from something you run every day.

Step 1 — Connect a device

Pair your own Android phone with the Auten APK over USB, or provision a hosted virtual device in the dashboard. Either way, the device registers against your API key and appears in your device list with a live screen you can watch and even tap yourself.

Step 2 — Write the task as a goal, not steps

Skip the step-by-step. State the end result you want, and name the success condition so the agent can verify it. "Open Gmail and tell me the subject of the latest unread email" is better than a list of taps, because it survives layout changes and lets the agent confirm it actually succeeded.

bash
curl -X POST https://relay.auten.ai/v1/tasks \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "device_serial": "<serial>",
    "prompt": "Open Gmail and tell me the subject of the latest unread email"
  }'

Step 3 — Watch it work

In the dashboard you can watch the live screen as the agent opens the app, reads it, and acts. Each action is logged, so when something is off you can see exactly what it did and why. This visibility is invaluable while you are tuning a new automation.

Writing prompts that work

A few techniques dramatically improve reliability:

  • Name the end state. "...and save the list in Notes" gives the verifier something concrete to check.
  • Be specific about ambiguity. If there are two "Settings" buttons, say which one ("the gear icon in the top right").
  • One goal per task. Chain multiple tasks rather than packing five unrelated goals into one prompt.
  • State the data shape you want back. "Return title and price for the top 5 results" yields structured output.

Real apps throw interruptions. The agent treats popups, cookie banners, permission dialogs, and rating prompts as ordinary screens to pass through, not roadblocks. For tasks that need credentials, store them encrypted per-device so they are only revealed at the moment of use — never put passwords in the prompt itself.

Security tip

Per-device credentials are encrypted at rest and only decrypted at the instant the agent needs to type them. Keep secrets out of prompts and logs by storing them once and referencing them by service name.

Step 4 — Let it learn

After a successful run, Auten extracts a clean plan — the minimal sequence that achieved the goal. The next similar task replays that plan deterministically: instant, and with no AI cost. Your routine automations get faster and cheaper the more you run them.

Troubleshooting common issues

  • "It tapped the wrong thing." Add a disambiguating detail to the prompt, or name the target by its visible label.
  • "It stopped early." Make the success condition explicit so the verifier knows what "done" means.
  • "It cannot log in." Confirm credentials are stored for that device and referenced by the right service name.
  • "An app blocks it." Some hardened apps detect automation; test your specific target before committing to it.

Good first automations

  • Check a number or status inside an app (unread count, balance, order state)
  • Fill and submit a form
  • Navigate to a screen and extract a value
  • Post or send a message
  • Run a daily check and webhook the result somewhere

Ready to do it from code instead of curl? See getting started with the @autenai/sdk.

Try Auten

Grab an API key at auten.ai, connect a phone or spin up a hosted virtual device, and send your first natural-language task in minutes. The free tier needs no credit card.

Share this article