Clickclickclick: Framework to enable autonomous, computer use using any LLM

4 months ago 8

ClickClickClick

A framework to enable autonomous android and computer use using any LLM (local or remote)

create a draft gmail to [email protected] and ask them if they are free for lunch on coming saturday at 1PM. Congratulate on the baby - write one para.

draft.gmail.to.rob.ask.for.lunch.n.congratulate.for.baby.mp4

Browse.Google.Maps.Find.Bus.Stops.mp4

start a 3+2 game on lichess

start.a.3+2.game.in.lichess.mp4

Currently supporting local models via Ollama (Llama 3.2-vision), Gemini, GPT 4o. The current code is highly experimental and will evolve in future commits. Please use at your own risk.

The best result currently comes from using GPT 4o/4o-mini as planner and Gemini Pro/Flash as finder.

This project needs adb to be installed on your local machine where the code is being executed.
Enable USB debugging on the android phone.
Python >= 3.11

For raspberry pi refer - raspberry-pi.md

Put your model specific settings in config/models.yaml and export the keys specified in the yaml file.

(Ensure OPENAI_API_KEY or ANTHROPIC_API_KEY or GEMINI_API_KEY API key is in the environment)

pip install git+https://github.com/BandarLabs/clickclickclick.git

One more way is to clone the repo and install it:

git clone https://github.com/BandarLabs/clickclickclick pip install .

Use from web interface (Gradio)

click3 gradio

By default, planner is openai and finder is gemini.

You can change the default configuration in config/models.yaml

Before running any tasks, you need to configure respective keys like OPENAI_API_KEY, ANTHROPIC_API_KEY and GEMINI_API_KEY in the environment.

Gemini Flash gives free 15 API calls - https://aistudio.google.com/apikey

To execute a task, use the run command. The basic usage is:

python main.py run <task-prompt>

--platform: Specifies the platform to use, either android or osx. Default is android.

python main.py run "example task" --platform=osx
--planner-model: Specifies the planner model to use, either openai, gemini, anthropic or ollama. Default is openai.

python main.py run "example task" --planner-model=gemini
--finder-model: Specifies the finder model to use, either openai, gemini, anthropic or ollama. Default is gemini.

python main.py run "example task" --finder-model=ollama

A full example command might look like:

python main.py run "Open Google news" --platform=android --planner-model=openai --finder-model=gemini

This endpoint executes a task based on the provided task prompt, platform, planner model, and finder model.

task_prompt (string): The prompt for the task that needs to be executed.
platform (string, optional): The platform on which the task is to be executed. Default is "android". Supported platforms: "android", "osx".
planner_model (string, optional): The planner model to be used for planning the task. Default is "openai". Supported models: "openai", "gemini", "ollama".
finder_model (string, optional): The finder model to be used for finding elements to interact with. Default is "gemini". Supported models: "gemini", "openai", "ollama".

200 OK:
- result (object): The result of the task execution.
400 Bad Request:
- detail (string): Description of why the request is invalid (e.g., unsupported platform, unsupported planner model, unsupported finder model).
500 Internal Server Error:
- detail (string): Description of the error that occurred during task execution.

curl -X POST "http://localhost:8000/execute" -H "Content-Type: application/json" -d '{ "task_prompt": "Open uber app", "platform": "android", "planner_model": "openai", "finder_model": "gemini" }'

Contributions are welcome! Please begin by opening an issue to discuss your ideas. Once the issue is reviewed and assigned, you can proceed with submitting a pull request.