Show HN: Open-source SDK to generate UI-based CUA tools or RPA scripts on macOS

2 hours ago 2

A powerful automation SDK for macOS that enables seamless workflow recording, skill execution, and intelligent task automation through desktop accessibility APIs and browser integration. Now includes MCP (Model Context Protocol) server support to create custom UI-based tools for computer use agents.

Features
Architecture
Quick Start
Installation
Usage
API Reference
Chrome Extension
Examples
Contributing
License
Contact

Native macOS accessibility API integration
System-wide UI element interaction
Window management and application control
Event recording and playback

Playwright integration with user's Chrome installation
Robust Chrome profile management
Cross-browser compatibility

Workflow Recording & Playback

Real-time action recording for both desktop and browser
JSON-based workflow serialization
Intelligent skill generation from recorded actions
Automated workflow optimization

Built-in MCP (Model Context Protocol) server for computer use agents
Automatic workflow-to-tool conversion for AI integration
Custom UI automation tools accessible via MCP protocol
LLM-powered task execution and workflow generation
Adaptive skill execution with error handling (WIP)

Clean Python API with type hints
Comprehensive CLI interface
Modular architecture for easy extension
Rich debugging and logging capabilities

Sisypho SDK follows a modular architecture designed for flexibility and extensibility:

sisypho/ ├── corelib/ # Core automation utilities │ ├── browser.py # Playwright browser management │ ├── os_utils.py # macOS system utilities │ └── user.py # User interaction helpers ├── execution/ # Workflow execution engine │ ├── recording.py # Action recording system │ └── skill.py # Skill execution framework ├── integrations/ # Platform-specific integrations │ ├── macos/ # macOS accessibility servers │ ├── chrome/ # Chrome extension bridge │ └── windows/ # Windows support (WIP) ├── agentic/ # AI-powered automation │ ├── generator.py # Workflow generation │ └── tools.py # MCP tools and verification ├── mcp_server.py # MCP server for computer use agents └── cli.py # Command-line interface

macOS 11.0+ (required)
Python 3.10+ (required)
Google Chrome (for browser automation)

For detailed installation instructions and troubleshooting, see INSTALL.md.

import asyncio from sisypho.utils import RecorderContext, await_task_completion, Workflow async def main(): task_prompt = "Open Chrome and navigate to GitHub" # Record a workflow with RecorderContext() as recorder: await_task_completion() recording = recorder.get_recording() workflow = Workflow(recording, task_prompt) await workflow.generate_code() result = workflow.run_workflow() workflow.save() # Run the async function asyncio.run(main())

Sisypho provides a comprehensive command-line interface for workflow creation and execution:

# Create a workflow from natural language description python -m sisypho create --task "open chrome and type hello" # Create with recording enabled python -m sisypho create --task "open chrome and type hello" --record # Specify output file python -m sisypho create --task "download file from website" --output workflow.json

# Run a saved workflow python -m sisypho run --workflow workflow.json # Run in interactive mode python -m sisypho run --interactive

# Launch MCP server with workflows from current directory python -m sisypho mcp # Launch MCP server with workflows from specific directory python -m sisypho mcp --workflow-directory ./my-workflows

Desktop automation leverages macOS accessibility APIs through natural language workflows:

import asyncio from sisypho.utils import RecorderContext, await_task_completion, Workflow async def desktop_automation_example(): task_prompt = "Open TextEdit, create a new document, and type 'Hello, World!'" # Record the desktop actions with RecorderContext() as recorder: await_task_completion() recording = recorder.get_recording() workflow = Workflow(recording, task_prompt) await workflow.generate_code() result = workflow.run_workflow() workflow.save("desktop_automation.json") asyncio.run(desktop_automation_example())

Browser automation uses Playwright with your existing Chrome installation through workflow recording:

import asyncio from sisypho.utils import RecorderContext, await_task_completion, Workflow async def browser_automation_example(): task_prompt = "Open Chrome, navigate to GitHub, and search for 'sisypho'" # Record the browser actions with RecorderContext() as recorder: await_task_completion() recording = recorder.get_recording() workflow = Workflow(recording, task_prompt) await workflow.generate_code() result = workflow.run_workflow() workflow.save("browser_automation.json") asyncio.run(browser_automation_example())

Record user actions for later playback and analysis:

import asyncio from sisypho.utils import RecorderContext, await_task_completion, Workflow async def recording_example(): task_prompt = "Record actions for email automation" with RecorderContext() as recorder: # Perform manual actions - they will be recorded await_task_completion() # Save the recording recording = recorder.get_recording() workflow = Workflow(recording, task_prompt) workflow.save("my_workflow.json") asyncio.run(recording_example())

Integrate with MCP servers for enhanced AI capabilities:

import asyncio from sisypho.utils import RecorderContext, await_task_completion, Workflow async def mcp_integration_example(): # Use natural language to describe complex workflows task_prompt = """Open Excel from Downloads, calculate total sales column, and paste the result into a new Apple Notes entry.""" with RecorderContext() as recorder: await_task_completion() recording = recorder.get_recording() workflow = Workflow(recording, task_prompt) # Generate optimized code using MCP servers await workflow.generate_code() result = workflow.run_workflow() workflow.save("mcp_workflow.json") asyncio.run(mcp_integration_example())

MCP Server Setup and Usage

Launch Sisypho as an MCP server to expose your workflows as tools for computer use agents:

# Start the MCP server python -m sisypho mcp --workflow-directory ./workflows

Connect from your computer use agent or MCP client:

# Example MCP client configuration from mcp.client import Client async def use_sisypho_tools(): # Connect to Sisypho MCP server client = Client() await client.connect("stdio", command=["python", "-m", "sisypho", "mcp"]) # List available workflow tools tools = await client.list_tools() print(f"Available automation tools: {[tool.name for tool in tools]}") # Execute a workflow tool result = await client.call_tool("run_workflow_0", {}) print(f"Workflow result: {result}")

Each workflow in your directory becomes an executable tool that computer use agents can call to perform UI automation tasks.

Custom UI Tools for Computer Use Agents

Sisypho SDK enables you to create reusable UI automation tools that computer use agents can leverage through the MCP protocol. Here's how workflows become powerful automation tools:

Workflow-to-Tool Conversion

When you launch the MCP server, Sisypho automatically:

Scans your workflow directory for .json workflow files
Registers each workflow as an MCP tool with its task description
Exposes the tools via the MCP protocol for agent consumption

# Directory structure ./automation-tools/ ├── gmail_automation.json # Becomes "run_workflow_0" tool ├── slack_notifications.json # Becomes "run_workflow_1" tool └── data_entry.json # Becomes "run_workflow_2" tool # Launch MCP server python -m sisypho mcp --workflow-directory ./automation-tools

Agent Integration Examples

Computer use agents can now call your custom UI tools:

# Agent discovers available tools tools = await mcp_client.list_tools() # Returns: [ # {"name": "run_workflow_0", "description": "Automate Gmail inbox management"}, # {"name": "run_workflow_1", "description": "Send Slack status updates"}, # {"name": "run_workflow_2", "description": "Fill customer data forms"} # ] # Agent executes UI automation result = await mcp_client.call_tool("run_workflow_0", {}) # Sisypho performs the recorded Gmail automation workflow

Create specialized tool libraries for different use cases:

# Business productivity tools ./business-tools/ ├── calendar_management.json ├── email_templates.json └── report_generation.json # Development workflow tools ./dev-tools/ ├── github_pr_workflow.json ├── deployment_checks.json └── code_review_automation.json # Customer service tools ./support-tools/ ├── ticket_routing.json ├── customer_onboarding.json └── feedback_collection.json

Each directory becomes a specialized MCP server providing domain-specific UI automation capabilities to computer use agents.

RecorderContext - Context manager for recording actions
await_task_completion() - Wait for user to complete manual actions during recording
Workflow(recording, task_prompt) - Main workflow orchestrator
- generate_code() - Generate automation code from recording and prompt
- run_workflow() - Execute the generated workflow
- save(path) - Save workflow to file
- load(path) - Load workflow from file

SkillExecutor - Main class for skill execution
- load_skill_from_file(path) - Load skill from Python file
- execute_skill_code(code) - Execute skill code directly

navigate(url) - Navigate to a URL
click_element(selector) - Click element by CSS selector
type_text(selector, text) - Type text into form field
getContent(rootNode) - Extract content from page

click(app_name, element_descriptor, is_right_click, is_double_click, duration) - Click UI element
type(app_name, text) - Type text in application
command(app_name, element_descriptor, modifier_keys, key) - Press keyboard command
open_app(app_name) - Open application on macOS

Install the Sisypho Chrome Extension to enable browser action recording:

Install from Chrome Web Store

The extension enables:

Real-time browser action recording
Seamless integration with Sisypho workflows
Visual feedback during recording sessions
Automatic workflow generation from browser interactions

Example 1: Automated Web Form Filling

import asyncio from sisypho.utils import RecorderContext, await_task_completion, Workflow async def web_form_automation(): task_prompt = """Navigate to example.com contact form, fill in name as 'John Doe', email as '[email protected]', message as 'Hello from Sisypho!', and submit the form.""" with RecorderContext() as recorder: await_task_completion() recording = recorder.get_recording() workflow = Workflow(recording, task_prompt) await workflow.generate_code() result = workflow.run_workflow() workflow.save("web_form_automation.json") asyncio.run(web_form_automation())

Example 2: Desktop Application Automation

import asyncio from sisypho.utils import RecorderContext, await_task_completion, Workflow async def desktop_app_automation(): task_prompt = """Open TextEdit, create a new document, type 'Automated document creation with Sisypho!', save the document as 'automated_document.txt'.""" with RecorderContext() as recorder: await_task_completion() recording = recorder.get_recording() workflow = Workflow(recording, task_prompt) await workflow.generate_code() result = workflow.run_workflow() workflow.save("desktop_automation.json") asyncio.run(desktop_app_automation())

Example 3: Complex Multi-App Workflow

import asyncio from sisypho.utils import RecorderContext, await_task_completion, Workflow async def multi_app_workflow(): task_prompt = """Open Excel from Downloads, calculate the total sales column, and paste the result into a new Apple Notes entry.""" with RecorderContext() as recorder: await_task_completion() recording = recorder.get_recording() workflow = Workflow(recording, task_prompt) await workflow.generate_code() result = workflow.run_workflow() workflow.save("multi_app_workflow.json") print("Multi-app workflow executed successfully!") asyncio.run(multi_app_workflow())

We welcome contributions! Please see our Contributing Guidelines for details.