Apple-on-device-OpenAI: OpenAI-compatible API server for Apple on-device models

4 months ago 5

A SwiftUI application that creates an OpenAI-compatible API server using Apple's on-device Foundation Models. This allows you to use Apple Intelligence models locally through familiar OpenAI API endpoints.

Use it in any OpenAI compatible app:

OpenAI Compatible API: Drop-in replacement for OpenAI API with chat completions endpoint
Streaming Support: Real-time streaming responses compatible with OpenAI's streaming format
On-Device Processing: Uses Apple's Foundation Models for completely local AI processing
Model Availability Check: Automatically checks Apple Intelligence availability on startup
🚧 Tool Using (WIP): Function calling capabilities for extended AI functionality

Why a GUI App Instead of Command Line?

This project is implemented as a GUI application rather than a command-line tool due to Apple's rate limiting policies for Foundation Models:

"An app that has UI and runs in the foreground doesn't have a rate limit when using the models; a macOS command line tool, which doesn't have UI, does."

— Apple DTS Engineer (Source)

Command-line tools hit rate limits very quickly (around 150+ requests), while GUI applications can make unlimited requests. This makes the GUI approach essential for any serious usage of Apple's on-device models.

⚠️ Important Note: You may still encounter rate limits due to current limitations in Apple FoundationModels. If you experience rate limiting, please restart the server.

⚠️ 重要提醒: 由于苹果 FoundationModels 当前的限制，您仍然可能遇到速率限制。如果遇到这种情况，请重启服务器。

macOS: 26.0+ (macOS 26 beta required)
Apple Intelligence: Must be enabled in Settings > Apple Intelligence & Siri
Xcode: 26.0+ (for building)

Clone the repository:

git clone https://github.com/yourusername/AppleOnDeviceOpenAI.git cd AppleOnDeviceOpenAI

Open the project in Xcode:

open AppleOnDeviceOpenAI.xcodeproj

Build and run the project in Xcode

Launch the app
Configure server settings (default: 127.0.0.1:11535)
Click "Start Server"
Server will be available at the configured address

Once the server is running, you can access these OpenAI-compatible endpoints:

GET /health - Health check
GET /status - Model availability and status
GET /v1/models - List available models
POST /v1/chat/completions - Chat completions (streaming and non-streaming)

curl -X POST http://127.0.0.1:11535/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "apple-on-device", "messages": [ {"role": "user", "content": "Hello, how are you?"} ], "temperature": 0.7, "stream": false }'

Using OpenAI Python client:

from openai import OpenAI # Point to your local server client = OpenAI( base_url="http://127.0.0.1:11535/v1", api_key="not-needed" # API key not required for local server ) response = client.chat.completions.create( model="apple-on-device", messages=[ {"role": "user", "content": "Hello, how are you?"} ], temperature=0.7, stream=True # Enable streaming ) for chunk in response: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")

You can use the included test script to verify the server is working correctly and see example usage patterns:

The test script will:

✅ Check server health and connectivity
✅ Verify model availability and status
✅ Test OpenAI SDK compatibility
✅ Run multi-turn conversations
✅ Test multilingual support (Chinese)
✅ Demonstrate streaming functionality

Make sure the server is running before executing the test script. The script provides comprehensive examples of how to interact with the API using both direct HTTP requests and the OpenAI Python SDK.

This server implements the OpenAI Chat Completions API with the following supported parameters: