Swama is a high-performance machine learning runtime written in pure Swift, designed specifically for macOS and built on Apple's MLX framework. It provides a powerful and easy-to-use solution for local LLM (Large Language Model) and VLM (Vision Language Model) inference.
- 🚀 High Performance: Built on Apple MLX framework, optimized for Apple Silicon
- 🔌 OpenAI Compatible API: Standard /v1/chat/completions endpoint support
- 📱 Menu Bar App: Elegant macOS native menu bar integration
- 💻 Command Line Tools: Complete CLI support for model management and inference
- 🖼️ Multimodal Support: Support for both text and image inputs
- 📦 Smart Model Management: Automatic downloading, caching, and version management
- 🔄 Streaming Responses: Real-time streaming text generation support
- 🌍 HuggingFace Integration: Direct model downloads from HuggingFace Hub
Swama features a modular architecture design:
- SwamaKit: Core framework library containing all business logic
- Swama CLI: Command-line tool providing complete model management and inference functionality
- Swama.app: macOS menu bar application with graphical interface and background services
- macOS 14.0 or later
- Apple Silicon (M1/M2/M3)
- Xcode 15.0+ (for compilation)
- Swift 6.1+
-
Download the latest release
- Go to Releases
- Download Swama.zip from the latest release
- Extract the zip file
-
Install the app
# Move to Applications folder mv Swama.app /Applications/ # Launch the app open /Applications/Swama.appNote: On first launch, macOS may show a security warning. If this happens:
- Go to System Preferences > Security & Privacy > General
- Click "Open Anyway" next to the Swama app message
- Or right-click the app and select "Open" from the context menu
-
Install CLI tools
- Open Swama from the menu bar
- Click "Install Command Line Tool…" to add swama command to your PATH
For developers who want to build from source:
After installing Swama.app, you can use either the menu bar app or command line:
✨ Smart Features:
- Model Aliases: Use friendly names like qwen3, llama3.2, deepseek-r1 instead of long URLs
- Auto-Download: Models are automatically downloaded on first use - no need to pull first!
- Cache Management: Downloaded models are cached for future use
| qwen3 | mlx-community/Qwen3-8B-4bit | Qwen3 8B (default) |
| qwen3-1.7b | mlx-community/Qwen3-1.7B-4bit | Qwen3 1.7B (lightweight) |
| llama3.2 | mlx-community/Llama-3.2-3B-Instruct-4bit | Llama 3.2 3B (default) |
| llama3.2-1b | mlx-community/Llama-3.2-1B-Instruct-4bit | Llama 3.2 1B (fastest) |
| deepseek-r1 | mlx-community/DeepSeek-R1-0528-4bit | DeepSeek R1 (reasoning) |
| deepseek-coder | mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx | DeepSeek Coder |
| qwen2.5 | mlx-community/Qwen2.5-7B-Instruct-4bit | Qwen 2.5 7B |
Swama provides a fully OpenAI-compatible API endpoint, allowing you to use it with existing tools and integrations:
Since Swama provides OpenAI-compatible endpoints, you can easily integrate it with popular community tools:
🤖 AI Coding Assistants:
💬 Chat Interfaces:
🔧 Development Tools:
📊 Popular Integrations:
- Langchain/LlamaIndex: Use OpenAI provider with custom base URL
- AutoGen: Configure as OpenAI endpoint for multi-agent conversations
- Semantic Kernel: Add as OpenAI chat completion service
- Flowise/Langflow: Connect via OpenAI node with custom endpoint
- Anything: Any tool supporting OpenAI API can connect to Swama!
Swama supports convenient aliases for popular models. Use these short names instead of full model URLs:
- --temperature <value>: Sampling temperature (0.0-2.0)
- --top-p <value>: Nucleus sampling parameter (0.0-1.0)
- --max-tokens <number>: Maximum number of tokens to generate
- --repetition-penalty <value>: Repetition penalty factor
Swama supports vision language models and can process image inputs:
- swift-nio - High-performance networking framework
- swift-argument-parser - Command-line argument parsing
- mlx-swift - Apple MLX Swift bindings
- mlx-swift-examples - MLX Swift examples and models
We welcome community contributions! Please follow these steps:
- Fork this repository
- Create a feature branch (git checkout -b feature/amazing-feature)
- Commit your changes (git commit -m 'Add some amazing feature')
- Push to the branch (git push origin feature/amazing-feature)
- Open a Pull Request
- Follow Swift coding style guidelines
- Add tests for new features
- Update relevant documentation
- Ensure all tests pass
This project is licensed under the MIT License - see the LICENSE file for details.
- Apple MLX team for the excellent machine learning framework
- Swift NIO for high-performance networking support
- All contributors and community members
- TODO
Swama - Bringing the best local AI experience to macOS users 🚀
.png)
