This project provides a reverse proxy for GitHub Copilot, exposing OpenAI-compatible endpoints for use with tools and clients that expect the OpenAI API. It follows the authentication and token management approach used by OpenCode.
- OAuth Device Flow Authentication: Secure authentication with GitHub Copilot using the same flow as OpenCode
- Advanced Token Management:
- Proactive token refresh (refreshes at 20% of token lifetime, minimum 5 minutes)
- Exponential backoff retry logic for failed token refreshes
- Automatic fallback to full re-authentication when needed
- Detailed token status monitoring
- Robust Request Handling:
- Automatic retry with exponential backoff for chat completions (3 attempts)
- Network error recovery and rate limiting handling
- 30-second request timeout protection
- OpenAI-Compatible API: Exposes /v1/chat/completions and /v1/models endpoints
- Request/Response Transformation: Handles model name mapping and ensures OpenAI compatibility
- Configurable Port: Default port 8081, configurable via CLI or config file
- Health Monitoring: /health endpoint for service monitoring
- Graceful Shutdown: Proper signal handling and graceful server shutdown
- Comprehensive Logging: Request/response logging for debugging and monitoring
- Enhanced CLI Commands: Status monitoring, manual token refresh, and detailed configuration display
- Production-Ready Performance: HTTP connection pooling, circuit breaker, request coalescing, and memory optimization
- Monitoring & Profiling: Built-in pprof endpoints for memory, CPU, and goroutine analysis
Pre-built binaries are available for each release on the Releases page.
Available platforms:
- Linux: AMD64, ARM64
- macOS: AMD64 (Intel), ARM64 (Apple Silicon)
- Windows: AMD64, ARM64
Releases are automatically created when code is merged to the main branch:
- Version numbers follow semantic versioning (starting from v0.0.1)
- Cross-platform binaries are built and attached to each release
- Release notes include download links for all supported platforms
This service includes enterprise-grade performance optimizations:
- Connection Pooling: Shared HTTP client with configurable connection limits (100 max idle, 20 per host)
- Configurable Timeouts: Fully customizable timeout settings via config.json for all server operations
- Streaming Support: Read (30s), Write (300s), and Idle (120s) timeouts optimized for AI chat streaming
- Long Response Handling: HTTP client and proxy context timeouts support up to 300s (5 minutes) for extended AI conversations
- Request Limits: 5MB request body size limit to prevent memory exhaustion
- Advanced Transport: Configurable dial timeout (10s), TLS handshake timeout (10s), keep-alive (30s)
- Circuit Breaker: Automatic failure detection and recovery (5 failure threshold, 30s timeout)
- Context Propagation: Request contexts with 25s timeout and proper cancellation
- Request Coalescing: Deduplicates identical concurrent requests to models endpoint
- Exponential Backoff: Enhanced retry logic with circuit breaker integration
- Worker Pool: Concurrent request processing with dedicated worker goroutines (CPU*2 workers)
- Buffer Pooling: sync.Pool for request/response buffer reuse to reduce GC pressure
- Memory Optimization: Streaming support with 32KB buffers for large responses
- Graceful Shutdown: Proper resource cleanup and coordinated shutdown with worker pool termination
- Shared Clients: Centralized HTTP client eliminates resource duplication
- Worker Pool Management: Automatic worker lifecycle management with graceful termination
- Profiling Endpoints: /debug/pprof/* for memory, CPU, and goroutine analysis
- Enhanced Logging: Circuit breaker state, request coalescing, worker pool metrics, and performance data
- Health Monitoring: Detailed /health endpoint for load balancer integration
- Production Metrics: Built-in support for operational monitoring and worker pool status
If you have make installed, you can build, run, and test the project easily:
| run | Start the proxy server |
| auth | Authenticate with GitHub Copilot using device flow |
| status | Show detailed authentication and token status |
| config | Display current configuration details |
| models | List all available AI models |
| refresh | Manually force token refresh |
| version | Show version information |
| help | Show usage information |
The status command now provides detailed token information:
Example output:
Status indicators:
- ✅ Token is healthy: Token has plenty of time remaining
- ⚠️ Token will be refreshed soon: Token is approaching refresh threshold
- ❌ Token needs refresh: Token has expired or will expire very soon
Once running, the proxy exposes these OpenAI-compatible endpoints:
The proxy implements proactive token management to minimize authentication interruptions:
- Proactive Refresh: Tokens are refreshed when 20% of their lifetime remains (typically 5-6 minutes before expiration for 25-minute tokens)
- Retry Logic: Failed token refreshes are retried up to 3 times with exponential backoff (2s, 8s, 18s delays)
- Fallback Authentication: If token refresh fails completely, the system falls back to full device flow re-authentication
- Background Monitoring: Token status is continuously monitored during API requests
Chat completion requests are automatically retried to handle transient failures:
- Automatic Retries: Up to 3 attempts for failed requests
- Smart Retry Logic: Only retries on network errors, server errors (5xx), rate limiting (429), and timeouts (408)
- Exponential Backoff: Retry delays of 1s, 4s, 9s to avoid overwhelming the API
- Timeout Protection: 30-second timeout per request attempt
The configuration is stored in ~/.local/share/github-copilot-svcs/config.json:
- port: Server port (default: 8081)
- github_token: GitHub OAuth token for Copilot access
- copilot_token: GitHub Copilot API token
- expires_at: Unix timestamp when the Copilot token expires
- refresh_in: Seconds until token should be refreshed (typically 1500 = 25 minutes)
All timeout values are specified in seconds and have sensible defaults:
| http_client | 300 | HTTP client timeout for outbound requests to GitHub Copilot API |
| server_read | 30 | Server timeout for reading incoming requests |
| server_write | 300 | Server timeout for writing responses (increased for streaming) |
| server_idle | 120 | Server timeout for idle connections |
| proxy_context | 300 | Request context timeout for proxy operations |
| circuit_breaker | 30 | Circuit breaker recovery timeout when API is failing |
| keep_alive | 30 | TCP keep-alive timeout for HTTP connections |
| tls_handshake | 10 | TLS handshake timeout |
| dial_timeout | 10 | Connection dial timeout |
| idle_conn_timeout | 90 | Idle connection timeout in connection pool |
Streaming Support: The service is optimized for long-running streaming chat completions with timeouts up to 300 seconds (5 minutes) to support extended AI conversations.
Custom Configuration: You can copy config.example.json as a starting point and modify timeout values based on your environment:
The authentication follows GitHub Copilot's OAuth device flow:
- Device Authorization: Generates a device code and user code
- User Authorization: User visits GitHub and enters the user code
- Token Exchange: Polls for GitHub OAuth token
- Copilot Token: Exchanges GitHub token for Copilot API token
- Automatic Refresh: Refreshes Copilot token as needed
The proxy automatically maps common model names to GitHub Copilot models:
| gpt-4o, gpt-4.1 | As specified | OpenAI |
| o3, o3-mini, o4-mini | As specified | OpenAI |
| claude-3.5-sonnet, claude-3.7-sonnet, claude-3.7-sonnet-thought | As specified | Anthropic |
| claude-opus-4, claude-sonnet-4 | As specified | Anthropic |
| gemini-2.5-pro, gemini-2.0-flash-001 | As specified |
Supported Model Categories:
- OpenAI GPT Models: GPT-4o, GPT-4.1, O3/O4 reasoning models
- Anthropic Claude Models: Claude 3.5/3.7 Sonnet variants, Claude Opus/Sonnet 4
- Google Gemini Models: Gemini 2.0/2.5 Pro and Flash models
- Tokens are stored securely in the user's home directory with restricted permissions (0700)
- All communication with GitHub Copilot uses HTTPS
- No sensitive data is logged
- Automatic token refresh prevents long-lived token exposure
Apache License 2.0 - see LICENSE file for details.
This is free software: you are free to change and redistribute it under the terms of the Apache 2.0 license.
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
For issues and questions:
- Check the troubleshooting section
- Review the logs for error messages
- Open an issue with detailed information about your setup and the problem
.png)


