Superclass is a powerful document analysis tool that combines advanced text extraction with AI-powered classification. It supports multiple document formats and provides both a CLI and HTTP server interface.
- PDF documents
- Microsoft Office (DOCX, XLSX, PPTX)
- OpenDocument (ODT)
- Images (with OCR)
- SVG files (with text extraction)
- HTML files
- Markdown files
- EPUB ebooks
- RTF documents
- Plain text files
- Multiple AI providers supported:
- OpenAI (GPT-4, GPT-3.5)
- Anthropic (Claude)
- Azure OpenAI
- Classification features:
- Category detection
- Predefined categories support
- Confidence scoring
- Content summarization
- Keyword extraction
- Model comparison capabilities
- Command-line interface
- HTTP server mode
- Docker support
The image is available on GitHub Container Registry:
# Basic usage
docker pull ghcr.io/adaptive-scale/superclass:latest
# Run with minimal configuration
docker run -p 8083:8083 \
-e OPENAI_API_KEY=your_openai_key \
ghcr.io/adaptive-scale/superclass:latest
# Run with common configuration
docker run -p 8083:8083 \
-e PORT=8083 \
-e LOG_LEVEL=debug \
-e MODEL_TYPE=gpt-4 \
-e MODEL_PROVIDER=openai \
-e MAX_COST=0.1 \
-e MAX_LATENCY=30 \
-e OPENAI_API_KEY=your_openai_key \
-v /path/to/local/uploads:/tmp/superclass-uploads \
ghcr.io/adaptive-scale/superclass:latest
# Using environment file
docker run -p 8083:8083 \
--env-file .env \
ghcr.io/adaptive-scale/superclass:latest
For all available environment variables and their descriptions, see the Configuration section.
Supported architectures:
- linux/amd64 (x86_64)
- linux/arm64 (Apple Silicon, AWS Graviton)
- Go 1.19 or later
- Docker with buildx support (for multi-arch builds)
- Make
- Tesseract OCR (for image support)
# Build local binary
make build
# Run tests
make test
# Build and push multi-arch Docker image
export GITHUB_TOKEN=your_github_token
export GITHUB_USER=your_github_username
make docker-login
make docker-buildx
# Create a release
VERSION=v1.0.0 make release
Available make targets:
make help # Show all available targets
Common targets:
- make build: Build local binary
- make test: Run tests
- make docker-build: Build Docker image for local architecture
- make docker-buildx: Build and push multi-arch Docker images
- make release VERSION=v1.0.0: Create and push a new release
Environment variables:
- REGISTRY: Container registry (default: ghcr.io)
- REPOSITORY: Image repository (default: adaptive-scale/superclass)
- TAG: Image tag (default: latest)
- PLATFORMS: Target platforms (default: linux/amd64,linux/arm64)
- GITHUB_TOKEN: GitHub personal access token
- GITHUB_USER: GitHub username
Classify a document:
# Basic classification
curl -X POST -F "file=@/path/to/document.pdf" http://localhost:8080/classify
Response:
{
"category": "Technical Documentation",
"confidence": 0.95,
"summary": "This document describes...",
"keywords": ["keyword1", "keyword2"],
"raw_text": "Optional extracted text..."
}
Health check endpoint:
curl http://localhost:8080/health
- PORT: Server port (default: 8083)
- UPLOAD_DIR: Directory for temporary file uploads (default: /tmp/superclass-uploads)
- LOG_LEVEL: Logging level (default: debug)
- MODEL_TYPE: AI model to use (default: gpt-4)
- MODEL_PROVIDER: AI provider to use (default: openai)
- MAX_COST: Maximum cost per request (default: 0.1)
- MAX_LATENCY: Maximum latency in seconds (default: 30)
- PREDEFINED_CATEGORIES: Comma-separated list of allowed categories (e.g., "Technology,Business,Science")
- ENFORCE_CATEGORIES: Whether to strictly enforce predefined categories (default: false)
- OPENAI_API_KEY: OpenAI API key for GPT models
- ANTHROPIC_API_KEY: Anthropic API key for Claude models
- AZURE_OPENAI_API_KEY: Azure OpenAI API key for Azure deployments
- REGISTRY: Container registry (default: ghcr.io)
- REPOSITORY: Image repository (default: adaptive-scale/superclass)
- TAG: Image tag (default: latest)
- PLATFORMS: Target platforms for multi-arch builds (default: linux/amd64,linux/arm64)
- GITHUB_TOKEN: GitHub personal access token for GHCR authentication
- GITHUB_USER: GitHub username for GHCR authentication
- VERSION: Version tag for releases (e.g., v1.0.0)
Example .env file:
# Server Configuration
PORT=8083
LOG_LEVEL=debug
UPLOAD_DIR=/tmp/superclass-uploads
# Model Configuration
MODEL_TYPE=gpt-4
MODEL_PROVIDER=openai
MAX_COST=0.1
MAX_LATENCY=30
# Classification Configuration
PREDEFINED_CATEGORIES=Technology,Business,Science,Health,Entertainment
ENFORCE_CATEGORIES=true
# API Keys
OPENAI_API_KEY=your_openai_key
# ANTHROPIC_API_KEY=your_anthropic_key
# AZURE_OPENAI_API_KEY=your_azure_key
Example Docker Compose environment:
services:
superclass:
environment:
# Server Configuration
- PORT=8083
- LOG_LEVEL=debug
# Model Configuration
- MODEL_TYPE=gpt-4
- MODEL_PROVIDER=openai
- MAX_COST=0.1
- MAX_LATENCY=30
# Classification Configuration
- PREDEFINED_CATEGORIES=Technology,Business,Science
- ENFORCE_CATEGORIES=true
# API Keys
- OPENAI_API_KEY=${OPENAI_API_KEY}
# - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
# - AZURE_OPENAI_API_KEY=${AZURE_OPENAI_API_KEY}
Available models:
- OpenAI:
- gpt-4
- gpt-4-turbo
- gpt-3.5-turbo
- Anthropic:
- claude-3-opus
- claude-3-sonnet
- claude-3-haiku
- Azure OpenAI: (depends on deployment)
When using predefined categories:
- Set PREDEFINED_CATEGORIES to a comma-separated list of categories
- Optionally set ENFORCE_CATEGORIES=true to ensure only predefined categories are returned
- Categories can also be specified per-request in the API call
Example using predefined categories:
# Using environment variables
export PREDEFINED_CATEGORIES="Technology,Business,Science,Health,Entertainment"
export ENFORCE_CATEGORIES=true
docker-compose up
# Or in docker-compose.yml
services:
superclass:
environment:
- PREDEFINED_CATEGORIES=Technology,Business,Science
- ENFORCE_CATEGORIES=true
- Go 1.19 or later
- Tesseract OCR (for image support)
- Required dependencies:
Build options:
- --dev: Development build
- --race: Enable race condition detection
- --debug: Include debug information
- Fork the repository
- Create your feature branch (git checkout -b feature/amazing-feature)
- Commit your changes (git commit -m 'Add amazing feature')
- Push to the branch (git push origin feature/amazing-feature)
- Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.