We trained an SLM assistant for automatic Python documentation - a Qwen3 0.6B parameter model that generates complete, properly formatted docstrings for your code in Google style. Run it locally, keeping your proprietary code secure!
First, install Ollama, following the instructions on their website.
Then set up the virtual environment:
Available models hosted on huggingface:
Finally, download the models from huggingface and build them locally:
Next, we load the model and your Python file. By default we load the downloaded Qwen3 0.6B model and generate Google-style docstrings.
The tool will generate an updated file with _documented suffix (e.g., your_script_documented.py).
The assistant can generate docstrings for:
- Functions: Complete parameter descriptions, return values, and raised exceptions
- Methods: Instance and class method documentation with proper formatting. The tool skips double underscore (dunder: __xxx) methods.
Feel free to run them yourself using the files in examples
Simply provide any Python file with functions or classes that need documentation:
The tool will:
- Parse your Python file using AST
- Identify all functions and methods without docstrings
- Generate appropriate docstrings based on the code structure
- Preserve all original code and existing docstrings
- Output a new file with _documented suffix
Note: The tool only adds docstrings where they're missing. Existing docstrings are never modified or overwritten.
The tuned models were trained using knowledge distillation, leveraging the teacher model GPT-OSS-120B. The data+config+script used for finetuning can be found in finetuning. We used 28 Python functions and classes as seed data and supplemented them with 10,000 synthetic examples covering various domains (data science, web development, utilities, algorithms).
We compare the teacher model and the student model on 250 held-out test examples using LLM-as-a-judge evaluation:
| GPT-OSS (thinking) | 120B | 0.81 +/- 0.02 |
| Qwen3 0.6B (tuned) | 0.6B | 0.76 +/- 0.01 |
| Qwen3 0.6B (base) | 0.6B | 0.55 +/- 0.04 |
Evaluation Criteria:
- LLM-as-a-judge: The training config file and train/test data splits are available under data/.
Privacy & Security: Proprietary codebases contain intellectual property, trade secrets, and sensitive business logic. Sending your code to cloud APIs for documentation creates:
- IP exposure risks
- Compliance violations (GDPR, SOC 2, etc.)
- Security audit failures
- Dependency on external services
Speed & Cost: Process entire codebases in minutes without API rate limits or per-token charges.
Q: Why don't we just use GPT-4/Claude API for this?
Because your proprietary code shouldn't leave your infrastructure. Cloud APIs create security risks, compliance issues, and ongoing costs. Our models run locally with comparable quality.
Q: Can I document existing docstrings or update them?
Currently, the tool only adds missing docstrings. Updating existing documentation is planned for future releases. For now, you can manually remove docstrings you want regenerated.
Q: Which docstring style can I use?
- Google: Most readable, great for general Python projects
Q: The model does not work as expected
A: The tool calling on our platform is in active development! Follow us on LinkedIn for updates, or join our community. You can also manually refine any generated docstrings.
Q: Can you train a model for my company's documentation standards?
A: Visit our website and reach out to us, we offer custom solutions tailored to your coding standards and domain-specific requirements.
Q: Does this support type hints or other Python documentation tools?
A: Type hints are parsed and incorporated into docstrings. Integration with tools like pydoc, Sphinx, and MkDocs is on our roadmap.
Next Steps: We're working on git integration to automatically document all modified functions in a commit, making documentation truly seamless in your development workflow.
.png)



