Show HN: Distil-localdoc.py – SLM assistant for writing Python documentation

1 hour ago 1

drawing

We trained an SLM assistant for automatic Python documentation - a Qwen3 0.6B parameter model that generates complete, properly formatted docstrings for your code in Google style. Run it locally, keeping your proprietary code secure!

First, install Ollama, following the instructions on their website.

Then set up the virtual environment:

python -m venv .venv . .venv/bin/activate pip install huggingface_hub openai

Available models hosted on huggingface:

Finally, download the models from huggingface and build them locally:

hf download distil-labs/Distil-Localdoc-Qwen3-0.6B --local-dir distil-model cd distil-model ollama create localdoc_qwen3 -f Modelfile

Next, we load the model and your Python file. By default we load the downloaded Qwen3 0.6B model and generate Google-style docstrings.

python localdoc.py --file your_script.py

The tool will generate an updated file with _documented suffix (e.g., your_script_documented.py).

The assistant can generate docstrings for:

  • Functions: Complete parameter descriptions, return values, and raised exceptions
  • Methods: Instance and class method documentation with proper formatting. The tool skips double underscore (dunder: __xxx) methods.

Feel free to run them yourself using the files in examples

def calculate_total(items, tax_rate=0.08, discount=None): subtotal = sum(item['price'] * item['quantity'] for item in items) if discount: subtotal *= (1 - discount) return subtotal * (1 + tax_rate)
def calculate_total(items, tax_rate=0.08, discount=None): """ Calculate the total cost of items, applying a tax rate and optionally a discount. Args: items: List of item objects with price and quantity tax_rate: Tax rate expressed as a decimal (default 0.08) discount: Discount rate expressed as a decimal; if provided, the subtotal is multiplied by (1 - discount) Returns: Total amount after applying the tax Example: >>> items = [{'price': 10, 'quantity': 2}, {'price': 5, 'quantity': 1}] >>> calculate_total(items, tax_rate=0.1, discount=0.05) 22.5 """ subtotal = sum(item['price'] * item['quantity'] for item in items) if discount: subtotal *= (1 - discount) return subtotal * (1 + tax_rate)
class DataProcessor: def __init__(self, config): self.config = config self.data = [] def process(self, raw_data): cleaned = [x for x in raw_data if x is not None] return [self.transform(x) for x in cleaned]
class DataProcessor: def __init__(self, config): self.config = config self.data = [] def process(self, raw_data): """ Calculate and return the transformed values from a list of raw data. Args: raw_data: List of strings to process None Returns: List of transformed strings Example: >>> data = ['apple', None, 'banana', 'cherry'] >>> process(data) ['a', 'b', 'c'] """ cleaned = [x for x in raw_data if x is not None] return [self.transform(x) for x in cleaned]
async def fetch_user_data(user_id, session, timeout=30): url = f"https://api.example.com/users/{user_id}" async with session.get(url, timeout=timeout) as response: if response.status != 200: raise ValueError(f"Failed to fetch user {user_id}") return await response.json()
async def fetch_user_data(user_id, session, timeout=30): """ Calculate user data from a user ID using an HTTP GET request. Args: user_id: The ID of the user to retrieve session: An aiohttp client session used to perform the request timeout: Number of seconds to wait for a response before timing out (default 30) Returns: The JSON‑decoded user data as a dictionary Raises: ValueError: If the HTTP response status is not 200 Example: >>> import aiohttp, asyncio >>> async def main(): ... async with aiohttp.ClientSession() as session: ... data = await fetch_user_data(123, session) ... print(data) >>> asyncio.run(main()) """ url = f"https://api.example.com/users/{user_id}" async with session.get(url, timeout=timeout) as response: if response.status != 200: raise ValueError(f"Failed to fetch user {user_id}") return await response.json()

Simply provide any Python file with functions or classes that need documentation:

python localdoc.py --file /path/to/your/file.py

The tool will:

  1. Parse your Python file using AST
  2. Identify all functions and methods without docstrings
  3. Generate appropriate docstrings based on the code structure
  4. Preserve all original code and existing docstrings
  5. Output a new file with _documented suffix

Note: The tool only adds docstrings where they're missing. Existing docstrings are never modified or overwritten.

The tuned models were trained using knowledge distillation, leveraging the teacher model GPT-OSS-120B. The data+config+script used for finetuning can be found in finetuning. We used 28 Python functions and classes as seed data and supplemented them with 10,000 synthetic examples covering various domains (data science, web development, utilities, algorithms).

We compare the teacher model and the student model on 250 held-out test examples using LLM-as-a-judge evaluation:

Model Size Accuracy
GPT-OSS (thinking) 120B 0.81 +/- 0.02
Qwen3 0.6B (tuned) 0.6B 0.76 +/- 0.01
Qwen3 0.6B (base) 0.6B 0.55 +/- 0.04

Evaluation Criteria:

  • LLM-as-a-judge: The training config file and train/test data splits are available under data/.

Privacy & Security: Proprietary codebases contain intellectual property, trade secrets, and sensitive business logic. Sending your code to cloud APIs for documentation creates:

  • IP exposure risks
  • Compliance violations (GDPR, SOC 2, etc.)
  • Security audit failures
  • Dependency on external services

Speed & Cost: Process entire codebases in minutes without API rate limits or per-token charges.

Q: Why don't we just use GPT-4/Claude API for this?

Because your proprietary code shouldn't leave your infrastructure. Cloud APIs create security risks, compliance issues, and ongoing costs. Our models run locally with comparable quality.

Q: Can I document existing docstrings or update them?

Currently, the tool only adds missing docstrings. Updating existing documentation is planned for future releases. For now, you can manually remove docstrings you want regenerated.

Q: Which docstring style can I use?

  • Google: Most readable, great for general Python projects

Q: The model does not work as expected

A: The tool calling on our platform is in active development! Follow us on LinkedIn for updates, or join our community. You can also manually refine any generated docstrings.

Q: Can you train a model for my company's documentation standards?

A: Visit our website and reach out to us, we offer custom solutions tailored to your coding standards and domain-specific requirements.

Q: Does this support type hints or other Python documentation tools?

A: Type hints are parsed and incorporated into docstrings. Integration with tools like pydoc, Sphinx, and MkDocs is on our roadmap.


Next Steps: We're working on git integration to automatically document all modified functions in a commit, making documentation truly seamless in your development workflow.

Read Entire Article