Show HN: Databroom – Clean data with GUI or CLI and generate Python/R script
8 hours ago
1
A powerful DataFrame cleaning tool with Command Line Interface, Interactive GUI, and Programmatic API - automatically generates reproducible Python/pandas, R/tidyverse code, and CLI Commands
🌐 Try the Demo App – Interactive GUI in the browser, no install needed.
# Generate R script for tidyverse users
databroom clean research_data.csv \
--clean-all \
--output-code tidyverse_pipeline.R \
--lang r
# Process multiple files with consistent operationsforfilein data/*.csv;do
databroom clean "$file" \
--clean-columns \
--output-file "clean_$(basename "$file")" \
--quiet
done
Databroom follows a modular architecture designed for extensibility and maintainability:
databroom/
├── cli/ # Command line interface (Typer + Rich)
│ ├── main.py # Entry point and app configuration
│ ├── commands.py # CLI commands (clean, gui, list)
│ ├── operations.py # Operation parsing and execution
│ └── utils.py # File handling and code generation
├── core/ # Core cleaning engine
│ ├── broom.py # Main API with method chaining
│ ├── pipeline.py # Operation coordination and state management
│ ├── cleaning_ops.py # Individual cleaning operations
│ └── history_tracker.py # Automatic operation tracking
├── generators/ # Code generation system
│ ├── base.py # Template-based code generator
│ └── templates/ # Jinja2 templates for Python/R
├── gui/ # Modular Streamlit web interface
│ ├── app.py # Main orchestrator (83 lines)
│ ├── components/ # Reusable UI components
│ │ ├── file_upload.py # File upload and processing
│ │ ├── operations.py # Data cleaning operations
│ │ ├── controls.py # Step back, reset, reload controls
│ │ └── tabs.py # Data display and export tabs
│ └── utils/ # GUI utilities
│ ├── session.py # Session state management
│ └── styles.py # CSS styling and theming
└── tests/ # Comprehensive test suite
# Clone repository
git clone https://github.com/onlozanoo/databroom.git
cd databroom
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate# Install in development mode
pip install -e ".[dev,cli,all]"# Run tests
pytest
# Run CLI locally
python -m databroom.cli.main --help
# Run full test suite
pytest
# Run with coverage
pytest --cov=databroom
# Run specific test categories
pytest -m "not slow"# Skip slow tests
pytest tests/cli/ # Test CLI only
pytest tests/core/ # Test core functionality
# Format code
black databroom/
isort databroom/
# Lint
flake8 databroom/
# Type check
mypy databroom/
Current Version: v0.4 – Portable Pipelines Across GUI, CLI, and API
Design a cleaning pipeline once — apply it anywhere.
Create a cleaning workflow visually in the GUI
Export it as a JSON pipeline
Run it headlessly via CLI or integrate into scripts and APIs
Re-import it to GUI for review or extension
This update makes your data prep workflows reusable, versionable, and automatable across any environment — without code duplication or switching tools.