Show HN: MLFCrafter – Modular ML pipeline automation framework in Python

3 months ago 4

ML Pipeline Automation Framework - Chain together data processing, model training, and deployment with minimal code

⭐ If you find MLFCrafter useful, please consider starring this repository!

Your support helps us continue developing and improving MLFCrafter for the ML community.

MLFCrafter is a Python framework that simplifies machine learning pipeline creation through chainable "crafter" components. Build, train, and deploy ML models with minimal code and maximum flexibility.

🔗 Chainable Architecture - Connect multiple processing steps seamlessly
📊 Smart Data Handling - Automatic data ingestion from CSV, Excel, JSON
🧹 Intelligent Cleaning - Multiple strategies for missing value handling
📏 Flexible Scaling - MinMax, Standard, and Robust scaling options
🤖 Multiple Models - Random Forest, XGBoost, Logistic Regression support
📈 Comprehensive Metrics - Accuracy, Precision, Recall, F1-Score
💾 Easy Deployment - One-click model saving with metadata
🔄 Context-Based - Seamless data flow between pipeline steps

from mlfcrafter import MLFChain, DataIngestCrafter, CleanerCrafter, ScalerCrafter, ModelCrafter, ScorerCrafter, DeployCrafter # Create ML pipeline in one line chain = MLFChain( DataIngestCrafter(data_path="data/iris.csv"), CleanerCrafter(strategy="auto"), ScalerCrafter(scaler_type="standard"), ModelCrafter(model_name="random_forest"), ScorerCrafter(), DeployCrafter() ) # Run entire pipeline results = chain.run(target_column="species") print(f"Test Score: {results['test_score']:.4f}")

chain = MLFChain( DataIngestCrafter(data_path="data/titanic.csv", source_type="csv"), CleanerCrafter(strategy="mean", str_fill="Unknown"), ScalerCrafter(scaler_type="minmax", columns=["age", "fare"]), ModelCrafter( model_name="xgboost", model_params={"n_estimators": 200, "max_depth": 6}, test_size=0.25 ), ScorerCrafter(), DeployCrafter(model_path="models/titanic_model.joblib") ) results = chain.run(target_column="survived")

Loads data from various file formats:

DataIngestCrafter( data_path="path/to/data.csv", source_type="auto" # auto, csv, excel, json )

Handles missing values intelligently:

CleanerCrafter( strategy="auto", # auto, mean, median, mode, drop, constant str_fill="missing", # Fill value for strings int_fill=0.0 # Fill value for numbers )

Scales numerical features:

ScalerCrafter( scaler_type="standard", # standard, minmax, robust columns=["age", "income"] # Specific columns or None for all numeric )

Trains ML models:

ModelCrafter( model_name="random_forest", # random_forest, xgboost, logistic_regression model_params={"n_estimators": 100}, test_size=0.2, stratify=True )

Calculates performance metrics:

ScorerCrafter( metrics=["accuracy", "precision", "recall", "f1"] # Default: all metrics )

Saves trained models:

DeployCrafter( model_path="model.joblib", save_format="joblib", # joblib or pickle include_scaler=True, include_metadata=True )

Alternative Usage Patterns

chain = MLFChain() chain.add_crafter(DataIngestCrafter(data_path="data.csv")) chain.add_crafter(CleanerCrafter(strategy="median")) chain.add_crafter(ModelCrafter(model_name="xgboost")) results = chain.run(target_column="target")

artifacts = DeployCrafter.load_model("model.joblib") model = artifacts["model"] metadata = artifacts["metadata"]

Python: 3.8 or higher
Core Dependencies: pandas, scikit-learn, numpy, xgboost, joblib

Setup Development Environment

git clone https://github.com/brkcvlk/mlfcrafter.git cd mlfcrafter pip install -r requirements-dev.txt pip install -e .

# Run all tests python -m pytest tests/ -v # Run tests with coverage python -m pytest tests/ -v --cov=mlfcrafter --cov-report=html # Check code quality ruff check . # Auto-fix code issues ruff check --fix . # Format code ruff format .

Complete documentation is available at MLFCrafter Docs

We welcome contributions! Please see our Contributing Guidelines for details.

This project is licensed under the MIT License - see the LICENSE file for details.

📖 Documentation: MLFCrafter Docs
🐛 Bug Reports: GitHub Issues
💬 Discussions: GitHub Discussions

Made for the ML Community

Read Entire Article

Show HN: MLFCrafter – Modular ML pipeline automation framework in Python

⭐ If you find MLFCrafter useful, please consider starring this repository!

Alternative Usage Patterns

Setup Development Environment

Related

Crushed by capitalism? There's a video game for that

Show HN: Infinite canvas product search and decision making

Strict and full Turkish get-up [video]