Show HN: MLFCrafter – Modular ML pipeline automation framework in Python

3 months ago 4

ML Pipeline Automation Framework - Chain together data processing, model training, and deployment with minimal code

If you find MLFCrafter useful, please consider starring this repository!

GitHub stars

Your support helps us continue developing and improving MLFCrafter for the ML community.


MLFCrafter is a Python framework that simplifies machine learning pipeline creation through chainable "crafter" components. Build, train, and deploy ML models with minimal code and maximum flexibility.

  • 🔗 Chainable Architecture - Connect multiple processing steps seamlessly
  • 📊 Smart Data Handling - Automatic data ingestion from CSV, Excel, JSON
  • 🧹 Intelligent Cleaning - Multiple strategies for missing value handling
  • 📏 Flexible Scaling - MinMax, Standard, and Robust scaling options
  • 🤖 Multiple Models - Random Forest, XGBoost, Logistic Regression support
  • 📈 Comprehensive Metrics - Accuracy, Precision, Recall, F1-Score
  • 💾 Easy Deployment - One-click model saving with metadata
  • 🔄 Context-Based - Seamless data flow between pipeline steps
from mlfcrafter import MLFChain, DataIngestCrafter, CleanerCrafter, ScalerCrafter, ModelCrafter, ScorerCrafter, DeployCrafter # Create ML pipeline in one line chain = MLFChain( DataIngestCrafter(data_path="data/iris.csv"), CleanerCrafter(strategy="auto"), ScalerCrafter(scaler_type="standard"), ModelCrafter(model_name="random_forest"), ScorerCrafter(), DeployCrafter() ) # Run entire pipeline results = chain.run(target_column="species") print(f"Test Score: {results['test_score']:.4f}")
chain = MLFChain( DataIngestCrafter(data_path="data/titanic.csv", source_type="csv"), CleanerCrafter(strategy="mean", str_fill="Unknown"), ScalerCrafter(scaler_type="minmax", columns=["age", "fare"]), ModelCrafter( model_name="xgboost", model_params={"n_estimators": 200, "max_depth": 6}, test_size=0.25 ), ScorerCrafter(), DeployCrafter(model_path="models/titanic_model.joblib") ) results = chain.run(target_column="survived")

Loads data from various file formats:

DataIngestCrafter( data_path="path/to/data.csv", source_type="auto" # auto, csv, excel, json )

Handles missing values intelligently:

CleanerCrafter( strategy="auto", # auto, mean, median, mode, drop, constant str_fill="missing", # Fill value for strings int_fill=0.0 # Fill value for numbers )

Scales numerical features:

ScalerCrafter( scaler_type="standard", # standard, minmax, robust columns=["age", "income"] # Specific columns or None for all numeric )

Trains ML models:

ModelCrafter( model_name="random_forest", # random_forest, xgboost, logistic_regression model_params={"n_estimators": 100}, test_size=0.2, stratify=True )

Calculates performance metrics:

ScorerCrafter( metrics=["accuracy", "precision", "recall", "f1"] # Default: all metrics )

Saves trained models:

DeployCrafter( model_path="model.joblib", save_format="joblib", # joblib or pickle include_scaler=True, include_metadata=True )

Alternative Usage Patterns

chain = MLFChain() chain.add_crafter(DataIngestCrafter(data_path="data.csv")) chain.add_crafter(CleanerCrafter(strategy="median")) chain.add_crafter(ModelCrafter(model_name="xgboost")) results = chain.run(target_column="target")
artifacts = DeployCrafter.load_model("model.joblib") model = artifacts["model"] metadata = artifacts["metadata"]
  • Python: 3.8 or higher
  • Core Dependencies: pandas, scikit-learn, numpy, xgboost, joblib

Setup Development Environment

git clone https://github.com/brkcvlk/mlfcrafter.git cd mlfcrafter pip install -r requirements-dev.txt pip install -e .
# Run all tests python -m pytest tests/ -v # Run tests with coverage python -m pytest tests/ -v --cov=mlfcrafter --cov-report=html # Check code quality ruff check . # Auto-fix code issues ruff check --fix . # Format code ruff format .

Complete documentation is available at MLFCrafter Docs

We welcome contributions! Please see our Contributing Guidelines for details.

This project is licensed under the MIT License - see the LICENSE file for details.


Made for the ML Community

Read Entire Article