ML Pipeline Automation Framework - Chain together data processing, model training, and deployment with minimal code
⭐ If you find MLFCrafter useful, please consider starring this repository!
Your support helps us continue developing and improving MLFCrafter for the ML community.
MLFCrafter is a Python framework that simplifies machine learning pipeline creation through chainable "crafter" components. Build, train, and deploy ML models with minimal code and maximum flexibility.
🔗 Chainable Architecture - Connect multiple processing steps seamlessly
📊 Smart Data Handling - Automatic data ingestion from CSV, Excel, JSON
🧹 Intelligent Cleaning - Multiple strategies for missing value handling
📏 Flexible Scaling - MinMax, Standard, and Robust scaling options
🤖 Multiple Models - Random Forest, XGBoost, Logistic Regression support
📈 Comprehensive Metrics - Accuracy, Precision, Recall, F1-Score
💾 Easy Deployment - One-click model saving with metadata
🔄 Context-Based - Seamless data flow between pipeline steps
from mlfcrafter import MLFChain , DataIngestCrafter , CleanerCrafter , ScalerCrafter , ModelCrafter , ScorerCrafter , DeployCrafter
# Create ML pipeline in one line
chain = MLFChain (
DataIngestCrafter (data_path = "data/iris.csv" ),
CleanerCrafter (strategy = "auto" ),
ScalerCrafter (scaler_type = "standard" ),
ModelCrafter (model_name = "random_forest" ),
ScorerCrafter (),
DeployCrafter ()
)
# Run entire pipeline
results = chain .run (target_column = "species" )
print (f"Test Score: { results ['test_score' ]:.4f} " )
chain = MLFChain (
DataIngestCrafter (data_path = "data/titanic.csv" , source_type = "csv" ),
CleanerCrafter (strategy = "mean" , str_fill = "Unknown" ),
ScalerCrafter (scaler_type = "minmax" , columns = ["age" , "fare" ]),
ModelCrafter (
model_name = "xgboost" ,
model_params = {"n_estimators" : 200 , "max_depth" : 6 },
test_size = 0.25
),
ScorerCrafter (),
DeployCrafter (model_path = "models/titanic_model.joblib" )
)
results = chain .run (target_column = "survived" )
Loads data from various file formats:
DataIngestCrafter (
data_path = "path/to/data.csv" ,
source_type = "auto" # auto, csv, excel, json
)
Handles missing values intelligently:
CleanerCrafter (
strategy = "auto" , # auto, mean, median, mode, drop, constant
str_fill = "missing" , # Fill value for strings
int_fill = 0.0 # Fill value for numbers
)
Scales numerical features:
ScalerCrafter (
scaler_type = "standard" , # standard, minmax, robust
columns = ["age" , "income" ] # Specific columns or None for all numeric
)
Trains ML models:
ModelCrafter (
model_name = "random_forest" , # random_forest, xgboost, logistic_regression
model_params = {"n_estimators" : 100 },
test_size = 0.2 ,
stratify = True
)
Calculates performance metrics:
ScorerCrafter (
metrics = ["accuracy" , "precision" , "recall" , "f1" ] # Default: all metrics
)
Saves trained models:
DeployCrafter (
model_path = "model.joblib" ,
save_format = "joblib" , # joblib or pickle
include_scaler = True ,
include_metadata = True
)
Alternative Usage Patterns
chain = MLFChain ()
chain .add_crafter (DataIngestCrafter (data_path = "data.csv" ))
chain .add_crafter (CleanerCrafter (strategy = "median" ))
chain .add_crafter (ModelCrafter (model_name = "xgboost" ))
results = chain .run (target_column = "target" )
artifacts = DeployCrafter .load_model ("model.joblib" )
model = artifacts ["model" ]
metadata = artifacts ["metadata" ]
Python : 3.8 or higher
Core Dependencies : pandas, scikit-learn, numpy, xgboost, joblib
Setup Development Environment
git clone https://github.com/brkcvlk/mlfcrafter.git
cd mlfcrafter
pip install -r requirements-dev.txt
pip install -e .
# Run all tests
python -m pytest tests/ -v
# Run tests with coverage
python -m pytest tests/ -v --cov=mlfcrafter --cov-report=html
# Check code quality
ruff check .
# Auto-fix code issues
ruff check --fix .
# Format code
ruff format .
Complete documentation is available at MLFCrafter Docs
We welcome contributions! Please see our Contributing Guidelines for details.
This project is licensed under the MIT License - see the LICENSE file for details.
Made for the ML Community