Show HN: nanoTabPFN – A Lightweight and Educational Reimplementation of TabPFN

2 hours ago 2

Train your own small TabPFN in less than 500 LOC and a few minutes.

The purpose of this repository is to be a good starting point for students and researchers that are interested in learning about how TabPFN works under the hood.

Clone the repository, afterwards install dependencies via:

pip install numpy torch schedulefree h5py scikit-learn openml seaborn
  • model.py contains the implementation of the architecture and a sklearn-like interface in less than 200 lines of code.
  • train.py implements a simple training loop and prior dump data loader in under 200 lines
  • experiment.ipynb will recreate the experiment from the paper

Pretrain your own nanoTabPFN

To pretrain your own nanoTabPFN, you need to first download a prior data dump from here, then run train.py.

cd nanoTabPFN # download data dump curl http://ml.informatik.uni-freiburg.de/research-artifacts/nanoTabPFN/300k_150x5_2.h5 --output 300k_150x5_2.h5 python train.py

Step by Step explanation:

First we import our code from model.py and train.py

from model import NanoTabPFNModel from model import NanoTabPFNClassifier from train import PriorDumpDataLoader from train import train, get_default_device

Then we instantiate our model

model = NanoTabPFNModel( embedding_size=96, num_attention_heads=4, mlp_hidden_size=192, num_layers=3, num_outputs=2 )

and our dataloader

prior = PriorDumpDataLoader( "300k_150x5_2.h5", num_steps=2500, batch_size=32, )

Now we can train our model:

device = get_default_device() model, _ = train( model, prior, lr = 4e-3, device = device )

and finally we can instantiate our classifier:

clf = NanoTabPFNClassifier(model, device)

and use its .fit, .predict and .predict_proba:

from sklearn.datasets import load_breast_cancer from sklearn.metrics import roc_auc_score, accuracy_score from sklearn.model_selection import train_test_split X, y = load_breast_cancer(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5) clf.fit(X_train, y_train) prob = clf.predict_proba(X_test) pred = clf.predict(X_test) print('ROC AUC', roc_auc_score(y_test, prob)) print('Accuracy', accuracy_score(y_test, pred))

The nanoTabPFN repository is supposed to stay ultra small and simple, but we created another repository, the TFM-Playground which we are building out to have a lot more features, like regression, multiple prior interfaces, multiple architectures, ensembling of different pre-processings and more, so check it out if you are interested!

@article{pfefferle2025nanotabpfn, title={nanoTabPFN: A Lightweight and Educational Reimplementation of TabPFN}, author={Pfefferle, Alexander and Hog, Johannes and Purucker, Lennart and Hutter, Frank}, journal={arXiv preprint arXiv:2511.03634}, year={2025} }
Read Entire Article