June 19, 2025 compoconf composition configuration machine learning
(see also the lengthy version aided by ChatGPT)
Composition over Inheritance
“Composition over Inheritance” is one of the phrases you might have heard many, many times in software engineering. Self-contained units, modules, are at the core; they can be tested separately and can be easily replaced when they satisfy a common interface. This potential for switching out components makes it easier to compare models, ablate models (an important scientific practice), test on new datasets, and perform hyperparameter optimization. Early on, an important feature of PyTorch was its nn.Module system, enabling a lot of flexibility and easy re-use of components, making it the leading framework for Machine Learning today. Still, common practice in Machine Learning is to hard-code many default values and then put the remaining configuration variables into a large flat configuration file, as seen in Hugging Face’s transformers. Moving around configuration options forces some reduction in configurational complexity, because who wants to keep track of numerous hyperparameters for various components. This compression is beneficial once configurations stabilize.
But it makes you unaware of potential knobs to tune, makes it difficult to uncover bad default values, and most importantly leads to reproducibility issues, because if a default is changed in a new version, the code’s history becomes crucial for reproducing experiments. Therefore, compression should occur within the configuration itself, where all settings are fixed, but key parameters that are likely to change are highlighted.
Another common structure in ML frameworks is that certain configs are assumed to be present, or if not present, are replaced by defaults. You can build a configuration that contains config.optimizer.lr, whereas the actual code expects config.optimizer.learning_rate. Simple naming issues like that can make your code work somehow, but not as intended. This can be solved by having a typed configuration. While there were some attempts to remove typing from programming languages (Python’s duck typing), I think most people would agree that typing largely improves the quality of the code and can catch many, many bugs before runtime.
CompoConf
This leads to the core point: I want to provide a solution. The CompoConf library is built on Python dataclasses for ensuring that everything is correctly typed, while keeping it flexible and well integratable into existing systems. Still, a problem is the following: You have multiple options for, e.g., your model, let’s say a Vision Transformer or a CNN model. They might have very different configuration options; for the Transformer, you could switch on a QK norm or a layer scaling; for the CNN, you might want to change the kernel sizes. In your dataclass, this would then look like:
For additional models, you have to extend this further and further. And for the actual instantiation, you have to check what type the configuration actually has to create the right model (a factory pattern). When reading the config from a yaml file or similar, you also want to be able to disambiguate two models that have exactly the same configuration options, without adding dummy config options solely for that purpose.
At the core of this are Protocol or Interface options that you actually want to choose from: All classes that implement a certain protocol or interface (e.g., a “Model” or “VisionModel” or “ImageDataset”) are eligible for configuration.
And that’s where CompoConf avoids duplicated efforts. For every interface that you actually have, you leverage Python inheritance patterns and register new protocol implementations or object classes that implement an interface. CompoConf holds a registry for every registered interface, which contains the registered classes that implement this interface. The registry idea comes from Maximilian Beck and worked well even before the implementation of CompoConf. But back then, we had to extend the registries manually every time a new option was implemented. Also, the connection to the interface was not as strict, and the config class disambiguation was a bit more difficult. CompoConf unites these ideas of compositional structures, enabled by interface classes, and composition via typed configuration with just two decorators and two base classes.
It also recognizes the config class based on Python annotations, thereby connecting the config class for the given interface with the associated implementation class.
With the type checks of CompoConf, you can even check if your overall configuration is valid via typing (and potential __post_init__ dataclass checks), before initiating an ML run on many GPUs!
For the actual configuration, compoconf is quite agnostic; it can be any structure of elementary Python data types, one would get from a json or yaml file, or tools built on top, like hydra or omegaconf. compoconf can parse the dictionary structure into dataclasses by matching their type annotations (for nested dataclasses) and the CompoConf-specific class_name attribute, which disambiguates configuration classes that share the exact same attributes otherwise. This way, you can use compoconf on top of existing, established workflows.
For a compositional class like a TransformerBlock, you now choose different submodules not by intricate if structures, but by simply instantiating what is in the config:
For instantiate, you can also add more arguments that cannot be configured and are only known at runtime (e.g., device). Here, the constructors need to match in structure for all interface implementations. Note that in the example above, you can instantiate the TransformerBlock itself from a higher level for the ResidualModule interface. So CompoConf not only allows horizontal composition but also nested composition. You can, but you don’t have to; at any point, you can “flatten” the config as you like, and that’s it.
If you have a large functional library, like torch.nn.init with many functions that “implement an interface” but have different configuration options (like scale parameters), you can register them easily by the type annotation of the function arguments.
But this is a quite advanced use of compoconf.
It might take a bit to get into the structure, but in my latest research on pLSTM, it facilitated easy switching between torch.nn, flax.linen, and flax.nnx based module systems for torch and jax based training. For these different frameworks, you can now use the exact same configuration, with different modules implemented (and tested for matching outputs). You can trivially run experiments by switching out, e.g., LayerNorm with RMSNorm, or activation functions, with specific parameters. All this by simply changing the configuration. The code remains untouched, leaving no need to copy a large code file to get a 1:1 comparison; the config is attached to the experiment and makes it reproducible. You can test your code in detail, while it stays flexible via the config.
This is what was enabled by compoconf for my Machine Learning workloads. I hope you projects can benefit from it as well! That’s why it is open source now!