Yet another minimal init system design

2 weeks ago 1

Status: early design RFC, no implementation yet.

"System state is a computed function of the dependency graph rather than as a mutable imperative set of unit states."

Concepts

Each service configuration is called a unit.
Unit names are irrelevant to daily operation.
Units provide one or more targets, which are used to specify dependencies. It is illegal to have multiple units provide duplicate targets.
Runtime states
- A target's state is equivalent to the state of the unit that provides it.
- A unit (and therefore its targets) may have one of the following states:
  - Disabled: Not required, so it's just off.
  - Waiting for dependencies
  - Failed: The unit failed (non-zero exit, unexpected exit, etc.)
  - Starting: The unit is being started and isn't ready yet.
  - Active:
    - Running: The unit is working properly.
    - Reloading: The unit is working properly and is currently being reloaded.
    - Exited: The main process exited, but it's successful because e.g. it's oneshot.
    - Running (degraded): The unit is running but one or more of its (direct or transitive) dependencies are not working (but we do not have direct hard dependencies that aren't working which would stop us). But we're still considered well and running; all of our dependents will be degraded (but not stopped).
  - Stopping: The unit is being stopped.
- A target in a state other than "disabled" may have a "coincidental" flag set: because the unit that provides this target is not disabled (i.e., because the unit provides other targets that are needed), this target is coincidentally provided.
Types of units
- foreground-notified: The unit is a supervised foreground process that supports sd_notify. It is considered starting until sd_notify receives READY=1. Other sd_notify states such as RELOADING are also supported.
- foreground-supervised: The unit is a supervised foreground process. It does not support sd_notify and is considered active while the process is running; if a ready-wait is specified, the supervisor waits that many seconds before the unit is considered ready (only if it doesn't exit); if ready-grep is specified, the standard output or standard error of the process must have a line that matches the specified regular expression before it's considered ready.
- background-pidfile: Traditional forking daemon that starts in the foreground and forks into the background when it's ready, leaving a PID file at a predetermined path.
- background-scripted: Start/stop/reload are implemented by custom scripts.
- oneshot: The unit is just a command run once. When it's running it's considered "starting". It is considered "active but exited" when it exits successfully, and it's considered "failed" when it exits unsuccessfully.
- virtual: Exists solely for dependency management.
Units may have start and stop timeouts. If a unit is in the "starting"/"stopping" states for longer than the timeout specified, it is terminated (killed if unable to terminate), and it enters the failed state.
There are several types of dependency relationships. All dependency specifications use target names, not unit names.
- depends-on marks a hard runtime dependency. The dependent does not start until the dependency is active. If a dependency gets out of the active state, the dependent is stopped.
- depends-ms marks a startup milestone. The dependency must start successfully before the dependent starts. Stopping the dependency later does not automatically stop the dependent.
- waits-for wait for the dependency to either be active or to fail, before the dependent is started. Circular dependencies are illegal.
There are no ad-hoc commands to start/stop individual services. At each point in time, the system is given one target to satisfy, and it deterministically finds the minimum set of units needed to satisfy that target. If you want temporary things to run, then just make a new target that lists your typical target as a dependency and add the new targets as dependencies.
Typically you don't want to use depends-on in your main system target, because when its dependency fails it would take down the entire system.
Things are started as concurrently as possible, constrained by the dependency model. Set a maximum number of units starting at a time via the central config file?

TODOs

Formally, how do we move from the current to the desired state?
Restarts, and retries after timeouts
Transient targets to perhaps add imperative control temporarily for debugging?
background-scripted needs to be used very carefully; it's kinda necessary for compatibility with sysvinit scripts...
More robust crash recovery in general
Targets may currently be uniquely provided by one unit only. This is a bit limiting. Research better ways to select which unit provides each target.
Test_User's comments:
- "all units that provide a target should be started (optionally in parallel) when that target is needed; the target is considered degraded if one fails or w/e"
- "you'd need an actual disable mechanism that's prob better than editing dependency graphs"

Examples

I forgot how the services below start or present readiness... these are just conceptual examples.

Let's say you run Maddy, an email server that provides IMAP and SMTP in one binary. It uses sd_notify and depends on the network being online — but we don't want to terminate it when we go offline. Note that exec-stop is typically unnecessary as we send SIGTERM by default; we just list it here to note that you could change it.

unit: maddy type: foreground-notified provides: imapd, smtpd depends-ms: network-online exec-start: /usr/bin/maddy run exec-reload: /bin/kill -USR1 $MAINPID exec-stop: /bin/kill -TERM $MAINPID

network-online is a virtual unit that is satisfied if and only if your DHCP client, network interfaces, local resolver, etc., are ready.

unit: network-online type: virtual provides: network-online depends-on: netif, dhcp, dns

Of course, these dependencies also need to be defined:

unit: dhcpcd provides: dhcp depends-on: netif ... unit: netif provides: netif type: oneshot exec-start: /etc/my-script-to-set-up-interfaces-with-iproute2 ... unit: unbound provides: dns depends-on: netif ...

Let's run an IRC daemon.

unit: inspircd provides: ircd depends-ms: network-online ...

We also have an IRC bot that connects to multiple networks. We want to wait for our own network to be up before it begins, but if our own network fails then just connect to other networks anyway.

name: irc-bot type: foreground-notified provides: irc-bot depends-ms: network-online waits-for: ircd ready-grep: ^connected to exec-start: /usr/local/bin/irc-bot -c /etc/irc-bot.conf exec-reload: /bin/kill -USR1 $MAINPID