CuBit: A Gen­er­al-Pur­pose Oper­at­ing Sys­tem in SPARK/ Ada

5 hours ago 1

Intro­duc­tion #

Last year, I start­ed eval­u­at­ing pro­gram­ming lan­guages for a for­mal­ly-ver­i­fied oper­at­ing sys­tem. I’ve been devel­op­ing soft­ware for a while, but only recent­ly began work in high integri­ty soft­ware devel­op­ment and for­mal meth­ods. There are sev­er­al oper­at­ing sys­tem projects, like the SeL4 micro­ker­nel and the Muen sep­a­ra­tion ker­nel, that make use of for­mal ver­i­fi­ca­tion. But I was inter­est­ed in using a for­mal­ly-ver­i­fied lan­guage to write a gen­er­al-pur­pose OS — an envi­ron­ment for abstract­ing the under­ly­ing hard­ware while act­ing as an arbiter for run­ning the nor­mal appli­ca­tions we’re used to.

Many of the lan­guages used to write for­mal­ly-ver­i­fied soft­ware rely on a run­time envi­ron­ment or make heavy use of garbage-col­lec­tion, which makes them dif­fi­cult to use for oper­at­ing sys­tem devel­op­ment. SPARK, with its sup­port for zero-foot­print run­times and embed­ded devices, seemed to fit the bill. Ada’s rich type sys­tem has a lot to offer the oper­at­ing sys­tem devel­op­er, and there is good sup­port for writ­ing bare-met­al code.

Overview #

CuBit is still in a very ear­ly stage of devel­op­ment. It boots into 64-bit mode on the x86-64, runs in the high­er-half of mem­o­ry, and is mul­ti­core (up to 128 log­i­cal proces­sors) with pre­emp­tive mul­ti­task­ing. There are vir­tu­al mem­o­ry object and phys­i­cal mem­o­ry allo­ca­tors, and CuBit can start prim­i­tive user-mode process­es with sup­port for the fast x86-64 SYSCALL instruc­tion for get­ting back into ker­nel mode. Filesys­tem dri­vers and VFS aren’t ready yet, but progress is being made in this area. My intent with CuBit is not to cre­ate a for­mal­ly-ver­i­fied Unix clone. I’ve tak­en inspi­ra­tion from a num­ber of dif­fer­ent oper­at­ing sys­tems, from embed­ded OSes like Xinu and QNX to BSD, Lin­ux and even Win­dows and VMS.

Hav­ing said all that, CuBit doesn’t real­ly do all that much yet! But I’ve had a lot of fun and learned a lot devel­op­ing what is there so far. I’ve spent a lot of time on doc­u­men­ta­tion and try­ing to make the code easy for con­trib­u­tors — so let’s dive in! Code is avail­able at https://​github​.com/​d​o​c​a​n​d​r​e​w​/​CuBit.

First Steps #

Luke Guest’s Ada Bare Bones arti­cle on osdev​.org (https://​wiki​.osdev​.org/​A​d​a​_​B​a​r​e​_​bones) does a good job of illus­trat­ing some of the lim­i­ta­tions that must be con­sid­ered when devel­op­ing an oper­at­ing sys­tem in Ada, and was the start­ing point I used. One of the top­ics he devotes some time to is the use of pragma to dis­able Ada fea­tures that aren’t suit­able for OS code. I use the gnat.adc file to list them all. Many of the prag­mas I spec­i­fy in gnat.adc are self-explana­to­ry, but it’s worth talk­ing about a few.

In gnat.adc:

pragma Suppress (Index_Check); pragma Suppress (Range_Check); pragma Suppress (Overflow_Check); ... pragma Restrictions (No_Floating_Point);

I sup­press the Index, Range and Over­flow checks because we should use gnatprove at com­pile-time to demon­strate the absence of these types of errors. The beau­ty of SPARK is that we can show the absence of these errors at com­pile-time rather than rely on run­time excep­tions. Now, just because we can prove absence of these errors doesn’t mean CuBit is there yet. There is still a lot of work to be done in the proof depart­ment. There’s a bal­ance between adding new fea­tures to the OS and going back and try­ing to ver­i­fy the work that was already done. Since this is a side-project for now, I find that usu­al­ly new fea­tures win my atten­tion. Hav­ing said that, I do enjoy going back, pick­ing a file, and try­ing to get it to pass all of gnatproves checks with no warn­ings or errors.

The next prag­ma dis­ables float­ing-point math. The x87 FPU has to be explic­it­ly ini­tial­ized pri­or to use. Depend­ing on where we do that in the OS (CuBit doesn’t do it at all, yet!), we might get our­selves into trou­ble by acci­den­tal­ly per­form­ing a float­ing-point oper­a­tion which would cause a hard­ware excep­tion. This prag­ma pre­vents us from doing so.

Gen­er­al­ly-speak­ing, use of float­ing-point in the OS is to be avoid­ed any­how, as it means that dur­ing a con­text switch (enter­ing the OS from user code or vice-ver­sa), we have to save the float­ing-point state of the user process and restore the kernel’s float­ing-point state. If the user process doesn’t make use of the FPU, then we can gain some effi­cien­cy by avoid­ing it in the ker­nel as well. I’m hop­ing CuBit can avoid float­ing-point math alto­geth­er in the ker­nel, but it remains to be seen whether that’s pos­si­ble long-term.

Since CuBit runs in 64-bit mode, we also have to con­sid­er SSE instruc­tions. These are a lit­tle bit sneaki­er. Unlike float­ing-point oper­a­tions, which are very easy to avoid using Ada’s mod­u­lar types, the GCC code gen­er­a­tor can put SSE instruc­tions in where you don’t expect them for bet­ter per­for­mance. Since CuBit doesn’t yet sup­port SSE, we need to explic­it­ly dis­able it with com­pil­er flags in the .gpr project file:

In cubit.gpr:

package Compiler is for Default_Switches ("Ada") use ( ... "-mno-sse" "-mno-sse2" ... ); end Compiler;

That one was learned the hard way, when I got an unex­pect­ed hard­ware excep­tion and had to dig through the gen­er­at­ed assem­bly to see what the prob­lem was. Speak­ing of assembly…

Some Assem­bly Required #

In order to get to our SPARK code, a lit­tle bit of assem­bly code is nec­es­sary. I pre­fer Intel syn­tax vs the AT&T syn­tax com­mon­ly used in GCC, so use the yasm assem­bler in CuBit. Inline assem­bly in GNAT is still using the native AT&T‑style syn­tax, so it can be seen in CuBit too, when we need to use native instruc­tions from Ada for spe­cif­ic hard­ware func­tion­al­i­ty, like read­ing I/O ports, or an x86 mod­el-spe­cif­ic reg­is­ter, shown here.

In x86.adb:

--------------------------------------------------------------------------- -- Read from a model-specific register (MSR) --------------------------------------------------------------------------- function rdmsr(msraddr : in MSR) return Unsigned_64 is low : Unsigned_32; high : Unsigned_32; begin Asm("rdmsr", Inputs => Unsigned_32'Asm_Input("c", msraddr), Outputs => (Unsigned_32'Asm_Output("=a", low), Unsigned_32'Asm_Output("=d", high)), Volatile => True); return (Shift_Left(Unsigned_64(high), 32) or Unsigned_64(low)); end rdmsr;

We aren’t allowed to use inline-assem­bly in SPARK-Mode, so we’ve marked the whole x86 pack­age body with SPARK_Mode => Off. How­ev­er, we can still make use of SPARK to ver­i­fy pre-con­di­tions and post-con­di­tions for these func­tions in the pack­age spec­i­fi­ca­tion. This is a tech­nique that CuBit uses quite often, as we often have to do point­er arith­metic, unchecked con­ver­sions and what John Barnes calls naughty peek­ing and pok­ing” with­in a sub­pro­gram body.

Link­ing #

One inter­est­ing thing about CuBit and many oper­at­ing sys­tems is that the ker­nel is just an ELF object file, with the typ­i­cal sec­tions we’d expect in a nor­mal appli­ca­tion — like .text, .rodata, .data, .bss, etc. We need to be a lit­tle more care­ful about how these sec­tions are linked (i.e. what address to use when refer­ring to a sym­bol). CuBit uses a link­er script to spec­i­fy exact­ly where the sec­tions should be linked, and then also pro­vide a load address using the AT key­word to tell the boot­loader where in phys­i­cal mem­o­ry to load it.

Since CuBit is 64-bit, we link the ker­nel in the upper 2GB of mem­o­ry, and use the -mcmodel=kernel com­pil­er flag to make sure the instruc­tions gen­er­at­ed by GNAT are using the appro­pri­ate address­ing mode. With a mul­ti­core archi­tec­ture, we con­sid­er the boot­strap” core or BSP and then appli­ca­tion proces­sors” or APs.

The APs will boot into real mode once we send them the appro­pri­ate inter­rupt, so the code there must be both phys­i­cal­ly in the first 64k of mem­o­ry, as well as linked there. After some ini­tial set­up to get to pro­tect­ed mode, they boot in a sim­i­lar man­ner to the BSP. We rename this sec­tion to .text_ap to dis­tin­guish it from the nor­mal ker­nel .text section.

In linker.ld:

/* AP starting point */ AP_START = 0x7000; /* kernel load and link locations */ KERNEL_PHYS = 0x00100000; KERNEL_BASE = 0xFFFFFFFF80000000; ... SECTIONS { . = AP_START; .text_ap : AT(AP_START) { stext_ap = .; *(.text_ap_entry) etext_ap = .; } . = KERNEL_PHYS + KERNEL_BASE; KERNEL_START_VIRT = .; .text : AT(KERNEL_PHYS) { stext = .; build/boot.o (.text .text.*) /* need this at the front */ *( EXCLUDE_FILE(build/init.o) .text .text.*) } . = ALIGN(4K); etext = .; .rodata : { srodata = .; *(.rodata .rodata.*) } ...

If you’ve nev­er worked with link­er scripts, this might look a lit­tle fun­ny. Just remem­ber that the . means the cur­rent address”. So start­ing at:

. = AP_START

we could say, the cur­rent address is AP_START (or 0x7000).” Then, from that address, we define a section

.text_ap : AT(AP_START) { stext_ap = .; *(.text_ap_entry) etext_ap = .; }

This means link the text_ap sec­tion start­ing from what­ev­er the cur­rent address is, and then AT(AP_START) puts a load address of AP_START in the object file, which our boot­loader will duti­ful­ly abide by. Note that there are what appear to be a cou­ple of assign­ments: stext_ap = .; and etext_ap = .;. These sym­bols will get export­ed for use in our Ada code. How­ev­er, we must always remem­ber to take the address of a sym­bol, nev­er refer to it direct­ly. To ensure this, CuBit uses the Ada type sys­tem to help out:

In util.ads:

--------------------------------------------------------------------------- -- Symbol is a useless type, used to prevent us from forgetting to use -- 'Address when referring to one. --------------------------------------------------------------------------- type Symbol is (USELESS) with Size => System.Word_Size;

In-between the stext_ap = .; and etext_ap = .; sym­bol exports, we have the cream fill­ing of the .text_ap sec­tion. Here’s where we spec­i­fy what gets includ­ed in that sec­tion. In this case, it’s every­thing marked as a .text_ap_entry sec­tion with­in indi­vid­ual object files — which in this case is just the com­piled boot_ap.asm or AP entry code. We use the section direc­tive in our assem­bly to spec­i­fy what the object file sec­tion will be called:

In boot_ap.asm:

... BITS 16 ; we'll link this section down low, since it has to be in first ; 65535 bytes for real mode. section .text_ap_entry ...

If we take a look at the object file with readelf we can see what’s going on:

> readelf -a build/boot_ap.o ... Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .shstrtab STRTAB 0000000000000000 00000320 000000000000009e 0000000000000000 0 0 0 [ 2] .strtab STRTAB 0000000000000000 000003c0 000000000000009d 0000000000000000 0 0 0 [ 3] .symtab SYMTAB 0000000000000000 00000460 0000000000000198 0000000000000018 2 15 8 [ 4] .text PROGBITS 0000000000000000 00000040 0000000000000000 0000000000000000 AX 0 0 16 [ 5] .text_ap_entry PROGBITS 0000000000000000 00000040 000000000000008e 0000000000000000 A 0 0 16 ...

The .text_ap_entry sec­tion shown here will get placed in the .text_ap sec­tion we defined in our link­er script. It seems yasm gives us a .text sec­tion we didn’t ask for, but it’s size 0, as expect­ed, since boot_ap.asm does not spec­i­fy a .text section.

We export some oth­er use­ful sym­bols in the link­er file too — like stext, etext, srodata, erodata, etc. This is just to mark the start and end of the var­i­ous sec­tions in our ker­nel image. We’ll return to those in just a bit.

The oth­er sec­tions, for read-only data (.rodata), ini­tial­ized sta­t­ic data (.data) and unini­tial­ized sta­t­ic data (.bss) fol­low after the .text sec­tion. Note we set the cur­rent address” using the ALIGN(4K) direc­tives. I like to have these page-aligned (4k when using small pages on x86-64) for a cou­ple of rea­sons. The first, less-impor­tant rea­son is that it makes debug­ging eas­i­er when I can quick­ly iden­ti­fy the range of mem­o­ry that might be caus­ing an error.

The sec­ond, more-impor­tant rea­son is that we can apply per-page access pro­tec­tions. For instance, mark­ing the no-exe­cute” (NX) bit for data sec­tions, and mark­ing .rodata as read-only. In doing so, if we make a mis­take and access some­thing we shouldn’t, we’ll hope­ful­ly get a (eas­i­ly debug­gable) hard­ware excep­tion rather than (hard to debug) unde­fined behav­ior. This is also a secu­ri­ty enhancement.

This is per­formed in the Mem_Mgr pack­age, when we map all phys­i­cal mem­o­ry in the sys­tem into the kernel’s page tables with the appro­pri­ate flags. Here, Ada’s type sys­tem helps us out again, since we can define ranges with the sec­tion start and end address­es we export­ed in the link­er script:

In mem_mgr.ads:

... subtype kernelTextPages is Virtmem.PFN range Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.stext'Address))) .. Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.etext'Address) - 1)); subtype kernelROPages is Virtmem.PFN range Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.srodata'Address))) .. Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.erodata'Address) - 1)); subtype kernelRWPages is Virtmem.PFN range Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.sdata'Address))) .. Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.ebss'Address) - 1)); ...

This looks a bit unwieldy, but we’re just defin­ing Ada ranges for each ELF sec­tion, con­vert­ing them to their phys­i­cal address, and then to a PFN or page frame num­ber” to make them eas­i­er to work with. Lat­er, we’ll iter­ate through all the mem­o­ry areas that the boot­loader iden­ti­fies, map­ping pages appro­pri­ate­ly. Ada makes it easy to check whether the page frame we’re look­ing at belongs to one of those ranges:

In mem_mgr.adb:

procedure determineFlagsAndMapFrame(frame : in Virtmem.PFN) is ... begin if frame in kernelTextPages then mapPage(fromPhys, toVirtLinear, Virtmem.PG_KERNELDATARO, kernelP4, ok); if not ok then raise RemapException; end if; mapPage(fromPhys, toVirtKernel, Virtmem.PG_KERNELCODE, kernelP4, ok); ...

Here, if the PFN falls into the kernelTextPages range, then we’ll map it into vir­tu­al mem­o­ry twice. Once as read-only data (PG_KERNELDATARO) in what I call the lin­ear range”, at 0xFFFF_​8000_​0000_​0000 and up, where we map all phys­i­cal mem­o­ry in the sys­tem, and then again as in the upper 2GiB of mem­o­ry at 0xFFFF_​FFFF_​8000_​0000 and up, where it was linked and exe­cutes from. At boot, CuBit will dis­play the mem­o­ry address­es of the sec­tions as they are linked:

... Kernel sections: text: 0xFFFFFFFF80100000-0xFFFFFFFF80127000 rodata: 0xFFFFFFFF80127000-0xFFFFFFFF8012B000 data: 0xFFFFFFFF8012B000-0xFFFFFFFF8012C000 bss: 0xFFFFFFFF8012C000-0xFFFFFFFF804300A8 ...

Mod­i­fy­ing the Run­time #

SPARK comes with a zero foot­print run­time” (rts-zfp) that you can find in the GNAT CE pack­age. I’ve mod­i­fied it very slight­ly from the orig­i­nal, so the CuBit repos­i­to­ry has it’s own copy. Much of the work that I’ve put into CuBit is get­ting the oper­at­ing sys­tem to a point where it can sup­port lan­guage and library fea­tures that we’re all used to in our dai­ly devel­op­ment, such as text I/O and mem­o­ry allocation.

Allo­ca­tors #

Since we have to pro­vide all the mem­o­ry allo­ca­tion our­selves, CuBit has 2 phys­i­cal mem­o­ry allo­ca­tors. The first is the BootAllocator. This uses sta­t­ic bitmaps with sup­port for allo­cat­ing a lim­it­ed amount of mem­o­ry using a rel­a­tive­ly slow lin­ear search.

The oth­er is the BuddyAllocator. CuBit’s bud­dy allo­ca­tor is imple­ment­ed as an array of cir­cu­lar­ly-linked lists of free blocks of pow­er-of‑2 mul­ti­ples of the x86-64 small page size (4K). Whew, that was a mouthful!

I know what you’re think­ing: cir­cu­lar­ly-linked lists are pro­hib­it­ed in SPARK.” Well, yes. Here we get around that by over­lay­ing the free list record” on top of the under­ly­ing phys­i­cal mem­o­ry to be allo­cat­ed. When remov­ing the block from the free list (either because it was allo­cat­ed, or because we’re com­bin­ing it with its bud­dy to form a larg­er block), we call the unlink procedure.

In buddyallocator.adb:

procedure unlink(ord : in Order; addr : in Virtmem.PhysAddress) with SPARK_Mode => On, Pre => freeLists(ord).numFreeBlocks > 0, Post => freeLists(ord).numFreeBlocks = freeLists(ord).numFreeBlocks'Old - 1 is block : aliased FreeBlock with Import, Address => To_Address(addr); prevAddr : constant System.Address := block.prevBlock; nextAddr : constant System.Address := block.nextBlock; begin linkNeighbors: declare prevBlock : aliased FreeBlock with Import, Address => prevAddr; nextBlock : aliased FreeBlock with Import, Address => nextAddr; begin prevBlock.nextBlock := nextAddr; nextBlock.prevBlock := prevAddr; end linkNeighbors; -- decrement the free list count when we unlink somebody freeLists(ord).numFreeBlocks := freeLists(ord).numFreeBlocks - 1; end unlink;

This makes it a bit hard­er to prove cer­tain prop­er­ties about the allo­ca­tor, so might not be con­sid­ered the best tech­nique. I believe its an accept­able com­pro­mise here as it allows us to main­tain SPARK_​Mode in the sub­pro­gram body and allow prov­ing oth­er prop­er­ties (absence of over­flow, etc.) of the code.

CuBit enforces strict align­ment of all block sizes, which caus­es us to sac­ri­fice a lit­tle bit of mem­o­ry to exter­nal frag­men­ta­tion, but it makes the bud­dy cal­cu­la­tions for split­ting and rejoin­ing blocks sim­pler and faster.

For object allo­ca­tion, CuBit uses a curi­ous fea­ture of GNAT called the Simple_Storage_Pool prag­ma. I couldn’t find much doc­u­men­ta­tion about this, but was pleased to dis­cov­er that we need lit­tle-to-no run­time sup­port for this, but can still use the famil­iar new key­word to allo­cate objects at run­time. This might open up the use of tagged records and con­trolled types in CuBit, but needs more exper­i­men­ta­tion before I can say one way or the other.

The SlabAllocator imple­ments a cir­cu­lar­ly-linked list of free objects. It works sim­i­lar­ly to the BuddyAllocator, but instead of page order-aligned blocks of mem­o­ry, it works on objects of a fixed size, with user-spec­i­fied align­ment. By allo­cat­ing from and free-ing to the head of the list, we can get fast O(1) allo­ca­tions and deal­lo­ca­tions. We do give up some flex­i­bil­i­ty with the slab allo­ca­tor, since the object size is fixed and must be spec­i­fied ahead of time.

In SlabAllocator.ads:

type FreeNode is record next : System.Address; prev : System.Address; end record with Size => 16 * 8; for FreeNode use record next at 0 range 0..63; prev at 8 range 0..63; end record; type Slab is limited record freeList : FreeNode; numFree : Integer := 0; capacity : Integer := 0; blockOrder : BuddyAllocator.Order; blockAddr : Virtmem.PhysAddress; mutex : aliased Spinlock.Spinlock; alignment : System.Storage_Elements.Storage_Count; paddedSize : System.Storage_Elements.Storage_Count; initialized : Boolean := False; end record; -- GNAT-specific pragma pragma Simple_Storage_Pool_Type(Slab);

Use of this allo­ca­tor should look some­what famil­iar. We need to ini­tial­ize the slab allo­ca­tor with a set­up rou­tine that will allo­cate under­ly­ing phys­i­cal mem­o­ry from the bud­dy allo­ca­tor described earlier.

... objSlab : SlabAllocator.Slab; type myObjPtr is access myObject; for myObjPtr'Simple_Storage_Pool use objSlab; procedure free is new Ada.Unchecked_Deallocation(myObject, myObjPtr); obj : myObjPtr; begin SlabAllocator.setup(objSlab, myObject'Size); obj := new Object; ...

Record Rep­re­sen­ta­tion #

Note that I use both Size and a record rep­re­sen­ta­tion clause for the FreeNode record in SlabAllocator. This is overkill here, but gen­er­al­ly I like to use both, espe­cial­ly for records that might be over­laid on mem­o­ry-mapped hard­ware, used direct­ly by the CPU (like page tables), or over­laid on in-mem­o­ry BIOS struc­tures like the ACPI tables. It helps keep me hon­est so that when I add fields to a record the com­pil­er will ensure that the size and rep­re­sen­ta­tion match up. Con­sid­er some­thing like the ACPI FADT structure:

In acpi.ads:

--------------------------------------------------------------------------- -- FADT - Fixed ACPI Description Table. --------------------------------------------------------------------------- type FADTRecord is record header : SDTRecordHeader; firmwareControl : Unsigned_32; -- ignored if exFirmwareControl present dsdt : Unsigned_32; -- ignored if exDsdt present reserved1 : Unsigned_8; powerMgmtProfile : PowerManagementProfile; sciInterrupt : Unsigned_16; smiCommand : Unsigned_32; acpiEnable : Unsigned_8; acpiDisable : Unsigned_8; S4BIOSReq : Unsigned_8; pStateControl : Unsigned_8; PM1AEventBlock : Unsigned_32; PM1BEventBlock : Unsigned_32; PM1AControlBlock : Unsigned_32; PM1BControlBlock : Unsigned_32; PM2ControlBlock : Unsigned_32; PMTimerBlock : Unsigned_32; GPE0Block : Unsigned_32; GPE1Block : Unsigned_32; PM1EventLength : Unsigned_8; PM1ControlLength : Unsigned_8; PM2ControlLength : Unsigned_8; PMTimerLength : Unsigned_8; GPE0BlockLength : Unsigned_8; GPE1BlockLength : Unsigned_8; GPE1Base : Unsigned_8; cStateControl : Unsigned_8; pLevel2Latency : Unsigned_16; pLevel3Latency : Unsigned_16; flushSize : Unsigned_16; flushStride : Unsigned_16; dutyOffset : Unsigned_8; dutyWidth : Unsigned_8; dayAlarm : Unsigned_8; monthAlarm : Unsigned_8; century : Unsigned_8; -- RTC index into RTC RAM if not 0 intelBootArch : Unsigned_16; -- IA-PC boot architecture flags reserved2 : Unsigned_8; -- always 0 flags : Unsigned_32; -- fixed feature flags resetRegister : GenericAddressStructure; resetValue : Unsigned_8; armBootArch : Unsigned_16; fadtMinorVersion : Unsigned_8; exFirmwareControl : Unsigned_64; exDsdt : Unsigned_64; exPM1AEventBlock : GenericAddressStructure; exPM1BEventBlock : GenericAddressStructure; exPM1AControlBlock : GenericAddressStructure; exPM1BControlBlock : GenericAddressStructure; exPM2ControlBlock : GenericAddressStructure; exPMTimerBlock : GenericAddressStructure; exGPE0Block : GenericAddressStructure; exGPE1Block : GenericAddressStructure; -- ACPI 6 fields (not supported yet) --sleepControlReg : GenericAddressStructure; --sleepStatusReg : GenericAddressStructure; --hypervisorVendor : Unsigned_64; end record with Size => 244*8; for FADTRecord use record header at 0 range 0..287; firmwareControl at 36 range 0..31; dsdt at 40 range 0..31; reserved1 at 44 range 0..7; powerMgmtProfile at 45 range 0..7; sciInterrupt at 46 range 0..15; smiCommand at 48 range 0..31; acpiEnable at 52 range 0..7; acpiDisable at 53 range 0..7; S4BIOSReq at 54 range 0..7; pStateControl at 55 range 0..7; PM1AEventBlock at 56 range 0..31; PM1BEventBlock at 60 range 0..31; PM1AControlBlock at 64 range 0..31; PM1BControlBlock at 68 range 0..31; PM2ControlBlock at 72 range 0..31; PMTimerBlock at 76 range 0..31; GPE0Block at 80 range 0..31; GPE1Block at 84 range 0..31; PM1EventLength at 88 range 0..7; PM1ControlLength at 89 range 0..7; PM2ControlLength at 90 range 0..7; PMTimerLength at 91 range 0..7; GPE0BlockLength at 92 range 0..7; GPE1BlockLength at 93 range 0..7; GPE1Base at 94 range 0..7; cStateControl at 95 range 0..7; pLevel2Latency at 96 range 0..15; pLevel3Latency at 98 range 0..15; flushSize at 100 range 0..15; flushStride at 102 range 0..15; dutyOffset at 104 range 0..7; dutyWidth at 105 range 0..7; dayAlarm at 106 range 0..7; monthAlarm at 107 range 0..7; century at 108 range 0..7; intelBootArch at 109 range 0..15; reserved2 at 111 range 0..7; flags at 112 range 0..31; resetRegister at 116 range 0..95; resetValue at 128 range 0..7; armBootArch at 129 range 0..15; fadtMinorVersion at 131 range 0..7; exFirmwareControl at 132 range 0..63; exDsdt at 140 range 0..63; exPM1AEventBlock at 148 range 0..95; exPM1BEventBlock at 160 range 0..95; exPM1AControlBlock at 172 range 0..95; exPM1BControlBlock at 184 range 0..95; exPM2ControlBlock at 196 range 0..95; exPMTimerBlock at 208 range 0..95; exGPE0Block at 220 range 0..95; exGPE1Block at 232 range 0..95; -- ACPI 6 fields --sleepControlReg at 244 range 0..95; --sleepStatusReg at 256 range 0..95; --hypervisorVendor at 268 range 0..63; end record;

It’s very easy to make mis­takes in big records like this! Thank­ful­ly Ada’s record rep­re­sen­ta­tion claus­es allow us to depend on the field align­ments and sizes that we specify.

Per-CPU Data, Per-CPU Stacks, and Sec­ondary Stacks #

Since CuBit is a mul­ti-core oper­at­ing sys­tem, we need to main­tain sep­a­rate call stacks for each CPU. The mech­a­nism by which this is done is fair­ly prim­i­tive at this point. We just carve out a chunk of mem­o­ry (8 MiB as of the time of writ­ing, but I sus­pect we’ll even­tu­al­ly want more) for the ker­nel stacks (STACK_BOTTOM to STACK_TOP). This is divid­ed up by the max­i­mum num­ber of cores that CuBit is cur­rent­ly set to sup­port (128), so we get a 64k-sized stack for each CPU present.

These are ker­nel-mode stacks, so call chains should be fair­ly short. Each of these stacks is also divid­ed up into the pri­ma­ry stack, which grows down­ward, and then the Ada sec­ondary stack, which grows upward. The sec­ondary stack is not wide­ly used, but it does work. It’s set at a pal­try 2048 bytes for now, in the System.Parameters.Runtime_Default_Sec_Stack_Size run­time package.

We need to set the stack point­er ear­ly on, so that we can call adainit to per­form elab­o­ra­tion, and then our kmain function.

In boot.asm:

... setup_bsp: ; Setup our kernel stack. mov rsp, qword (STACK_TOP) ; Add a stack canary to bottom of primary stack for CPU #0 mov rax, qword (STACK_TOP - PER_CPU_STACK_SIZE + SEC_STACK_SIZE) mov rbx, 0xBAD_CA11_D37EC7ED mov [rax], rbx ; Save rdi, rsi so adainit doesn't clobber them push rdi push rsi ; Initialize with adainit for elaboration prior to entering Ada. mov rax, qword adainit call rax ; Restore arguments to kmain pop rsi pop rdi ; call into Ada code mov rax, qword kmain call rax ...

The sec­ondary stack is then ini­tial­ized with a call to System.Secondary_Stack.SS_Init with the address for the bot­tom of the CPU stack for the boot­strap processor/​CPU #0.

Remem­ber those APs that boot into real-mode? We need to set their ini­tial stack point­ers as well, once they get to 64-bit mode. When we tell those proces­sors to boot, we set a glob­al vari­able startingCPU in the Ada code pri­or to send­ing the inter­rupt that tells them to start.

We can use that to cal­cu­late where that CPU’s per-CPU stack should live in the setup_ap:. Each stack size is (STACK_TOP - STACK_BOTTOM) / MAX_CPUS. Then we can set the stack for CPU N by set­ting its RSP (stack point­er) to STACK_TOP - (startingCPU * stack size).

This rais­es anoth­er issue. Mul­ti-core code can use the pri­ma­ry stack eas­i­ly, since each CPU has its own stack point­er reg­is­ter. How­ev­er, the Ada sec­ondary stack needs to be manip­u­lat­ed using data and a point­er kept in main mem­o­ry. So how do we know which sec­ondary stack to use when we’re run­ning code on an arbi­trary CPU? It is not always obvi­ous which CPU a piece of code is run­ning on in a mul­ti-core system.

CuBit uses a per-CPU data struc­ture. It is imple­ment­ed sim­i­lar to the way that thread-local stor­age is, using the GS reg­is­ter to store an off­set to the per-CPU data. This data struc­ture includes infor­ma­tion like the CPU num­ber it rep­re­sents and the cur­rent­ly run­ning process ID. When code on a CPU attempts to use the sec­ondary stack, it gets a point­er to the ded­i­cat­ed sec­ondary stack for that CPU. This is obvi­ous­ly slow­er than access­ing the RSP reg­is­ter for the pri­ma­ry stack.

In this case, CuBit actu­al­ly keeps the per-CPU data on each of the pri­ma­ry stacks we set up ear­li­er. Gen­er­al­ly speak­ing its bad form to take the address of a stack vari­able, but here we cre­ate the per-CPU data record just pri­or to enter­ing the sched­ul­ing sub­pro­gram which loops infi­nite­ly, so it is safe to access as though it were sta­t­i­cal­ly allocated.

When the sec­ondary stack pack­age goes to allo­cate, it calls __gnat_get_secondary_stack, which is the export name of the getSecondaryStack func­tion in the PerCPUData pack­age. We use this address in SS_Allocate, which is called behind the scenes when­ev­er the sec­ondary stack is used.

Sched­ul­ing and Task­ing #

The CuBit sched­uler is very prim­i­tive and very sim­ple. It just uses a basic round-robin mod­el for now, with a lot of work to be done for var­i­ous process states like sleep­ing, etc. This is cur­rent­ly where I’m spend­ing a lot of my time on CuBit. Con­text switch­ing is the­o­ret­i­cal­ly just a mat­ter of sav­ing one reg­is­ter set and restor­ing anoth­er, and return­ing to the saved return address on the stack. x86 makes this a bit more chal­leng­ing, and the details are messy. I’m still work­ing through a lot of the fin­er points, but it most­ly works. As of the time of writ­ing, there’s a small user-mode stub in init.asm. This is the first user-mode process, which will even­tu­al­ly per­form the sys­tem call to exe­cute what­ev­er init frame­work is used.

SPARK can ver­i­fy con­cur­rent pro­grams using the Raven­scar task­ing mod­el. This is ide­al for bare-met­al appli­ca­tions where the tasks are known ahead of time. How­ev­er, in a gen­er­al-pur­pose OS, we don’t know what appli­ca­tions are going to be run, nor should we place unnec­es­sary restric­tions on them. There might be a good way to mod­el user process­es as Ada tasks in order to get some guar­an­tees from the com­pil­er, but I haven’t worked out a way to do this, and it may in fact not be feasible.

The dif­fi­cul­ty, as far as I can tell, lies in describ­ing which parts of the code might be run­ning con­cur­rent­ly and mod­el­ing their inter­ac­tions. Between mul­ti­ple cores, sys­tem calls, and inter­rupts, it gets hairy very quick­ly. Care­ful use of lock­ing prim­i­tives is the typ­i­cal route for deal­ing with this, but ensur­ing the absence of dead­lock is dif­fi­cult. If there are any SPARK experts read­ing this with some good ideas in this area, please let me know!

The Future #

CuBit is still in a very ear­ly stage of devel­op­ment. It’s just been a side-project of mine for now, and frankly I’m a lit­tle embar­rassed to write about it in it’s cur­rent state! I sup­pose they say if you aren’t embar­rassed by your first release than you’ve wait­ed too long to release”. I wel­come con­tri­bu­tions from the wider Ada/​SPARK com­mu­ni­ty. If, like me, you dream about for­mal­ly-ver­i­fied soft­ware tak­ing over the world some day, then please check it out at https://​github​.com/​d​o​c​a​n​d​r​e​w​/​C​uBit/ and start sub­mit­ting those pull requests!

Read Entire Article