The Sail instruction-set semantics specification language

3 months ago 2

Functions

In Sail, we often define functions in two parts. First we can write the type signature for the function using the val keyword, then define the body of the function using the function keyword. In this Subsection, we will write our own version of the replicate_bits function from the Sail library. This function takes a number n and a bitvector, and creates a new bitvector containing that bitvector copied n times.

val my_replicate_bits : forall 'n 'm, 'm >= 1 & 'n >= 1. (int('n), bits('m)) -> bits('n * 'm)

This signature shows how Sail can track the length of bitvectors and the value of integer variables in type signatures, using type variables. Type variables are written with a leading 'tick', so 'n and 'm are the type variables in the above signature.

Note

The leading tick is a convention derived from Standard ML, and other functional languages derived from Standard ML, such as OCaml. Readers who are familiar with Rust will also recognise this naming convention from lifetime variables in Rust types. The val keyword to declare functions is also taken from OCaml.

The type bits('m) is a bitvector of length 'm, and int('n) is an integer with the value 'n. The result of this function will therefore be a bitvector of length 'n * 'm. We can also add constraints on these types. Here we require that we are replicating the input bitvector at least once with the 'n >= 1 constraint, and that the input bitvector length is at least one with the 'm >= 1 constraint. Sail will check that all callers of this function are guaranteed to satisfy these constraints.

Sail will also ensure that the output of our function has precisely the length bits('n * 'm) for all possible inputs (hence why the keyword uses the mathematical forall quantifier).

A potential implementation of this function looks like:

function my_replicate_bits(n, xs) = { var ys = zeros(n * length(xs)); foreach (i from 1 to n) { ys = ys << length(xs); ys = ys | zero_extend(xs, length(ys)) }; ys }

Functions may also have implicit parameters, e.g. we can implement a zero extension function that implicitly picks up its result length from the calling context as follows:

val extz : forall 'n 'm, 'm >= 'n. (implicit('m), bits('n)) -> bits('m) function extz(m, xs) = zero_extend(xs, m)

Implicit parameters are always integers, and they must appear first before any other parameters in the function type signature. The first argument can then just be omitted when calling the function, like so:

let xs: bits(32) = 0x0000_0000; let ys: bits(64) = extz(xs);

Blocks

You may have noticed that in the definition of my_replicate_bits above, there was no return keyword. This is because unlike languages such as C and C++, and more similar to languages like OCaml and Rust, everything in Sail is an expression which evaluates to a value. A block in Sail is simply a sequence of expressions surrounded by curly braces { and }, and separated by semicolons. The value returned by a block is the value returned by the last expressions, and likewise the type of a block is determined by it’s final expressions, so { A; B; C }, will evaluate to the value of C after evaluating A and B in order. The expressions other than the final expression in the block must have type unit, which is discussed in the following section. Within blocks we can declare immutable variables using let, and mutable variables using var, for example:

{ let x : int = 3; var y : int = 2; y = y + 1; x + y }

The above block has type int and evaluates to the value 6.

Note	For those familiar with Rust, a trailing semicolon in Sail does not change the semantics of the block and is purely optional.

The unit type

The simplest type in Sail is the unit type unit. It is a type with a single member (). Rather than have functions that takes zero arguments, we have functions that take a single unit argument. Similarly, rather than having functions that return no results, a function with no meaningful return value can return (). The () notation reflects the fact that the unit type can be thought of as an empty tuple (see Tuples).

In Sail unit plays a similar role to void in C and C++, except unlike void it is an ordinary type and can appear anywhere and be used in generic functions.

Numeric types and bits

Sail has three basic numeric types, int, nat, and range. The type int which we have already seen above is an arbitrary precision mathematical integer. Likewise, nat is an arbitrary precision natural number. The type range('n, 'm) is an inclusive range between type variables 'n and 'm. For both int and nat we can specify a type variable that constrains elements of the type to be equal to the value of that type variable. In other words, the values of type int('n) are integers equal to 'n. So 5 : int(5) and similarly for any integer constant. These types can often be used interchangeably provided certain constraints are satisfied. For example, int('n) is equivalent to range('n, 'n) and range('n, 'm) can be converted into int('n) when 'n == 'm.

Note	The colon operator : is used for type ascription, so x : Y is read as x has type Y

Note that bit isn’t a numeric type (i.e. it’s not range(0, 1)). This is intentional, as otherwise it would be possible to write expressions like (1 : bit) + 5 which would end up being equivalent to 6 : range(5, 6). This kind of implicit casting from bits to other numeric types would be highly undesirable. The bit type itself is a two-element type with members bitzero and bitone.

In addition, we can write a numeric type that only contains a fixed set of integers. The type {32, 64} is a type that can only contain the values 32 and 64.

Note	In older Sail versions the numeric set type would have been denoted {\|32, 64\|}. This syntax is now deprecated.

Bitvector literals

Bitvector literals in Sail are written as either 0x followed by a sequence of hexadecimal digits (as in 0x12FE) or 0b followed by a sequence of binary digits (as in 0b1010100). The bit length of a hex literal is always four times the number of hexadecimal digits, and the bit length of binary literal is always the exact number of binary digits. So, 0x12FE has bit length 16, and 0b1010100 has bit length 7. To ensure bitvector logic in specifications is precisely specified, we do not allow any kind of implicit widening or truncation as might occur in C. To change the length of a bitvector, explicit zero/sign extension and truncation functions must be used. Underscores can be used in bitvector literals to separate groups, where each group is typically 16 bits. For example:

let large_bitvector : bits(64) = 0xFFFF_0000_1234_0000

We can make bitvectors as large as we need:

let even_larger_bitvector : bits(192) = 0xFFFF_FFFF_FFFF_FFFF_0000_0000_0000_0000_ABCD_ABCD_ABCD_ABCD

We can also write bitvectors very verbosely using bitzero and bitone, like:

let v : bits(2) = [bitzero, bitzero]

The @ operator is used to concatenate bitvectors, for example:

let x = 0xFFFF; let y = 0x0000; assert(x @ y == 0xFFFF_0000);

For historical reasons the bit type is not equal to bits(1), and while this does simplify naively mapping the bits type into a (very inefficient!) representation like bit list in Isabelle or OCaml, it might be something we reconsider in the future.

Sail allows two different types of bitvector orderings---increasing (inc) and decreasing (dec). These two orderings are shown for the bitvector 0b10110000 below.

ordering tikz

For increasing (bit)vectors, the 0 index is the most significant bit and the indexing increases towards the least significant bit. Whereas for decreasing (bit)vectors the least significant bit is 0 indexed, and the indexing decreases from the most significant to the least significant bit. For this reason, increasing indexing is sometimes called `most significant bit is zero' or MSB0, while decreasing indexing is sometimes called `least significant bit is zero' or LSB0. While this vector ordering makes most sense for bitvectors (it is usually called bit-ordering), in Sail it applies to all vectors. A default ordering can be set using

and this should usually be done right at the beginning of a specification. This setting is global, and increasing and decreasing bitvectors can therefore never be mixed within the same specification!

In practice decreasing order is the almost universal standard and only POWER uses increasing order. All currently maintained Sail specifications use decreasing. You may run into issues with increasing bitvectors as the code to support these is effectively never exercised as a result.

Vectors

Sail has the built-in type vector, which is a polymorphic type for fixed-length vectors. For example, we could define a vector v of three integers as follows:

let v : vector(3, int) = [3, 2, 1]

The first argument of the vector type is a numeric expression representing the length of the vector, and the second is the type of the vector’s elements. As mentioned in the bitvector section, the ordering of bitvectors and vectors is always the same, so:

let a_generic_vector : vector(3, bit) = [bitzero, bitzero, bitone]; let a_bitvector : bits(3) = [bitzero, bitzero, bitone]; // 0b001 assert(a_generic_vector[0] == bitone); assert(a_bitvector[0] == bitone)

Note that a generic vector of bits and a bitvector are not the same type, despite us being able to write them using the same syntax. This means you cannot write a function that is polymorphic over both generic vectors and bitvectors). This is because we typically want these types to have very different representations in our various Sail backends, and we don’t want to be forced into a compilation strategy that requires monomorphising such functions.

Accessing and updating vectors

A (bit)vector can be indexed by using the vector index notation. So, in the following code:

let v : vector(4, int) = [1, 2, 3, 4] let a = v[0] let b = v[3]

a will be 4, and b will be 1 (we assume default Order dec here). By default, Sail will statically check for out of bounds errors, and will raise a type error if it cannot prove that all such vector accesses are valid.

A vector v can be sliced using the v[n .. m] notation. The indexes are always supplied with the index closest to the MSB being given first, so we would take the bottom 32-bits of a decreasing bitvector v as v[31 .. 0], and the upper 32-bits of an increasing bitvector as v[0 .. 31], i.e. the indexing order for decreasing vectors decreases, and the indexing order for increasing vectors increases.

A vector v can have an index index using [v with index = expression]. Similarly, a sub-range of v can be updated using [v with n .. m = expression] where the order of the indexes is the same as described above for increasing and decreasing vectors.

The list type

In addition to vectors, Sail also has list as a built-in type. For example:

let l : list(int) = [|1, 2, 3|]

The cons operator is ::, so we could equally write:

let l : list(int) = 1 :: 2 :: 3 :: [||]

For those unfamiliar the word, 'cons' is derived from Lisp dialects, and has become standard functional programming jargon for such an operator — see en.wikipedia.org/wiki/Cons for more details.

Caution

The list type (plus the recursive functions typically used to manipulate lists) does not work well with certain Sail targets, such as the SMT and SystemVerilog backends. The vector type is almost always preferable to the list type. The inclusion of the list type (where we otherwise forbid recursive types) was perhaps a design mistake.

Tuples

Sail has tuple types which represent heterogeneous sequences containing values of different types. A tuple type (T1, T2, …) has values (x1, x2, …) where x1 : T1, x2 : T2 and so on. A tuple must have 2 or more elements. Some examples of tuples would be:

let tuple1 : (string, int) = ("Hello, World!", 3) let tuple2 : (nat, nat, nat) = (1, 2, 3)

Note that while the function type (A, B) -> C might look like a function taking a single tuple argument, it is in fact a function taking two arguments. If we wanted to write a function taking a single tuple argument, we would instead write:

val single_tuple_argument : ((int, int)) -> unit function single_tuple_argument(tuple) = { let (x, y) = tuple; print_int("x = ", x); print_int("y = ", y); }

which would be called as

single_tuple_argument((1, 2))

Note	This is because in Sail the function type is denoted (A, B, …) -> C, but we allow the brackets to be elided when the function has a single non-tuple argument so we can write A -> B rather than (A) -> B.

Strings

Sail has a string type, which is primarily used for error reporting and debugging.

Caution

Sail is not a language designed for working with strings, and the semantics of ISA specifications should not depend on any logic involving strings. If you find yourself using strings for reasons other than printing or logging errors in a Sail specification, you should probably reconsider.

A Sail string is any sequence of ASCII characters between double quotes. Backslash is used to introduce escape sequences, and the following escape sequences are supported:

\\ — Backslash
\n — Newline character
\t — Tab character
\b — Backspace character
\r — Carriage return
\' — Single quote (somewhat unnecessary, as single quotes are allowed in Sail strings)
\" — Double quote
\DDD — The character with decimal ASCII code DDD
\xHH — The character with hexadecimal ASCII code HH

Multi-line strings can be written by escaping the newline character at the end of a line:

print_endline("Hello, \ World!"); // Is equivalent to print_endline("Hello, World!")

User-defined types

Sail supports various kinds of user-defined types: structs, unions, and enums.

Structs

Like in C, structs are used to hold multiple values. Structs are created using the struct keyword followed by the name of the struct. The data the struct contains is then defined following the = symbol. Each piece of data stored in a struct is called a field, and each is given a unique name that identifies that piece of data, and can have a different type.

The following example struct defines three fields. The first contains a bitvector of length 5 and is called field1. The second, field2 contains an integer. The third, field3 contains a string.

struct My_struct = { field1 : bits(5), field2 : int, field3 : string, }

We can then construct struct values by using the struct keyword and providing values for all the fields. The individual fields can be accessed using the . operator. If the struct is mutable, we can also the . operator on the left-hand side of an assignment to assign to the struct field. For immutable structs, we can update them using the with keyword, which will create a new struct value with some fields changed.

let s : My_struct = struct { field1 = 0b11111, field2 = 5, field3 = "test", }; // Accessing a struct field, prints "field1 is 0b11111" print_bits("field1 is ", s.field1); // Updating struct fields, creating a new struct variable s2 var s2 = { s with field1 = 0b00000, field3 = "a string" }; // Prints "field1 is 0b00000" print_bits("field1 is ", s2.field1); // Prints "field2 is 5", as field2 was taken from 's' and not modified print_int("field2 is ", s2.field2); // Because s2 is mutable we can assign to it s2.field3 = "some string"; // Prints "some string" print_endline(s2.field3)

Structs can have user defined type parameters. The following example has an integer parameter 'n and a type parameter 'a.

struct My_other_struct('n, 'a) = { a_bitvector : bits('n), something : 'a }

The struct can then be instantiated with any length of bitvector for the a_bitvector field, and any data type for the something field, including another struct like the My_struct type we defined earlier.

var s3 : My_other_struct(2, string) = struct { a_bitvector = 0b00, something = "a string" }; var s4 : My_other_struct(32, My_struct) = struct { a_bitvector = 0xFFFF_FFFF, something = struct { field1 = 0b11111, field2 = 6, field3 = "nested structs!", } }; s3.a_bitvector = 0b11; // Note that once created, we cannot change a variable's type, so the following is forbidden: // s3.a_bitvector = 0xFFFF_FFFF; // Changing the type is also forbidden in a with expression: // let s4 : My_other_struct(32, string) = { s3 with a_bitvector = 0xFFFF_FFFF }; // If structs are nested, then field accesses can also be nested print_endline(s4.something.field3); // and assignments s4.something.field3 = "another string"

Enums

Sail enumerations are also much like their C counterparts. An enum is declared using the enum keyword followed by the name of the enumeration type. The members of the enumeration are specified as a comma-separated list of identifiers between curly braces, like so:

enum My_enum = { Foo, Bar, Baz, Quux, }

The above style is best when there are a lot of elements in the enumeration. We allow enums to be defined in a style similar to Haskell, where the identifiers are separated by a vertical bar | character, like so:

enum My_short_enum = A | B | C

This style is better when the enumeration is short and has few constructors, however specification authors may prefer to use the first style exclusively for consistency.

Sail will automatically generate functions for converting enumeration members into numbers, starting with 0 for the first element of the enumeration.

function enum_example1() = { assert(num_of_My_enum(Foo) == 0); assert(num_of_My_enum(Bar) == 1); assert(num_of_My_enum(Baz) == 2); assert(num_of_My_enum(Quux) == 3); }

Likewise, there is a function for converting the other way

function enum_example2() = { assert(My_enum_of_num(0) == Foo) }

If the names of the generated functions are not ideal, then custom names can be provided using the enum_number_conversions attribute.

$[enum_number_conversions { from_enum = to_number, to_enum = from_number }] enum Another_enum = { Member1, Member2, } function custom_conversions() -> unit = { assert(to_number(Member1) == 0); assert(from_number(1) == Member2); }

The no_enum_number_conversions attribute can be used to disable the generation of these functions entirely.

Unions

Structs provide us a way to group multiple pieces of related data, for example

struct rectangle = { width : int, height : int, }

defines a type rectangle that has both a width and a height. The other logical way to combine data is if we want to say we have some data or some other data, like if we wanted to represent a shape that could be a circle or a rectangle. In Sail, we use a union for this.

struct circle = { radius : int } union shape = { Rectangle : rectangle, Circle : circle, }

Note

While structs are ubiquitous in programming languages, the equivalent feature for safely representing disjunctions of types is often a second-class afterthought or even absent entirely. The requirement to safely represent disjunctions of types is why C’s union feature is something different to what we want here, which is sometimes called tagged unions or variants to distinguish them from the unsafe C construct. C++ is therefore a good example of a language where structs are first class, but std::variant is relegated to a library feature.

This defines a type with two constructors Rectangle and Circle. Constructors are used like functions with a single argument, as so:

function example() = { // Construct a shape using the Rectangle constructor let r : shape = Rectangle(struct { width = 30, height = 20 }); // Construct a shape using the Circle constructor // Note that we can allow the 'shape' type to be inferred let c = Circle(struct { radius = 15 }); }

To destructure unions, we typically use pattern matching to handle each case in the union separately. This will be discussed in detail in Pattern matching.

Constructors in Sail always have a single argument. If we want a constructor in a union type to contain no information, then we can use the unit type. For a constructor with multiple arguments we can use a tuple type. We provide some syntax sugar that allows writing constructor applications with either tuple or unit argument types as if they were functions with multiple arguments or no arguments respectively.

union example_unit_constructor = { Foo : (int, string), Bar : unit, } // Using a constructor with a tuple type let x1 = Foo(2, "a string") // Note that the above is equivalent to let x2 = Foo((2, "a string")) // Using a unit-type constructor let y1 = Bar() // Similarly, equivalent to let y2 = Bar(())

In our shape example, we defined rectangle and circle as two separate structs, but if we wanted we could declare those structs inline

union shape2 = { Rectangle2 : { width : int, height : int }, Circle2 : { radius : int }, }

Behind the scenes, Sail will just generate separate struct definitions, so we use this type in exactly the same was as we did previously.

As with structs, we can define polymorphic union types, like

union either('a, 'b) = { Left : 'a, Right : 'b, }

Type synonyms

We have now described all the various types available in Sail. However, in order to create more descriptive and self-documenting specifications, we might want to give types new names that better inform the reader of the intent behind the types. We can do this using type synonyms, which are created with the type keyword.

For example, if we have an architecture with 32 general purpose registers, we might want to index them using a 5-bit type. Rather than using bits(5) directly, we create a synonym:

We can now write functions that use this type:

register R1 : bits(32) register R2 : bits(32) function access_register(r : regbits) -> bits(32) = { match r { 0b00000 => 0x0000_0000, // zero register 0b00001 => R1, 0b00010 => R2, // and so on } }

Sail types also include numbers, as in bits(32) above. We can therefore write a type synonym for just such a number:

In which case, we could re-write our example as

register R1 : bits(xlen) register R2 : bits(xlen) function access_register(r : regbits) -> bits(xlen) = { match r { 0b00000 => 0x0000_0000, // zero register 0b00001 => R1, 0b00010 => R2, // and so on } }

Type synonyms are completely transparent to the Sail type-checker, so a type synonym can be used exactly the same as the original type.

We can also write type-synonyms that have arguments, for example:

type square('n, 'a) = vector('n, vector('n, 'a))

Type kinds

Throughout the previous section we have seen many different types. For example we could have a type like list(int), or a type like range(2, 5). The range type takes two numbers as an arguments, whereas list takes a type as an argument. We’ve also seen polymorphic user-defined types like:

struct foo('n, 'a) = { some_bits : bits('n), some_a : 'a, }

Which we could use as the type foo(32, string), or foo(16, int). What stops us from writing foo(string, int) or even range(int, int) or list(3) however? Those should clearly be type errors, and if they are type errors does that mean types have types? The answer to this is yes, and we call the types of types kinds. Sail has three different kinds of types, which we denote as Int, Bool, and Type. The Int kind is for type-level integers, Bool is for type-level (boolean) constraints, and Type is for ordinary types. Type constructors therefore have types much like functions, with list having the type Type → Type and range having the type (Int, Int) → Type.

Note

From a type-theory perspective, one really can think of type constructors as just functions at the type level. As functions are applied to terms, type constructors are applied to types. This is partly why we use parentheses for type constructor application in Sail — not only does it neatly sidestep the notoriously thorny parsing problems of < and > in types (see Rust’s infamous turbofish operator, or requiring extra spaces in C++ templates to avoid ambiguities with >>), but it really is the same concept when viewed in the right way.

Sail allows writing the kinds explicitly in type definitions, so we could write:

struct foo('n : Int, 'a : Type) = { some_bits : bits('n), some_a : 'a, }

but in practice we don’t need to do this, as Sail has kind-inference to avoid us having to think about this particular detail.

Pattern matching

Like most functional languages, and an increasing number of imperative languages like Rust, Sail supports pattern matching via the match keyword. For example:

let n : int = f(); match n { 1 => print_endline("1"), 2 => print_endline("2"), 3 => print_endline("3"), _ => print_endline("wildcard"), }

The match keyword takes an expression and then branches by comparing the matched value with a pattern. Each case in the match expression takes the form <pattern> => <expression>, separated by commas (a trailing comma is allowed). The cases are checked sequentially from top to bottom, and when the first pattern matches its expression will be evaluated.

The concrete match statement syntax in Sail is inspired by the syntax used in Rust — but programmers coming from languages with no pattern matching features may be unfamiliar with the concept. One can think of the match statement like a super-powered switch statement in C. At its most basic level a match statement can function like a switch statement (except without any fall-through). As in the above example we can use match to compare an expression against a sequence of values like so:

match expression { 1 => print_endline("expression == 1"), 2 => { // A match arm can have a single expression or a block // (because blocks in Sail are also expressions!) print_endline("expression == 2") // Note that unlike in C, no 'break' is needed as there is no fallthrough }, _ => print_endline("This wildcard pattern acts like default: in a C switch") }

However the pattern in a match statement can also bind variables. In the following example we match on a numeric expression x + y, and if it is equal to 1 we execute the first match arm. However, if that is not the case the value of x + y is bound to a new immutable variable z.

match x + y { 1 => print_endline("x + y == 1"), z => { // here z is a new variable defined as x + y. print_int("x + y = ", z) // z is only defined within the match arm }, }

Finally, we can use patterns to destructure values — breaking them apart into their constituent parts. For example if we have a pair expression we can break it apart into the first value in the pair and the second, which can then be used as individual variables:

match pair : (int, int) { (first, second) => { // here we have split a pair into two variables first and second print_int("first = ", first); print_int("second = ", second) } }

These features can be combined, so if we had a pattern (first, 3) in the above example, the expression for that pattern would be executed when the second element of the pair is equal to 3, and the first element would be bound to the variable first.

Sail will check match statements for exhaustiveness (meaning that the patterns in the match cover every possible value), and prints a warning if the overall match statement is not exhaustive. There are some limitations on the exhaustiveness checker which we will discuss further below.

Guards

What if we need to switch based on more complex logic than just the structure and values of the expressions we are matching on? For this matches in Sail support guards. A guard allows us to combine the behavior of a match expression and the boolean logic of an if expression — and the syntax is reflective of this, as we can use the if keyword to add extra logic to each match arm:

match x + y { z if z > 0 => print_endline("y is greater than 0"), z if z == 0 => print_endline("z is equal to 0"), z => print_endline("z is less than 0"), }

You may wonder — why not write z if z < 0 for the final case? Here we run into one of the limitations of the exhaustiveness checker mentioned above. Sail can only check the exhaustiveness of unguarded clauses, meaning that the checker only considers cases without an if guard. The z pattern by itself is exhaustive, so the checker is happy, but if we added a if z < 0 guard the checker would complain that there are no unguarded patterns for it to look at. This may seem suprising for such a simple case (we can easily see the three guards would cover all cases!), however each guard clause could contain arbitrarily complex logic potentially abstracted behind arbitrary function calls, which the completeness checker cannot reason about.

We now describe all the things that can be matched on in Sail

Matching on literals

First, and as we have already seen, we can match on literal values. These literal values can be (), bitvectors, the boolean values true and false, numbers, and strings.

Matching on enumerations

Match can be used to match on possible values of an enum, like so:

match x { A => print_endline("A"), B => print_endline("B"), C => print_endline("C") }

Note that because Sail places no restrictions on the lexical structure of enumeration elements to differentiate them from ordinary identifiers, pattern matches on variables and enum elements can be somewhat ambiguous. Issues with this are usually caught by the exhaustiveness checker — it can warn you if removing an enum constructor breaks a pattern match.

Matching on tuples

We use match to destructure tuple types, for example:

let x: (int, int) = (2, 3); match x { (y, z) => print_endline("y = 2 and z = 3") }

Matching on unions

Match can also be used to destructure tagged union constructors, for example using the option type from the Sail library.

$include <option.sail> val g : unit -> option(int) function match_union() -> unit = { let x = g(); match x { Some(n) => print_int("n = ", n), None() => print_endline("got None()!"), } }

Note that like how calling a function with a unit argument can be done as f() rather than f(()), matching on a constructor C with a unit type can be achieved by using C() rather than C(()).

Matching on lists

Sail allows lists to be destructured using patterns. There are two types of patterns for lists, cons patterns and list literal patterns. The cons pattern destructures lists into the first element (the head) and the rest of the list (the tail).

match ys { x :: xs => print_endline("cons pattern"), [||] => print_endline("empty list") }

On the other hand, a list pattern matches on the entire list:

match ys { [|1, 2, 3|] => print_endline("list pattern"), _ => print_endline("wildcard") }

Matching on structs

Match can also be used for structures, for example:

struct my_struct = { field1 : bits(16), field2 : int, } function match_struct(value : my_struct) -> unit = { match value { // match against literals in the struct fields struct { field1 = 0x0000, field2 = 3 } => (), // bind the struct field values to immutable variables struct { field1 = x, field2 = y } => { print_bits("value.field1 = ", x); print_int("value.field2 = ", y) } } }

We can omit fields from the match by using a wildcard _ in place of some of the fields:

match value { struct { field1 = x, _ } => { print_bits("value.field1 = ", x) } }

Finally, if we want to create a variable with the same name as a field, rather than typing struct { field_name = field_name, _ }, we can shorten this to just struct { field_name, _ }, So the above example is equivalent to:

match value { struct { field1, _ } => { print_bits("value.field1 = ", field1) } }

As patterns

Like OCaml, Sail also supports naming parts of patterns using the as keyword. For example, in the above cons pattern example we could bind the entire matched list to the zs variable:

match (1 :: 2 :: ys) : list(int) { x :: xs as zs => print_endline("cons with as pattern"), [||] => print_endline("empty list"), }

The as pattern has lower precedence than any other keyword or operator in a pattern, so in this example the pattern brackets as (x :: xs) as zs

Automatic wildcard insertion

The various theorem provers that Sail can produce definitions for are often strict, and enforce that pattern matches are exhaustive. However, their pattern exhaustiveness checkers do not understand bitvectors in the same way Sail does. For example, Sail can tell that the following match is complete:

match (x, y) { (0b1, _) => print_endline("1"), (_, 0b0) => print_endline("2"), (0b0, 0b1) => print_endline("3"), }

When translating to a target without bitvector literals, this would be rewritten to use guards:

match (x, y) { (t1, _) if t1 == 0b1 => print_endline("1"), (_, t2) if t2 == 0b0 => print_endline("2"), (t1, t2) if t1 == 0b0 & t2 == 0b1 => print_endline("3"), }

Most targets that check pattern exhaustiveness share the same limitation as Sail — they only check match arms without guards, so they would not see that this match is complete. To avoid this, Sail will attempt to replace literal patterns with wildcards when possible, so the above will be rewritten to:

match (x, y) { (0b1, _) => print_endline("1"), (_, 0b0) => print_endline("2"), (_, _) => print_endline("3"), }

which is equivalent.

One can find situations where such wildcards cannot be inserted. For example:

function cannot_wildcard(x: option(bits(1)), y: option(bits(1))) -> unit = { match (x, y) { (Some(0b0), Some(_)) => print_endline("1"), (Some(0b1), Some(0b0)) => print_endline("2"), (Some(0b1), _) => print_endline("3"), (Some(0b0), None()) => print_endline("4"), (None(), _) => print_endline("5"), } }

Here the 0b1 literal in the (Some(0b1), _) case would need to be replaced for an exhaustiveness checker without bitvector literals to check the case where x and y are both Some, but this would change the behavior when x is Some and y is None, hence a wildcard cannot be inserted.

In this case Sail will print a warning explaining the problem:

Warning: Required literal ../examples/cannot_wildcard.sail:17.14-17: 17 | (Some(0b1), _) => print_endline("3"), | ^-^ | Sail cannot simplify the above pattern match: This bitvector pattern literal must be kept, as it is required for Sail to show that the surrounding pattern match is complete. When translated into prover targets (e.g. Lem, Coq) without native bitvector patterns, they may be unable to verify that the match covers all possible cases.

This warning should be heeded, and the match simplified otherwise the generated theorem prover definitions produced by Sail may be rejected by the prover.

Matching in let and function arguments

The match statement isn’t the only place we can use patterns. We can also use patterns in function arguments and with let, for example:

struct S = { field1 : int, field2 : string, } function example1(s : S) -> unit = { // Destructure 's' using let let struct { field1, _ } = s; print_int("field1 = ", field1) } // Destructure directly in the function argument function example2(struct { field1, _ } : S) -> unit = { print_int("field1 = ", field1) }

Type patterns

In the previous section we saw as patterns, which allowed us bind additional variables for subpatterns. However, as patterns can also be used to bind type variables. For example:

// Some function that returns either 32 or 64 at runtime val get_current_xlen : unit -> {32, 64} register R : bits(32) val example : int -> unit function example() = { let xlen as int('n) = get_current_xlen() // Create a bitvector of length xlen let bv : bits('n) = zero_extend(xlen, *R); print_bits("bv = ", bv) }

You can think of the as int('n) as matching on the return type of the get_current_xlen rather than the value, and binding it’s length to a new type variable 'n, which we can subsequently use in types later in our function. Note that even though we only know if xlen will be 32 or 64 at runtime after the call to get_current_xlen, Sail is still able to statically check all our bitvector accesses.

If a type only contains a single type variable (as int('n) does), then we allow omitting the type name and just using a variable as the type pattern, for example the following would be equivalent to the first line of example above:

let xlen as 'n = get_current_xlen();

If we want to give the variable xlen and the type variable 'n the same name, we could go further and simplify to:

function example() = { let 'xlen = get_current_xlen() // Create a bitvector of length xlen let bv : bits('xlen) = zero_extend(xlen, *R); print_bits("bv = ", bv) }

Here we can use xlen within the function as both a regular variable xlen and as a type variable 'xlen.

Mutable variables

Bindings made using let are always immutable, but Sail also allows mutable variables. Mutable variables are created by using the var keyword within a block.

{ var x : int = 3; // Create a new mutable variable x initialised to 3 x = 2 // Rebind it to the value 2 }

Like let-bound variables, mutable variables are lexically scoped, so they only exist within the block that declared them.

Technically, unless the --strict-var option is used Sail also allows mutable variables to be implicitly declared, like so:

{ x : int = 3 // Create a new mutable variable x initialised to 3 x = 2 // Rebind it to the value 2 }

This functions identically, just without the keyword. We consider this deprecated and strongly encourage the use of the var keyword going forwards.

The assignment operator is the equality symbol, as in C and other programming languages. Sail supports a rich language of l-value forms, which can appear on the left of an assignment. These will be described in the next section.

One important thing to note is that Sail always infers the most specific type it can for variables, and in the presence of integer types with constraints, these types can be very specific. This is not a problem for immutable bindings, but can cause issues for mutable variables when explicit types are omitted. The following will not typecheck:

The reason is that Sail (correctly) infers that x has the type 'the integer equal to 3', and therefore refuses to allow us to assign 2 to it (as it well should), because two is not equal to three. To avoid this we must give an annotation with a less specific type like int.

Assignment and l-values

It is common in ISA specifications to assign to complex l-values, e.g. to a subvector or named field of a bitvector register, or to an l-value computed with some auxiliary function, e.g. to select the appropriate register for the current execution model.

Sail has l-values that allow writing to individual elements of a (bit)vector:

var v : bits(8) = 0xFF; v[0] = bitzero; assert(v == 0xFE)

As well as sub-ranges of a (bit)vector:

var v : bits(8) = 0xFF; v[3 .. 0] = 0x0; // assume default Order dec assert(v == 0xF0)

We also have vector concatenation l-values, which work much like vector concatenation patterns

var v1 : bits(4) = 0xF; var v2 : bits(4) = 0xF; v1 @ v2 = 0xAB; assert(v1 == 0xA & v2 == 0xB)

For structs we can write to an individual struct field as

// Assume S is a struct type with a single bits(8) field (called field) var s : S = struct { field = 0xFF }; s.field = 0x00; assert(s.field == 0x00)

We can also do multiple assignment using tuples, for example:

var (x, y) = (2, 3); assert(x == 2 & x == 3) // swap values (y, x) = (x, y) assert(x == 3 & x == 2)

Finally, we allow functions to appear in l-values. This is a very simple way to declare setter functions that look like custom l-values, for example:

memory(addr) = 0x0F // is just syntactic sugar for memory(addr, 0x0F)

This feature is commonly used when setting registers or memory that have some additional semantics when they are read or written. We commonly use the ad-hoc overloading feature to declare what appear to be getter/setter pairs, so for the above example we could implement a read_memory function and a write_memory function and overload them both as memory to allow us to write memory using memory(addr) = data and read memory as data = memory(addr), for example:

val read_memory : bits(64) -> bits(8) val write_memory : (bits(64), bits(8)) -> unit overload memory = {read_memory, write_memory}

Bitfields

The following example creates a bitfield type called cr_type and a register CR of that type. The underlying bitvector type (in this case bits(8)) must be specified as part of the bitfield declaration.

Note	The underlying bitvector type can be specified using a type synonym, like xlenbits in sail-riscv.

bitfield cr_type : bits(8) = { CR0 : 7 .. 4, LT : 7, GT : 6, CR1 : 3 .. 2, CR3 : 1 .. 0 } register CR : cr_type

If the bitvector is decreasing then indexes for the fields must also be in decreasing order, and vice-versa for an increasing vector. The field definitions can be overlapping and do not need to be contiguous.

A bitfield generates a type that is simply a struct wrapper around the underlying bitvector, with a single field called bits containing that bitvector. For the above example, this will be:

struct cr_type = { bits : bits(8) }

This representation is guaranteed, so it is expected that Sail code will use the bits field to access the underlying bits of the bitfield as needed. The following function shows how the bits contained in a bitfield can be accessed:

function bitfield_access_example() = { // Rather than using numeric indices, the bitfield names are used let cr0_field : bits(4) = CR[CR0]; // Bitfield accessors always return bitvector results let lt_field : bits(1) = CR[LT]; // Note 'bits(1)' not 'bit' // Can access the underlying bitvector using the bits field let some_bits : bits(7) = CR.bits[6 .. 0]; }

Similarly, bitfields can be updated much like regular vectors just using the field names rather than numeric indices:

function bitfield_update_example() = { // We can set fields on the CR register using their field names CR[CR3] = 0b01; // If we want to set the underlying bits CR.bits[1 .. 0] = 0b01; // We can even use vector update notation! CR = [CR with CR3 = 0b01, LT = 0b1]; // Because of the representation, we could set the whole thing like: CR = (struct { bits = 0x00 } : cr_type); }

Older versions of Sail did not guarantee the underlying representation of the bitfield (because it tried to do clever things to optimise them). This meant that bitfields had to be accessed using getter and setter functions, like so:

function legacy_bitfields() = { // Assigning to a field of a bitfield register CR->CR3() = 0b01; // '->' notation just means the setter takes the register by reference: (ref CR).CR3() = 0b01; // Assigning all the bits (now just 'CR.bits = 0x00') CR->bits() = 0x00; // Accessing a field let cr0_field : bits(4) = CR.CR0(); // Updating a field CR = update_CR3(CR, 0b01); // now '[ CR with CR3 = 0b01 ]' // Construct a new CR bitfield CR = Mk_cr_type(0x00); }

The method like accessor syntax was (overly sweet) syntactic sugar for getter and setter functions following a specific naming convention that was generated by the bitfield. These functions are still generated for backwards compatibility, but we would advise against using them.

Note	Except perhaps for the Mk_cr_type function or equivalent for other bitfields, which is still quite useful for creating bitfields.

Operators

Valid operators in Sail are sequences of the following non alpha-numeric characters: !%&*+-./:<>=@^|#. Additionally, any such sequence may be suffixed by an underscore followed by any valid identifier, so <=_u or even <=_unsigned are valid operator names. Operators may be left, right, or non-associative, and there are 10 different precedence levels, ranging from 0 to 9, with 9 binding the tightest. To declare the precedence of an operator, we use a fixity declaration like:

For left or right associative operators, we’d use the keywords infixl or infixr respectively. An operator can be used anywhere a normal identifier could be used via the operator keyword. As such, the <=_u operator can be defined as:

val operator <=_u : forall 'n. (bits('n), bits('n)) -> bool function operator <=_u(x, y) = unsigned(x) <= unsigned(y)

Built-in precedences

The precedence of several common operators are built into Sail. These include all the operators that are used in type-level numeric expressions, as well as several common operations such as equality, division, and modulus. The precedences for these operators are summarised in the following table.

Table 1. Default Sail operator precedences Precedence Left associative Non-associative Right associative

9
8			^
7	*, \, %
6	+, -
4		<, ⇐, >, >=, !=, =, ==
3			&
2			\|
1

Ad-hoc overloading

Sail has a flexible overloading mechanism using the overload keyword. For example:

overload foo = {bar, baz}

It takes an identifier name, and a list of other identifier names to overload that name with. When the overloaded name is seen in a Sail definition, the type-checker will try each of the overloads (that are in scope) in order from left to right until it finds one that causes the resulting expression to type-check correctly.

Multiple overload declarations are permitted for the same identifier, with each overload declaration after the first adding its list of identifier names to the right of the overload list (so earlier overload declarations take precedence over later ones). As such, we could split every identifier from above example into its own line like so:

overload foo = {bar} overload foo = {baz}

Note that if an overload is defined in module B using identifiers provided by another module A, then a module C that requires only B will not see any of the identifiers from A, unless it also requires A. See the section on modules for details. Note that this means an overload cannot be used to 're-export' definitions provided by another module.

As an example for how overloaded functions can be used, consider the following example, where we define a function print_int and a function print_string for printing integers and strings respectively. We overload print as either print_int or print_string, so we can print either number such as 4, or strings like "Hello, World!" in the following main function definition.

val print_int : int -> unit val print_string : string -> unit overload print = {print_int, print_string} function main() : unit -> unit = { print("Hello, World!"); print(4) }

We can see that the overloading has had the desired effect by dumping the type-checked AST to stdout using the following command sail -ddump_tc_ast examples/overload.sail. This will print the following, which shows how the overloading has been resolved

function main () : unit = { print_string("Hello, World!"); print_int(4) }

This option can sometimes be quite useful for observing how overloading has been resolved. Since the overloadings are done in the order they are defined, it can be important to ensure that this order is correct. A common idiom in the standard library is to have versions of functions that guarantee more constraints about their output be overloaded with functions that accept more inputs but guarantee less about their results. For example, we might have two division functions:

val div1 : forall 'm 'n, 'n >= 0 & 'm > 0. (int('n), int('m)) -> {'o, 'o >= 0. int('o)} val div2 : (int, int) -> option(int)

The first guarantees that if the first argument is greater than or equal to zero, and the second argument is greater than zero, then the result will be greater than or equal to zero. If we overload these definitions as

overload operator / = {div1, div2}

Then the first will be applied when the constraints on its inputs can be resolved, and therefore the guarantees on its output can be guaranteed, but the second will be used when this is not the case, and indeed, we will need to manually check for the division by zero case due to the option type. Notice that the return type can be different between different cases in the overload.

Mappings

Mappings are a feature of Sail that allow concise expression of bidirectional relationships between values that are common in ISA specifications: for example, bit-representations of an enum type, or the encoding-decoding of instructions.

They are defined similarly to functions, with a val keyword to specify the type and a definition, using a bidirectional arrow <-> rather than a function arrow ->, and a separate mapping definition.

val size_bits : word_width <-> bits(2) mapping size_bits = { BYTE <-> 0b00, HALF <-> 0b01, WORD <-> 0b10, DOUBLE <-> 0b11 }

As a shorthand, you can also specify a mapping and its type simultaneously:

mapping size_bits2 : word_width <-> bits(2) = { BYTE <-> 0b00, HALF <-> 0b01, WORD <-> 0b10, DOUBLE <-> 0b11 }

Mappings are used simply by calling them as if they were functions: type inference will determine in which direction the mapping is applied. (This gives rise to the restriction that the types on either side of a mapping must be different.)

let width : word_width = size_bits(0b00); let width : bits(2) = size_bits(BYTE);

Sometimes, because the subset of Sail allowed in bidirectional mapping clauses is quite restrictive, it can be useful to specify the forwards and backwards part of a mapping separately:

mapping size_bits3 : word_width <-> bits(2) = { BYTE <-> 0b00, HALF <-> 0b01, WORD <-> 0b10, forwards DOUBLE => 0b11, // forwards is left-to-right backwards 0b11 => DOUBLE, // backwards is right-to-left }

Sizeof and constraint

Sail allows for arbitrary type variables to be included within expressions. However, we can go slightly further than this, and include both arbitrary (type-level) numeric expressions in code, as well as type constraints. For example, if we have a function that takes two bitvectors as arguments, then there are several ways we could compute the sum of their lengths.

val f : forall 'n 'm. (bits('n), bits('m)) -> unit function f(xs, ys) = { let len = length(xs) + length(ys); let len = 'n + 'm; let len = sizeof('n + 'm); () }

Note that the second line is equivalent to

let len = sizeof('n) + sizeof('n)

There is also the constraint keyword, which takes a type-level constraint and allows it to be used as a boolean expression, so we could write:

function f(xs, ys) = { if constraint('n <= 'm) { // Do something } }

rather than the equivalent test length(xs) <= length(ys). This way of writing expressions can be succinct, and can also make it very explicit what constraints will be generated during flow typing. However, all the constraint and sizeof definitions must be re-written to produce executable code, which can result in the generated theorem prover output diverging (in appearance) somewhat from the source input. In general, it is probably best to use sizeof and constraint sparingly on type variables.

One common use for sizeof however, is to lower type-level integers down to the value level, for example:

// xlen is a type of kind 'Int' type xlen = 64 val f : int -> unit function xlen_example() = { let v = sizeof(xlen); f(v) }

Exceptions

Perhaps surprisingly for a specification language, Sail has exception support. This is because exceptions as a language feature do sometimes appear in vendor ISA pseudocode (they are a feature in Arm’s ASL language), and such code would be very difficult to translate into Sail if Sail did not itself support exceptions. In practice Sail language-level exceptions end up being quite a nice tool for implementing processor exceptions in ISA specifications.

For exceptions we have two language features: throw statements and try--catch blocks. The throw keyword takes a value of type exception as an argument, which can be any user defined type with that name. There is no built-in exception type, so to use exceptions one must be set up on a per-project basis. Usually the exception type will be a union, often a scattered union, which allows for the exceptions to be declared throughout the specification as they would be in OCaml, for example:

scattered union exception union clause exception = Epair : (range(0, 255), range(0, 255)) union clause exception = Eunknown : string function main() : unit -> unit = { try { throw(Eunknown("foo")) } catch { Eunknown(msg) => print_endline(msg), _ => exit() } } union clause exception = Eint : int end exception

Scattered definitions

In a Sail specification, sometimes it is desirable to collect together the definitions relating to each machine instruction (or group thereof), e.g.~grouping the clauses of an AST type with the associated clauses of decode and execute functions, as in the A tutorial RISC-V style example section. Sail permits this with syntactic sugar for `scattered' definitions. Functions, mappings, unions, and enums can be scattered.

One begins a scattered definition by declaring the name and kind (either function or union) of the scattered definition, e.g.

// We can declare scattered types, for either unions or enums scattered union U scattered enum E // For scattered functions and mappings, we have to provide a type signature with val val foo : U -> bits(64) scattered function foo val bar : E <-> string scattered mapping bar

This is then followed by clauses for either, which can be freely interleaved with other definitions. It is common to define both a scattered type (either union or enum), with a scattered function that operates on newly defined clauses of that type, as is shown below:

enum clause E = E_one union clause U = U_one : bits(64) function clause foo(U_one(x)) = x mapping clause bar = E_one <-> "first member of E"

Finally the scattered definitions are ended with the end keyword, like so:

end E end U end foo end bar

Technically the end keyword is not required, but it is good practice to include it as it informs Sail that the clauses are now complete, which forbids new clauses and means subsequent pattern completeness checks no longer have to require extra wildcards to account for new clauses being added.

Semantically, scattered definitions for types appear at the start of their definition (note however, that this does not mean that a module that requires just the start scattered union definition can access any constructors of a union defined in modules it does not require). Scattered definitions for functions and mappings appear at the end. A scattered function definition can be recursive, but mutually recursive scattered function definitions should be avoided.

Preludes and default environment

By default Sail has almost no built-in types or functions, except for the primitive types described in this Chapter. This is because different vendor-pseudocodes have varying naming conventions and styles for even the most basic operators, so we aim to provide flexibility and avoid committing to any particular naming convention or set of built-ins. However, each Sail backend typically implements specific external names, so for a PowerPC ISA description one might have:

val EXTZ = "zero_extend" : ...

while for ARM, to mimic ASL, one would have

val ZeroExtend = "zero_extend" : ...

where each backend knows about the "zero_extend" external name, but the actual Sail functions are named appropriately for each vendor’s pseudocode. As such each ISA spec written in Sail tends to have its own prelude.

Read Entire Article

The Sail instruction-set semantics specification language

Functions

Blocks

The unit type

Numeric types and bits

Bitvector literals

Vectors

Accessing and updating vectors

The list type

Tuples

Strings

User-defined types

Structs

Enums

Unions

Type synonyms

Type kinds

Pattern matching

Guards

Matching on literals

Matching on enumerations

Matching on tuples

Matching on unions

Matching on lists

Matching on structs

As patterns

Automatic wildcard insertion

Matching in let and function arguments

Type patterns

Mutable variables

Assignment and l-values

Bitfields

Operators

Built-in precedences

Ad-hoc overloading

Mappings

Sizeof and constraint

Exceptions

Scattered definitions

Preludes and default environment

Related

Show HN: Shelly Systems – Process Modeling as Code

Show HN: Zingle – an AI code reviewer for data teams (SQL/db...

Eval Protocol: RL for agents in any language, container, or ...