Coccinelle for Rust Progress Report

4 months ago 18

In collaboration with Inria, the French Institute for Research in Computer Science and Automation, Tathagata Roy shares the progress made over the past year on the CoccinelleForRust project, co-sponsored by Collabora.

Coccinelle is a tool for automatic program matching and transformation that was originally developed for making large-scale changes to the Linux kernel source code (i.e., C code). Matches and transformations are driven by user-specific transformation rules in the form of abstracted patches, referred to as semantic patches. As the Linux kernel—and systems software more generally—is starting to adopt Rust, we are developing Coccinelle for Rust to make the power of Coccinelle available to Rust codebases.

Example usage

This diff illustrates a patch in which the type_of function was being called before confirming that the target item’s trait was implemented. A straightforward CfR-based fix is to find every expression of the form

expression.type_of(impl_id)

and replace it with

expression.type_of(impl_id).subst_identity()

There are roughly fifty occurrences of this pattern in the diff, so updating them all by hand would be quite tedious. The accompanying Semantic Patch to perform this transformation automatically is:

@change_ty_of@ expression exp1, impl_id; @@ -exp1.type_of(impl_id) +exp1.type_of(impl_id).subst_identity()

While the above example could be achieved with a complicated and unreadable regex pattern, things can quickly become more complex.

The following patch changes a function signature and all the related calls.

@change_sig@ expression x; identifier fname, sname; @@ impl sname { ... pub(crate) fn fname(&self, - guard: &RevocableGuard<'_> ) -> Result { ... } ... } @modify_calls@ expression x, guard; identifier change_sig.fname; @@ x.fname( ... - guard )

Rule change_sig finds all the occurrences of functions which take a guard of type &RevocableGuard<'_> and removes that parameter. The rule modify_calls updates the calls to that method.

This semantic patch can be used on a whole code-base where once a guard variable is no longer needed, it can be removed. It can also serve as an integration test to check that no such code is present in new pull requests.

Developments (2024–Present)

CTL Engine

Core pattern-matching engine using CTL.
Shared with C version, adapted for Rust’s expression-heavy syntax.
Performance optimized with RefCells and hash tables.

SmPL Parsing

SmPL includes Rust + custom syntax (.., +, -, disjunctions).
Custom parser for SmPL constructs; uses Rust Analyzer for Rust code.

Rules

Define code transformations scoped to an environment.
Support for rule inheritance.
Rule dependencies not yet implemented.

Language Features

Ellipses (...)

Connects code segments within the same control flow.
Uses CTL AU.
Higher complexity due to Rust’s flexible syntax.

Note: ... when not yet supported.

Disjunctions

Alternative code match options.
Can combine with ellipses for complex patterns.

Developments in detail: 2024–Present

CTL Engine

Computational Tree Logic (CTL) is the heart of Coccinelle, which takes semantic patches and generalizes them over Rust files. Prior to using this engine, CfR used an ad-hoc method for matching patterns of code. This engine is the same as the one used for Coccinelle for C, with a few minor changes. Most of the changes were idiomatic but to the same effect. More information on the engine and its language (CTL-VW) can be found in the POPL Paper. With a standard engine, each step of the matching process can be logged, allowing us to learn and reuse the same design patterns from Coccinelle for C, including critical test cases.

The expression-dominated nature of Rust makes the matching and transformation process a bit different from that of C. For example, in the following semantic patch:

@@ expression e @@ -foo(e);

for C, foo(e) would be guaranteed to be present as an immediate child of a block, i.e.:

{ // <- start of a block foo(e); // <- this statement }

Blocks in C are present only in specific parts of the Abstract Syntax Tree, like in function definitions, loops, or conditional blocks. However, in Rust, blocks are expressions, which can appear anywhere an expression is allowed. For example:

while { f(&mut a); a > 1 } { // }

This makes searching much more computationally intensive. Thus, several optimizations were implemented in CfR to address this problem, including replacing lists in the CTL engine with RefCells and hash tables.

Semantic Patch Language (SmPL)

While developing the parser for SmPL, we decided not to reinvent the wheel by writing a parser for the Rust language from scratch. SmPL contains custom syntax such as dots (...), disjunctions, and modifiers (+ and -). In the latest version, we parse only these constructs ourselves and hand off the rest to Rust Analyzer.

Rules

A rule refers to a set of changes given an environment. Multiple rules can inherit values from one another to transform code in different parts of a file.

Language features

Dots / Ellipses

Used in SmPL as ..., ellipses connect two blocks of code:

@@ expression q; @@ drop_queue(q); ... pop(q);

This is implemented in CTL using the AU term.

Disjunctions

Disjunctions allow for conditional matching:

f1(10); ( // <--- disjunction start foo(1); | bar(10); ) // <--- disjunction end f2();

Matches either:

f1(10); foo(1); f2();

f1(10); bar(10); f2();

Macros

Transforming macros posed a problem because of their non-standard nature. For example, should the following semantic patch match

@@ expression e; @@ foo!( - e + 2 );

this code?

foo!(a b c)

AND

foo!(a)

To avoid discrepancies, we support only macros which look like function calls. For example, foo!(a, b, c) or foo![a; b; c].

Miscellaneous

Pretty Printing has been improved. The transformed code is formatted using rustfmt and it is then compared with the formatting from the original code. This way only the transformed code is formatted without messing up the original file formatting. Note: Pretty printing is still a work-in-progress for rust macros as they are notoriously hard to deal with and rustfmt thus leaves them alone.
Better tests have been added.
A more robust UI has been implemented, with various debugging flags.

Conclusion

Our current aim is to bring Coccinelle For Rust at par with Coccinelle For C in terms of basic functionalities. In the following months we are going on to work on:

Rule Dependance - It lets a rule run only if a condition is satisfied by the rules before it.
Scripting - Lets the user run arbitrary code for each match, allowing them to perform more things that are out of scope for now. This includes counting instances, matching with regex and performing other arbitrary operations.
Performance Improvements

If you want to try out CoccinelleForRust it is available on Gitlab. Please feel free to reach out to us at the email addresses on our website CoccinelleForRust, we would be happy to answer your questions!

Read Entire Article

Coccinelle for Rust Progress Report

Example usage

Developments (2024–Present)

CTL Engine

SmPL Parsing

Rules

Language Features

Ellipses (...)

Disjunctions

Developments in detail: 2024–Present

CTL Engine

Semantic Patch Language (SmPL)

Rules

Language features

Dots / Ellipses

Disjunctions

Macros

Miscellaneous

Conclusion

Related

What happens when you mix live faculty with generative AI?

China launches 18 day Arctic shipping route

A 400-Year-Old Ring That Unfolds to Track the Movements of t...