1. Changelog
2. Revision 0 - July 4th, 2025
-
Initial release. ✨
3. Introduction and Motivation
A colloquial overview (but with a bit less technical depth) of these options is available as a writeup here ([lambdas-nested-functions-block-expressions-oh-my]).
C has had an extremely long and somewhat complicated history of wanting to pair a set of data with a given function call. Early problems first started with the standardization of C89’s qsort, which only took a single function pointer argument and no way to pass data through for additional constraints:
void qsort( void* ptr, size_t count, size_t size, int (*comp)(const void* left, const void* right) );This worked, until -- occasionally -- people wanted to provide modifications to certain behaviors based on local data rather than static data. For example, to modify the sort order of a call to qsort, one would originally have to program it like this in standard C:
#include <stdlib.h> #include <string.h> #include <stddef.h> static int in_reverse = 0; int compare(const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; return (in_reverse) ? *right - *left : *left - *right; } int main(int argc, char* argv[]) { if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare); return list[0]; }There were multiple limiting factors in getting data into the function outside of using static. Accessing data in the local block was impossible for a compare function that, necessarily, had to be defined outside of main. This necessitated a way of transporting data to the compare function to do the work in a way that would work for qsort. Since it offered no way to transmit user data parameter, other forms of data transfer became commonplace:
-
static data was the most popular ISO Standard C way of handling this in qsort-style APIs, resulting in a large number of (inline) static global variables throughout early programs using clever tricks to duplicate variables VIA fork-style or loading several translation units with different static variables;
-
and, _Thread_local in the case of avoiding contention on variables when multithreading was formally introduced circa C11.
This was, unfortunately, both bug-generating and limiting due to the data sharing issues. Concerns and problems around sharing data only grew in time as each program had to manage larger and larger swaths of 1980’s-style global variable soups, something that has famously ended up being used as a sign of negative code quality in legal matters. When global variables became more discouraged towards C95 and C99 to prevent unintended data clobbering or sharing, a new extension was cooked up by GCC to handle this program. Similar to an Ada feature but spun directly for C at the time, it was a feature lovingly dubbed Nested Functions.
3.1. GNU Nested Functions
Nested Functions ([nested-functions]) were the logical extension of function definition syntax pulled down into the local level. Despite being the oldest functions-with-data attempt, it only has one proposal by Martin Uecker done recently ([n2661]). The goal was very simple, and during the C89 timeframe imminently doable due to the absence of a deep understanding of security implications for machines not yet being actively exploited in both civilian and military contexts. The syntax of a function definition within a block scope created a function that could be called in the obvious way but still reference surrounding block scope objects by name:
#include <stdlib.h> #include <string.h> #include <stddef.h> int main(int argc, char* argv[]) { int in_reverse = 0; if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; int compare(const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; return (in_reverse) ? *right - *left : *left - *right; } qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare); return list[0]; }The notable benefits of this approach were:
-
looks, feels, and smells like a regular function definition;
-
can access local data instead of static data, preventing issues with forked data or shared data;
-
produced a single function pointer and so did not need an additional user data pointer (typically a void*);
-
and, places the function doing the comparison work closer to the actual usage site.
Unfortunately, the early and enduring design of this feature -- in order to enable some of the benefits listed above -- very quickly ran afoul of early security concerns, and soon earned itself a big CVE due to the way it worked. In particular, the implementation strategy for a nested function is a brilliant piece of engineering that runs afoul of one of the only enduring security mitigations that has not fallen by the wayside: Non-Executable Stacks. To understanding why this matters, a brief description of what (Non-)Executable Stacks are, and why they are important.
3.1.1. Non-Executable Stacks
In C, a common practice to leverage lots of hot-patching, assembly hot-fixing, direct opcode injection, and more, programs frequently made use of stack-based data buffers where they also dumped their programs. This meant that binaries would read from their stack with the instruction pointer, allowing programs to dynamically inject behavior into a currently running program either in parts or -- if that data came from outside sources -- a fully dynamic sort of live-machine coding. The problem with this approach was that if a normal user could use the stack to make the program do a set of behaviors, so could a malicious actor: this was the easiest and widest branch of attacks upon C programs, and it was an endemic issue given the extremely large number of stack-based, fixed-sixed buffers or -- after the advent of C99 -- variable-length arrays. Commandeering programs was as easy as finding a place where naïve programmers and hackers employed the now-deprecated gets, or where input was loaded in a too-relaxed way into a stack buffer. Overrunning the buffer and gaining control of the program by getting a jmp instruction to jump not back to some expected place in the stack but instead to a piece of a new program written by malicious inputs into the program made it easy to exploit C programs day in and day out, over and over again.
Then, compiler developers, operating system vendors, and runtime loader designers came up with one killer fix: making the program stack non-executable.
Without an executable stack, it did not matter how many buffers were overflowed or how many jmp pointers were overtaken by malicious data overwriting parts of the program stack through inadequately-sized user buffers. Code that was inadvertently overtaken by a buffer overflow did not have the ability to point to data written into a buffer that was also on the stack and start running. Such runs would be considered entirely and wholly illegal, and it would blow up the program with a SEGMENTATION FAULT or other issue. Most programs, whether using VLAs or not, loved to keep (small)-ish buffers on the stack that they constantly wrote data into in response to user input or read data (such as configuration files or network input). Preventing the ability of a poorly written program that did not guard against all forms of malicious input from turning into an easy Remote Code Execution issue was a gigantic security win. The overwhelming majority of exploits running at the time were effectively halted, even if they contained the perfect shellcode to do so. This permanently crippled red teamers, and completely changed both the simplicity and the power in basic code failures.
3.1.2. Early Design Flaw: Nested Functions turn the stack Executable!
The original design of Nested Functions suffered for its compact and brilliant design, unfortunately. Let’s remind ourselves of the previous example relating to qsort: somehow, without marking the variable as static or _Thread_local, a nested function is able to access all of the object by name from its enclosing scope.
#include <stdlib.h> #include <string.h> #include <stddef.h> int main(int argc, char* argv[]) { /* LOCAL variable.... */ int in_reverse = 0; if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; int compare(const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; /* HOW is `in_reverse` usable here?! */ return (in_reverse) ? *right - *left : *left - *right; } qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare); return list[0]; }The secret of this is in the brilliance of the original design: since it was so commonplace at the time, Nested Functions decided that the best way to be able to find the variables in the enclosing scope was to take a location associated with the enclosing block scope -- the Stack -- and turn it into an executable piece of code. This executable piece of code had an address. Because the executable code and its address came from the program’s stack, it meant that there was no need to provide a pointer to do reference-based / "by-name variable capturing. The function pointer itself served as both the jump to the code to execute, and a location that could then have a number of pre-determined, fixed offsets applied to it to reach all of the variables in that piece of code. This meant that rather than needing an object with static storage or _Thread_local storage, the block-scope variable could be accessed directly!
Unfortunately, this required the stack, still, to be executable in order to function.
This is the critical issue that has resulted in most other vendors not picking up the feature. Compilers want to work flawlessly with GCC-compiled code: this means that compiling a nested function and passing it to a function pointer has to result in a (somewhat) similar Application Binary Interface (ABI) that can be used as applications expect. This means the decision to make GCC nested functions trampoline from the stack are non-ignorable details that cannot be ignored. This is the primary reason Clang and other vendors have refused to implement Nested Functions, chiefly citing security concerns and the inability to develop a worthwhile ABI that is compatible. Just how prevalent are the security issues that Clang and other vendors avoided by refusing to implement the GNU Nested Functions extension? Well...
3.1.2.1. The Prevalence of Executable Stack and GNU Nested Functions
A ton of users relied on or otherwise kept using Executable Stacks, even into today. While the non-executable stack fix was deployed to great success in the early 2020s, some platforms -- particularly, Linux and similar POSIX-based platforms -- kept this up. This clashed with users who became far too comfortable with the issues related to Executable Stack:
Year of Linux desktop will never happen as long as glibc is de-facto standard libc. As far as I know, this regression is wontfix, so many games just won’t work anymore.
Imagine being a game dev from 5 years ago and making a native Linux version of your game in a good faith, that’s 100% a net-negative money-wise, only for it to stop working in a couple years. Backward compatibility on Linux does not exist as far as regular users concerned, and the only way to make software that works in years to come is to make it for Windows, and hopefully let Valve and Wine teams handle the rest.
-- February 19, 2025, Valentin Ignatev
Briefly ignoring the extreme hyperbole: note the dates and the mentioned time of this tweet. 5 years ago; e.g. developed in 2020, with the complaint happening in 2025. The tweet included a screenshot from the article titled "The glibc 2.41 update has been causing problems for Linux gaming" ([gamingonlinux-dawe]). Windows and MacOS have been disabling executable stacks globally, by default, for years and refusing to load such applications. This complaint happened when glibc stopped forcefully setting executable stack, even if the tweet that garnered the public push back did not mention WHY these games were breaking. This talks to the severe need for a solution in this space that is not security breaking, and even telling folks to simply "turn on executable stack" for themselves is not wonderful:
... Don’t just take my advice on it though if you’re a developer or gamer reading this, always look up what you’re doing fully. Run at your own risk. ...
-- February 13, 2025, Article Author Liam Dawe
Unfortunately, this lax approach to security -- especially for video games -- has resurfaced quite a few truly unfornate bugs. Popular Windows games such as Dark Souls: PREPARE TO DIE Edition and Call of Duty: WWII have exploitable Remote Code Execution (RCE) bugs in their code. One of them seems to be getting patched, but the other -- in Dark Souls -- is not going to be patched out. The idea that glibc -- or platforms in general -- can take a lax position to security on devices, even for something as "frivolous" as the (multi-billion dollar revenue) video game industry is simply not tenable. From a highly skilled Vision & Graphics engineer commenting on the controversy caused by the glibc 2.41 update:
glibc is in the right here. iirc windows and mac DEP policy disabled executable stack by default for the past ~20 years or so. shocked this was not already the case in linux userland
-- February 21, 2025 awr
The engineer in charge of pushing the change and looking over the situation commented on the article and general attitude represented by Mr. Ignatev:
It is interesting that the headline did not get into details why I made this change: https://sourceware.org/pipermail/libc-alpha/2024-December/163146.html
In a short: the old behavior was used in a know RCE described in CVE-2023-38408.
-- February 20, 2025, Adhemerval Zanella
It is important to know that many other developers do not share this perception. Even as far back as 2018, when Microsoft kept up its Security Posture in its Windows Subsystem for Linux, people railed against the high-security default of refusing to work with programs that allowed executable stack ([WSL-no-executable-stack]):
The accepted trade-off is to have a non-executable stack be default but have an executable stack for programs which need them. Not supporting this is just a deficiency in WSL.
-
August 7, 2018, Martin Uecker
Whose perspective is correct?
3.1.2.2. The Standard Is To Blame
This proposal agrees with both Mr. Zanella and Ms. Awr, and disagrees with Mr. Uecker and Mr. Ignatev; it is impossible to pretend like this is not a problem with more and more exploits taking advantage of not only executable stack, but directly targeting "harmless" software that makes use of it (such as video games). But, even more importantly, the real culprit here is ISO C. There were many alternatives to executable stack-based GNU Nested Functions, and other such entrapments. However, because of how convenient GNU Nested Functions are and how accepted they are in the GNU ecosystem, it has led to multiple security vulnerabilities that should have never existed in the first place (§ 5.3 Executable Stack CVEs).
Unlike Address Space Layout Randomization (ASLR) and several other run-time mitigations, non-executable stacks have been both the cheapest and most enduring security wins in the last 50 years of computing. Marking sections of memory as unable to be run as part of the program permanently shifted the landscape of targeted exploits to be focused almost exclusively on buffer overrun-into-heap-exploitation bugs, as well as trying to find sequences of logic to put programs in a state of disrepair that ultimately grant attackers either a Denial of Service (DoS) attack or full control VIA Remote Code Execution (RCE). Exploits were now exceedingly harder and required orchestration of multiple carefully-crafted scenarios to hit the "Weird Machine" state so coveted by red teamers/exploiters. This is not the case for executable stack, specifically with the most common way to access them: GNU Nested Functions.
3.1.3. Alternative Nested Function Implementations
We have already discussed the first and most popular attempt which leaves the stack executable. Because of the negative security properties of this, there were two more attempts, one still on-going (§ 3.1.3.2 Attempt 3) and one with limited success that requires the entire world to be recompiled or face potential ABI breaks (§ 3.1.3.1 Attempt 2).
3.1.3.1. Attempt 2
An Ada-style Function Descriptors implementation was attempted. The problem with this change for GCC is that it uses a bit (the lowest bit, which traditionally is always 0) in the function pointer itself to mark the function pointer as one that is relying on the Function Descriptor technology. Setting the lowest bit on the function pointer means that it is unsafe to call directly, and therefore every function call must first be masked with func_ptr & ~0b1ull before being called. This is a runtime cost and a general-purpose pessimization that applies to ALL function pointers, making the Function Descriptor approach unsuitable for solving the ABI problem both internally with existing GCC code and to the satisfaction of other developers.
3.1.3.2. Attempt 3
A third attempted implementation of Nested Functions attempts to use a separately-allocated trampoline: this either comes from a stack that is setup at program load time in coordination with the compiler and whose exclusive purpose is to be a memory region for both trampolines and a slot for a void* environment/context; OR, comes from a dynamic allocation to serve as the function pointer plus a void* environment/context pointer. These approaches simply do not work because it is unclear when, if ever, the function pointer will stop being used. However, one part of Nested Functions -- the fact that they refer to everything by name / "capture by reference" all of the things they use -- means that this can be sufficiently approximated by simply stating that GNU Nested Functions will deallocate such a trampoline (or shrink the stack it was cut from) when the enclosing block that was used for the nested function is gone.
There is also the slight issue that using a separated trampoline that is on a separate heap or a separate stack might need (but is not necessarily required) a secondary level of indirection. The original, executable stack implementation of GNU Nested Functions prevents this because both the code and the variables are in-line: having a separated trampoline may require such a trampoline to first load the right function call with __builtin_call_with_static_chain compiler intrinsic within the code of the trampoline to have the code work properly without making the original stack executable.
3.1.4. A Possible 4th Attempt: Explicit User Control
An experimental technique for allocating a trampoline can be done by having an _Any_func* pointer, as is being standardized by using an in-progress proposal [_Any_func]. Then, rather than needing to implicitly create a trampoline on usage, a user can instead requested it and control the allocation explicitly, while passing it back. A fictional example of such an intrinsic -- called __gnu_make_trampoline -- is seen here:
#include <stdlib.h> #include <string.h> #include <stddef.h> int main(int argc, char* argv[]) { /* LOCAL variable.... */ int in_reverse = 0; if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; typedef int (compare_fn_t)(const void* left, const void* right); int compare(const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; /* HOW is `in_reverse` usable here?! */ return (in_reverse) ? *right - *left : *left - *right; } // explicitly make a single-function-pointer trampoline, without an executable stack // __gnu_make_trampoline takes a function identifier, returns a _Any_func* compare_fn_t* compare_fn_ptr = __gnu_make_trampoline(compare); // use it qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_fn_ptr); // explicitly free a single-function-pointer trampoline, without an executable stack __gnu_destroy_trampoline(compare_fn_ptr); return list[0]; }This is discussed further in § 5.2 Make Trampoline and Singular Function Pointers.
3.1.5. The Nature of Captures
There’s a final issue with nested functions, and it’s that it is not suitable for use with asynchronous code or code that returns "up". Consider the same bit of code as before, but slightly modified:
#include <stdlib.h> #include <string.h> #include <stddef.h> typedef int (compare_fn_t)(const void* left, const void* right); compare_fn_t* make_compare(int argc, char* argv[]) { /* LOCAL variable.... */ int in_reverse = 0; if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } int compare(const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; /* HOW is `in_reverse` usable here?! */ return (in_reverse) ? *right - *left : *left - *right; } return compare; } int main(int argc, char* argv[]) { int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; compare_fn_t* compare = make_compare(argc, argv); qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare); return list[0]; }In this example, we have simply moved the in_reverse and compare generation into a function, for ease-of-use. One can imagine that we need to create this sort of function mutliple times, from perhaps different sources of data. GNU Nested Functions allow us to do this and to return the function "up" the call stack. The problem, of course, is that Nested Functions (in a heap-based implementation or the current executable stack implementation) both point to the current "function frame" that it is created in. That is, while make_compare -- once it has returned -- is no longer alive and all of its automatic storage durations have died, the compare function pointer is still there and passed up the stack. This means that all accesses to in_reverse are accessing memory that is no longer alive, and it is effectively Undefined Behavior.
The actual manifestation of the undefined behavior in this program is very clear: adding the -r argument to make in_reverse turn to 1 does not have any effect on the program anymore:
https://godbolt.org/z/81d7Tqn1E
This is a critical failure of Nested Functions: it only ever "captures" function values by-name / by-reference. There is no option to capture by-value, and therefore the transportation and use of these function pointers to asynchronous code or "up" the call stack means it is fundamentally dangerous. This was not a huge problem in the early days of C, where programs were very flat and it was easy to always "move" function calls up. Unfortunately, we now have asynchronous programming, coroutine libraries / green threading models, callbacks that are saved and invoked much, much later in a program, and all sorts of models for shared code. This is the part of Nested Functions that cannot be saved; it is an intrinsic part of the design that unfortunately will always lead to Undefined Behavior because there is no way to get around that limitations in the GNU Nested Functions design. This is another serious problem that ultimately make it impossible to consider Nested Functions as THE solution for all of the C ecosystem.
NOTE: As a general rule of thumb, if the entity being designed has the ability to be transferred out of its current scope (Blocks, Nested Functions, Lambdas, ...) then it must as a rule allow for determining how it interacts with the scope it is nested in. The answer for function calls in most languages is "cannot interact with its surrounding scope", which simplifies the problem. But, the whole point of these features IS to interact with the surrounding scope, and so care must be taken to make it work better.
3.2. Apple Blocks
Blocks are an approach to having functions and data that originate from Objective-C ([apple-blocks]) and are associated with Apple’s Clang. They were proposed a long time ago in an overview of Apple Extensions for C by Garst in 2009 ([n1370]), refined into a specific proposal in 2010 ([n1451] [n1457]). It was later further refined into a proper "Closures" proposal, and the name changed ([n2030]). However, none of these papers made the standard.
Because there is a wealth of proposals and literature talking about Blocks, their implementation, their runtime, and more (see an answer discussing the intrinsics and implementation bits for Blocks), rather than inform proposal readers how they work and why they are not suitable for ISO C, this proposal will focus exclusively on why they are not usable for the whole C ecosystem.
3.2.1. Expression
Unlike GNU Nested Functions, Block definitions are an expression. That means that -- given appropriate typing -- one can use blocks directly within a function call:
#include <stdlib.h> #include <string.h> #include <stddef.h> void apple_qsort( void* ptr, size_t count, size_t size, int (^comp)(const void* left, const void* right) ); int main(int argc, char* argv[]) { int in_reverse = 0; if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; apple_qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), ^(const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; /* HOW is `in_reverse` usable here?! */ return (in_reverse) ? *right - *left : *left - *right; } ); return list[0]; }The special function_type^ and return_type (identifier^)(argument0, argumentN...) are Block types, whch are special types that act as a handle to a Block. Blocks are not just a simplistic combination of a function and a context, however: much more effort is put into making them safe at execution time, and that is done by putting everything related to Blocks behind a hefty runtime.
3.2.2. Runtime Required
Apple Blocks are, at their core, dynamic objects that engage in type-erasure at the top-most level. Where C++ lambdas are completely non-type-erased and each contains a unique type, and where Nested Functions are completely type-erased behind normal function pointers with executable stacks + trampolines, Blocks are callbacks that are unique but have all of their type information erased and carted around in a new function pointer-like construct: the block. Block types are denoted by the ^ in their function type name, and typically are cheaply copiable handles to a heap-stored callable.
NOTE: The layout of the heap-stored callable is dictated by Apple, and has been reverse-engineered and deconstructed many times. There used to be working links to the Rich Text Files governing what it looks like from Apple’s Opensource Website, but all of those links (e.g.: https://opensource.apple.com/source/libclosure/libclosure-59/BlockSpec.rtf) return 404 error codes now. Thankfully, there’s an open source mirror available on GitHub with readable text here: https://github.com/apple-opensource-mirror/libclosure/blob/master/BlockImplementation.txt. In order to understand the layout, one would need to read the released Apple Public License and freely-available source code, to get the most up to date picture. Some of this work has been done by the Clang Team, and is the most up-to-date, thorough specification for it is stored in with the Clang documentation ([clang-blocks-spec]).
Of course, all of this is just to illustrate the problem: while Microsoft as a platform may not need to care, GCC and Clang both tend to occupy the same hardware and software spaces. Even if one compiler or another figured out how to be clever, the base layout -- and the premise of blocks being a handle to a heap, and not a compile-time sized object -- means that some form of dynamic allocation or heap is required. This is a net-negative for memory-constrained environments, and in implementations that attempt to be ABI-compatible with the original Apple Blocks implementation will be forced to lay their blocks out in the way that Apple has already specified.
3.2.2.1. More Complications: Generally Unsafe to Return
All block literals are not initially placed on the heap or allocated through the run-time, as a blanket optimization applied to all block literals. They start out on the stack! Which means that while the above code using a literal works just fine, this code is actually Undefined Behavior, in a way that’s equally as bad as GNU Nested Functions:
#define __STDC_WANT_LIB_EXT1__ 1 #include <stdlib.h> #include <string.h> #include <stddef.h> typedef int (compare_fn_t)(const void* left, const void* right); compare_fn_t^ make_compare(int argc, char* argv[]) { /* LOCAL variable.... */ int in_reverse = 0; if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; /* HOW is `in_reverse` usable here?! */ return (in_reverse) ? *right - *left : *left - *right; }; return compare; } int compare_trampoline(const void* left, const void* right, void* user) { compare_fn_t^* p_compare = (compare_fn_t^*)user; return (*p_compare)(left, right); } int main(int argc, char* argv[]) { int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; compare_fn_t^ compare = make_compare(argc, argv); qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare); return list[0]; }This code does not work. Despite having a runtime, it will NOT perform the copy automatically. The return from make_compare into compare_fn_t^ compare is a dangling reference, and is explicitly discouraged by reference materials and documentation from Apple. It must be modified to use Block_copy on the return:
#define __STDC_WANT_LIB_EXT1__ 1 #include <stdlib.h> #include <string.h> #include <stddef.h> typedef int (compare_fn_t)(const void* left, const void* right); compare_fn_t^ make_compare(int argc, char* argv[]) { /* LOCAL variable.... */ int in_reverse = 0; if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; /* HOW is `in_reverse` usable here?! */ return (in_reverse) ? *right - *left : *left - *right; }; return Block_copy(compare); } int compare_trampoline(const void* left, const void* right, void* user) { compare_fn_t^* p_compare = (compare_fn_t^*)user; return (*p_compare)(left, right); } int main(int argc, char* argv[]) { int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; compare_fn_t^ compare = make_compare(argc, argv); qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare); Block_release(compare); return list[0]; }Annoyingly, every call to Block_copy must be paired with a call to Block_release. This means that there’s now an invisible (from the perspective of main) block copy that now needs to be managed with a specific call to Block_release. One can imagine that every function that returns a Block type should just be assumed to need releasing, but this isn’t always the case: this means there’s an invisible lifetime tracking that even the runtime and the heap does not solve for us! Truly, unfortunate.
There is also a small gotcha in this example, that only shows up based on where the compare block is created. This has to do with how Captures work under the Blocks feature.
3.2.3. Captures
Captures in Apple Blocks work in one of two ways.
-
referring to the existing object by-name in your code, which will make a copy of the object inside of the Block’s implicit capture;
-
or, annotating a variable with __block, which will load the object into a sort of intrusive pointer / automatic reference-counted place in memory (managed by the Blocks runtime) to allow it to be referred to be both the surrounding code and the block’s inner guts by-name.
The reason Apple used this technique, as talked about before, is because it’s safer than Nested Functions in the particular regard of using variables and carting them around.
#define __STDC_WANT_LIB_EXT1__ 1 #include <stdlib.h> #include <string.h> #include <stddef.h> typedef int (compare_fn_t)(const void* left, const void* right); compare_fn_t^ make_compare(int argc, char* argv[]) { /* LOCAL variable.... */ int in_reverse = 0; if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; /* HOW is `in_reverse` usable here?! */ return (in_reverse) ? *right - *left : *left - *right; }; return Block_copy(compare); } int compare_trampoline(const void* left, const void* right, void* user) { compare_fn_t^* p_compare = (compare_fn_t^*)user; return (*p_compare)(left, right); } int main(int argc, char* argv[]) { int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; compare_fn_t^ compare = make_compare(argc, argv); qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare); Block_release(compare); return list[0]; }The first thing to note is that, because this cannot be translated to a single function pointer, we cannot use qsort. This means that using such a function without creating some kind of special trampoline is off-limits to us. Again, this is something that could be solved by the introduction of an explicit heap-based trampoline creator (§ 5.2 Make Trampoline and Singular Function Pointers). One would need to introduce a "wide function pointer" type -- which is exactly what function_type^ is -- and change qsort’s signature to use it for the callback.
NOTE: Simply upgrading qsort with a Block type is an ABI break that would cause old, already-compiled libraries and programs mixed with new programs to combust in painful, hard-to-detect ways. To solve this problem one would need to rely on existing C extensions like Assembly Labels, or work towards Transparent Aliases ([transparent-aliases]).
But, if someone uses qsort_s -- which takes a void* user data parameter -- one can use it without altering the signature of qsort directly and create a void*-kick off point as a trampoline. However, there is a bit of an interesting conundrum. Take, for example, moving the creation of the block function further upwards inside of make_compare:
#define __STDC_WANT_LIB_EXT1__ 1 #include <stdlib.h> #include <string.h> #include <stddef.h> typedef int (compare_fn_t)(const void* left, const void* right); compare_fn_t^ make_compare(int argc, char* argv[]) { /* LOCAL variable.... */ int in_reverse = 0; compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; /* HOW is `in_reverse` usable here?! */ return (in_reverse) ? *right - *left : *left - *right; }; if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } return Block_copy(compare); } int compare_trampoline(const void* left, const void* right, void* user) { compare_fn_t^* p_compare = (compare_fn_t^*)user; return (*p_compare)(left, right); } int main(int argc, char* argv[]) { int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; compare_fn_t^ compare = make_compare(argc, argv); qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare); Block_release(compare); return list[0]; }This code will sort the list incorrectly even if in_reverse is set to 1. That’s because Blocks will capture the variables that get used at the point-of-creation. The value at the point-of-creation is 0, therefore, in_reverse is 0 when the function is called later. Even though the in_reverse variable is copied into the block, the block is now sensitive to where it is being created without any indication that it behaves that way. This is safe, but the behavior would throw someone who uses Nested Functions religiously off completely.
The second way it can capture variables is by using __block, which works like so:
#define __STDC_WANT_LIB_EXT1__ 1 #include <stdlib.h> #include <string.h> #include <stddef.h> typedef int (compare_fn_t)(const void* left, const void* right); compare_fn_t^ make_compare(int argc, char* argv[]) { /* BLOCK-QUALIFIED variable.... */ __block int in_reverse = 0; compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; /* HOW is `in_reverse` usable here?! */ return (in_reverse) ? *right - *left : *left - *right; }; if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } return Block_copy(compare); } int compare_trampoline(const void* left, const void* right, void* user) { compare_fn_t^* p_compare = (compare_fn_t^*)user; return (*p_compare)(left, right); } int main(int argc, char* argv[]) { int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; compare_fn_t^ compare = make_compare(argc, argv); qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare); Block_release(compare); return list[0]; }This will now work as expected, even though the creation of the block comes after in_reverse is declared. __block lifts the variable being declared up into a heap (or heap-like space) that is managed by either a garbage collector or an automatic reference-counted memory implementation. This both reverse the onus of capturing / handling the variable onto the surrounding scope and makes it safe-by-design. This does leave an unfortunate gap in that there’s no way to do the dangerous thing or opt into a direct reference without making an explicit declaration of the pointer and then using the pointer by-copy instead in that block, which can leave some memory footprint and program speed on the table without aggressive optimization.
NOTE: The address of in_reverse might actually change, depending on if the Apple intrinsic Block_copy is used to copy the block itself before being run. This code does not depend on it, but a hash map that stores the address of variables might experience the address of any __block-annotated variables changing between creation and the innovation of Block_copy.
All in all, however, these two things are safe: either a copy is happening and is stored along with the creation of the block on the heap, or the variable itself is having its lifetime prolonged by a sort of automatic-tracking. The second of these is very against the typical properties of C, but that matters little in the face of the obvious safety it brings to the table. Unfortunately, because all of this happens magically and in a mostly-unspecified manner, it’s very detrimental to the proliferation of the C ecosystem and having several loosely-connected implementations working towards the same improved implementations.
3.2.4. Optimization: Folding Escapes
As a matter of optimization, Blocks do not necessarily have to pollute the heap. And indeed, most immediately invocations of a block or pass-down (rather than pass-up) invocations that are visible will be optimized into a direct call. Unfortunately, this is not something that is encouraged by the general design of Apple Blocks and Objective-C or Objective-C++. Because taking the address or passing the function along leaves it open to how far the handle-to-some-heap Block-typed object might be passed, compilers have to correctly (and pessimistically) generate the full, indirect-function-call representation as a matter of course. Block_copy also needs to be used, explicitly, in many cases, making it not much better in regular C code.
Swift solves this problem while still being compatible with Objective-C and C++ by making it so every "Block" or "Closure" type must be annotated if it "escapes" beyond the compiler’s knowledge, otherwise the program just refuses to compile ([swift-escapes]). This allows aggressive optimization to be applied by-default, with weaker static analysis-based optimizations or escape analysis optimization only acting as a fallback in the @escaping-annotated case. The annotation also makes it so Block_copy does not need to be invoked and rather than having to copy it to a heap version of itself, it is simply put in the right format and place, ready to interoperate cleanly with C, C++, Objective-C or Objective-C++.
In the opposite direction, the C-style attribute that can be used to say "the closure never escapes", which enables optimizations for crunching the object down and assuming that it never is invoked outside of the function. This is available through the NS_NOESCAPE macro, which expands to __attribute__((noescape)) and can be used to gain better binary size and reduce indirection for code execution speed where possible.
3.3. C++-Style Lambdas
C++ lambdas, despite coming from C++ initially, is the only solution not to apply executable stack, separate stack, Function Descriptors, or other dynamic/runtime data, to the problem. It was detailed in a large collection of proposals from Jens Gustedt and almost made C23, but ambition in trying to allow for type-generic programming through lambdas with auto parameters stalled progress and ultimately halted everything ([n2923] [n2924] [n2892] [n2893]). It makes a unique type for each and every lambda object that gets created using the syntax [ captures ... ] ( args ... ) { /* code ... */ }, and that object has an implementation-defined but compile-time known size.
NOTE: While type inference for function returns are not yet in, a section of [n2923] was broken off for just variable definitions which ultimately succeeded. Type-inferred variable declarations work properly in C23 and are an important part of Lambdas being able to work in the manner envisioned by Gustedt’s proposals and by C++.
If a lambda has no captures, it can reduce to a function pointer like so:
#include <stdlib.h> #include <string.h> #include <stddef.h> static int in_reverse = 0; int main(int argc, char* argv[]) { if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; auto compare = [](const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; /* HOW is `in_reverse` usable here?! */ return (in_reverse) ? *right - *left : *left - *right; }; // "compare" below becomes a function pointer qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare); return list[0]; }The one thing that makes it different from GNU Nested Functions -- and somewhat similar to Blocks -- is that it is also an expression. That means that it can also be used inside of a function call expression as an argument, or as part of any other complicated chain of additions:
#include <stdlib.h> #include <string.h> #include <stddef.h> static int in_reverse = 0; int main(int argc, char* argv[]) { if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), [](const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; /* HOW is `in_reverse` usable here?! */ return (in_reverse) ? *right - *left : *left - *right; } ); return list[0]; }Again, this only works because there are no captures. When captures get involved, the above code will simply stop compiling at all because the conversion to a function pointer stops working. This can be solved in the ways discussed previously, such as:
-
a version of qsort which takes a void* user data parameter, like qsort_s does;
-
a version of qsort which takes a "wide" function pointer types, similar to how Blocks do with the apple_qsort defined above (§ 3.2.1 Expression);
-
or, allowing a separate facility which lifts the complex closure type into a single function pointer through a trampoline or similar (§ 5.2 Make Trampoline and Singular Function Pointers).
This gives Lambdas a heavy weakness similar to all of the other solutions: there must be either a trampoline, a wide function pointer type, a void* user data/context pointer, or a something else to accommodate the lack of transformation into a singular function pointer.
NOTE: C++ as a language has a more robust set of core primitives, they don’t have to worry about this problem. std::function<FUNCTION_TYPE> is a type-erased way to transport a whole function object, as a copy, through API boundaries that is defined entirely in their library mechanisms. For a view into any old function, there is std::function_view<FUNCTION_TYPE>, which is like a function pointer but allows pointing into many different kinds of functions that exist in C++. This makes the integration of lambdas into C++ easy; in C, it is much more difficult to have this all participate in the system automatically. There is no std::function-alike in C, and there’s no std::function_view-alike in C either for new function types like GNU Nested Functions, Apple Blocks, or otherwise. This problem has to be solved separately, with a Wide Function Pointer type (§ 3.4.2 Wide Function Pointer Type?) or with explicit user trampoline-making capability such as § 5.2 Make Trampoline and Singular Function Pointers.
3.3.1. Captures and Data
Lambdas do not capture any context by default, unlike both GNU Nested Functions or Apple Blocks. Instead, every capture must be manually annotated as either being taken by value or by reference, or a "universal capture" must be used to set a default method of capturing all visible, in-scope, block objects.
#define __STDC_WANT_LIB_EXT1__ 1 #include <stdlib.h> #include <string.h> #include <stddef.h> // a new kind of return type: "inferred" auto make_compare(int argc, char* argv[]) { int in_reverse = 0; if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } // explicitly capture in the "[ ]" of the lambda auto compare = [in_reverse](const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; return (in_reverse) ? *right - *left : *left - *right; }; return compare; } int main(int argc, char* argv[]) { int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; auto compare = make_compare(argc, argv); auto compare_trampoline = [](const void* left, const void* right, void* userdata) { typeof(compare)* p_compare = userdata; return (*p_compare)(left, right); }; qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare); return list[0]; }This, unfortunately, also makes them susceptible to location just like Blocks; the moment of creation during execution is the state they capture when using unadorned identifiers in_reverse and = captures. This code would capture in_reverse before it happens.
#define __STDC_WANT_LIB_EXT1__ 1 #include <stdlib.h> #include <string.h> #include <stddef.h> // a new kind of return type: "inferred" (`auto`) auto make_compare(int argc, char* argv[]) { int in_reverse = 0; // explicitly capture in the "[ ]" of the lambda auto compare = [in_reverse](const void* untyped_left, const void* untyped_right) { const int* left = (const int*)untyped_left; const int* right = (const int*)untyped_right; return (in_reverse) ? *right - *left : *left - *right; }; // lambdas and captures do not reflect any changes beyond this point, // including the `in_reverse` if (argc > 1) { char* r_loc = strchr(argv[1], 'r'); if (r_loc != NULL) { ptrdiff_t r_from_start = (r_loc - argv[1]); if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) { in_reverse = 1; } } } return compare; } int main(int argc, char* argv[]) { int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 }; auto compare = make_compare(argc, argv); auto compare_trampoline = [](const void* left, const void* right, void* userdata) { typeof(compare)* p_compare = userdata; return (*p_compare)(left, right); }; qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare); return list[0]; }Like Apple Blocks a lambda with captures is safe to return with the by-value capture (if one briefly ignores the need for Block_copy to reseat the memory of a Block). Additionally, it is better here because there is no usage of the heap in the above code example. The only problem is that this requires a feature that was proposed for C23 but didn’t make it (along with Lambdas not making it): deduced return types. There was consensus to have the feature, but the feature was bundled with the set of Lambda proposals, and thus fell through during the final stretch of C23. Therefore, a proposal for inferring the type of a function return should be separated from the previous proposals (such as [n2923]) in order to accommodate such behavior in C.
3.3.2. What About Lifetime / Destructors?
A common criticism of Lambdas and their unique type, whole-object approach is that such an approach with captures requires C++-style destructors to work well. We are completely unsure why this is the case or why this criticism keeps being levied specifically at Lambdas. In the previous section on Apple Blocks (§ 3.2.2.1 More Complications: Generally Unsafe to Return), coordinated function calls and documentation are the only way to communicate that a user has Block_copy’d an object and therefore requires Block_release. Similarly with GNU Nested Functions, returning them up the stack at all is pure undefined behavior, that has tangible effects on the program (§ 3.1.5 The Nature of Captures): these are problems endemic to C. Using a complex data structure like a binary tree or allocating memory requires that it is documented and communicated to the user: capturing such complex types and having it called over a longer period of time simply means the user has the responsibility to clean up or free the resources.
C APIs will always provide a user with provided functions a way to know when something must be cleaned up. For example, the Lua C API has an allocation function that is specifically called with a "new size" parameter of 0, it means memory passed in must be freed; that’s how it communicates what the current action is. Similarly, ev/libev -- with ev_set_allocator -- provides a hook to allow a user to manage the memory of the library, while also providing several statuses in callbacks for watchers (initialized/pending/running/stopped/etc.). Even for standard C, thrd_create passes a void* user data to the func that gets run on the new thread: a user must allocate and then pass the user data to the thread, and it becomes the thread’s responsibility to manage the lifetime of that type in a manner that is threadsafe.
Any C API worth its salt, when dealing with convoluted lifetimes, provides to a given callback (through its parameters) or a separate callback (for more full-fledged APIs) that notifies the API that its finished or done with a specific operation, and therefore safe to close things out. The secondary alternative is, of course, statically sourcing lifetime until the user themselves guarantees all resources can be safely freed using some outside knowledge (e.g., an explicit set of calls after the start of the library to cleanup/close/stop the library). This is not just a point with lambdas: every attempt at solving this problem has to engage with this. Whether it’s using Block_copy to ensure the lifetime of an Apple Block, or malloc to make sure a struct type being pointed to by a void* is accessible after a dispatch to an asynchronous function call
3.4. The Solutions
This proposal is going to propose two distinct options for standardization, with the recommendation to do both. It is critical to do both for the maximum amount of external language compatibility (C++ in particular, with § 4.2 Lambdas) and for the approval of the C ecosystem in general (with § 4.1 Capture Functions: Rehydrated Nested Function). The necessary and core goals of this proposal are focused on:
-
compile-time knowable size of the function object (can be treated as a regular object);
-
does not require a heap or a special stack;
-
can be interacted with like a normal function (but not necessarily as a normal function pointer);
-
and, does not compromise the security of all or a portion of the program.
Lambdas already have usage experience with well-known properties that can be directly translated to C and is easy enough to understand, despite the unfortunate syntax. Capture Functions are a simple modification of Nested Functions that produce a sized object (similar to Lambdas) and makes their captures explicit, allowing for a degree of control and additional safety that was not present in the original Nested Functions design.
We are not focused on interopability with singular function pointers. We believe that should be left to a separate, explicit mechanism in the language, capable of allowing the user to choose where the memory comes from and setting it up appropriately. This way, a user can make the decision on their own if they want to use e.g. executable stack (with the consequences that it brings) or just have a part of (heap) memory they set with e.g. Linux mprotect(...) or Win32 VirtualProtect to be readable, writable, and executable. Such a trampoline-maker (as briefly talked about in § 5.2 Make Trampoline and Singular Function Pointers) can also be applied across implementations in a way that the secret sauce powering Nested Functions cannot be: this is much more appealing as an approach.
We DO NOT take any of the design from Blocks because the Blocks design is, as a whole, unsuitable for C. While its deployment of a blocks "type" to fulfill the necessary notion of a "wide function pointer" type is superior to what Nested Functions have produced, the implementation details it imposes for __block variables and the excessive reliance on an (underspecified) runtime/heap are detrimental to a shared & unified approach to C.
NOTE: The Blocks runtime/heap layout has changed (at least) once in its history: the only reason this worked is because Apple owned every part of the Blocks ecosystem. Apple can do whatever they want with it, however they want, whenever they want: this does not work in a language with diverse, loosely coordinated implementations like C and not Objective-C, Objective-C++, or Swift.
As the heap is repulsive to typical freestanding implementations, we do not want to standardize something that will have similar technological drawbacks like VLAs, where -- even if no syntactical or language-design issues exist from the way blocks are written -- the presence of an unspecified source of memory (stack or heap) produces uncertainty in the final code generation of a program.
3.4.1. Statement Expressions?
Statement Expressions should be standardized. While it is related to these efforts, it is entirely separate and has a full, robust set of constraints and concerns in standardizing. It has more existing implementation experience, deployment experience, and implementer practice than any of Blocks or Nested Functions combined. Therefore, it will be pursued in a different proposal. This was briefly noted in a proposal collecting existing extensions in 2007 by Stoughton ([n1229]); while there was enthusiastic support at the time, nothing materialized of the mention nor the in-meeting enthusiasm. Some attempts are being made at standardizing it, but it is notably difficult to standardize due to the large number of corner cases that arise from needing to clarify semantics of constructs that normally cannot appear in certain places being able to suddenly appear there, like a break; being placed in the initializer expression of a for loop.
Another advantage of Statement Expressions is that, unlike any of Apple Blocks / C++ Lambdas / GNU Nested Functions, there is no separating function body. This is critical for writing macros that coordinate with one another, AND is critical in writing reusable macros that have no additional cost and does not set up extra individual entry points. For example, there are hundreds of permutations of the functions in C2y’s <stdmchar.h> that could be written to make them easier to use, to make them not require double-pointers, to infer the size from a C-style string, and so on, and so forth. The choice of having a bunch of macros which simply repeat the same code means not having to add hundreds of permutations of the <stdmchar.h> functions (5 different character types across 5 different encoding types with 6 forms of "pointer and length, just pointer" for input/output, pairwise with one another where order matters).
3.4.2. Wide Function Pointer Type?
We do hope that another paper creates a new "Wide Function Pointer" type of some kind. While the carat (^) cannot be used for this purpose thanks to Apple and Objective-C/Objective-C++ taking that design space, there are other possible spellings that can be utilized for this purpose. A proposal has already been written by Martin Uecker to solve this problem in C, though as of time of writing it still needs a good bit of work ([n2862]).
4. Design
THIS SECTION IS NOT STARTED AND INCOMPLETE.
4.1. Capture Functions: Rehydrated Nested Function
THIS SECTION IS INCOMPLETE.
int main () { int x = 3; int zero () { // OK, no external variables used return 0; } int double_it () { return x * 2; // constraint violation } int triple_it () _Capture(x) { return x * 3; // OK, x = 3 when called } int quadruple_it () _Capture(&x) { return x * 4; // OK, x = 5 when called } int quintuple_it () _Capture(=) { return x * 5; // OK, x = 3 when called } int sextuple_it () _Capture(&) { return x * 6; // OK, x = 5 when caled } x = 5; return zero() + triple_it() + quadruple_it() + quintuple_it() + sextuple_it(); // return 74; // 0 + (3 * 3) + (5 * 4) // (3 * 5) + (5 * 6) }Upsides:
-
Fixes ABI and allows Clang et. al. to implement.
-
Provides for explicit captures with a keyword.
-
Retains Nested Function look and feel.
-
Name mangling is less unclear (nested functions have a name, can be used to do recursion unlike Lambdas).
Downsides:
-
Not a real expression.
-
Cannot be used cleanly for macros!
-
Cannot be used in a for loop expression!
-
Cannot be quickly/anonymously defined as part of a function call argument!
-
-
Nested Function is an "object" now, with a lifetime and a size? (No other function-like thing in C or C++ behaves like this!)
-
Since it’s an Object, does not cleanly go into a function pointer unless something like § 5.2 Make Trampoline and Singular Function Pointers is deployed or GNU extensions are used.
-
It’s still Nested Functions, GCC will keep their extension and obviously that lures people into executable stack issues.
4.2. Lambdas
THIS SECTION IS INCOMPLETE.
int main () { int x = 3; auto zero = [] () { // OK, no external variables used return 0; } auto double_it = [] () { return x * 2; // constraint violation } auto triple_it = [x] () { return x * 3; // OK, x = 3 when called } auto quadruple_it = [&x] () { return x * 4; // OK, x = 5 when called } int quintuple_it = [=] () { return x * 5; // OK, x = 3 when called } auto sextuple_it = [&] () { return x * 6; // OK, x = 5 when caled } x = 5; return zero() + triple_it() + quadruple_it() + quintuple_it() + sextuple_it(); // return 74; // 0 + (3 * 3) + (5 * 4) // (3 * 5) + (5 * 6) }Upsides:
-
Compatible with C++.
-
Provides for explicit captures.
-
Name mangling and other questions already answered thanks to C++ (which is just unspecified, there’s plenty of room).
-
It’s an EXPRESSION, so it can go anywhere an EXPRESSION can go!
-
Usable for macros!
-
Usable in the init / condition of for loops!
-
Usable to quickly define to go right into a function call argument!
-
And in many more sneaky places! (While having its own scope, so no pesky shadowing or jump in/out issues!)
-
Downsides:
-
It’s an Object, does not cleanly go into a function pointer unless something like § 5.2 Make Trampoline and Singular Function Pointers is deployed.
-
Ugly as (REDACTED).
-
C people hate it, C++ people live with it (acknowledge it as superior technologically but also recognize it as ugly).
-
Side note: Rust people also acknowledge it as superior and have been moving towards allowing all the things C++ lambdas do in its capture expressions as something they can do in their own, including with the move keyword on their closures so they can move arguments in (something C++ could do since the beginning).
-
5. Appendix
5.1. Statement Expressions
THIS SECTION IS INCOMPLETE AND AT THE END WILL NOT BE INCLUDED IN THIS PAPER; A SEPARATE PAPER STANDARDIZING STATEMENT EXPRESSIONS WILL BE COMPLETED AS WHILE THE TWO ARE RELATED IT HAS SEVERAL PROPERTIES THAT ARE FUNDAMENTALLY SEPARATE FROM THIS ONE.
From lak; https://gist.github.com/LAK132/0d264549745e8196df1e632d5b518c37
5.2. Make Trampoline and Singular Function Pointers
In the later examples in § 3.1.3 Alternative Nested Function Implementations, a magic compiler builtin named __gnu_make_trampoline, with a secondary follow-on builtin named __gnu_destroy_trampoline, is used. This section talks about what that would look like, if it was to be implemented. In particular, an ideal solution that makes a trampoline needs to be an explicit request from the user because:
-
you want to opt-in to any dynamic allocations;
-
you want to provide a way to override the default allocation if possible;
-
and, you explicit control on when those resources (the allocation, the protected memory, and similar) are released.
While this section was spawned from GNU Nested Functions, this same technique can be used to make possible single function pointer trampolines for Blocks with or without captures (§ 3.2 Apple Blocks) as well as C++-style Lambdas (§ 3.3 C++-Style Lambdas).
Therefore, the best design to do this would be -- using the [_Any_func]* paper and its new type -- the following:
typedef void*(allocate_function_t)(size_t alignment, size_t size); typedef void(deallocate_function_t)(void* p, size_t alignment, size_t size); _Any_func* stdc_make_trampoline(FUNCTION-WITH-DATA-IDENTIFIER func); _Any_func* stdc_make_trampoline_with( FUNCTION-WITH-DATA-IDENTIFIER func, allocation_function_t* alloc ); void stdc_destroy_trampoline(_Any_func* func); void stdc_destroy_trampoline_with(_Any_func* func, deallocate_function_t* dealloc);stdc_make_trampoline(f) is identical to stdc_make_trampoline_with(f, aligned_alloc), and stdc_destroy_trampoline(f) is identical to stdc_destroy_trampoline_with(f, free_aligned_size). Providing an allocation and a deallocation function means that while the implementation controls what is done to the memory and how it gets set up, the user controls where that memory is surfaced from. This would prevent the problem of the Heap Alternative Nested Function implementation: rather than creating a special stack or having to rely on memory allocation functions, the compiler can instead source the memory from a user. This also makes such an allocation explicit, and means that its lifetime could be Though, given our memory primitives, a slightly better implementation that would allow the implementation to take care of (potentially) extra space handed down by alignment and what not would be:
struct allocation { void* data; size_t size; }; typedef allocation(allocate_function_t)(size_t alignment, size_t size); typedef void(deallocate_function_t)(void* p, size_t alignment, size_t size); _Any_func* stdc_make_trampoline(FUNCTION_TYPE* func); _Any_func* stdc_make_trampoline_with(FUNCTION_TYPE* func, allocation_function_t* alloc); void stdc_destroy_trampoline(_Any_func* func); void stdc_destroy_trampoline_with(_Any_func*, deallocate_function_t* dealloc);Regardless the form that the make/destroy functions take, this sort of intrinsic would be capable of lifting not just a typical GNU nested functions but all types of functions to be a single, independent function pointer with some kind of backing storage. Some desire may still exist to make the allocation and deallocation process automatic, but that should be left to compiler vendors to decide for ease-of-use tradeoffs versus e.g. security, like in § 3.1.2 Early Design Flaw: Nested Functions turn the stack Executable!.
5.3. Executable Stack CVEs
WARNING: THIS SECTION IS INCOMPLETE.
The following CVEs are related to executable stack issues.