Shuyao Cheng1,2,3, Pengwei Jin1,2,3, Qi Guo1, Zidong Du1,4, Rui Zhang1, Xing Hu1,4, Yongwei Zhao1, Yifan Hao1, Xiangtao Guan5, Husheng Han1,2, Zhengyue Zhao1,2, Ximing Liu1,2, Xishan Zhang1,3, Yuejie Chu1, Weilong Mao1, Tianshi Chen3, Yunji Chen1,2
1State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences
2University of Chinese Academy of Sciences
3Cambricon Technologies
4Shanghai Innovation Center for Processor Technologies
5University of Science and Technology of China
Overview
Designing a central processing unit (CPU) requires intensive manual work by talented experts to implement the circuit logic from design specifications, involving an iterative process that demands significant effort in programming, debugging, and verification (shown in Figure 1 (a)). Although considerable progress has been made in electronic design automation (EDA) to relieve human efforts, all existing tools require hand-crafted formal program codes (e.g., Verilog, Chisel, or C) as the input.
To automate CPU design without human programming, we are motivated to learn the CPU design from only input-output (IO) examples, which are generated from test cases of design specification (shown in Figure 1 (b)). The key challenge is that the learned CPU design must have near-zero tolerance for inaccuracy, rendering well-known approximate algorithms, such as neural networks, ineffective.
We propose a novel AI approach to generate the CPU design as a large-scale Boolean function, using only external IO examples instead of formal program code. This approach employs a new graph structure called the Binary Speculative Diagram (BSD) to accurately approximate the CPU-scale Boolean function. We introduce an efficient BSD expansion method based on Boolean Distance, a new metric to quantitatively measure the structural similarity between Boolean functions, gradually achieving 100% design accuracy.
Our approach generates an industrial-scale RISC-V CPU design in just 5 hours which is over 1700× larger than existing work (shown in Table 1), reducing the design cycle by approximately 1000× without human involvement. The taped-out chip, Enlightenment-1, the world's first CPU designed by AI, successfully runs the Linux operating system and performs comparably to the human-designed Intel 80486SX CPU. Remarkably, our approach autonomously rediscovers human knowledge of the von Neumann architecture.


Enlightenment-1 (QiMeng-1): The World's First Automatically Generated CPU
We use the proposed approach to automatically generate a 32-bit RISC-V CPU, Enlightenment-1, within 5 hours, and demonstrate that the approach can discover human knowledge of von Neumann architecture.
Automatically Design a RISC-V CPU
We use the proposed approach to generate the CPU design from a relatively small set of IO examples. Concretely, the CPU has 1789 input bits and 1826 output bits, and thus the total number of IO examples is 1826 × 21798, while only less than 240 IO examples are randomly sampled for training. The training process takes less than 5 hours to achieve an accuracy of >99.99999999999% for validation tests. The generated CPU design then undergoes the physical design process with scripts at 65nm technology to generate the layout for fabrication,and the detailed hardware characteristics are listed in Table 2. The layout of the entire chip with major components marked, the manufactured chip with a frequency of 300 MHz, and the printed circuit board containing the chip are illustrated below.


Perform Comparably to Intel 80486SX CPU
We successfully run the Linux (kernel 5.15) operating system and SPEC CINT2000 on Enlightenment-1 to validate the functionality (see Figure 3 (a) below). We also use the widely-used Dhrystone to evaluate the performance. The Figure 3 (b) below compares the performance of Enlightenment-1 against different generations of commercial CPUs, e.g., Intel 80386 (1980s), Intel 80486SX (1990s), and Intel Pentium III (2000s). On the evaluated program, it performs comparably to Intel 80486SX, designed in mid-1991. Though Enlightenment-1 performs worse than modern processors such as Intel Core i7 3930K, it is the world’s first automatically designed CPU, and its performance could be significantly improved with augmented algorithms, which is left as our future work.

Discover the von Neumann Architecture
By detailing the generated circuit logic of Enlightenment-1, we demonstrate that our approach discovers human knowledge of von Neumann architecture only from the IO examples. Concretely, the generated CPU design in terms of BSD has the key component of the von Neumann architecture, which mainly consists of the control unit generated first in the BSD for global control, and the arithmetic unit (see Figure 4). The control unit generates the controlling signals for the entire CPU, and the arithmetic unit accomplishes arithmetic operations (e.g., ADD and SUB) and logic operations (e.g., AND and OR). Moreover, we observe that both the control unit and arithmetic unit can be recursively decomposed into smaller functional modules such as the instruction decoder, ALU, and LSU (load/store unit) by expanding more BSD layers.
