Automatic Target Description File Generation

1State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
2University of Chinese Academy of Sciences, Beijing 100049, China
3School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia
*Corresponding Author

Overview

Agile hardware design is gaining increasing momentum and bringing new chips in larger quantities to the market faster. However, it also takes new challenges for compiler developers to retarget existing compilers to these new chips in shorter time than ever before. Currently, retargeting a compiler backend, e.g., an LLVM backend to a new target, requires compiler developers to write manually a set of target description files (totalling 10 300+ lines of code (LOC) for RISC-V in LLVM), which is error-prone and time-consuming. In this paper, we introduce a new approach, Automatic Target Description File Generation (ATG), which accelerates the generation of a compiler backend for a new target by generating its target description files automatically. Given a new target, ATG proceeds in two stages. First, ATG synthesizes a small list of target-specific properties and a list of code-layout templates from the target description files of a set of existing targets with similar instruction set architectures (ISAs). Second, ATG requests compiler developers to fill in the information for each instruction in the new target in tabular form according to the list of target-specific properties synthesized and then generates its target description files automatically according to the list of code-layout templates synthesized. The first stage can often be reused by different new targets sharing similar ISAs. We evaluate ATG using nine RISC-V instruction sets drawn from a total of 1 029 instructions in LLVM 12.0. ATG enables compiler developers to generate compiler backends for these ISAs that emit the same assembly code as the existing compiler backends for RISC-V but with significantly less development effort (by specifying each instruction in terms of up to 61 target-specific properties only).

The Architecture of ATG

ATG automates the creation of compiler target description files (like LLVM's *.td files). It analyzes existing targets to extract a reusable property list (TSP-List) and code templates (CLT-List). For new hardware, developers only provide values for the TSP-List, and ATG uses the templates to automatically generate the full description files, greatly reducing manual effort and speeding up compiler support for new chips.

Main Results

The core experimental results demonstrate that ATG successfully generates complete target description (*.td) files for complex RISC-V ISAs, encompassing up to 1029 instructions across various standard and custom extensions. While its primary similarity-matching scheme handled most cases, a complementary auxiliary scheme ensured full instruction coverage, addressing the few failures encountered with highly customized instructions.

Regarding development effort, although the initial model synthesis requires about four days (a one-time cost), and target specification takes 1-7 days (comparable to initial manual efforts), the subsequent automatic generation of *.td files completes in mere minutes per target. This represents a significant reduction compared to the months typically required for manual creation.

ATG Result

Crucially, the compiler backends generated using ATG's output (LLVMATG) proved functionally equivalent to the standard, manually developed LLVM backend. This was validated by producing identical assembly code and binaries for SPEC 2017 C/C++ benchmarks and successfully passing the entire suite of approximately 15,600 LLVM regression tests, matching the standard backend's results. Furthermore, using LLVMATG introduced no noticeable compilation time overhead.

ATG comparison with LLVM

The primary trade-off identified is that the automatically generated *.td files are 2x-3x larger in lines of code than their manual counterparts, though this did not negatively impact correctness or runtime performance. In summary, the experiments confirm ATG as an effective method for automating target description file generation, drastically reducing manual effort while maintaining functional correctness and performance.

Example

Figure 4 showcases ATG's process using the RISC-V c.jr instruction, contrasting the complex, hand-written TableGen description typically required in LLVM (shown in Fig. 4b) with ATG's streamlined approach. Instead of manually coding the intricate details involving classes and specific record definitions, ATG only requires the compiler developer to provide values for a small, pre-defined set of target-specific properties (TSP-List), such as the instruction's size, opcode value, and bit range, as depicted conceptually in Fig. 4d and 4k. Leveraging a pre-synthesized library of code-layout templates (CLT-List) derived from analyzing existing ISAs (like the MIPS examples in Fig. 4e-j), ATG employs similarity-based matching (Fig. 4l) to select the most appropriate template based on the provided properties. It then automatically instantiates this template with the developer-supplied values to generate the final, functionally equivalent TableGen description (Fig. 4c), thereby significantly reducing development effort by replacing complex coding with simple data specification.

ATG Example

BibTex

  
@article{10.1007/s11390-022-1919-x,
author = {Geng, Hong-Na and Lyu, Fang and Zhong, Ming and Cui, Hui-Min and Xue, Jingling and Feng, Xiao-Bing},
title = {Automatic Target Description File Generation},
year = {2023},
issue_date = {Dec 2023},
publisher = {Springer-Verlag},
address = {Berlin, Heidelberg},
volume = {38},
number = {6},
issn = {1000-9000},
url = {https://doi.org/10.1007/s11390-022-1919-x},
doi = {10.1007/s11390-022-1919-x},
abstract = {Agile hardware design is gaining increasing momentum and bringing new chips in larger quantities to the market faster. However, it also takes new challenges for compiler developers to retarget existing compilers to these new chips in shorter time than ever before. Currently, retargeting a compiler backend, e.g., an LLVM backend to a new target, requires compiler developers to write manually a set of target description files (totalling 10 300+ lines of code (LOC) for RISC-V in LLVM), which is error-prone and time-consuming. In this paper, we introduce a new approach, Automatic Target Description File Generation (ATG), which accelerates the generation of a compiler backend for a new target by generating its target description files automatically. Given a new target, ATG proceeds in two stages. First, ATG synthesizes a small list of target-specific properties and a list of code-layout templates from the target description files of a set of existing targets with similar instruction set architectures (ISAs). Second, ATG requests compiler developers to fill in the information for each instruction in the new target in tabular form according to the list of target-specific properties synthesized and then generates its target description files automatically according to the list of code-layout templates synthesized. The first stage can often be reused by different new targets sharing similar ISAs. We evaluate ATG using nine RISC-V instruction sets drawn from a total of 1 029 instructions in LLVM 12.0. ATG enables compiler developers to generate compiler backends for these ISAs that emit the same assembly code as the existing compiler backends for RISC-V but with significantly less development effort (by specifying each instruction in terms of up to 61 target-specific properties only).},
journal = {J. Comput. Sci. Technol.},
month = nov,
pages = {1339–1355},
numpages = {17},
keywords = {retargetability, compiler, target description, target backend, automatic generator}
}