VEGA: Automatically Generating Compiler Backends using a Pre-trained Transformer Model

We introduce VEGA, an AI-driven system aimed at easing the development of compiler backends for new targets. Our approach involves categorizing functions from existing backends into function groups, each comprising various target specific implementations of a standard compiler interface function, abstracted as a single function template. There fore, generating a new backend involves customizing these function templates to specific target requirements. To capitalize on AI’s capabilities in code generation, VEGA maps statements in a target-specific version of a function template into feature vectors, distinguishing between target independent and target-specific properties. Leveraging a pre-trained model, VEGA can efficiently auto-generate a version of each function template tailored to a specific target, thereby enabling the construction of a complete compiler backend for a new target based solely on its target description files.

We evaluated VEGA on three distinct targets: a CPU processor (RISC-V), a customized processor with instruction extensions (RI5CY), and an IoT processor (xCORE). VEGA demonstrated high efficiency, generating compiler backends under an hour, which can substantially enhance developer productivity. Across the three targets, VEGA achieved accuracy rates of 71.5%, 73.2%, and 62.2% for all generated functions, significantly outperforming the traditional forkflow method, which yielded less than 8% accuracy. Moreover, VEGA provides explicit confidence scores for generated functions and statements, allowing developers to easily identify areas requiring minimal manual intervention. This research has the potential to improve the effectiveness of traditional compiler backend development.

VEGA is an automated system that uses a pre-trained transformer model to generate compiler backends, identifying target-specific and target-independent features to produce code for new architectures efficiently, requiring only their description files as input.

Main Results

VEGA demonstrates significant improvements in compiler backend generation efficiency and accuracy. It can generate complete backends for new targets like RISC-V, RI5CY, and xCORE in under an hour. Evaluation using pass@1 on LLVM regression tests shows average function-level accuracy rates of 71.5% (RISC-V), 73.2% (RI5CY), and 62.2% (xCORE). This vastly outperforms the traditional fork-flow approach, which achieved less than 8% accuracy for these targets. VEGA also provides confidence scores, aiding developers in identifying potentially incorrect code sections, further enhancing productivity.

Compared to the manual effort required by the traditional fork-flow method, VEGA drastically reduces the need for modifications. While fork-flow required over 85% of statements to be manually changed, VEGA achieves high statement-level accuracy (e.g., 55.0% for RISC-V, 58.5% for RI5CY) automatically, minimizing manual intervention.

Example

The paper presents a complete example to demonstrate VEGA’s workflow: automatically generating the getRelocType function for the RISC-V target. It then abstracts a function template, identifying target-independent and target-dependent parts.

First, VEGA collects existing implementations of getRelocType from ARM and MIPS backends and aligns their code using GumTree.

Next, VEGA extracts feature vectors from these implementations, capturing semantic properties relevant to backend generation. Using these feature vectors, it fine-tunes a pre-trained transformer model. Finally, given only the RISC-V target description files, VEGA generates a new RISC-V-specific getRelocType function. Throughout the process, it assigns confidence scores to each generated statement, helping developers quickly locate and fix uncertain parts.

BibTex

  

@inproceedings{zhong2025vega,
  title={VEGA: Automatically Generating Compiler Backends Using a Pre-Trained Transformer Model},
  author={Ming Zhong, Fang Lv, Lulin Wang, Lei Qiu, Yingying Wang, Ying Liu, Huimin Cui, Xiaobing Feng, Jingling Xue},
  booktitle={2025 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)},
  year={2025}
}

VEGA: Automatically Generating Compiler Backends using a Pre-trained Transformer Model

Overview

Main Results

Example

Quick Start

BibTex