Main Results
Experimental results clearly demonstrate the significant efficacy of the ComBack dataset. Fine-tuning six representative pre-trained language models with ComBack led to substantial accuracy improvements across the three core compiler backend development tasks: statement-level completion, next-statement suggestion, and code generation. Specifically, Edit Distance (ED) scores improved by an average of 41.64 to 77.21 points, and Exact Match (EM) accuracy for statement completion tasks saw absolute gains ranging from 42.58% to 67.77%. This overall enhancement is clearly illustrated in Table 2, which showcases the broad impact of ComBack.

Crucially, when tasked with generating code for new targets of existing types (such as RISC-V, ARC, and NVPTX), a CodeT5+ model with only 220M parameters, after fine-tuning on ComBack, significantly outperformed both ChatGPT-3.5-Turbo and Code-LLaMA-34B-Instruct, as detailed in Table 3. Furthermore, the fine-tuned CodeT5+ also demonstrated superior performance compared to the conventional 'Fork-Flow' development method for code generation, a comparison visually presented in Figure 6.

As shown in Table 5, by fine-tuning CodeT5+ with ComBack iteratively expanded to include data for a specific customized target (RI5CY), significant accuracy improvements were achieved for that target across all three tasks. Specifically, for RI5CY, Statement-Level Completion EM increased by +7.90%, Next-Statement Suggestion EM by +9.96%, and Code Generation BLEU-4 by +25.05, showcasing the dataset's ability to enhance model performance for specialized, evolving targets.
