Abstract:
Large Language Models (LLMs) have become prominent in the software development life cycle, yet the generation of performant source code, particularly through automatic parallelization, remains underexplored. This study compares 23 pre-trained LLMs against the Intel C Compiler (icc), a state-of-the-art auto-parallelization tool, to evaluate their effectiveness in transforming sequential C source code into parallelized versions. Using 30 kernels from the PolyBench C benchmarks, we generated 667 parallelized code versions to assess LLMs’ zero-shot parallelization capabilities. Our experiments reveal that LLMs can outperform icc in non-functional aspects like speedup, with 26.66% of cases surpassing icc’s performance. The best LLM-generated code achieved a 7.5× speedup compared to icc’s 1.08×. However, only 90 of the 667 generated versions (13.5%) were error-free and functionally correct, underscoring significant reliability challenges. After filtering out versions with compilation errors or data race issues through detailed memory and threading analysis, notable performance gains were observed. Challenges include increased cache miss rates and branch misses with higher thread counts, indicating that simply adding threads does not ensure better performance. Optimizing memory access, managing thread interactions, and validating code correctness are critical for LLM-generated parallel code. Our findings demonstrate that, even without fine-tuning or advanced prompting techniques, pre-trained LLMs can compete with decades-old non-LLM compiler technology in zero-shot sequential-to-parallel code translation. This highlights their potential in automating code parallelization while emphasizing the need to address reliability and performance optimization challenges.