Benchmark Datasets

Benchmarking LLMs for code translation is essential to understand the capabilities and limitations of the LLM in your dataset. For this purpose, there exist several benchmark datasets that are widely used in the research community. This page provides a list of such benchmarks with download links. They can be easily used with CodeTransEngine to evaluate your LLM or translation methodology.

Benchmarks

Dataset Name	Description	Number of Translation Pairs	Languages Supported	Download Link
TransCoder	Coding problems and solutions from GeeksforGeeks, manually curated and verified for correctness by Yang et al. Packaged by InterTrans authors.	2,826	C++, Java, Python	Download
HumanEval-X	Extended HumanEval dataset with solutions and test cases in six programming languages. Subset created by InterTrans Authors.	1,050	C++, Go, Java, JavaScript, Python, Rust	Download
CodeNet	Sourced from AIZU and AtCoder competitive programming websites. Subset created by InterTrans authors.	1,050	C++, Go, Java, JavaScript, Python, Rust	Download