Skip to content

Benchmark Datasets

Benchmarking LLMs for code translation is essential to understand the capabilities and limitations of the LLM in your dataset. For this purpose, there exist several benchmark datasets that are widely used in the research community. This page provides a list of such benchmarks with download links. They can be easily used with CodeTransEngine to evaluate your LLM or translation methodology.

Benchmarks

Dataset NameDescriptionNumber of Translation PairsLanguages SupportedDownload Link
TransCoderCoding problems and solutions from GeeksforGeeks, manually curated and verified for correctness by Yang et al. Packaged by InterTrans authors.2,826C++, Java, PythonDownload
HumanEval-XExtended HumanEval dataset with solutions and test cases in six programming languages. Subset created by InterTrans Authors.1,050C++, Go, Java, JavaScript, Python, RustDownload
CodeNetSourced from AIZU and AtCoder competitive programming websites. Subset created by InterTrans authors.1,050C++, Go, Java, JavaScript, Python, RustDownload