Benchmark Datasets
Benchmarking LLMs for code translation is essential to understand the capabilities and limitations of the LLM in your dataset. For this purpose, there exist several benchmark datasets that are widely used in the research community. This page provides a list of such benchmarks with download links. They can be easily used with CodeTransEngine to evaluate your LLM or translation methodology.
Benchmarks
Dataset Name | Description | Number of Translation Pairs | Languages Supported | Download Link |
---|---|---|---|---|
TransCoder | Coding problems and solutions from GeeksforGeeks, manually curated and verified for correctness by Yang et al. Packaged by InterTrans authors. | 2,826 | C++, Java, Python | Download |
HumanEval-X | Extended HumanEval dataset with solutions and test cases in six programming languages. Subset created by InterTrans Authors. | 1,050 | C++, Go, Java, JavaScript, Python, Rust | Download |
CodeNet | Sourced from AIZU and AtCoder competitive programming websites. Subset created by InterTrans authors. | 1,050 | C++, Go, Java, JavaScript, Python, Rust | Download |