SemanticCodeCloneBERT

SemanticCodeCloneBERT is a fine-tuned CodeBERT model designed for different specific task, but in this repository, it has been used for semantic code clone detection.

About the Project

Semantic code clone detection involves identifying functionally similar code fragments, even when their syntax differs significantly. This project leverages a fine-tuned CodeBERT model to detect semantic equivalence in code, enabling applications in software maintenance, plagiarism detection, and code search optimization. The fine-tuning process adapts the pre-trained model to better capture functional similarities across diverse programming languages. Moreover, it has to be mentioned that, this fine-tuned model only focused on python programming language source code in existed dataset.

Model Fine-Tuning

The base CodeBERT base model was fine-tuned using transfer learning techniques. Here are the details:

Pre-trained Model: CodeBERT (Microsoft pre-trained transformer model).
Result:

Model Loss Accuracy Precision Recall F1

CodeBERT-base 0.058 0.987 0.980 0.994 0.987

Dataset

Source: https://drive.google.com/open?id=1KicfslV02p6GDPPBjZHNlmiXk-9IoGWl.
Description: For further details on this dataset, please refer to the original publication cited in the References section.:

References

Farouq Al-Omari, Chanchal K. Roy, and Tonghao Chen. Semanticclonebench: A semantic code clone benchmark using crowd-source knowledge. In 2020 IEEE 14th International Workshop on Software Clones (IWSC), pages 57–63, 2020.
Saad Arshad, Shamsa Abid, and Shafay Shamail. Codebert for code clone detection: A replication study. In 2022 IEEE 16th International Workshop on Software Clones (IWSC), pages 39–45, 2022. CodeBERT for Code Clone Detection Replication Pack. https://doi.org/10.5281/zenodo.6361315, 2022.
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., & Zhou, M. (2020). CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv preprint arXiv:2002.08155, 2020. Model Repository: https://github.com/microsoft/CodeBERT

Name	Name	Last commit message	Last commit date
Latest commit History 16 Commits 16 Commits
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
Semantic_code_clone_detection.ipynb	Semantic_code_clone_detection.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SemanticCodeCloneBERT

Table of Contents

About the Project

Model Fine-Tuning

Dataset

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

SemanticCodeCloneBERT

Table of Contents

About the Project

Model Fine-Tuning

Dataset

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages