Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Hurmatullah/SemanticCodeCloneBERT

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SemanticCodeCloneBERT

SemanticCodeCloneBERT is a fine-tuned CodeBERT model designed for different specific task, but in this repository, it has been used for semantic code clone detection.


Table of Contents

  1. About the Project
  2. Model Fine-Tuning
  3. Dataset
  4. References

About the Project

Semantic code clone detection involves identifying functionally similar code fragments, even when their syntax differs significantly. This project leverages a fine-tuned CodeBERT model to detect semantic equivalence in code, enabling applications in software maintenance, plagiarism detection, and code search optimization. The fine-tuning process adapts the pre-trained model to better capture functional similarities across diverse programming languages. Moreover, it has to be mentioned that, this fine-tuned model only focused on python programming language source code in existed dataset.


Model Fine-Tuning

The base CodeBERT base model was fine-tuned using transfer learning techniques. Here are the details:

  • Pre-trained Model: CodeBERT (Microsoft pre-trained transformer model).

  • Result:

    Model Loss Accuracy Precision Recall F1
    CodeBERT-base 0.058 0.987 0.980 0.994 0.987

Dataset


References

  • Farouq Al-Omari, Chanchal K. Roy, and Tonghao Chen. Semanticclonebench: A semantic code clone benchmark using crowd-source knowledge. In 2020 IEEE 14th International Workshop on Software Clones (IWSC), pages 57–63, 2020.
  • Saad Arshad, Shamsa Abid, and Shafay Shamail. Codebert for code clone detection: A replication study. In 2022 IEEE 16th International Workshop on Software Clones (IWSC), pages 39–45, 2022. CodeBERT for Code Clone Detection Replication Pack. https://doi.org/10.5281/zenodo.6361315, 2022.
  • Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., & Zhou, M. (2020). CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv preprint arXiv:2002.08155, 2020. Model Repository: https://github.com/microsoft/CodeBERT

About

SemanticCodeCloneBERT is a fine-tuned CodeBERT model designed for different specific task, but in this repository, it has been used for semantic code clone detection.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Morty Proxy This is a proxified and sanitized view of the page, visit original site.