ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Original Paper

Research and practice of plagiarism detection in program code assignments by college students

Cite this:
  • Received Date: 05 June 2020
  • Accepted Date: 24 June 2020
  • Rev Recd Date: 24 June 2020
  • Publish Date: 31 August 2020
  • The programming ability of students directly reflects the learning effect of technical courses. The proportion of program code assignments are increasing in teaching evaluation. The low cost of plagiarism of program code homework leads to the widespread plagiarism in colleges and universities, which seriously affects the cultivation of students’ ability and the effect of teaching. To this end, a method for homework plagiarism detection is proposed by combining the artificial intelligence algorithm with data processing analysis technology to detect similarities in students’ homework intelligently and automatically, and analyze the overall situation of plagiarism. First, the complex situation of the program code assignments submitted by students is analyzed, and the data pre-processing process is designed. Then, the similarity detection algorithm for program code assignments based on KR and Winnowing is specifically proposed. Compared with the traditional detection methods, the accuracy of similarity detection in students’ homework is improved by such means as code formatting. In the practice of large-scale homework detection, the research optimization algorithm increases the differentiation of similarity results in different students’ homework. To verify the validity and practicability of the core similarity calculation part of this paper, a relevant simulation experiment process (including the comparison with JPlag detection system), was designed and the similarity calculation results were given under different plagiarism types on the same experimental data set. Finally, based on iFLYTEK’s Bosi intelligent online learning platform, the research has been applied in real scenarios. The experimental results and practical application results show that the proposed detection method has high validity and application value in the detection of similarities in program code assignments by college students.
    The programming ability of students directly reflects the learning effect of technical courses. The proportion of program code assignments are increasing in teaching evaluation. The low cost of plagiarism of program code homework leads to the widespread plagiarism in colleges and universities, which seriously affects the cultivation of students’ ability and the effect of teaching. To this end, a method for homework plagiarism detection is proposed by combining the artificial intelligence algorithm with data processing analysis technology to detect similarities in students’ homework intelligently and automatically, and analyze the overall situation of plagiarism. First, the complex situation of the program code assignments submitted by students is analyzed, and the data pre-processing process is designed. Then, the similarity detection algorithm for program code assignments based on KR and Winnowing is specifically proposed. Compared with the traditional detection methods, the accuracy of similarity detection in students’ homework is improved by such means as code formatting. In the practice of large-scale homework detection, the research optimization algorithm increases the differentiation of similarity results in different students’ homework. To verify the validity and practicability of the core similarity calculation part of this paper, a relevant simulation experiment process (including the comparison with JPlag detection system), was designed and the similarity calculation results were given under different plagiarism types on the same experimental data set. Finally, based on iFLYTEK’s Bosi intelligent online learning platform, the research has been applied in real scenarios. The experimental results and practical application results show that the proposed detection method has high validity and application value in the detection of similarities in program code assignments by college students.
  • loading
  • [1]
    傅钢善. 教育技术发展轨迹探讨[J].电化教育研究,2005(09):22-26.
    HONG C M, CHEN C M, CHANG M H, et al. Intelligent web-based tutoring system with personalized learning path guidance[J]. Computers &Education, 2008, 51( 2) : 787-814.
    韦琳,袁泉,霍剑青,等. E-learning非结构化数据管理系统的构建与实现[J].中国科学技术大学学报,2010,40(06):623-628.
    黄振亚,苏喻,吴润泽,等. 一种面向教育评估的智能教育辅助平台[J].中国科学技术大学学报,2015,45(10):846-854.
    韩冰. 基于FTP教学平台的代码相似度检测的研究[J].计算机光盛软件与应用, 2012 (09):217-218.
    LECUN Y, BOTTOU L, BENGIO Y. Gradient -Based Learning Applied to Document Recognition[M]. Proceedings of the IEEE,1998.
    KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM , 2012:60 (2):1-9.
    殷丹平. 基于CNN的代码相似度检测研究与代码查重系统[D]. 北京:北京邮电大学,2018.
    BAXTER. Clone detection using abstract syntax trees[J]. International Conference on Software Maintenance 1998, 1998:368-377.
    KOSCHKE R. Clone detection using abstract syntax suffix trees[J]. Working Conference on Reverse Engineering 2006, 2006:253-262.
    SCHLEIMER S, WILKERSON D S, AIKEN A. Winnowing :Local algo-rithms for document fingerprinting[C]∥Proc of the 2003 ACM SIGMOD Int’ l Conf on Management of Data, 2003:76-85.
    KARP R M, RABIN M O. Efficient randomized pattern-matching algorithms[J]. IBM Journal of Research and Development, 1987:115-126.
    苏德富, 钟诚. 计算机算法设计与分析[M] .北京:电子工业出版社, 2001.
    张文典,任冬伟. 程序抄袭判定系统[J].小型微型计算机系统,1988(10):34-39.
    朱江. 基于XML的程序设计自动批改的研究[D]. 湘潭:湘潭大学,2005.
    王继远. 一种用于软件作业评判系统的程序结构分析算法的设计与实现[D]. 北京:北京邮电大学,2007.
    赵长海,晏海华,金茂忠. 基于编译优化和反汇编的程序相似性检测方法[J].北京航空航天大学学报,2008(06):711-715.
    张鹏,王国胤,陶春梅,等. 基于本体粗糙集的程序代码相似度度量方法[J].重庆邮电大学学报(自然科学版),2008,20(06):737-741.
    熊浩,晏海华,赫建营,等. 一种基于静态词法树的程序相似性检测方法[J].计算机应用研究,2009,26(04):1316-1319+1326.
    王春晖. 程序代码抄袭检测中串匹配算法的研究与实现[D]. 内蒙古呼和浩特:内蒙古师范大学,2008.
    VERCO K L, WISE M J. Software for detecting suspected plagiarism: Comparing structure and attribute-counting systems[C]. Australasian Conference on Computer Science Education, 1996: 81-88.
    KOSCHKE R, FALKE R, FRENZEL P, et al. Clone Detection Using Abstract Syntax Suffix Trees[C]. Working Conference on Reverse Engineering, 2006: 253-262.
    KAMIYA T, KUSUMOTO S, INOUE K, et al. CCFinder: A multilinguistic token-based code clone detection system for large scale source code[J]. IEEE Transactions on Software Engineering, 2002, 28(7): 654-670.
    ELMATARAWY A, ELRAMLY M, BAHGAT R, et al. Code clone detection using sequential pattern mining[J].International Journal of Computer Applications, 2015, 127(2): 10-18.
    BAKER B S. Parameterized duplication in strings: Algorithms and an application to software maintenance[J]. SIAM Journal on Computing, 1997, 26(5): 1343-1362.
    Wise M J. Running karp-rabin matching and greedy string tiling[J]. Software - Practice and Experience,1993.
    PRECHELT L, MALPOHL G, PHILIPPSEN M. Finding plagiarisms among a set of programs with JPlag[J]. Universal Computer Science, 2000, 8(11):1016-1038.
    GRANVILLE A. Detecting Plagiarism in Java Code[J]. Supervisor Yorick Wilks, 2002.
    RAGKHITWETSAGUL C, KRINKE J, MARNETTE B, et al. A picture is worth a thousand words: Code clone detection based on image similarity[C]. International Workshop on Software Clones, 2018: 44-50.
    牛永洁,张成. 多种字符串相似度算法的比较研究[J].计算机与数字工程,2012,40(03):14-17.
    MYLES G, COLBERG C. Detecting Software Theft via Whole Program Path Birthmarks[M]. Springer Berlin Heidelberg, 2004.
    XIE X, LIU F L, LU B, et al. A software birthmark based on weighted k-gram [C]//2010 IEEE International Conference on Intelligent Computing and Intelligent Systems. IEEE, 2010.
    MATHEWS B W. Comparison of the predicted and observed secondary structure of T4 phnge lysozyme [J]. Biochim Biophys Acta, 1975, 405(2):442-451.)
  • 加载中


    傅钢善. 教育技术发展轨迹探讨[J].电化教育研究,2005(09):22-26.
    HONG C M, CHEN C M, CHANG M H, et al. Intelligent web-based tutoring system with personalized learning path guidance[J]. Computers &Education, 2008, 51( 2) : 787-814.
    韦琳,袁泉,霍剑青,等. E-learning非结构化数据管理系统的构建与实现[J].中国科学技术大学学报,2010,40(06):623-628.
    黄振亚,苏喻,吴润泽,等. 一种面向教育评估的智能教育辅助平台[J].中国科学技术大学学报,2015,45(10):846-854.
    韩冰. 基于FTP教学平台的代码相似度检测的研究[J].计算机光盛软件与应用, 2012 (09):217-218.
    LECUN Y, BOTTOU L, BENGIO Y. Gradient -Based Learning Applied to Document Recognition[M]. Proceedings of the IEEE,1998.
    KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM , 2012:60 (2):1-9.
    殷丹平. 基于CNN的代码相似度检测研究与代码查重系统[D]. 北京:北京邮电大学,2018.
    BAXTER. Clone detection using abstract syntax trees[J]. International Conference on Software Maintenance 1998, 1998:368-377.
    KOSCHKE R. Clone detection using abstract syntax suffix trees[J]. Working Conference on Reverse Engineering 2006, 2006:253-262.
    SCHLEIMER S, WILKERSON D S, AIKEN A. Winnowing :Local algo-rithms for document fingerprinting[C]∥Proc of the 2003 ACM SIGMOD Int’ l Conf on Management of Data, 2003:76-85.
    KARP R M, RABIN M O. Efficient randomized pattern-matching algorithms[J]. IBM Journal of Research and Development, 1987:115-126.
    苏德富, 钟诚. 计算机算法设计与分析[M] .北京:电子工业出版社, 2001.
    张文典,任冬伟. 程序抄袭判定系统[J].小型微型计算机系统,1988(10):34-39.
    朱江. 基于XML的程序设计自动批改的研究[D]. 湘潭:湘潭大学,2005.
    王继远. 一种用于软件作业评判系统的程序结构分析算法的设计与实现[D]. 北京:北京邮电大学,2007.
    赵长海,晏海华,金茂忠. 基于编译优化和反汇编的程序相似性检测方法[J].北京航空航天大学学报,2008(06):711-715.
    张鹏,王国胤,陶春梅,等. 基于本体粗糙集的程序代码相似度度量方法[J].重庆邮电大学学报(自然科学版),2008,20(06):737-741.
    熊浩,晏海华,赫建营,等. 一种基于静态词法树的程序相似性检测方法[J].计算机应用研究,2009,26(04):1316-1319+1326.
    王春晖. 程序代码抄袭检测中串匹配算法的研究与实现[D]. 内蒙古呼和浩特:内蒙古师范大学,2008.
    VERCO K L, WISE M J. Software for detecting suspected plagiarism: Comparing structure and attribute-counting systems[C]. Australasian Conference on Computer Science Education, 1996: 81-88.
    KOSCHKE R, FALKE R, FRENZEL P, et al. Clone Detection Using Abstract Syntax Suffix Trees[C]. Working Conference on Reverse Engineering, 2006: 253-262.
    KAMIYA T, KUSUMOTO S, INOUE K, et al. CCFinder: A multilinguistic token-based code clone detection system for large scale source code[J]. IEEE Transactions on Software Engineering, 2002, 28(7): 654-670.
    ELMATARAWY A, ELRAMLY M, BAHGAT R, et al. Code clone detection using sequential pattern mining[J].International Journal of Computer Applications, 2015, 127(2): 10-18.
    BAKER B S. Parameterized duplication in strings: Algorithms and an application to software maintenance[J]. SIAM Journal on Computing, 1997, 26(5): 1343-1362.
    Wise M J. Running karp-rabin matching and greedy string tiling[J]. Software - Practice and Experience,1993.
    PRECHELT L, MALPOHL G, PHILIPPSEN M. Finding plagiarisms among a set of programs with JPlag[J]. Universal Computer Science, 2000, 8(11):1016-1038.
    GRANVILLE A. Detecting Plagiarism in Java Code[J]. Supervisor Yorick Wilks, 2002.
    RAGKHITWETSAGUL C, KRINKE J, MARNETTE B, et al. A picture is worth a thousand words: Code clone detection based on image similarity[C]. International Workshop on Software Clones, 2018: 44-50.
    牛永洁,张成. 多种字符串相似度算法的比较研究[J].计算机与数字工程,2012,40(03):14-17.
    MYLES G, COLBERG C. Detecting Software Theft via Whole Program Path Birthmarks[M]. Springer Berlin Heidelberg, 2004.
    XIE X, LIU F L, LU B, et al. A software birthmark based on weighted k-gram [C]//2010 IEEE International Conference on Intelligent Computing and Intelligent Systems. IEEE, 2010.
    MATHEWS B W. Comparison of the predicted and observed secondary structure of T4 phnge lysozyme [J]. Biochim Biophys Acta, 1975, 405(2):442-451.)

    Article Metrics

    Article views (152) PDF downloads(354)
    Proportional views


    DownLoad:  Full-Size Img  PowerPoint