Analyzing Non-Textual Content Elements to Detect Academic Plagiarism by Norman Meuschke
				
							 
							
								
							
							
							Author:Norman Meuschke
							
							
							
							Language: eng
							
							
							
							Format: epub
							
							
							
																				
							ISBN: 9783658420628
							
							
							
							
							
							
							
							Publisher: Springer Fachmedien Wiesbaden
							
							
							
							
							
							
							
5.3 Evaluation Dataset
To evaluate math-based detection methods and compare their effectiveness to citation-based and text-based methods, we created a new dataset because no existing dataset offers mathematical content. See Section 2.â5.â2 for summaries of existing datasets. Figure 5.3 illustrates the process for creating the dataset.
We selected 10 publications as test cases, to which we refer as C1â¦C10. Selecting only 10 cases had four reasons. First, we chose cases from research fields within our area of expertise to enable us to assess the relevance of identified similarities. Second, we chose cases most representative of the types of mathematical similarity we observed. Third, our preprocessing of documents required manual checks of automatically extracted mathematical content, as we explain in Section 5.3.1. The effort required for this step prevented converting more cases. Fourth, we restricted the test cases to disciplines covered by the NTCIR-11 MathIR Task dataset [9]. Appendix B in the electronic supplementary material, describes the test cases.
We used the topically related NTCIR-11 MathIR Task dataset to create a reference collection. The NTCIR dataset includes about 60 million formulae from 105,120 scientific publications in computer science, mathematics, physics, and statistics. The dataset creators retrieved the publications from the arXiv [93] preprint repository in LaTeX format. We embedded the confirmed source documents for each of the test cases into the NTCIR dataset.
Figure 5.3 Creation of the evaluation dataset
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8532)
Test-Driven Development with Java by Alan Mellor(7449)
Data Augmentation with Python by Duc Haba(7336)
Principles of Data Fabric by Sonia Mezzetta(7086)
Learn Blender Simulations the Right Way by Stephen Pearson(7029)
Microservices with Spring Boot 3 and Spring Cloud by Magnus Larsson(6843)
RPA Solution Architect's Handbook by Sachin Sahgal(6259)
Hadoop in Practice by Alex Holmes(6039)
The Infinite Retina by Robert Scoble Irena Cronin(5963)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5878)
Big Data Analysis with Python by Ivan Marin(5750)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5410)
Pretrain Vision and Large Language Models in Python by Emily Webber(4708)
Infrastructure as Code for Beginners by Russ McKendrick(4490)
WordPress Plugin Development Cookbook by Yannick Lefebvre(4215)
Functional Programming in JavaScript by Mantyla Dan(4129)
The Age of Surveillance Capitalism by Shoshana Zuboff(4127)
Embracing Microservices Design by Ovais Mehboob Ahmed Khan Nabil Siddiqui and Timothy Oleson(4010)
Applied Machine Learning for Healthcare and Life Sciences Using AWS by Ujjwal Ratan(3986)
