Intrinsic Plagiarism Detection 2009
Synopsis
- Task: Given a set of suspicious documents the task is to identify all plagiarized text passages, e.g., by detecting writing style breaches. The comparison of a suspicious document with other documents is not allowed in this task.
- Input: [data]
- Evaluator: [code]
Award
We are happy to announce the following overall winner of the 1st International Competition on Plagiarism Detection who will be awarded 500,- Euro sponsored by Yahoo! Research:
- Task winner of the intrinsic analysis task is Efstathios Stamatatos from the University of the Aegean.
Congratulations!
Input
To develop your approach, we provide you with a training corpus which comprises a set of suspicious documents, each of which may contain plagiarized passages.
Output
For each suspicious document suspicious-documentXYZ.txt
found in the evaluation
corpora, your plagiarism detector shall output an XML file suspicious-documentXYZ.xml
which contains
meta information about all plagiarism cases detected within:
<document reference="suspicious-documentXYZ.txt"> <feature name="detected-plagiarism" this_offset="5" this_length="1000" /> ... </document>
The XML documents must be valid with respect to the XML schema found here.
Evaluation
Performance will be measured using macro-averaged precision and recall, granularity, and the plagdet score, which is a combination of the first three measures. For your convenience, we provide a reference implementation of the measures written in Python.
Results
Intrinsic Plagiarism Detection Performance | |
---|---|
Plagdet | Participant |
0.2462 | E. Stamatatos University of the Aegean, Greece |
0.1955 | B. Hagbi and M. Koppel Bar Ilan University, Israel |
0.1766 | M. Zechner, M. Muhr, R. Kern, and M. Granitzer Know-Center Graz, Austria |
0.1219 | L. M. Seaward and S. Matwin University of Ottawa, Canada |
A more detailed analysis of the detection performances with respect to precision, recall, and granularity can be found in the overview paper accompanying this task.