Authorship Attribution 2012
Synopsis
Task
Within the traditional authorship tasks there are different flavors:
- Traditional (closed-class /open-class, with varying numbers of candidate authors) authorship attribution. Within the closed class you will be given a closed set of candidate authors and are asked to identify which one of them is the author of an anonymous text. Withing the open class you have to consider also that it might be that none of the candidates is the real author of the document.
- Authorship clustering/intrinsic plagiarism: in this problem you are given a text (which, for simplicity, is segmented into a sequence of "paragraphs") and are asked to cluster the paragraphs into exactly two clusters: one that includes paragraphs written by the "main" author of the text and another that includes all paragraphs written by anybody else. (Thus, this year the intrinsic plagiarism has been moved from the plagiarism task to the author identification track.).
Input
To develop your software, we provide you with a training corpus that comprises several different common attribution and clustering scenarios. Learn more »
Output
As per repeated requests, here is a sample submission format to use for the Traditional Authorship Attribution Competition for PAN/CLEF. Please note that following this format is not mandatory and we will continue to accept anything we can interpret.
For traditional authorship problems (e.g. problem A), use the following (all words in ALL CAPS should be filled out appropriately):
team TEAM NAME : run RUN NUMBER task TASK IDENTIFIER file TEST FILE = AUTHOR IDENTIFIER file TEST FILE = AUTHOR IDENTIFIER ...
For problems E and F, there are no designated sample authors, so we recommend listing paragraph numbers. Author identifier is optional and arbitrary -- if it makes you feel better to talk about authors A and B or authors 1 and 2 you can insert it into the appropriate field. Any paragraphs not listed will be assumed to be part of an unnamed default author.
team TEAM NAME : run RUN NUMBER task TASK IDENTIFIER file TEST FILE = AUTHOR IDENTIFIER (PARAGRAPH LIST) file TEST FILE = AUTHOR IDENTIFIER ...
For example:
team Jacob : run 1 task B file 12Btest01.txt = A file 12Btest02.txt = A file 12Btest03.txt = A file 12Btest04.txt = None of the Above file 12Btest05.txt = A file 12Btest06.txt = A file 12Btest07.txt = A file 12Btest08.txt = A file 12Btest09.txt = A file 12Btest10.txt = A task C file 12Ctest01.txt = A file 12Ctest02.txt = A file 12Ctest03.txt = A file 12Ctest04.txt = A file 12Ctest05.txt = A file 12Ctest06.txt = A file 12Ctest07.txt = A file 12Ctest08.txt = A file 12Ctest09.txt = A task F file 12Ftest01.txt = (1,2,3,6,7) file 12Ftest01.txt = (4,5)
In this sample file, we consider anything not listed in task F (paragraphs 8 and beyond) to be a third, unnamed author.
Evaluation
The performance of your authorship attribution will be judged by average precision, recall, and F1 over all authors in the given training set.
Results
Authorship attribution performance | |
---|---|
Overall | Participant |
86.37 | Marius Popescu* and Cristian Grozea° °Fraunhofer FIRST, Germany, and *University of Bucharest, Romania |
83.40 | Navot Akiva Bar Ilan University, Israel |
82.41 | Michael Ryan and John Noecker Jr Duquesne University, USA |
70.81 | Ludovic Tanguy, Franck Sajous, Basilio Calderone, and Nabil Hathout CLLE-ERSS: CNRS and University of Toulouse, France |
62.13 | Esteban Castillo°, Darnes Vilariño°, David Pinto°, Iván Olmos°, Jesús A. González*, and
Maya Carrillo° °Benemérita Universidad Autónoma de Puebla and *Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Mexico |
59.77 | François-Marie Giraud and Thierry Artières LIP6, Université Pierre et Marie Curie (UPMC), France |
58.35 | Upendra Sapkota and Thamar Solorio University of Alabama at Birmingham, USA |
57.55 | Ramon de Graaff° and Cor J. Veenman* °Leiden University and *Netherlands Forensics Institute, The Netherlands |
57.40 | Stefan Ruseti and Traian Rebedea University Politehnica of Bucharest, Romania |
54.88 | Anna Vartapetiance and Lee Gillam University of Surrey, UK |
43.18 | Roman Kern°*, Stefan Klampfl*, Mario Zechner* °Graz University of Technology and *Know-Center GmbH, Austria |
16.63 | Julian Brooke and Graeme Hirst University of Toronto, Canada |
A more detailed analysis of the detection performances with respect to precision, recall, and granularity can be found in the overview paper accompanying this task.
Related Work
- Patrick Juola. Authorship Attribution. In Foundations and Trends in Information Retrieval, Volume 1, Issue 3, December 2006.
- Moshe Koppel, Jonathan Schler, and Shlomo Argamon. Computational Methods in Authorship Attribution. Journal of the American Society for Information Science and Technology, Volume 60, Issue 1, pages 9-26, January 2009.
- Efstathios Stamatatos. A Survey of Modern Authorship Attribution Methods. Journal of the American Society for Information Science and Technology, Volume 60, Issue 3, pages 538-556, March 2009.