Sexual Predator Identification 2012
Synopsis
Task
The goal of this sub-task is to identify classes of authors, namely online predators. You will be given chat logs involving two (or more) people and have to determine who is the one trying to convince the other(s) to provide some sexual favour. You will also need to identify the particular conversation where the person exploits his bad behavior.
The task can therefore be divided into two parts:
- Identify the predators (within all the users)
- Identify the part (the lines) of the predator conversations which are the most distinctive of the predator bad behavior
Given the public nature of the dataset, we ask the participants not to use external or online resources for resolving this task (e.g. search engines) but to extract evidence from the provided datasets only.
Input
To develop your software, we provide you with a training corpus that consisting of chat logs where minors and adults pretending to minors are chatting.
Output
For each of the two parts we require a different format.
Identify the predators (within all the users).
Participants should update a text file containing an user-id per line, of those identified as predator only:
... a7c5056a2c30e2dc637907f448934ca3 58f15bbb100bbeb6963b4b967ce04bdf e040eb115e3f7ad3824e93141665fc2a 3d57ed3fac066fa4f8a52432db51c019 ...
Identify the part (the lines) of the predator conversations which are the most distinctive of the predator bad behavior.
Participants should update an xml file similar to the corpus ones, containing conversation-ids and message line numbers considered suspicious (line numbers together with all the others message information: author, time, text):
<conversations> ... <conversation id="0042762e26ed295a8576806f5548cad9"> <message line="3"> <author>f069dbec9ab3e090972d432db279e3eb</author> <time>03:20</time> <text>whats up?</text> </message> <message line="4"> <author>f069dbec9ab3e090972d432db279e3eb</author> <time>03:21</time> <text>how u doing?</text> </message> ... <message line="10"> <author>f069dbec9ab3e090972d432db279e3eb</author> <time>04:00</time> <text>sse you llater?</text> </message> </conversation> ... <conversation id="0209b0a30c8eced86863631ada73a530"> <message line="3"> <author>0042762e26ed295a8576806f5548cad9</author> <time>01:17</time> <text>and that i dont touch u</text> </message> </conversation> ... <conversations>
Evaluation
The performance of your predator identification will be judged by average precision, recall, and F over all persons involved and lines of the conversations. In addition, we consider F0.5 for predator identification, and F3 for predator line identification, which will be used to rank participants, respectively.
Results
Sexual predator identification performance: 1) identify predators | |
---|---|
F0.5 | Participant |
0.9346 | Esaú Villatoro-Tello°*, Antonio Juárez-González°, Hugo J. Escalante°, Manuel
Montes-y-Gómez°, and Luis Villaseñor-Pineda° °Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE) and *Universidad Autónoma Metropolitana, Mexico |
0.9168 | Tim Snider Porfiau Inc., Canada |
0.8691 | Javier Parapar°, David E. Losada*, and Alvaro Barreiro° °University of A Coruña and *Universidade de Santiago de Compostela, Spain |
0.8652 | Colin Morris and Graeme Hirst University of Toronto, Canada |
0.8638 | Gunnar Eriksson and Jussi Karlgren Gavagai AB, Sweden |
0.8137 | Claudia Peersman, Frederik Vaassen, Vincent Van Asch, and Walter Daelemans University of Antwerp, Netherlands |
0.7316 | Cristian Grozea° and Marius Popescu* °Fraunhofer FIRST, Germany, and *University of Bucharest, Romania |
0.7060 | Rachel Sitarz Purdue University, USA |
0.5537 | Anna Vartapetiance and Lee Gillam University of Surrey, UK |
0.3946 | April Kontostathis°, Andy Garron*, Kelly Reynolds^, Will West^, and Lynne Edwards° °Ursinus College, *The University of Maryland, and ^Lehigh University, USA |
0.2554 | In-Su Kang°, Chul-Kyu Kim°, Shin-Jae Kang*, and Seung-Hoon Na^ °Kyungsung University, *Daegu University, and ^Electronics and Telecommunications Research Institute, South Korea |
0.1791 | Roman Kern°*, Stefan Klampfl*, Mario Zechner* °Graz University of Technology and *Know-Center GmbH, Austria |
0.0316 | Dasha Bogdanova° and Paolo Rosso* °University of Saint Petersburg, Russia, and *Universitat Politècnica de València, Spain |
0.0250 | Sriram Prasath Elango KTH/Gavagai, Sweden |
0.0232 | Darnes Vilariño, Esteban Castillo, David Pinto, Iván Olmos, Saul León Benemérita Universidad Autónoma de Puebla, Mexico |
0.0059 | José María Gómez Hidalgo° and Andrés Alfonso Caurcel Díaz* °Optenet and *Universidad Politécnica de Madrid, Spain |
Sexual predator identification performance: 2) identify predator lines | |
---|---|
F3 | Participant |
0.4762 | Cristian Grozea° and Marius Popescu* °Fraunhofer FIRST, Germany, and *University of Bucharest, Romania |
0.4174 | April Kontostathis°, Andy Garron*, Kelly Reynolds^, Will West^, and Lynne Edwards° °Ursinus College, *The University of Maryland, and ^Lehigh University, USA |
0.2679 | Claudia Peersman, Frederik Vaassen, Vincent Van Asch, and Walter Daelemans University of Antwerp, Netherlands |
0.2364 | Rachel Sitarz Purdue University, USA |
0.1986 | Colin Morris and Graeme Hirst University of Toronto, Canada |
0.1838 | Roman Kern°*, Stefan Klampfl*, Mario Zechner* °Graz University of Technology and *Know-Center GmbH, Austria |
0.1633 | Gunnar Eriksson and Jussi Karlgren Gavagai AB, Sweden |
0.0770 | Sriram Prasath Elango KTH/Gavagai, Sweden |
0.0174 | Javier Parapar°, David E. Losada*, and Alvaro Barreiro° °University of A Coruña and *Universidade de Santiago de Compostela, Spain |
0.0154 | Anna Vartapetiance and Lee Gillam University of Surrey, UK |
0.0074 | Darnes Vilariño, Esteban Castillo, David Pinto, Iván Olmos, Saul León Benemérita Universidad Autónoma de Puebla, Mexico |
0.0007 | Dasha Bogdanova° and Paolo Rosso* °University of Saint Petersburg, Russia, and *Universitat Politècnica de València, Spain |
0.0002 | Esaú Villatoro-Tello°*, Antonio Juárez-González°, Hugo J. Escalante°, Manuel
Montes-y-Gómez°, and Luis Villaseñor-Pineda° °Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE) and *Universidad Autónoma Metropolitana, Mexico |
0.0000 | José María Gómez Hidalgo° and Andrés Alfonso Caurcel Díaz* °Optenet and *Universidad Politécnica de Madrid, Spain |
A more detailed analysis of the detection performances with respect to precision, recall, and granularity can be found in the overview paper accompanying this task.