Sexual Predator Identification 2012

Synopsis

  • Task: Identify sexual predators in chat logs.
  • Input: [data]
  • Baseline: [code]

Task

The goal of this sub-task is to identify classes of authors, namely online predators. You will be given chat logs involving two (or more) people and have to determine who is the one trying to convince the other(s) to provide some sexual favour. You will also need to identify the particular conversation where the person exploits his bad behavior.

The task can therefore be divided into two parts:

  1. Identify the predators (within all the users)
  2. Identify the part (the lines) of the predator conversations which are the most distinctive of the predator bad behavior

Given the public nature of the dataset, we ask the participants not to use external or online resources for resolving this task (e.g. search engines) but to extract evidence from the provided datasets only.

Input

To develop your software, we provide you with a training corpus that consisting of chat logs where minors and adults pretending to minors are chatting.

Output

For each of the two parts we require a different format.

Identify the predators (within all the users).

Participants should update a text file containing an user-id per line, of those identified as predator only:

...
a7c5056a2c30e2dc637907f448934ca3
58f15bbb100bbeb6963b4b967ce04bdf
e040eb115e3f7ad3824e93141665fc2a
3d57ed3fac066fa4f8a52432db51c019
...

Identify the part (the lines) of the predator conversations which are the most distinctive of the predator bad behavior.

Participants should update an xml file similar to the corpus ones, containing conversation-ids and message line numbers considered suspicious (line numbers together with all the others message information: author, time, text):

<conversations>
  ...
  <conversation id="0042762e26ed295a8576806f5548cad9">
    <message line="3">
      <author>f069dbec9ab3e090972d432db279e3eb</author>
      <time>03:20</time>
      <text>whats up?</text>
    </message>
    <message line="4">
      <author>f069dbec9ab3e090972d432db279e3eb</author>
      <time>03:21</time>
      <text>how u doing?</text>
    </message>
    ...
    <message line="10">
      <author>f069dbec9ab3e090972d432db279e3eb</author>
      <time>04:00</time>
      <text>sse you llater?</text>
    </message>
  </conversation>
  ...
  <conversation id="0209b0a30c8eced86863631ada73a530">
    <message line="3">
      <author>0042762e26ed295a8576806f5548cad9</author>
      <time>01:17</time>
      <text>and that i dont touch u</text>
    </message>
  </conversation>
  ...
<conversations>

Evaluation

The performance of your predator identification will be judged by average precision, recall, and F over all persons involved and lines of the conversations. In addition, we consider F0.5 for predator identification, and F3 for predator line identification, which will be used to rank participants, respectively.

Results

Sexual predator identification performance: 1) identify predators
F0.5 Participant
0.9346 Esaú Villatoro-Tello°*, Antonio Juárez-González°, Hugo J. Escalante°, Manuel Montes-y-Gómez°, and Luis Villaseñor-Pineda°
°Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE) and *Universidad Autónoma Metropolitana, Mexico
0.9168 Tim Snider
Porfiau Inc., Canada
0.8691 Javier Parapar°, David E. Losada*, and Alvaro Barreiro°
°University of A Coruña and *Universidade de Santiago de Compostela, Spain
0.8652 Colin Morris and Graeme Hirst
University of Toronto, Canada
0.8638 Gunnar Eriksson and Jussi Karlgren
Gavagai AB, Sweden
0.8137 Claudia Peersman, Frederik Vaassen, Vincent Van Asch, and Walter Daelemans
University of Antwerp, Netherlands
0.7316 Cristian Grozea° and Marius Popescu*
°Fraunhofer FIRST, Germany, and *University of Bucharest, Romania
0.7060 Rachel Sitarz
Purdue University, USA
0.5537 Anna Vartapetiance and Lee Gillam
University of Surrey, UK
0.3946 April Kontostathis°, Andy Garron*, Kelly Reynolds^, Will West^, and Lynne Edwards°
°Ursinus College, *The University of Maryland, and ^Lehigh University, USA
0.2554 In-Su Kang°, Chul-Kyu Kim°, Shin-Jae Kang*, and Seung-Hoon Na^
°Kyungsung University, *Daegu University, and ^Electronics and Telecommunications Research Institute, South Korea
0.1791 Roman Kern°*, Stefan Klampfl*, Mario Zechner*
°Graz University of Technology and *Know-Center GmbH, Austria
0.0316 Dasha Bogdanova° and Paolo Rosso*
°University of Saint Petersburg, Russia, and *Universitat Politècnica de València, Spain
0.0250 Sriram Prasath Elango
KTH/Gavagai, Sweden
0.0232 Darnes Vilariño, Esteban Castillo, David Pinto, Iván Olmos, Saul León
Benemérita Universidad Autónoma de Puebla, Mexico
0.0059 José María Gómez Hidalgo° and Andrés Alfonso Caurcel Díaz*
°Optenet and *Universidad Politécnica de Madrid, Spain
Sexual predator identification performance: 2) identify predator lines
F3 Participant
0.4762 Cristian Grozea° and Marius Popescu*
°Fraunhofer FIRST, Germany, and *University of Bucharest, Romania
0.4174 April Kontostathis°, Andy Garron*, Kelly Reynolds^, Will West^, and Lynne Edwards°
°Ursinus College, *The University of Maryland, and ^Lehigh University, USA
0.2679 Claudia Peersman, Frederik Vaassen, Vincent Van Asch, and Walter Daelemans
University of Antwerp, Netherlands
0.2364 Rachel Sitarz
Purdue University, USA
0.1986 Colin Morris and Graeme Hirst
University of Toronto, Canada
0.1838 Roman Kern°*, Stefan Klampfl*, Mario Zechner*
°Graz University of Technology and *Know-Center GmbH, Austria
0.1633 Gunnar Eriksson and Jussi Karlgren
Gavagai AB, Sweden
0.0770 Sriram Prasath Elango
KTH/Gavagai, Sweden
0.0174 Javier Parapar°, David E. Losada*, and Alvaro Barreiro°
°University of A Coruña and *Universidade de Santiago de Compostela, Spain
0.0154 Anna Vartapetiance and Lee Gillam
University of Surrey, UK
0.0074 Darnes Vilariño, Esteban Castillo, David Pinto, Iván Olmos, Saul León
Benemérita Universidad Autónoma de Puebla, Mexico
0.0007 Dasha Bogdanova° and Paolo Rosso*
°University of Saint Petersburg, Russia, and *Universitat Politècnica de València, Spain
0.0002 Esaú Villatoro-Tello°*, Antonio Juárez-González°, Hugo J. Escalante°, Manuel Montes-y-Gómez°, and Luis Villaseñor-Pineda°
°Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE) and *Universidad Autónoma Metropolitana, Mexico
0.0000 José María Gómez Hidalgo° and Andrés Alfonso Caurcel Díaz*
°Optenet and *Universidad Politécnica de Madrid, Spain

A more detailed analysis of the detection performances with respect to precision, recall, and granularity can be found in the overview paper accompanying this task.

Task Committee