ChEMU: Cheminformatics Elsevier Melbourne University
Information Extraction from Chemical Patents
NEWS update: Training data and submission website now available!
Please head over to our new website at: http://chemu.eng.unimelb.edu.au/ to access the data and formally participate.
We will be running a new evaluation lab named ChEMU, part of the 11th Conference and Labs of the Evaluation Forum (CLEF-2020).
ChEMU proposes two key information extraction tasks over chemical reactions from patents.
- Task 1: Named Entity Recognition involves identifying chemical compounds as well as their types in context, i.e., to assign the label of a chemical compound according to the role which the compound plays within a chemical reaction.
- Task 2: Event extraction over chemical reactions involves event trigger detection and argument recognition.
The tasks are briefly explained in our upcoming ECIR 2020 paper:
- Nguyen DQ, Zhai Z, Yoshikawa H, Fang B, Druckenbrodt C, Thorne C, Hoessel R, Akhondi SA, Cohn T, Baldwin T and Verspoor K. (2020) ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents. To appear in ECIR 2020. PDF
Annotation Guidelines
To know how the datasets are annotated and gain further insight into the task, please see the annotation guidelines:
Sample dataset is available
The data for this task is released in BRAT format.
This is a standoff format, with the text in one plain text file (*.txt), and the annotations in a different file (*.ann).
The configuration files required for BRAT are included in each of the two subdirectories, "ner" for Task 1 and "ee" for Task 2.
A visualization of the latest sample dataset is provided here: Visualization of Sample Dataset.
Latest version: On 7 April, we have removed the labeled trigger words from the annotation files in "ner", since those words are not the target output in task 1. This version is available at:
chemu_sample.v3.zip
Second version: On 18 March, we create the 2nd version of the sample dataset. Due to some inconsistencies in how character entities were handled, we have corrected the sample. This version is available at:
chemu_sample.v2.zip
Note that the file numbers in this version of the sample differ from in the first version.
First version: Please find the first version of sample dataset here: chemu_sample.zip
Relevant background:
- Zhai Z, Nguyen DQ, Akhondi S, Thorne C, Druckenbrodt C, Cohn T, Gregory M and Verspoor K. (2019) Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings. Proceedings of the Workshop on Biomedical Natural Language Processing (BioNLP) at ACL 2019. https://www.aclweb.org/anthology/W19-5035.pdf
- Yoshikawa H, Verspoor K, Baldwin T, Nguyen DQ, Zhai Z, Zkhondi S, Thorne C, Druckenbrodt C. (2019) Detecting Chemical Reaction Schemes in Patents. Australian Language Technology Association Workshop (ALTA 2019). Sydney, Australia, December 2019. https://www.aclweb.org/anthology/U19-1014.pdf
If you are interested in participating in the CLEF2020 ChEMU task on information extraction from chemical patents, please register here:
http://clef2020-labs-registration.dei.unipd.it/registrationForm.php.
To access the data and submission site you will also need to register here, and accept the data usage agreement: http://chemu.eng.unimelb.edu.au/ to access the data and formally participate.
This project is a collaboration between the University of Melbourne natural language processing group in the School of Computing and Information Systems, the Elsevier Content Transformations, Life Science team, and RMIT University. The principal investigator of the project is Karin Verspoor. The research is supported by an Australian Research Council Linkage Project, LP160101469, and Elsevier.
Key Dates
- Registration opens:
20 November 2019 Registration Form
- Sample set release:
9 March chemu_sample.v3.zip
- Training set release:
(mid March) 10 April http://chemu.eng.unimelb.edu.au/
- Registration closes: 26 April 2020
- Evaluation period of Task 1:
24 April 2020 - 3 May 2020 22 May 2020 - 28 May 2020
- Evaluation period of Task 2:
4 May 2020 - 8 May 2020 29 May 2020 - 3 June 2020
- End of Evaluation Cycle and feedback for participants:
10 May 2020 Fri 5 June 2020
- Submission of Participant Papers [CEUR-WS]:
24 May 2020 17 July 2020
- Review process of participant papers:
24 May – 14 June 2020 17 July - 14 August 2020
- Notification of Acceptance Participant Papers [CEUR-WS]:
14 June 2020 14 August 2020
- Camera Ready Copy of Participant Papers and Extended Lab Overviews [CEUR-WS]:
28 June 2020 28 August 2020
- Evaluation Lab meeting @CLEF 2020, Thessaloniki, Greece: September 22-25 2020
For questions about the task, please email: chemu.clef2020@gmail.com