NUS Natural Language Processing Group Corpora

WSJ Preposition Senses

This data set contains preposition word senses for prepositional phrases in the Wall Street Journal (WSJ) section of the Penn Treebank. The data was used in the experiments in (Dahlmeier et al. 2009).

Data available from download.

NUS Corpus of Learner English (NUCLE)

The NUS Corpus of Learner English (NUCLE) was collected in a collaboration project between the National University of Singapore (NUS) Natural Language Processing (NLP) Group led by Prof. Hwee Tou Ng and the NUS Centre for English Language Communication (CELC) led by Prof. Siew Mei Wu. The work was carried out as part of the PhD thesis research of Daniel Dahlmeier at the NUS NLP Group.

The corpus consists of about 1,400 essays written by university students at the National University of Singapore on a wide range of topics, such as environmental pollution, healthcare, etc. It contains over one million words which are completely annotated with error tags and corrections. All annotations have been performed by professional English instructors at the NUS CELC.

The corpus is distributed under the standard NUS licensing agreement and can be downloaded from the NUS Enterprise R2M portal.