The 8th International Joint Conference on Natural Language Processing

Workshop on NLP Techniques for Educational Applications
Chinese Spelling Check Shared Task

The goal of this task is to develop a computer-assisted system to automatically diagnose typing errors in Traditional Chinese sentences written by native Hong Kong primary students. There are two kinds of errors: (1) typos and (2) Cantonese usages. Given a sentence, the system should: (1) identify where the errors are, (2) indicate the kind of error for each identified error, and (3) offer correction suggestions for each identified error. Note that a sentence may have no error, multiple errors, and multiple types of errors.

For more information, see the overview paper:

Gabriel Pui Cheong Fung, Maxime Debosschere, Dingmin Wang, Bo Li, Jia Zhu, and Kam-Fai Wong. NLPTEA 2017 Shared Task – Chinese Spelling Check. In Proceedings of the Workshop on NLP Techniques for Educational Applications, The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017), 29–34, 2017

Organizers

Gabriel Pui Cheong Fung, The Chinese University of Hong Kong
Jia Zhu, South China Normal University

Data Description

For the training data, there are two files: “training-sentences.json”, “training-corrections.json”. “training-sentences.json” contains all the sentences for training and “training-corrections.json” contains the gold standard corrections. Both files are in JSON format. We explain the file contents using examples below:

training-sentences.json:

[
    {
        "id":"ASTRI01",
        "sentence":"我很喜歡吃媽媽做的凉瓜炒蛋飯。"
    },
    {
        "id":"ASTRI02",
        "sentence":"我很喜歡吃媽媽做的梁瓜炒蛋飯。"
    },
    {
        "id":"ASTRI03",
        "sentence":"我很鍾意吃媽媽做的凉瓜炒蛋飯。"
    },
    {
        "id":"ASTRI04",
        "sentence":"我很鍾意食媽媽做的梁瓜炒旦飯。"
    }
]

training-corrections.json:

[
    {
        "id":"ASTRI01",
        "typo":null,
        "cantonese":null
    },
    {
        "id":"ASTRI02",
        "typo":[
            {"position":10, "correction":["凉"]}
        ],
        "cantonese":null,
        "reorder":null
    },
    {
        "id":"ASTRI03",
        "typo":null,
        "cantonese":[
            {"position":3, "length":2, "correction":["喜歡"]}
        ],
        "reorder":null
    },
    {
        "id":"ASTRI04",
        "typo":[
            {"position":10, "correction":["凉"]},
            {"position":13, "correction":["蛋"]}
        ],
        "cantonese":[
            {"position":3, "length":2, "correction":["喜歡"]},
            {"position":5, "length":1, "correction":["吃"]}
        ],
        "reorder":null
    }
]

The structure of the above two files should be self-explanatory. Note that according to Section 1 – Background, there are multiple types of Cantonese usage. This is the reason why “reorder” is necessary for Cantonese error in a sentence. Specifically, given the following sentences:

[
    {
        "id":"ASTRI05",
        "sentence":"我走先然後去打球。"
    },
    {
        "id":"ASTRI06",
        "sentence":"大家討論緊這件事。"
    }
]

Then, the corresponding gold standard corrections are:

[
    {
        "id":"ASTRI05",
        "typo":null,
        "cantonese":null,
        "reorder":[
            {"position":1, "length":8, "correction":["我先走然後去打球"]}
        ]
    },
    {
        "id":"ASTRI06",
        "typo":null,
        "cantonese":[
            {"position":5, "length":1, "correction":["正在"]}
        ],
        "reorder":[
            {"position":1, "length":8, "correction":["大家緊討論這件事"]}
        ]
    }
]

Evaluation

We observed that:

regarding the gold standard: given a typo in a sentence, there may be multiple ways to correct the corresponding typo. Hence, in the gold standard we will include as many valid corrections as possible for each typo.
regarding corrections: all modern word processing software spellcheckers suggest multiple corrections to the user, which is reasonable and maximises flexibility. Hence, participants are allowed to submit a list of correction suggestions for each typo. We explain this later.

For example, given the following sentences:

[
    {
        "id":"ASTRI2000", 
        "sentence":"佢想禾你共進免餐。"
    },
    {
        "id":"ASTRI2001", 
        "sentence":"仍記得小學下課的時候，我總愛到草推裏捉蠶蟲。"
    },
    {
        "id":"ASTRI2002",
        "sentence":"我走先然後去打球。"
    }
]

the gold standard will be:

[
    {
        "id":"ASTRI2000", 
        "typo":[
            {"position":3, "correction":["和"]},
            {"position":7, "correction":["晚", "午"]}
        ],
        "cantonese":[
            {"position":1, "length":1, "correction":["他", "她"]}
        ],
        "reorder":[]
    },
    {
        "id":"ASTRI2001", 
        "typo":[
            {"position":17, "correction":["堆"]}
        ],
        "cantonese":[],
        "reorder":[]
    },
    {
        "id":"ASTRI2002",
        "typo":[],
        "cantonese":[],
        "reorder":[
            {"position":1, "length":8, "correction":["我先走然後去打球"]}
        ]
    }
]

In this example, for the sentence with id ASTRI2000, 免 is a typo. Since 免 and 晚 have similar shapes whereas 免 and 午 have similar pronunciations (in Cantonese), we consider both 晚 and 午 to be valid corrections of 免.

Note that the value for the key correction is always an array. If your correction suggestion matches any of the suggestions from the gold standard, then your suggestion is considered correct. Details are provided below.

Detection Performance

The system should correctly detect the positions of errors, the length of errors (if any) and the types of errors for all errors in all sentences. Mathematically,

\(\mathit{Precision} = \dfrac{\mathit{TP}}{\mathit{TP} + \mathit{FP}}\)
\(\mathit{Recall} = \dfrac{\mathit{TP}}{\mathit{TP} + \mathit{FN}}\)
\(\mathit{Performance_\mathit{Detection}} = \dfrac{2 \times \mathit{Precision} \times \mathit{Recall}}{\mathit{Precision} + \mathit{Recall}}\)

For example, if the participant submits the following correction file:

[
    {
        "id":"ASTRI2000", 
        "typo":[
            {"position":3, "correction":["和"]},
            {"position":7, "correction":["晚", "挽", "行"]}
        ],
        "cantonese":[
            {"position":1, "length":1, "correction":["他", "她"]}
        ],
        "reorder":[]
    },
    {
        "id":"ASTRI2001", 
        "typo":[
            {"position":1, "correction":["也"]}
        ],
        "cantonese":[],
        "reorder":[]
    },
    {
        "id":"ASTRI2002", 
        "typo":[],
        "cantonese":[],
        "reorder":[]
    }
]

then:

\(\mathit{TP} = 3\) (detected the typos “禾” and “免”; detected the Cantonese usage “佢”)
\(\mathit{FP} = 1\) (incorrectly suggested “仍” as a typo in ASTRI2001)
\(\mathit{FN} = 1\) (did not detect the typo “推” in ASTRI2001 and did not detect the ordering problem of ASTRI2002)

Correction Performance

For each detected error, the system should deliver one or more appropriate correction suggestions. Since we allow multiple suggestions for a given error, we say that the suggestion is correct if the union between the gold standard suggestions and the participant suggestions is not null. Yet, in order to avoid the case where the participant provides a long list of suggestions for all corrections, a penalty proportional to the number of provided suggestions is imposed. Mathematically,

\(\mathit{Performance}_{\mathit{Correction}} = \dfrac{1}{|W|}\sum\limits_{\forall_{i\in W}}{\dfrac{|G_i\cap U_i|}{|U_i|}}\)

where \(W\) is the set containing all correctly detected errors (including typos, Cantonese, and reorders), \(G_i\) is the set containing the gold standard suggestions for error \(i\in W\), and \(U_i\) is the set containing the participant suggestions for error \(i\in W\).

Overall System Performance

We will rank the overall performance of the system as follows:

\(\mathit{Performance}_\mathit{Overall} = \dfrac{2 \times \mathit{Performance}_\mathit{Detection} \times \mathit{Performance}_\mathit{Correction}}{\mathit{Performance}_\mathit{Detection} + \mathit{Performance}_\mathit{Correction}}\)

Important Dates

Registration open: May 19, 2017
Release of training data: June 2, 2017
Release of gold standard (training data): June 12, 2017
Release of evaluation script: June 12, 2017
Registration close: August 20, 2017
Release of testing data: August 21, 2017
Testing results submission due: August 23, 2017
Release of gold standard (testing data): August 25, 2017
Release of testing results: August 25, 2017
Technical report submission due: September 12, 2017
Report reviews returned: September 30, 2017
Camera-ready due: October 10, 2017

Download

Training

Sentences: training-sentences.zip

Gold standard for the corrections: training-corrections.zip

Testing

Sentences: testing-sentences.zip

Gold standard for the corrections: testing-corrections.zip

The 8th International Joint Conference on Natural Language Processing

Workshop on NLP Techniques for Educational Applications
Chinese Spelling Check Shared Task

Organizers

Sponsors

Data Description

Evaluation

Detection Performance

Correction Performance

Overall System Performance

Important Dates

Download

Training

Testing

Other Resources

The 8th International Joint Conference on Natural Language Processing Workshop on NLP Techniques for Educational Applications Chinese Spelling Check Shared Task

Organizers

Sponsors

Data Description

Evaluation

Detection Performance

Correction Performance

Overall System Performance

Important Dates

Download

Training

Testing

Other Resources

The 8th International Joint Conference on Natural Language Processing

Workshop on NLP Techniques for Educational Applications
Chinese Spelling Check Shared Task