---
language: en
datasets:
- Lang-8
- NUCLE
- CoNLL-2014
tags:
- GEC
- text-classification
inference: false
---
# Grammatical Error Correction
You can **test the model** at [Grammatical Error Correction](https://huggingface.co/spaces/aisingapore/grammatical-error-correction).
If you want to find out more information, please contact us at sg-nlp@aisingapore.org.
## Table of Contents
- [Model Details](#model-details)
- [How to Get Started With the Model](#how-to-get-started-with-the-model)
- [Training](#training)
- [Model Parameters](#parameters)
- [Other Information](#other-information)
## Model Details
**Model Name:** Cross Sentence GEC
- **Description:** This model is based on the convolutional encoder-decoder architecture described in the associated paper
- **Paper:** Cross-sentence grammatical error correction. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, July 2019 (pp. 435-445).
- **Author(s):** Chollampatt, S., Wang, W., & Ng, H. T. (2019).
- **URL:** https://aclanthology.org/P19-1042
# How to Get Started With the Model
## Install Python package
SGnlp is an initiative by AI Singapore's NLP Hub. They aim to bridge the gap between research and industry, promote translational research, and encourage adoption of NLP techniques in the industry.
Various NLP models, other than relation extraction are available in the python package. You can try them out at [SGNLP-Demo](https://sgnlp.aisingapore.net/) | [SGNLP-Github](https://github.com/aisingapore/sgnlp).
```python
pip install sgnlp
```
## Examples
For more full code (such as Grammatical Error Correction), please refer to this [github](https://github.com/aisingapore/sgnlp).
Alternatively, you can also try out the [Demo](https://huggingface.co/spaces/aisingapore/grammatical-error-correction) | [SGNLP-Docs](https://sgnlp.aisingapore.net/docs/api_reference/sgnlp.models.csgec.html#module-sgnlp.models.csgec).
Example of Grammatical Error Correction:
```python
from sgnlp.models.csgec import (
CsgConfig,
CsgModel,
CsgTokenizer,
CsgecPreprocessor,
CsgecPostprocessor,
download_tokenizer_files,
)
config = CsgConfig.from_pretrained("https://storage.googleapis.com/sgnlp-models/models/csgec/config.json")
model = CsgModel.from_pretrained(
"https://storage.googleapis.com/sgnlp-models/models/csgec/pytorch_model.bin",
config=config,
)
download_tokenizer_files(
"https://storage.googleapis.com/sgnlp-models/models/csgec/src_tokenizer/",
"csgec_src_tokenizer",
)
download_tokenizer_files(
"https://storage.googleapis.com/sgnlp-models/models/csgec/ctx_tokenizer/",
"csgec_ctx_tokenizer",
)
download_tokenizer_files(
"https://storage.googleapis.com/sgnlp-models/models/csgec/tgt_tokenizer/",
"csgec_tgt_tokenizer",
)
src_tokenizer = CsgTokenizer.from_pretrained("csgec_src_tokenizer")
ctx_tokenizer = CsgTokenizer.from_pretrained("csgec_ctx_tokenizer")
tgt_tokenizer = CsgTokenizer.from_pretrained("csgec_tgt_tokenizer")
preprocessor = CsgecPreprocessor(src_tokenizer=src_tokenizer, ctx_tokenizer=ctx_tokenizer)
postprocessor = CsgecPostprocessor(tgt_tokenizer=tgt_tokenizer)
texts = [
"All of us are living in the technology realm society. Have you ever wondered why we use these tools to connect "
"ourselves with other people? It started withthe invention of technology which has evolved tremendously over the "
"past few decades. In the past, we travel by ship and now we can use airplane to do so. In the past, it took a few "
"days to receive a message as we need to post our letter and now, we can use e-mail which stands for electronic "
"message to send messages to our friends or even use our handphone to send our messages.",
"Machines have replaced a bunch of coolies and heavy labor. Cars and trucks diminish the redundancy of long time "
"shipment. As a result, people have more time to enjoy advantage of modern life. One can easily travel to the "
"other half of the globe to see beautiful scenery that one dreams for his lifetime. One can also easily see his "
"deeply loved one through internet from miles away."
]
batch_source_ids, batch_context_ids = preprocessor(texts)
predicted_ids = model.decode(batch_source_ids, batch_context_ids)
predicted_texts = postprocessor(predicted_ids)
```
# Training
The train dataset comprises of the Lang-8 and NUCLE datasets.
Both datasets have to be requested from NAIST and NUS respectively.
# Evaluation
The evaluation scores reported are based on evaluation on [CoNLL-2014](https://www.comp.nus.edu.sg/~nlp/conll14st.html) benchmark.
The full dataset can be downloaded from their respective shared task pages.
#### Evaluation Scores
- **Retrained scores:** N/A. Demo uses the author's original code
- **Scores reported in paper:** (Single Model F0.5: 53.06, Ensemble + BERT Rescoring F0.5: 54.87%)
# Model Parameters
- **Model Inputs:** Source Sentence - sentence to be corrected, context - the two immediately preceeding sentences, target - either padding tokens and the start token or the last 3 previously predicted tokens.
- **Model Outputs:** Array of logits for each token in the target vocabulary. This can be converted into the probability distribution for the next word using the softmax function.
- **Model Inference Info:** Not available.
- **Usage Scenarios:** Grammar and spell checker app / feature.
# Other Information
- **Original Code:** [link](https://github.com/nusnlp/crosentgec)