This model is primarily designed for the automatic grading of English essays, particularly those written by second language (L2) learners. The training dataset used is the English Language Learner Insight, Proficiency, and Skills Evaluation (ELLIPSE) Corpus. This freely available resource comprises approximately 6,500 writing composition samples from English language learners, each scored for overall holistic language proficiency as well as analytic scores pertaining to cohesion, syntax, vocabulary, phraseology, grammar, and conventions. The scores were obtained through assessments by a number of professional English teachers adhering to rigorous procedures. The training dataset guarantees that our model acuqires high practicality and accuracy, closely emulating professional grading standards.

The model's performance on the test dataset, which includes around 980 English essays, is summarized by the following metrics: 'mean accuracy'= 0.91 and 'mean f1 score' = 0.9, mean Quadratic Weighted Kappa (QWK) =0.85.

Upon inputting an essay, the model outputs six scores corresponding to cohesion, syntax, vocabulary, phraseology, grammar, and conventions. Each score ranges from 1 to 5, with higher scores indicating greater proficiency within the essay. These dimensions collectively assess the quality of the input essay from multiple perspectives. The model serves as a valuable tool for EFL teachers and researchers, and it is also beneficial for English L2 learners and parents for self-evaluating their composition skills.

Please cite the following paper if you use this model:

@article{sun2024automatic,
  title={Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression},
  author={Kun Sun and Rong Wang},
  year={2024},
  journal={ArXiv},
  url={https://arxiv.org/abs/2406.01198}
}

To test the model, run the following code or paste your essay into the API interface:

  1. Please use the following Python code if you want to get the ouput values ranging from 1 to 5.
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model = AutoModelForSequenceClassification.from_pretrained("Kevintu/Engessay_grading_ML")
tokenizer = AutoTokenizer.from_pretrained("Kevintu/Engessay_grading_ML")

new_text = "The English Language Learner Insight, Proficiency and Skills Evaluation (ELLIPSE) Corpus is a freely available corpus of ~6,500 ELL writing samples that have been scored for overall holistic language proficiency as well as analytic proficiency scores related to cohesion, syntax, vocabulary, phraseology, grammar, and conventions. In addition, the ELLIPSE corpus provides individual and demographic information for the ELL writers in the corpus including economic status, gender, grade level (8-12), and race/ethnicity. The corpus provides language proficiency scores for individual writers and was developed to advance research in corpus and NLP approaches to assess overall and more fine-grained features of proficiency."

# Define the path to your text file
#file_path = 'path/to/yourfile.txt'

# Read the content of the file
#with open(file_path, 'r', encoding='utf-8') as file:
#    new_text = file.read()

encoded_input = tokenizer(new_text, return_tensors='pt', padding=True, truncation=True, max_length=64)
model.eval()

# Perform the prediction
with torch.no_grad():
    outputs = model(**encoded_input)

predictions = outputs.logits.squeeze()

predicted_scores = predictions.numpy()  
item_names = ["cohesion", "syntax", "vocabulary", "phraseology", "grammar", "conventions"]

# Scale predictions from the raw output to the range [1, 5]
scaled_scores = 1 + 4 * (predicted_scores - np.min(predicted_scores)) / (np.max(predicted_scores) - np.min(predicted_scores))

# Round scores to the nearest 0.5
rounded_scores = np.round(scaled_scores * 2) / 2

for item, score in zip(item_names, rounded_scores):
    print(f"{item}: {score:.1f}")

# Example output:
# cohesion: 3.5
# syntax: 3.5
# vocabulary: 4.0
# phraseology: 4.0
# grammar: 4.0
# conventions: 3.5
  1. However, implement the following code if you expect to obtain the output values between 1 to 10.
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model = AutoModelForSequenceClassification.from_pretrained("Kevintu/Engessay_grading_ML")
tokenizer = AutoTokenizer.from_pretrained("Kevintu/Engessay_grading_ML")

new_text = "The English Language Learner Insight, Proficiency and Skills Evaluation (ELLIPSE) Corpus is a freely available corpus of ~6,500 ELL writing samples that have been scored for overall holistic language proficiency as well as analytic proficiency scores related to cohesion, syntax, vocabulary, phraseology, grammar, and conventions. In addition, the ELLIPSE corpus provides individual and demographic information for the ELL writers in the corpus including economic status, gender, grade level (8-12), and race/ethnicity. The corpus provides language proficiency scores for individual writers and was developed to advance research in corpus and NLP approaches to assess overall and more fine-grained features of proficiency."
encoded_input = tokenizer(new_text, return_tensors='pt', padding=True, truncation=True, max_length=64)

model.eval()
with torch.no_grad():
    outputs = model(**encoded_input)

predictions = outputs.logits.squeeze()
predicted_scores = predictions.numpy()  # Convert to numpy array
item_names = ["cohesion", "syntax", "vocabulary", "phraseology", "grammar", "conventions"]

# Scale predictions from 1 to 10 and round to the nearest 0.5
scaled_scores = 2.25 * predicted_scores - 1.25
rounded_scores = [round(score * 2) / 2 for score in scaled_scores]  # Round to nearest 0.5

for item, score in zip(item_names, rounded_scores):
    print(f"{item}: {score:.1f}")

# Example output:
# cohesion: 6.5
# syntax: 7.0
# vocabulary: 7.5
# phraseology: 7.5
# grammar: 7.5
# conventions: 7.0

Examples:

# the first example (A1 level)

new_text ="Dear Mauro, Thank you for agreeing to take a care of my house and my pets in my absence. This is my daily routine. Every day I water the plants, I walk the my dog in the morning and in the evening. I feed food it twice a day, I check water's dog twice a week. I take out trash every Friday. I sweep the floor and clean house on Monday and on Wednesday. In your free time you can watch TV and play video games.  In the fridge I left coca cola and ice-cream for you  Have a nice week. "
##ouput
cohesion: 5.0
syntax: 5.0
vocabulary: 5.5
phraseology: 5.0
grammar: 5.0
conventions: 6.0


# the second example (C1 level)

new_text = " Dear Mr. Tromps It was so good to hear from you and your group of international buyers are visiting our company next month. And in response to your question, I would like to recommend some suggestions about business etiquette in my country. Firstly, you'll need to make hotel's reservations with anticipation, especially when the group is numerous. There are several five starts hotels in the commercial center of the Guayaquil city, very close to our offices. Business appointments well in advance and don't be late. Usually, at those meetings the persons exchange presentation cards. Some places include tipping by services in restaurant bills, but if any not the tip is 10% of the bill. The people is very kind here, surely you'll be invited to a meal at a house, you can take a small gift as flowers, candy or wine. Finally, remember it's a beautiful summer here, especially in our city is always warm, then you might include appropriate clothes for this weather. If you have any questions, please just let me know. Have you a nice and safe trip.  Sincerely,  JG Marketing Dpt. LP Representations."
##output:
cohesion: 8.0
syntax: 8.0
vocabulary: 8.0
phraseology: 8.5
grammar: 8.5
conventions: 8.5

Downloads last month
48
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.