willwade
/

t5-small-spoken-typo

@@ -72,7 +72,7 @@ print(decoded_output)
 # Model Details
 ## Model Description
-The `t5-small-spoken-typo` model is specifically designed to tackle the challenges of text correction within user-generated content, particularly in short, conversation-like sentences. It corrects for missing spaces, removes unnecessary punctuation, introduces and then corrects typos, and normalizes text by replacing informal contractions and abbreviations with their full forms.
 It has been training on
 - [BNC 2014 Spoken](http://cass.lancs.ac.uk/cass-projects/spoken-bnc2014/)
 - [Daily Dialog](https://huggingface.co/datasets/daily_dialog)
@@ -90,7 +90,7 @@ Then injecting  typos from a range of places
   - Find these resources [here](https://www.dcs.bbk.ac.uk/~ROGER/corpora.html).
 - **TOEFL Spell** A dataset of Spelling Annotations for English language learner essays written for TOEFL exams.
   - Find this [here](https://github.com/EducationalTestingService/TOEFL-Spell/tree/master)
-- **Homonyms** We replace words in BNC and Dialy Dialog occasionally with homonyms from this list https://github.com/pimentel/homophones/
 - **Our own typo augment function** This would make likely errors found in a English Qwerty layout as well as subsitutions, deletions etc
 And then compressing versions of the sentences (i.e. removing spaces)- both correct and typod we add to our dataset. (This is to solve a problem where some people write without spaces)
@@ -136,6 +136,14 @@ Users are encouraged to critically assess the model's output, especially when us
 # Training Details
 ## Training Data
 The model was trained on a curated subset of the DailyDialog and BNC corpora (2014 spoken), focusing on sentences 2-5 words in length, with manual introduction of typos and removal of spaces for robustness in text correction tasks.You can see the code to pre-process this [here](https://github.com/willwade/dailyDialogCorrections/tree/main)
@@ -264,9 +272,6 @@ We hope to build on this by further fine-tuning in time on real corpous of indvi
 ## Model Architecture and Objective
 The model follows the T5 architecture, fine-tuned for the specific task of text correction with a focus on typo correction and space insertion.
-## Compute Infrastructure
-- **Hardware**: T4 GPU (Google Colab)
-- **Software**: PyTorch 1.8.1 with Transformers 4.8.2
 # Citation

 # Model Details
 ## Model Description
+The `t5-small-spoken-typo` model is specifically designed to tackle the challenges of text correction within user-generated content, particularly in short, conversation-like sentences. It corrects for missing spaces, removes unnecessary punctuation, corrects typos, and normalizes text by replacing informal contractions and abbreviations with their full forms.
 It has been training on
 - [BNC 2014 Spoken](http://cass.lancs.ac.uk/cass-projects/spoken-bnc2014/)
 - [Daily Dialog](https://huggingface.co/datasets/daily_dialog)
   - Find these resources [here](https://www.dcs.bbk.ac.uk/~ROGER/corpora.html).
 - **TOEFL Spell** A dataset of Spelling Annotations for English language learner essays written for TOEFL exams.
   - Find this [here](https://github.com/EducationalTestingService/TOEFL-Spell/tree/master)
+- **Homonyms** We replace words in BNC and Dialy Dialog occasionally with homonyms from this list https://github.com/pimentel/homophones
 - **Our own typo augment function** This would make likely errors found in a English Qwerty layout as well as subsitutions, deletions etc
 And then compressing versions of the sentences (i.e. removing spaces)- both correct and typod we add to our dataset. (This is to solve a problem where some people write without spaces)
 # Training Details
+## System
+- System configuration: Linux-6.2.0-37-generic-x86_64-with-glibc2.35
+- Runtime: Python 3.10.12
+- Hardware: NVIDIA A10 GPU with 24GB GDDR6 dedicated memory
+- CPU Cores: 30 logical cores @ 2.59GHz
+- Disk Space: Approximately 1.3TB
 ## Training Data
 The model was trained on a curated subset of the DailyDialog and BNC corpora (2014 spoken), focusing on sentences 2-5 words in length, with manual introduction of typos and removal of spaces for robustness in text correction tasks.You can see the code to pre-process this [here](https://github.com/willwade/dailyDialogCorrections/tree/main)
 ## Model Architecture and Objective
 The model follows the T5 architecture, fine-tuned for the specific task of text correction with a focus on typo correction and space insertion.
 # Citation