HelpMum-Personal commited on
Commit
aba0b46
·
verified ·
1 Parent(s): 9b954de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -19
README.md CHANGED
@@ -1,35 +1,89 @@
1
  ---
2
  library_name: transformers
3
  license: mit
4
- base_model: HelpMum-Personal/9ja-to-eng
5
  tags:
6
  - translation
7
  - generated_from_trainer
8
  model-index:
9
- - name: 9ja-to-eng2
10
  results: []
 
 
 
 
 
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
- # 9ja-to-eng2
17
 
18
- This model is a fine-tuned version of [HelpMum-Personal/9ja-to-eng](https://huggingface.co/HelpMum-Personal/9ja-to-eng) on an unknown dataset.
 
19
 
20
- ## Model description
21
 
22
- More information needed
 
 
23
 
24
- ## Intended uses & limitations
25
 
26
- More information needed
27
 
28
- ## Training and evaluation data
29
 
30
- More information needed
31
 
32
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ### Training hyperparameters
35
 
@@ -42,14 +96,10 @@ The following hyperparameters were used during training:
42
  - lr_scheduler_type: linear
43
  - num_epochs: 1
44
  - mixed_precision_training: Native AMP
45
-
46
- ### Training results
47
-
48
-
49
-
50
  ### Framework versions
51
 
52
  - Transformers 4.44.2
53
- - Pytorch 2.4.1+cu121
54
- - Datasets 3.0.0
55
- - Tokenizers 0.19.1
 
1
  ---
2
  library_name: transformers
3
  license: mit
4
+ base_model: facebook/m2m100_418M
5
  tags:
6
  - translation
7
  - generated_from_trainer
8
  model-index:
9
+ - name: m2m100_418M-nig-en
10
  results: []
11
+ language:
12
+ - yo
13
+ - ig
14
+ - ha
15
+ pipeline_tag: translation
16
  ---
17
 
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
  should probably proofread and complete it, then remove this comment. -->
20
 
21
+ # AI-translator-9ja-to-eng
22
 
23
+ This model is a 418 Million parameter translation model, built for translating from Yoruba, Igbo, and Hausa into English. It was trained on a dataset consisting of 1,500,000 sentences (500,000 for each language), providing high-quality translations for these languages.
24
+ It was built with the intention of building a system that makes it easier to communicate with LLMs using Igbo, Hausa and Yoruba languages.
25
 
26
+ ## Model Details
27
 
28
+ - **Languages Supported**:
29
+ - Source Language: Yoruba, Igbo, Hausa
30
+ - Target Languages: English
31
 
 
32
 
 
33
 
34
+ ### Model Usage
35
 
36
+ To use this model for translation tasks, you can load it from Hugging Face’s `transformers` library:
37
 
38
+ ```python
39
+ from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
40
+
41
+ # Load the fine-tuned model
42
+ model = M2M100ForConditionalGeneration.from_pretrained("HelpMum-Personal/AI-translator-9ja-to-eng")
43
+ tokenizer = M2M100Tokenizer.from_pretrained("HelpMum-Personal/AI-translator-9ja-to-eng")
44
+
45
+ # translate igbo to English
46
+ igbo_text="Nlekọta ahụike bụ mpaghara dị mkpa n'ihe fọrọ nke nta ka ọ bụrụ obodo ọ bụla n'ihi na ọ na-emetụta ọdịmma na ịdịmma ndụ nke ndị mmadụ n'otu n'otu. Ọ gụnyere ọtụtụ ọrụ na ọrụ dị iche iche, gụnyere nlekọta mgbochi, nchoputa, ọgwụgwọ na njikwa ọrịa na ọnọdụ. Usoro nlekọta ahụike dị mma na-achọ imeziwanye nsonaazụ ahụike, belata ọrịa ọrịa, yana hụ na ndị mmadụ n'otu n'otu nwere ohere ịnweta ọrụ ahụike dị mkpa."
47
+ tokenizer.src_lang = "ig"
48
+ tokenizer.tgt_lang = "en"
49
+ encoded_ig = tokenizer(igbo_text, return_tensors="pt")
50
+ generated_tokens = model.generate(**encoded_ig, forced_bos_token_id=tokenizer.get_lang_id("en"))
51
+ tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
52
+
53
+
54
+
55
+ # translate yoruba to English
56
+ yoruba_text="Itọju ilera jẹ aaye pataki ni o fẹrẹ to gbogbo awujọ nitori pe o taara ni ilera ati didara igbesi aye eniyan kọọkan. O ni awọn iṣẹ lọpọlọpọ ati awọn oojọ, pẹlu itọju idena, iwadii aisan, itọju, ati iṣakoso awọn arun ati awọn ipo. Awọn eto ilera ti o munadoko ṣe ifọkansi lati ni ilọsiwaju awọn abajade ilera, dinku iṣẹlẹ ti aisan, ati rii daju pe awọn eniyan kọọkan ni iraye si awọn iṣẹ iṣoogun pataki."
57
+ tokenizer.src_lang = "yo"
58
+ tokenizer.tgt_lang = "en"
59
+ encoded_yo = tokenizer(yoruba_text, return_tensors="pt")
60
+ generated_tokens = model.generate(**encoded_yo, forced_bos_token_id=tokenizer.get_lang_id("en"))
61
+ tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
62
+
63
+ # translate Hausa to English
64
+ hausa_text="Kiwon lafiya fage ne mai mahimmanci a kusan kowace al'umma domin yana shafar jin daɗi da ingancin rayuwar ɗaiɗaikun kai tsaye. Ya ƙunshi nau'ikan ayyuka da sana'o'i, gami da kulawa na rigakafi, ganewar asali, jiyya, da kula da cututtuka da yanayi. Ingantattun tsarin kiwon lafiya na nufin inganta sakamakon kiwon lafiya, rage yawan kamuwa da cututtuka, da kuma tabbatar da cewa mutane sun sami damar yin amfani da ayyukan likita masu mahimmanci."
65
+ tokenizer.src_lang = "ha"
66
+ tokenizer.tgt_lang = "en"
67
+ encoded_ha = tokenizer(hausa_text, return_tensors="pt")
68
+ generated_tokens = model.generate(**encoded_ha, forced_bos_token_id=tokenizer.get_lang_id("en"))
69
+ tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
70
+ ```
71
+
72
+ ### Supported Language Codes
73
+ - **English**: `en`
74
+ - **Yoruba**: `yo`
75
+ - **Igbo**: `ig`
76
+ - **Hausa**: `ha`
77
+
78
+
79
+ ### Training Dataset
80
+
81
+ The training dataset consists of 1,500,000 translation pairs, sourced from a combination of open-source parallel corpora and curated datasets specific to Yoruba, Igbo, and Hausa
82
+
83
+ ## Limitations
84
+
85
+ - While the model performs well across Yoruba, Igbo, and Hausa to English translations, performance may vary depending on the complexity and domain of the text.
86
+ - Translation quality may decrease for extremely long sentences or ambiguous contexts.
87
 
88
  ### Training hyperparameters
89
 
 
96
  - lr_scheduler_type: linear
97
  - num_epochs: 1
98
  - mixed_precision_training: Native AMP
99
+ -
 
 
 
 
100
  ### Framework versions
101
 
102
  - Transformers 4.44.2
103
+ - Pytorch 2.4.0+cu121
104
+ - Datasets 2.21.0
105
+ - Tokenizers 0.19.1