File size: 10,656 Bytes
7b05d2d
f25cbbf
7b05d2d
f25cbbf
 
7b05d2d
 
 
f25cbbf
7b05d2d
f25cbbf
 
7b05d2d
 
 
 
 
23eb5db
7b05d2d
 
 
 
 
 
8fa3d66
23eb5db
8fa3d66
f25cbbf
8fa3d66
f25cbbf
8fa3d66
7b05d2d
 
 
 
 
8fa3d66
f25cbbf
8fa3d66
f25cbbf
8fa3d66
f25cbbf
8fa3d66
7b05d2d
 
 
 
b050205
7b05d2d
 
 
8fa3d66
d6ae532
23aa33b
7b05d2d
8fa3d66
7b05d2d
8fa3d66
f25cbbf
7b05d2d
2c1452e
 
 
7b05d2d
d6ae532
8fa3d66
7b05d2d
 
8fa3d66
 
7b05d2d
8fa3d66
7b05d2d
 
 
f25cbbf
 
8fa3d66
 
f25cbbf
8fa3d66
f25cbbf
 
 
8fa3d66
f25cbbf
 
 
8fa3d66
f25cbbf
 
 
 
 
 
 
 
 
 
8fa3d66
f25cbbf
 
 
 
 
 
 
 
 
 
8fa3d66
f25cbbf
 
 
 
8fa3d66
f25cbbf
 
 
 
 
 
 
 
 
 
 
 
 
8fa3d66
f25cbbf
 
c8c56fc
8fa3d66
f25cbbf
 
 
c8c56fc
f25cbbf
 
c8c56fc
7b05d2d
 
8fa3d66
7b05d2d
f25cbbf
8fa3d66
7b05d2d
f25cbbf
 
7b05d2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
---
language:
- en
license: gpl
tags:
- autograding
- essay quetion
- sentence similarity
metrics:
- accuracy
library_name: peft
datasets:
- mohamedemam/Essay-quetions-auto-grading
---

# Model Card for Model ID

fine tuned version of Mistral on Essay-quetions-auto-grading



### Model Description

<!-- Provide a longer summary of what this model is. -->

We are thrilled to introduce our graduation project, the EM2 model, designed for automated essay grading in both Arabic and English. 📝✨

To develop this model, we first created a custom dataset for training. We adapted the QuAC and OpenOrca datasets to make them suitable for our automated essay grading application.

Our model utilizes the following impressive models:

Mistral: 96%
LLaMA: 93%
FLAN-T5: 93%
BLOOMZ (Arabic): 86%
MT0 (Arabic): 84%

You can try our models for auto-grading on Hugging Face! 🌐

We then deployed these models for practical use. We are proud of our team's hard work and the potential impact of the EM2 model in the field of education. 🌟

#MachineLearning #AI #Education #EssayGrading #GraduationProject

- **Developed by:** mohamed emam
- **Model type:** decoder only
- **Language(s) (NLP):** English
- **License:** gpl
- **Finetuned from model :** Mistral


<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/mohamed-em2m/Automatic-Grading-AI
- 
### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

auto grading for essay quetions

### Explain how it work 
- model take three inputs first context or perfect answer + quetion on context + student answer 
then model output the result

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6456f2eca9b8e1fd4cbe5ebe/_O75HT2zb2TYZOEkX4YXO.png)

### Training Data
- **mohamedemam/Essay-quetions-auto-grading-arabic**


### Training Procedure

using Trl
### Pipline
```python
from transformers import Pipeline
import torch.nn.functional as F


class MyPipeline:

    def __init__(self,model,tokenizer):
        self.model=model
        self.tokenizer=tokenizer

    def chat_Format(self,context, quetion, answer):
                        return "Instruction:/n check answer is true or false of next quetion using context below:\n" + "#context: " + context + f".\n#quetion: " + quetion + f".\n#student answer: " + answer + ".\n#response:"
                  

    def __call__(self, context, quetion, answer,generate=1,max_new_tokens=4, num_beams=2, do_sample=False,num_return_sequences=1):
                inp=self.chat_Format(context, quetion, answer)
                w = self.tokenizer(inp, add_special_tokens=True,
                                      pad_to_max_length=True,
                                      return_attention_mask=True,
                                      return_tensors='pt')
                response=""
                if(generate):
                    outputs = self.tokenizer.batch_decode(self.model.generate(input_ids=w['input_ids'].cuda(), attention_mask=w['attention_mask'].cuda(), max_new_tokens=max_new_tokens, num_beams=num_beams, do_sample=do_sample, num_return_sequences=num_return_sequences), skip_special_tokens=True)
                    response = outputs

                s =self.model(input_ids=w['input_ids'].cuda(), attention_mask=w['attention_mask'].cuda())['logits'][0][-1]
                s = F.softmax(s, dim=-1)
                yes_token_id = self.tokenizer.convert_tokens_to_ids(self.tokenizer.tokenize("True")[0])
                no_token_id = self.tokenizer.convert_tokens_to_ids(self.tokenizer.tokenize("False")[0])
                
                for i in  ["Yes", "yes", "True", "true","صحيح"]:
                  for word in self.tokenizer.tokenize(i): 
                    s[yes_token_id] += s[self.tokenizer.convert_tokens_to_ids(word)]
                for i in ["No", "no", "False", "false","خطأ"]:
                  for word in self.tokenizer.tokenize(i): 

                    s[no_token_id] += s[self.tokenizer.convert_tokens_to_ids(word)]
                true = (s[yes_token_id] / (s[no_token_id] + s[yes_token_id])).item()
                return {"response": response, "true": true}
context="""Large language models, such as GPT-4, are trained on vast amounts of text data to understand and generate human-like text. The deployment of these models involves several steps:

    Model Selection: Choosing a pre-trained model that fits the application's needs.
    Infrastructure Setup: Setting up the necessary hardware and software infrastructure to run the model efficiently, including cloud services, GPUs, and necessary libraries.
    Integration: Integrating the model into an application, which can involve setting up APIs or embedding the model directly into the software.
    Optimization: Fine-tuning the model for specific tasks or domains and optimizing it for performance and cost-efficiency.
    Monitoring and Maintenance: Ensuring the model performs well over time, monitoring for biases, and updating the model as needed.""" 
quetion="What are the key considerations when choosing a cloud service provider for deploying a large language model like GPT-4?"
answer="""When choosing a cloud service provider for deploying a large language model like GPT-4, the key considerations include:
    Compute Power: Ensure the provider offers high-performance GPUs or TPUs capable of handling the computational requirements of the model.
    Scalability: The ability to scale resources up or down based on the application's demand to handle varying workloads efficiently.
    Cost: Analyze the pricing models to understand the costs associated with compute time, storage, data transfer, and any other services.
    Integration and Support: Availability of tools and libraries that support easy integration of the model into your applications, along with robust technical support and documentation.
    Security and Compliance: Ensure the provider adheres to industry standards for security and compliance, protecting sensitive data and maintaining privacy.
    Latency and Availability: Consider the geographical distribution of data centers to ensure low latency and high availability for your end-users.

By evaluating these factors, you can select a cloud service provider that aligns with your deployment needs, ensuring efficient and cost-effective operation of your large language model."""
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM,AutoTokenizer

config = PeftConfig.from_pretrained("mohamedemam/Em2-llama-7b")
base_model = AutoModelForCausalLM.from_pretrained("NousResearch/Llama-2-7b-hf")
model = PeftModel.from_pretrained(base_model, "mohamedemam/Em2-llama-7b")
tokenizer = AutoTokenizer.from_pretrained("mohamedemam/Em2-llama-7b", trust_remote_code=True)
pipe=MyPipeline(model,tokenizer)
print(pipe(context,quetion,answer,generate=True,max_new_tokens=4, num_beams=2, do_sample=False,num_return_sequences=1))

```
- **output:**{'response': ["Instruction:/n check answer is true or false of next quetion using context below:\n#context: Large language models, such as GPT-4, are trained on vast amounts of text data to understand and generate human-like text. The deployment of these models involves several steps:\n\n    Model Selection: Choosing a pre-trained model that fits the application's needs.\n    Infrastructure Setup: Setting up the necessary hardware and software infrastructure to run the model efficiently, including cloud services, GPUs, and necessary libraries.\n    Integration: Integrating the model into an application, which can involve setting up APIs or embedding the model directly into the software.\n    Optimization: Fine-tuning the model for specific tasks or domains and optimizing it for performance and cost-efficiency.\n    Monitoring and Maintenance: Ensuring the model performs well over time, monitoring for biases, and updating the model as needed..\n#quetion: What are the key considerations when choosing a cloud service provider for deploying a large language model like GPT-4?.\n#student answer: When choosing a cloud service provider for deploying a large language model like GPT-4, the key considerations include:\n    Compute Power: Ensure the provider offers high-performance GPUs or TPUs capable of handling the computational requirements of the model.\n    Scalability: The ability to scale resources up or down based on the application's demand to handle varying workloads efficiently.\n    Cost: Analyze the pricing models to understand the costs associated with compute time, storage, data transfer, and any other services.\n    Integration and Support: Availability of tools and libraries that support easy integration of the model into your applications, along with robust technical support and documentation.\n    Security and Compliance: Ensure the provider adheres to industry standards for security and compliance, protecting sensitive data and maintaining privacy.\n    Latency and Availability: Consider the geographical distribution of data centers to ensure low latency and high availability for your end-users.\n\nBy evaluating these factors, you can select a cloud service provider that aligns with your deployment needs, ensuring efficient and cost-effective operation of your large language model..\n#response:  true the answer is"], 'true': 0.943033754825592}

### Chat Format Function
This function formats the input context, question, and answer into a specific structure for the model to process.

```python
def chat_Format(self, context, question, answer):
    return "Instruction:/n check answer is true or false of next question using context below:\n" + "#context: " + context + f".\n#question: " + question + f".\n#student answer: " + answer + ".\n#response:"
```


## Configuration

### Dropout Probability for LoRA Layers
- **lora_dropout:** 0.05

### Quantization Settings
- **use_4bit:** True
- **bnb_4bit_compute_dtype:** "float16"
- **bnb_4bit_quant_type:** "nf4"
- **use_nested_quant:** False

### Output Directory
- **output_dir:** "./results"

### Training Parameters
- **num_train_epochs:** 1
- **fp16:** False
- **bf16:** False
- **per_device_train_batch_size:** 1
- **per_device_eval_batch_size:** 4
- **gradient_accumulation_steps:** 8
- **gradient_checkpointing:** True
- **max_grad_norm:** 0.3
- **learning_rate:** 5e-5
- **weight_decay:** 0.001
- **optim:** "paged_adamw_8bit"
- **lr_scheduler_type:** "constant"
- **max_steps:** -1
- **warmup_ratio:** 0.03
- **group_by_length:** True

### Logging and Saving
- **save_steps:** 100
- **logging_steps:** 25
- **max_seq_length:** False