File size: 10,791 Bytes
2e89137
7a2e0a4
 
 
 
 
 
 
 
 
ace5c4a
7a2e0a4
 
2e89137
 
 
 
fbac399
4d8b01d
2e89137
 
 
 
 
 
 
4d8b01d
2e89137
7a2e0a4
2e89137
7a2e0a4
2e89137
7a2e0a4
 
 
 
 
 
 
 
 
2e89137
7a2e0a4
2e89137
7a2e0a4
 
 
 
 
2e89137
 
7a2e0a4
 
f257d29
2e89137
 
7a2e0a4
2e89137
 
 
 
 
7a2e0a4
2e89137
089be89
 
 
 
 
2e89137
 
7a2e0a4
2e89137
 
 
 
7a2e0a4
0af18bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c2e9423
0af18bb
 
 
 
fbac399
0af18bb
 
fbac399
0af18bb
 
 
7a2e0a4
 
 
 
 
8f76b50
7a2e0a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
language:
- en
license: gpl
tags:
- autograding
- essay quetion
- sentence similarity
metrics:
- accuracy
library_name: peft
datasets:
- mohamedemam/Essay-quetions-auto-grading
---

# Model Card for Model ID

Fine tuned version of llama on Essay-quetions-auto-grading dataset
this model support autograding for english only

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

We are thrilled to introduce our graduation project, the EM2 model, designed for automated essay grading in both Arabic and English. 📝✨

To develop this model, we first created a custom dataset for training. We adapted the QuAC and OpenOrca datasets to make them suitable for our automated essay grading application.

Our model utilizes the following impressive models:

Mistral: 96%
LLaMA: 93%
FLAN-T5: 93%
BLOOMZ (Arabic): 86%
MT0 (Arabic): 84%

You can try our models for auto-grading on Hugging Face! 🌐

We then deployed these models for practical use. We are proud of our team's hard work and the potential impact of the EM2 model in the field of education. 🌟

#MachineLearning #AI #Education #EssayGrading #GraduationProject

- **Developed by:** mohamed emam
- **Model type:** decoder only
- **Language(s) (NLP):** English
- **License:** gpl
- **Finetuned from model :** llama


<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/mohamed-em2m/Automatic-Grading-AI
### Direct Use

auto grading for essay quetions

### Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

Text generation

### Explain how it work 
- model take three inputs first context or perfect answer + quetion on context + student answer 
then model output the result

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6456f2eca9b8e1fd4cbe5ebe/EcEyKSJsYsO4ooxuEMeO-.png)

### Training Data
- **mohamedemam/Essay-quetions-auto-grading-arabic**


### Training Procedure

using Trl
### Pipline
```python
from transformers import Pipeline
import torch.nn.functional as F


class MyPipeline:

    def __init__(self,model,tokenizer):
        self.model=model
        self.tokenizer=tokenizer

    def chat_Format(self,context, quetion, answer):
                        return "Instruction:/n check answer is true or false of next quetion using context below:\n" + "#context: " + context + f".\n#quetion: " + quetion + f".\n#student answer: " + answer + ".\n#response: "
                  

    def __call__(self, context, quetion, answer,generate=1,max_new_tokens=4, num_beams=2, do_sample=False,num_return_sequences=1):
                inp=self.chat_Format(context, quetion, answer)
                w = self.tokenizer(inp, add_special_tokens=True,
                                      pad_to_max_length=True,
                                      return_attention_mask=True,
                                      return_tensors='pt')
                response=""
                if(generate):
                    outputs = self.tokenizer.batch_decode(self.model.generate(input_ids=w['input_ids'].cuda(), attention_mask=w['attention_mask'].cuda(), max_new_tokens=max_new_tokens, num_beams=num_beams, do_sample=do_sample, num_return_sequences=num_return_sequences), skip_special_tokens=True)
                    response = outputs

                s =self.model(input_ids=w['input_ids'].cuda(), attention_mask=w['attention_mask'].cuda())['logits'][0][-1]
                s = F.softmax(s, dim=-1)
                yes_token_id = self.tokenizer.convert_tokens_to_ids(self.tokenizer.tokenize("True")[0])
                no_token_id = self.tokenizer.convert_tokens_to_ids(self.tokenizer.tokenize("False")[0])
                
                for i in  ["Yes", "yes", "True", "true","صحيح"]:
                  for word in self.tokenizer.tokenize(i): 
                    s[yes_token_id] += s[self.tokenizer.convert_tokens_to_ids(word)]
                for i in ["No", "no", "False", "false","خطأ"]:
                  for word in self.tokenizer.tokenize(i): 

                    s[no_token_id] += s[self.tokenizer.convert_tokens_to_ids(word)]
                true = (s[yes_token_id] / (s[no_token_id] + s[yes_token_id])).item()
                return {"response": response, "true": true}
context="""Large language models, such as GPT-4, are trained on vast amounts of text data to understand and generate human-like text. The deployment of these models involves several steps:

    Model Selection: Choosing a pre-trained model that fits the application's needs.
    Infrastructure Setup: Setting up the necessary hardware and software infrastructure to run the model efficiently, including cloud services, GPUs, and necessary libraries.
    Integration: Integrating the model into an application, which can involve setting up APIs or embedding the model directly into the software.
    Optimization: Fine-tuning the model for specific tasks or domains and optimizing it for performance and cost-efficiency.
    Monitoring and Maintenance: Ensuring the model performs well over time, monitoring for biases, and updating the model as needed.""" 
quetion="What are the key considerations when choosing a cloud service provider for deploying a large language model like GPT-4?"
answer="""When choosing a cloud service provider for deploying a large language model like GPT-4, the key considerations include:
    Compute Power: Ensure the provider offers high-performance GPUs or TPUs capable of handling the computational requirements of the model.
    Scalability: The ability to scale resources up or down based on the application's demand to handle varying workloads efficiently.
    Cost: Analyze the pricing models to understand the costs associated with compute time, storage, data transfer, and any other services.
    Integration and Support: Availability of tools and libraries that support easy integration of the model into your applications, along with robust technical support and documentation.
    Security and Compliance: Ensure the provider adheres to industry standards for security and compliance, protecting sensitive data and maintaining privacy.
    Latency and Availability: Consider the geographical distribution of data centers to ensure low latency and high availability for your end-users.

By evaluating these factors, you can select a cloud service provider that aligns with your deployment needs, ensuring efficient and cost-effective operation of your large language model."""
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM,AutoTokenizer

config = PeftConfig.from_pretrained("mohamedemam/Em2-llama-7b")
base_model = AutoModelForCausalLM.from_pretrained("NousResearch/Llama-2-7b-hf")
model = PeftModel.from_pretrained(base_model, "mohamedemam/Em2-llama-7b")
tokenizer = AutoTokenizer.from_pretrained("mohamedemam/Em2-llama-7b", trust_remote_code=True)
pipe=MyPipeline(model,tokenizer)
print(pipe(context,quetion,answer,generate=True,max_new_tokens=4, num_beams=2, do_sample=False,num_return_sequences=1))
         
```
- **output:**{'response': ["Instruction:/n check answer is true or false of next quetion using context below:\n#context: Large language models, such as GPT-4, are trained on vast amounts of text data to understand and generate human-like text. The deployment of these models involves several steps:\n\n    Model Selection: Choosing a pre-trained model that fits the application's needs.\n    Infrastructure Setup: Setting up the necessary hardware and software infrastructure to run the model efficiently, including cloud services, GPUs, and necessary libraries.\n    Integration: Integrating the model into an application, which can involve setting up APIs or embedding the model directly into the software.\n    Optimization: Fine-tuning the model for specific tasks or domains and optimizing it for performance and cost-efficiency.\n    Monitoring and Maintenance: Ensuring the model performs well over time, monitoring for biases, and updating the model as needed..\n#quetion: What are the key considerations when choosing a cloud service provider for deploying a large language model like GPT-4?.\n#student answer: When choosing a cloud service provider for deploying a large language model like GPT-4, the key considerations include:\n    Compute Power: Ensure the provider offers high-performance GPUs or TPUs capable of handling the computational requirements of the model.\n    Scalability: The ability to scale resources up or down based on the application's demand to handle varying workloads efficiently.\n    Cost: Analyze the pricing models to understand the costs associated with compute time, storage, data transfer, and any other services.\n    Integration and Support: Availability of tools and libraries that support easy integration of the model into your applications, along with robust technical support and documentation.\n    Security and Compliance: Ensure the provider adheres to industry standards for security and compliance, protecting sensitive data and maintaining privacy.\n    Latency and Availability: Consider the geographical distribution of data centers to ensure low latency and high availability for your end-users.\n\nBy evaluating these factors, you can select a cloud service provider that aligns with your deployment needs, ensuring efficient and cost-effective operation of your large language model..\n#response:  true the answer is"], 'true': 0.943033754825592}

### Chat Format Function
This function formats the input context, question, and answer into a specific structure for the model to process.

```python
def chat_Format(self, context, question, answer):
    return "Instruction:/n check answer is true or false of next question using context below:\n" + "#context: " + context + f".\n#question: " + question + f".\n#student answer: " + answer + ".\n#response: "
```


## Configuration

### Dropout Probability for LoRA Layers
- **lora_dropout:** 0.05

### Quantization Settings
- **use_4bit:** True
- **bnb_4bit_compute_dtype:** "float16"
- **bnb_4bit_quant_type:** "nf4"
- **use_nested_quant:** False

### Output Directory
- **output_dir:** "./results"

### Training Parameters
- **num_train_epochs:** 1
- **fp16:** False
- **bf16:** False
- **per_device_train_batch_size:** 1
- **per_device_eval_batch_size:** 4
- **gradient_accumulation_steps:** 8
- **gradient_checkpointing:** True
- **max_grad_norm:** 0.3
- **learning_rate:** 5e-5
- **weight_decay:** 0.001
- **optim:** "paged_adamw_8bit"
- **lr_scheduler_type:** "constant"
- **max_steps:** -1
- **warmup_ratio:** 0.03
- **group_by_length:** True

### Logging and Saving
- **save_steps:** 100
- **logging_steps:** 25
- **max_seq_length:** False