Waris01 commited on
Commit
6a81eb9
·
verified ·
1 Parent(s): 51cb7f4

Updated by Author

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md CHANGED
@@ -210,6 +210,81 @@ The training utilized **Google Colab GPUs, which provided the necessary computat
210
  The training process was carried out using **PyTorch** as the primary framework, leveraging libraries such as **Hugging Face Transformers** for model implementation and training.
211
 
212
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
213
 
214
  ## Glossary [optional]
215
 
 
210
  The training process was carried out using **PyTorch** as the primary framework, leveraging libraries such as **Hugging Face Transformers** for model implementation and training.
211
 
212
 
213
+ ## ROUGE Evaluation
214
+
215
+ To evaluate the quality of the generated summaries, we employed the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scoring system. This method compares the generated summaries against reference summaries to quantify their similarity and overall quality.
216
+
217
+ ### Evaluation Code
218
+
219
+ We used the `rouge_score` library to compute the ROUGE scores for our summaries. Below is the implementation:
220
+
221
+ ```python
222
+ from rouge_score import rouge_scorer
223
+
224
+ reference_summaries = [
225
+ "AI systems in healthcare improve diagnostics and personalize treatments.",
226
+ "Algorithms analyze market trends and help in fraud detection.",
227
+ ]
228
+
229
+ generated_summaries = [
230
+ "In healthcare, AI systems are used for predictive analytics and improving diagnostics.",
231
+ "In finance, algorithms analyze market trends and assist in fraud detection."
232
+ ]
233
+
234
+ scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
235
+
236
+ for reference, generated in zip(reference_summaries, generated_summaries):
237
+ scores = scorer.score(reference, generated)
238
+ print(f"Reference: {reference}")
239
+ print(f"Generated: {generated}")
240
+ print(f"ROUGE Scores: {scores}\n")
241
+ ```
242
+
243
+ ### ROUGE Scores
244
+
245
+ #### Summary 1
246
+ - **Reference**: "AI systems in healthcare improve diagnostics and personalize treatments."
247
+ - **Generated**: "In healthcare, AI systems are used for predictive analytics and improving diagnostics."
248
+
249
+ **ROUGE-1**:
250
+ - Precision: 72.73%
251
+ - Recall: 88.89%
252
+ - F1-Score: 80.00%
253
+
254
+ This score indicates a strong overlap, showing that the generated summary captures a significant amount of relevant information.
255
+
256
+ **ROUGE-2**:
257
+ - Precision: 60.00%
258
+ - Recall: 75.00%
259
+ - F1-Score: 66.67%
260
+
261
+ This indicates a good capture of bigrams, reflecting the generated summary's effectiveness in retaining key phrases.
262
+
263
+ **ROUGE-L**:
264
+ - Precision: 72.73%
265
+ - Recall: 88.89%
266
+ - F1-Score: 80.00%
267
+
268
+ This score confirms that the sequence of words in the generated summary closely follows that of the reference.
269
+
270
+ #### Summary 2
271
+ - **Reference**: "Algorithms analyze market trends and help in fraud detection."
272
+ - **Generated**: "In finance, algorithms analyze market trends and assist in fraud detection."
273
+
274
+ **ROUGE-1**:
275
+ - Precision: 72.73%
276
+ - Recall: 88.89%
277
+ - F1-Score: 80.00%
278
+
279
+ **ROUGE-2**:
280
+ - Precision: 60.00%
281
+ - Recall: 75.00%
282
+ - F1-Score: 66.67%
283
+
284
+ **ROUGE-L**:
285
+ - Precision: 72.73%
286
+ - Recall: 88.89%
287
+ - F1-Score: 80.00%
288
 
289
  ## Glossary [optional]
290