zhuchi76 commited on
Commit
4b1f2c1
·
verified ·
1 Parent(s): 0a94565

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -3
README.md CHANGED
@@ -15,7 +15,7 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # bert-finetuned-sst2
17
 
18
- This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
  - Loss: 0.6081
21
  - Accuracy: 0.64
@@ -30,9 +30,61 @@ More information needed
30
 
31
  ## Training and evaluation data
32
 
33
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
- ## Training procedure
36
 
37
  ### Training hyperparameters
38
 
 
15
 
16
  # bert-finetuned-sst2
17
 
18
+ This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on SST-2 dataset.
19
  It achieves the following results on the evaluation set:
20
  - Loss: 0.6081
21
  - Accuracy: 0.64
 
30
 
31
  ## Training and evaluation data
32
 
33
+ [SST-2 dataset](https://huggingface.co/datasets/gimmaru/glue-sst2)
34
+ We randomly select 100 training data and 100 evaluation data.
35
+
36
+ ## How to use
37
+
38
+ ```
39
+ from datasets import load_dataset
40
+ from transformers import AutoTokenizer, DataCollatorWithPadding
41
+
42
+ raw_datasets = load_dataset("glue", "sst2")
43
+ checkpoint = "zhuchi76/bert-finetuned-sst2"
44
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
45
+
46
+
47
+ def tokenize_function(example):
48
+ return tokenizer(example["sentence"], truncation=True)
49
+
50
+
51
+ tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
52
+
53
+ small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(100))
54
+ small_eval_dataset = tokenized_datasets["validation"].shuffle(seed=42).select(range(100))
55
+
56
+ data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
57
+
58
+ from transformers import TrainingArguments
59
+
60
+ training_args = TrainingArguments(output_dir="bert-finetuned-sst2",
61
+ evaluation_strategy="epoch",
62
+ hub_model_id="zhuchi76/bert-finetuned-sst2")
63
+
64
+ from transformers import AutoModelForSequenceClassification
65
+ model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
66
+
67
+ from transformers import Trainer
68
+ trainer = Trainer(
69
+ model,
70
+ training_args,
71
+ train_dataset=small_train_dataset, # if using cpu
72
+ eval_dataset=small_eval_dataset, # if using cpu
73
+ data_collator=data_collator,
74
+ tokenizer=tokenizer,
75
+ compute_metrics=compute_metrics,
76
+ )
77
+
78
+ # Evaluation
79
+ predictions = trainer.predict(small_eval_dataset)
80
+ print(predictions.predictions.shape, predictions.label_ids.shape)
81
+ preds = np.argmax(predictions.predictions, axis=-1)
82
+
83
+ import evaluate
84
+ metric = evaluate.load("glue", "sst2")
85
+ metric.compute(predictions=preds, references=predictions.label_ids)
86
 
87
+ ```
88
 
89
  ### Training hyperparameters
90