thethinkmachine
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -49,10 +49,7 @@ $$\text{S}_{predicted} \times (\text{max} - \text{min}) + \text{min}$$
|
|
49 |
Use the code below to get started with the model.
|
50 |
|
51 |
```python
|
52 |
-
|
53 |
-
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
54 |
-
|
55 |
-
model_name = "thethinkmachine/Maxwell-Task-Complexity-Exp-v1"
|
56 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
57 |
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
58 |
|
@@ -66,16 +63,29 @@ def get_deita_complexity_score(question: str) -> int:
|
|
66 |
final_score = torch.round(final_score)
|
67 |
return final_score.item()
|
68 |
|
69 |
-
|
70 |
-
|
71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
```
|
73 |
|
74 |
## Training Details
|
75 |
|
76 |
### Training Data
|
77 |
|
78 |
-
We use [BhabhaAI/DEITA-Complexity](https://huggingface.co/datasets/BhabhaAI/DEITA-Complexity) 'deita'set for training the model. The dataset contains 66.5K diverse English instructions along with their complexity scores computed using the DEITA-Evol-Complexity scoring scheme which uses an LLM-judge to rank a sextuple containing 1 seed + 5 progressively complexified (*evolved*) instructions based on their
|
79 |
|
80 |
However, the training dataset used was observed to have instruction-score pairs across a diversity of scores within the range [0,9]. We suspect that this range includes scoring errors, as anomalous scores (0, 7, 8, 9) account for less than 1% of the total instructions.
|
81 |
|
|
|
49 |
Use the code below to get started with the model.
|
50 |
|
51 |
```python
|
52 |
+
model_name = "thethinkmachine/Maxwell-Task-Complexity-Scorer-v0.2"
|
|
|
|
|
|
|
53 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
54 |
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
55 |
|
|
|
63 |
final_score = torch.round(final_score)
|
64 |
return final_score.item()
|
65 |
|
66 |
+
def get_scaled_complexity_score(question: str) -> float:
|
67 |
+
inputs = tokenizer(question, return_tensors="pt")
|
68 |
+
with torch.no_grad():
|
69 |
+
outputs = model(**inputs)
|
70 |
+
normalized_pred = outputs.logits.squeeze()
|
71 |
+
final_score = normalized_pred * (max_score - min_score) + min_score
|
72 |
+
final_score = torch.clamp(final_score, min=min_score, max=max_score)
|
73 |
+
final_score = final_score.item()
|
74 |
+
return round(final_score, 2)
|
75 |
+
|
76 |
+
query = "Is learning equivalent to decreasing local entropy?"
|
77 |
+
max_score = 100
|
78 |
+
min_score = 0
|
79 |
+
|
80 |
+
print("DEITA Evol-Complexity Score:", get_deita_complexity_score(query)) # 2
|
81 |
+
print("Scaled Complexity Score:", get_scaled_complexity_score(query)) # 28.39...
|
82 |
```
|
83 |
|
84 |
## Training Details
|
85 |
|
86 |
### Training Data
|
87 |
|
88 |
+
We use [BhabhaAI/DEITA-Complexity](https://huggingface.co/datasets/BhabhaAI/DEITA-Complexity) 'deita'set for training the model. The dataset contains 66.5K diverse English instructions along with their complexity scores computed using the DEITA-Evol-Complexity scoring scheme which uses an LLM-judge to rank a sextuple containing 1 seed + 5 progressively complexified (*evolved*) instructions based on their complexity & difficulty. The scheme assigns scores within [1, 6] range, with 1 being the least complex and 6 being the most complex.
|
89 |
|
90 |
However, the training dataset used was observed to have instruction-score pairs across a diversity of scores within the range [0,9]. We suspect that this range includes scoring errors, as anomalous scores (0, 7, 8, 9) account for less than 1% of the total instructions.
|
91 |
|