thethinkmachine
/

Maxwell-Task-Complexity-Scorer-v0.2

@@ -49,10 +49,7 @@ $$\text{S}_{predicted} \times (\text{max} - \text{min}) + \text{min}$$
 Use the code below to get started with the model.
 ```python
-import torch
-from transformers import AutoTokenizer, AutoModelForSequenceClassification
-model_name = "thethinkmachine/Maxwell-Task-Complexity-Exp-v1"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForSequenceClassification.from_pretrained(model_name)
@@ -66,16 +63,29 @@ def get_deita_complexity_score(question: str) -> int:
     final_score = torch.round(final_score)
     return final_score.item()
-query = "What is the capital of France?"
-final_score = get_deita_complexity_score(query)
-print(final_score)
 ```
 ## Training Details
 ### Training Data
-We use [BhabhaAI/DEITA-Complexity](https://huggingface.co/datasets/BhabhaAI/DEITA-Complexity) 'deita'set for training the model. The dataset contains 66.5K diverse English instructions along with their complexity scores computed using the DEITA-Evol-Complexity scoring scheme which uses an LLM-judge to rank a sextuple containing 1 seed + 5 progressively complexified (*evolved*) instructions based on their contextual complexity & difficulty. The scheme assigns scores within [1, 6] range, with 1 being the least complex and 6 being the most complex.
 However, the training dataset used was observed to have instruction-score pairs across a diversity of scores within the range [0,9]. We suspect that this range includes scoring errors, as anomalous scores (0, 7, 8, 9) account for less than 1% of the total instructions.

 Use the code below to get started with the model.
 ```python
+model_name = "thethinkmachine/Maxwell-Task-Complexity-Scorer-v0.2"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForSequenceClassification.from_pretrained(model_name)
     final_score = torch.round(final_score)
     return final_score.item()
+def get_scaled_complexity_score(question: str) -> float:
+    inputs = tokenizer(question, return_tensors="pt")
+    with torch.no_grad():
+        outputs = model(**inputs)
+    normalized_pred = outputs.logits.squeeze()
+    final_score = normalized_pred * (max_score - min_score) + min_score
+    final_score = torch.clamp(final_score, min=min_score, max=max_score)
+    final_score = final_score.item()
+    return round(final_score, 2)
+query = "Is learning equivalent to decreasing local entropy?"
+max_score = 100
+min_score = 0
+print("DEITA Evol-Complexity Score:", get_deita_complexity_score(query)) # 2
+print("Scaled Complexity Score:", get_scaled_complexity_score(query)) # 28.39...
 ```
 ## Training Details
 ### Training Data
+We use [BhabhaAI/DEITA-Complexity](https://huggingface.co/datasets/BhabhaAI/DEITA-Complexity) 'deita'set for training the model. The dataset contains 66.5K diverse English instructions along with their complexity scores computed using the DEITA-Evol-Complexity scoring scheme which uses an LLM-judge to rank a sextuple containing 1 seed + 5 progressively complexified (*evolved*) instructions based on their complexity & difficulty. The scheme assigns scores within [1, 6] range, with 1 being the least complex and 6 being the most complex.
 However, the training dataset used was observed to have instruction-score pairs across a diversity of scores within the range [0,9]. We suspect that this range includes scoring errors, as anomalous scores (0, 7, 8, 9) account for less than 1% of the total instructions.