thethinkmachine
/

Maxwell-Task-Complexity-Scorer-v0.2

Text Classification

Inference Endpoints

Model card Files Files and versions Community

thethinkmachine commited on 4 days ago

Commit

5a68db7

·

verified ·

1 Parent(s): f3f8abb

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ Maxwell-TCS-v0.2 is a task/instruction complexity score annotator based on the M
 ### Model Description
-Maxwell-TCS-v0.2 is an experimental SOTA **t**ask **c**omplexity **s**corer based on the state-of-the-art ModernBERT-Large backbone. It is trained under a data heavy setting over 66.5K diverse instruction-score pairs, and performs at par with other [complexity scorers](https://huggingface.co/hkust-nlp/deita-complexity-scorer) 34 times larger in size. For a given user instruction, the model predicts normalized scores between 0 and 1 across a single complexity dimension.
 Maxwell-TCS can be used in a variety of downstreaming tasks such as prompt difficulty prediction, dataset annotation, dataset augmentation and more.
@@ -121,6 +121,8 @@ You are advised to use the model keeping these factors in mind.
 - **Base Model:** ModernBERT-Large
 - **Task:** Sequence Classification
 - **Training regime:** FP32 Non-Mixed Precision
 - **Batch size:** 8
 - **Max epochs:** 3
 - **Learning rate:** 5e-5

 ### Model Description
+Maxwell-TCS-v0.2 is an experimental SOTA **t**ask **c**omplexity **s**corer based on the state-of-the-art ModernBERT-Large backbone. It is trained under a data heavy setting (TTP ratio of ~3.5 toks/trainable-param) over 66.5K diverse instruction-score pairs, and performs at par with other [complexity scorers](https://huggingface.co/hkust-nlp/deita-complexity-scorer) 34 times larger in size. For a given user instruction, the model predicts normalized scores between 0 and 1 across a single complexity dimension.
 Maxwell-TCS can be used in a variety of downstreaming tasks such as prompt difficulty prediction, dataset annotation, dataset augmentation and more.
 - **Base Model:** ModernBERT-Large
 - **Task:** Sequence Classification
 - **Training regime:** FP32 Non-Mixed Precision
+- **# Training tokens:** 50.3 million tokens
+- **Tokens-Per-Parameter Ratio:** ~3.5 (on 14.4 million trainable parameters)
 - **Batch size:** 8
 - **Max epochs:** 3
 - **Learning rate:** 5e-5