thethinkmachine
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ Maxwell-TCS-v0.2 is a task/instruction complexity score annotator based on the M
|
|
16 |
|
17 |
### Model Description
|
18 |
|
19 |
-
Maxwell-TCS-v0.2 is an experimental SOTA **t**ask **c**omplexity **s**corer based on the state-of-the-art ModernBERT-Large backbone. It is trained under a data heavy setting over 66.5K diverse instruction-score pairs, and performs at par with other [complexity scorers](https://huggingface.co/hkust-nlp/deita-complexity-scorer) 34 times larger in size. For a given user instruction, the model predicts normalized scores between 0 and 1 across a single complexity dimension.
|
20 |
|
21 |
Maxwell-TCS can be used in a variety of downstreaming tasks such as prompt difficulty prediction, dataset annotation, dataset augmentation and more.
|
22 |
|
@@ -121,6 +121,8 @@ You are advised to use the model keeping these factors in mind.
|
|
121 |
- **Base Model:** ModernBERT-Large
|
122 |
- **Task:** Sequence Classification
|
123 |
- **Training regime:** FP32 Non-Mixed Precision
|
|
|
|
|
124 |
- **Batch size:** 8
|
125 |
- **Max epochs:** 3
|
126 |
- **Learning rate:** 5e-5
|
|
|
16 |
|
17 |
### Model Description
|
18 |
|
19 |
+
Maxwell-TCS-v0.2 is an experimental SOTA **t**ask **c**omplexity **s**corer based on the state-of-the-art ModernBERT-Large backbone. It is trained under a data heavy setting (TTP ratio of ~3.5 toks/trainable-param) over 66.5K diverse instruction-score pairs, and performs at par with other [complexity scorers](https://huggingface.co/hkust-nlp/deita-complexity-scorer) 34 times larger in size. For a given user instruction, the model predicts normalized scores between 0 and 1 across a single complexity dimension.
|
20 |
|
21 |
Maxwell-TCS can be used in a variety of downstreaming tasks such as prompt difficulty prediction, dataset annotation, dataset augmentation and more.
|
22 |
|
|
|
121 |
- **Base Model:** ModernBERT-Large
|
122 |
- **Task:** Sequence Classification
|
123 |
- **Training regime:** FP32 Non-Mixed Precision
|
124 |
+
- **# Training tokens:** 50.3 million tokens
|
125 |
+
- **Tokens-Per-Parameter Ratio:** ~3.5 (on 14.4 million trainable parameters)
|
126 |
- **Batch size:** 8
|
127 |
- **Max epochs:** 3
|
128 |
- **Learning rate:** 5e-5
|