thethinkmachine commited on
Commit
5a68db7
·
verified ·
1 Parent(s): f3f8abb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -16,7 +16,7 @@ Maxwell-TCS-v0.2 is a task/instruction complexity score annotator based on the M
16
 
17
  ### Model Description
18
 
19
- Maxwell-TCS-v0.2 is an experimental SOTA **t**ask **c**omplexity **s**corer based on the state-of-the-art ModernBERT-Large backbone. It is trained under a data heavy setting over 66.5K diverse instruction-score pairs, and performs at par with other [complexity scorers](https://huggingface.co/hkust-nlp/deita-complexity-scorer) 34 times larger in size. For a given user instruction, the model predicts normalized scores between 0 and 1 across a single complexity dimension.
20
 
21
  Maxwell-TCS can be used in a variety of downstreaming tasks such as prompt difficulty prediction, dataset annotation, dataset augmentation and more.
22
 
@@ -121,6 +121,8 @@ You are advised to use the model keeping these factors in mind.
121
  - **Base Model:** ModernBERT-Large
122
  - **Task:** Sequence Classification
123
  - **Training regime:** FP32 Non-Mixed Precision
 
 
124
  - **Batch size:** 8
125
  - **Max epochs:** 3
126
  - **Learning rate:** 5e-5
 
16
 
17
  ### Model Description
18
 
19
+ Maxwell-TCS-v0.2 is an experimental SOTA **t**ask **c**omplexity **s**corer based on the state-of-the-art ModernBERT-Large backbone. It is trained under a data heavy setting (TTP ratio of ~3.5 toks/trainable-param) over 66.5K diverse instruction-score pairs, and performs at par with other [complexity scorers](https://huggingface.co/hkust-nlp/deita-complexity-scorer) 34 times larger in size. For a given user instruction, the model predicts normalized scores between 0 and 1 across a single complexity dimension.
20
 
21
  Maxwell-TCS can be used in a variety of downstreaming tasks such as prompt difficulty prediction, dataset annotation, dataset augmentation and more.
22
 
 
121
  - **Base Model:** ModernBERT-Large
122
  - **Task:** Sequence Classification
123
  - **Training regime:** FP32 Non-Mixed Precision
124
+ - **# Training tokens:** 50.3 million tokens
125
+ - **Tokens-Per-Parameter Ratio:** ~3.5 (on 14.4 million trainable parameters)
126
  - **Batch size:** 8
127
  - **Max epochs:** 3
128
  - **Learning rate:** 5e-5