andersonbcdefg
/

distilmodernbert

Model card Files Files and versions Community

andersonbcdefg commited on 17 days ago

Commit

32807cd

•

1 Parent(s): 022ec2d

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -1,4 +1,7 @@
 This is a version of [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) distilled down to 16 layers out of 22.
 The last 6 local attention layers were removed:
 0. Global

 This is a version of [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) distilled down to 16 layers out of 22.
+This reduces the number of parameters from 149M to 119M; however, practically speaking, since the embedding params
+do not contribute greatly to latency, the effect is reducing the "trunk" of the model from 110M params to 80M params.
+I would expect this to reduce latency by roughly 25% (increasing throughput by roughly 33%).
 The last 6 local attention layers were removed:
 0. Global