andersonbcdefg commited on
Commit
32807cd
1 Parent(s): 022ec2d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -1,4 +1,7 @@
1
  This is a version of [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) distilled down to 16 layers out of 22.
 
 
 
2
  The last 6 local attention layers were removed:
3
 
4
  0. Global
 
1
  This is a version of [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) distilled down to 16 layers out of 22.
2
+ This reduces the number of parameters from 149M to 119M; however, practically speaking, since the embedding params
3
+ do not contribute greatly to latency, the effect is reducing the "trunk" of the model from 110M params to 80M params.
4
+ I would expect this to reduce latency by roughly 25% (increasing throughput by roughly 33%).
5
  The last 6 local attention layers were removed:
6
 
7
  0. Global