trollek
/

NinjaMouse-3B-40L-danube

Text Generation

Inference Endpoints

Model card Files Files and versions Community

trollek commited on Apr 12, 2024

Commit

1d204a6

•

1 Parent(s): d203bbf

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -30,6 +30,11 @@ library_name: transformers
 tags:
 - code
 - art
 ---
 This is [NinjaMouse](https://huggingface.co/trollek/NinjaMouse-2.4B-32L-danube) extended even further. Instead of Cosmopedia I used different coding datasets.

 tags:
 - code
 - art
+---
+#### ❗ This model gives up when the input reaches a critical mass of about tree fiddy thousand tokens
+I have dun goofed and not tested the [base model](https://huggingface.co/h2oai/h2o-danube-1.8b-chat) enough (and possibly goofed in other ways too), but I'm already training the new one based on [h2oai/h2o-danube2-1.8b-chat](https://huggingface.co/h2oai/h2o-danube2-1.8b-chat). Perhaps S² attn or RoPE scaling will work and make a hella big context window possible? We'll see.
 ---
 This is [NinjaMouse](https://huggingface.co/trollek/NinjaMouse-2.4B-32L-danube) extended even further. Instead of Cosmopedia I used different coding datasets.