tiiuae
/

falcon-mamba-7b-instruct-4bit

4-bit precision

Model card Files Files and versions Community

JingweiZuo commited on Aug 12, 2024

Commit

8817b19

·

verified ·

1 Parent(s): 9ea3bd1

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -104,7 +104,7 @@ print(tokenizer.decode(outputs[0]))
 ## Training Data
-Falcon-Mamba has been trained with ~ 6,000 GT mainly coming from [Refined-Web](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), a large volume web-only dataset filtered and deduplicated.
 Similar to the others [Falcon](https://huggingface.co/tiiuae/falcon-11B) suite models, Falcon-Mamba has been trained leveraging a multi-stage training strategy to increase the context-length from 2,048 to 8,192.
 Moreover, inspired by the concept of Curriculum Learning, we carefully selected data mixtures throughout the training stages, considering both data diversity and complexity.
 Note that at inference the context-length is not relevant as the Mamba architecture has no limit on long range dependency.

 ## Training Data
+Falcon-Mamba has been trained with ~ 5,500 GT mainly coming from [Refined-Web](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), a large volume web-only dataset filtered and deduplicated.
 Similar to the others [Falcon](https://huggingface.co/tiiuae/falcon-11B) suite models, Falcon-Mamba has been trained leveraging a multi-stage training strategy to increase the context-length from 2,048 to 8,192.
 Moreover, inspired by the concept of Curriculum Learning, we carefully selected data mixtures throughout the training stages, considering both data diversity and complexity.
 Note that at inference the context-length is not relevant as the Mamba architecture has no limit on long range dependency.