mpt-7b-instruct-sharded
What are the steps required to replicate this for mpt-7b-instruct?
Hey - if it's useful, I can take a look at replicating this for mpt-7b-instruct
, but it might take me some time to get around to it.
The short version of how to DIY this is:
- load the model as it says on the original mosaicML model card
- if you want to have it on the hub, make a new model repo & clone your repo locally
- follow the transformers docs for saving a sharded model checkpoint & save it and the tokenizer to
my_model_dir
.- For this, I used
model.save_pretrained(my_model_dir, max_shard_size="2GB")
, but you can change the shard size as needed.
- For this, I used
- to add basic support for
device_map="auto"
, gradient checkpointing, etc., update the relevant.py
files as on this model - see the commit history - now you can use it like this one/push to hub/etc
@pszemraj I was able to replicate this easily with the instructions you provided. For anyone interested, the resulting model weights are available at jprafael/mpt-7b-instruct-sharded.
awesome! great stuff. BTW, I am discussing with a user on this discussion post - there may be some additional updates to make sure that everything works with device_map="auto"
specifically in the case of a multi-GPU setup. I have tested inference and fine-tuning with a single GPU and everything works fine, so don't worry about this if multi-gpu is irrelevant for you 👍
I'll reply here/ping you if/when that happens, but just FYI.
Currently I'm just using a single GPU, but I'm happy to incorporate the changes on my side when they're done.
will keep you posted!