NeMo
nvidia

Hf safetensors version

#3
by ehartford - opened

Who is making the hf safetensors version?

I've never used an nvidia model. Any idea on how to convert them to hf st?

What I could find:

  • the MLP has fc1 and fc2 (presumably up_proj and down_proj in any order, no gate_proj), so conversion to Llama is already excluded
  • the normalization layers have bias and is layernorm1p (also excludes any conversion to Llama format)
  • this model is GQA (96 query heads, 8 KV heads)
  • the activation function is squared ReLU

with all of that said, writing a modeling file seems inevitable unless we can find an existing Transformers architecture that matches all of these characteristics...

Also rotary_pct (0.5 here) needs to be implemented (see GPT-NeoX for reference):

        self.rotary_ndims = int(self.head_dim * config.rope_pct)
        ...

        query_rot = query_states[..., : self.rotary_ndims]
        query_pass = query_states[..., self.rotary_ndims :]
        key_rot = key_states[..., : self.rotary_ndims]
        key_pass = key_states[..., self.rotary_ndims :]

        cos, sin = self.rotary_emb(value_states, position_ids)
        query_states, key_states = apply_rotary_pos_emb(query_rot, key_rot, cos, sin)

        query_states = torch.cat((query_states, query_pass), dim=-1)
        key_states = torch.cat((key_states, key_pass), dim=-1)

and it's pre-layernorm instead of pre and post like llama.

Closest arch may be Phi-3. I'm unsure.

My checkpoint after finetune with Nemo framework look like this checkpoint (but I don't have model_config.yaml or .model file, only model_weights). How can I convert this to hf safetensors format?

FailSpy has an effort but it seems to have stalled

My checkpoint after finetune with Nemo framework look like this checkpoint (but I don't have model_config.yaml or .model file, only model_weights). How can I convert this to hf safetensors format?

When training completes, you get a Nemo file. The Nemo file is basically a tar xvf of the weights, tokenizer, config, and one or more files I forget. I usually do a small run, get the variants, then once I have the Nemo, I extract it then I train longer. I can then evaluate checkpoints. Now, you can only use these inside of Nemo from experience, conversion to HF is tricky and not fully supported. You can only convert to existing architectures using their existing converters. But models like the Reward and instruct variants cannot be made compatible with HF at the moment.

Sign up or log in to comment