diff --git a/.gitattributes b/.gitattributes
index a6344aac8c09253b3b630fb776ae94478aa0275b..87cd3a7e551f7820613d93addf0b36b001a69970 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+model_weights/module.embedding.position_embeddings.weight/0.0 filter=lfs diff=lfs merge=lfs -text
+model_weights/module.embedding.word_embeddings.weight/0.0 filter=lfs diff=lfs merge=lfs -text
+model_weights/module.output_layer.weight/0.0 filter=lfs diff=lfs merge=lfs -text
diff --git a/README.md b/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..180aa3cfaa611306c78dc0366fa8181dfa286cc1
--- /dev/null
+++ b/README.md
@@ -0,0 +1,196 @@
+---
+license: cc-by-4.0
+library_name: nemo
+tags:
+- pytorch
+- NeMo
+---
+
+# Abc5
+
+<style>
+img {
+ display: inline;
+}
+</style>
+
+[![Model architecture](https://img.shields.io/badge/Model_Arch-PUT-YOUR-ARCHITECTURE-HERE-lightgrey#model-badge)](#model-architecture)
+| [![Model size](https://img.shields.io/badge/Params-PUT-YOUR-MODEL-SIZE-HERE-lightgrey#model-badge)](#model-architecture)
+| [![Language](https://img.shields.io/badge/Language-PUT-YOUR-LANGUAGE-HERE-lightgrey#model-badge)](#datasets)
+
+**Put a short model description here.**
+
+See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/index.html) for complete architecture details.
+
+
+## NVIDIA NeMo: Training
+
+To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
+```
+pip install nemo_toolkit['all']
+``` 
+
+## How to Use this Model
+
+The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
+
+### Automatically instantiate the model
+
+**NOTE**: Please update the model class below to match the class of the model being uploaded.
+
+```python
+import nemo.core import ModelPT
+model = ModelPT.from_pretrained("smajumdar/abc5")
+```
+
+### NOTE
+
+    Add some information about how to use the model here. An example is provided for ASR inference below.
+
+    ### Transcribing using Python
+    First, let's get a sample
+    ```
+    wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
+    ```
+    Then simply do:
+    ```
+    asr_model.transcribe(['2086-149220-0033.wav'])
+    ```
+
+    ### Transcribing many audio files
+
+    ```shell
+    python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py      pretrained_name="smajumdar/abc5"      audio_dir=""
+    ```
+
+### Input
+
+**Add some information about what are the inputs to this model**
+
+### Output
+
+**Add some information about what are the outputs of this model**
+
+## Model Architecture
+
+**Add information here discussing architectural details of the model or any comments to users about the model.**
+
+## Training
+
+**Add information here about how the model was trained. It should be as detailed as possible, potentially including the the link to the script used to train as well as the base config used to train the model. If extraneous scripts are used to prepare the components of the model, please include them here.**
+
+### NOTE
+
+    An example is provided below for ASR
+
+    The NeMo toolkit [3] was used for training the models for over several hundred epochs. These model are trained with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/fastconformer/fast-conformer_transducer_bpe.yaml).
+
+    The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
+
+
+### Datasets
+
+**Try to provide as detailed a list of datasets as possible. If possible, provide links to the datasets on HF by adding it to the manifest section at the top of the README (marked by ---).**
+
+### NOTE
+
+    An example for the manifest section is provided below for ASR datasets
+
+    datasets:
+    - librispeech_asr
+    - fisher_corpus
+    - Switchboard-1
+    - WSJ-0
+    - WSJ-1
+    - National-Singapore-Corpus-Part-1
+    - National-Singapore-Corpus-Part-6
+    - vctk
+    - voxpopuli
+    - europarl
+    - multilingual_librispeech
+    - mozilla-foundation/common_voice_8_0
+    - MLCommons/peoples_speech
+
+    The corresponding text in this section for those datasets is stated below -
+
+    The model was trained on 64K hours of English speech collected and prepared by NVIDIA NeMo and Suno teams.
+
+    The training dataset consists of private subset with 40K hours of English speech plus 24K hours from the following public datasets:
+
+    - Librispeech 960 hours of English speech
+    - Fisher Corpus
+    - Switchboard-1 Dataset
+    - WSJ-0 and WSJ-1
+    - National Speech Corpus (Part 1, Part 6)
+    - VCTK
+    - VoxPopuli (EN)
+    - Europarl-ASR (EN)
+    - Multilingual Librispeech (MLS EN) - 2,000 hour subset
+    - Mozilla Common Voice (v7.0)
+    - People's Speech  - 12,000 hour subset
+
+
+## Performance
+
+**Add information here about the performance of the model. Discuss what is the metric that is being used to evaluate the model and if there are external links explaning the custom metric, please link to it.
+
+### NOTE
+
+    An example is provided below for ASR metrics list that can be added to the top of the README
+    
+    model-index:
+    - name: PUT_MODEL_NAME
+      results:
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: AMI (Meetings test)
+          type: edinburghcstr/ami
+          config: ihm
+          split: test
+          args:
+            language: en
+        metrics:
+        - name: Test WER
+          type: wer
+          value: 17.10
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Earnings-22
+          type: revdotcom/earnings22
+          split: test
+          args:
+            language: en
+        metrics:
+        - name: Test WER
+          type: wer
+          value: 14.11
+
+Provide any caveats about the results presented in the top of the discussion so that nuance is not lost. 
+
+It should ideally be in a tabular format (you can use the following website to make your tables in markdown format - https://www.tablesgenerator.com/markdown_tables)**
+
+## Limitations
+
+**Discuss any practical limitations to the model when being used in real world cases. They can also be legal disclaimers, or discussion regarding the safety of the model (particularly in the case of LLMs).**
+
+
+### Note
+
+    An example is provided below 
+
+    Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
+
+
+## License
+
+License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
+
+## References
+
+**Provide appropriate references in the markdown link format below. Please order them numerically.**
+
+[1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
diff --git a/model_config.yaml b/model_config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2ef3105963cd6c2aa690742485ec15262580cc97
--- /dev/null
+++ b/model_config.yaml
@@ -0,0 +1,129 @@
+tensor_model_parallel_size: 1
+pipeline_model_parallel_size: 1
+virtual_pipeline_model_parallel_size: null
+sequence_parallel: false
+context_parallel_size: 1
+expert_model_parallel_size: 1
+moe_extended_tp: false
+perform_initialization: true
+use_cpu_initialization: false
+fp16: false
+bf16: false
+params_dtype: float32
+timers: null
+finalize_model_grads_func: null
+grad_scale_func: null
+no_sync_func: null
+grad_sync_func: null
+param_sync_func: null
+deterministic_mode: false
+enable_autocast: false
+autocast_dtype: float32
+num_microbatches_with_partial_activation_checkpoints: null
+gradient_accumulation_fusion: false
+async_tensor_model_parallel_allreduce: false
+use_te_rng_tracker: false
+tp_comm_overlap: false
+tp_comm_bulk_wgrad: true
+tp_comm_bulk_dgrad: true
+tp_comm_overlap_ag: true
+tp_comm_overlap_rs: true
+tp_comm_overlap_rs_dgrad: false
+tp_comm_split_ag: true
+tp_comm_atomic_ag: false
+tp_comm_split_rs: true
+tp_comm_atomic_rs: false
+pipeline_dtype: null
+variable_seq_lengths: false
+overlap_p2p_comm: false
+batch_p2p_comm: true
+batch_p2p_sync: true
+use_ring_exchange_p2p: false
+deallocate_pipeline_outputs: false
+defer_embedding_wgrad_compute: false
+pipeline_model_parallel_split_rank: null
+cpu_offloading: false
+cpu_offloading_num_layers: 0
+_cpu_offloading_context: null
+cpu_offloading_activations: true
+cpu_offloading_weights: true
+barrier_with_L1_time: true
+fp16_lm_cross_entropy: false
+parallel_output: true
+share_embeddings_and_output_weights: false
+make_vocab_size_divisible_by: 128
+position_embedding_type: learned_absolute
+rotary_base: 10000
+rotary_percent: 1.0
+seq_len_interpolation_factor: null
+seq_length: 2048
+optim:
+  name: fused_adam
+  sched: null
+optimizer_fn: null
+tokenizer_filepath: null
+num_layers: 4
+hidden_size: 256
+num_attention_heads: 4
+num_query_groups: 4
+ffn_hidden_size: 256
+kv_channels: 64
+hidden_dropout: 0.1
+attention_dropout: 0.1
+fp32_residual_connection: false
+apply_residual_connection_post_layernorm: false
+layernorm_epsilon: 1.0e-05
+layernorm_zero_centered_gamma: false
+add_bias_linear: true
+add_qkv_bias: false
+gated_linear_unit: false
+activation_func: gelu
+activation_func_fp8_input_store: false
+num_moe_experts: null
+rotary_interleaved: false
+window_size: null
+normalization: LayerNorm
+qk_layernorm: false
+test_mode: false
+calculate_per_token_loss: false
+init_method: init_
+output_layer_init_method: init_
+init_method_std: 0.02
+apply_query_key_layer_scaling: false
+attention_softmax_in_fp32: true
+bias_activation_fusion: false
+masked_softmax_fusion: false
+persist_layer_norm: false
+memory_efficient_layer_norm: false
+bias_dropout_fusion: false
+apply_rope_fusion: false
+recompute_granularity: null
+recompute_method: null
+recompute_num_layers: null
+distribute_saved_activations: null
+fp8: null
+fp8_margin: 0
+fp8_interval: 1
+fp8_amax_history_len: 1
+fp8_amax_compute_algo: most_recent
+fp8_wgrad: true
+fp8_dot_product_attention: false
+fp8_multi_head_attention: false
+moe_router_load_balancing_type: aux_loss
+moe_router_topk: 2
+moe_grouped_gemm: false
+moe_aux_loss_coeff: 0.0
+moe_z_loss_coeff: null
+moe_input_jitter_eps: null
+moe_token_dropping: false
+moe_token_dispatcher_type: allgather
+moe_per_layer_logging: false
+moe_expert_capacity_factor: null
+moe_pad_expert_input_to_capacity: false
+moe_token_drop_policy: probs
+moe_layer_recompute: false
+clone_scatter_output_in_embedding: true
+disable_parameter_transpose_cache: false
+enable_cuda_graph: false
+target: nemo.collections.llm.gpt.model.base_v2.GPTModelV2
+nemo_version: 2.0.0rc1
diff --git a/model_weights/common.pt b/model_weights/common.pt
new file mode 100644
index 0000000000000000000000000000000000000000..adde03ac922f81f982a7df845a24fb79bdaeb2c2
--- /dev/null
+++ b/model_weights/common.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:e4e4090fa34d96307127606cccef3ae99aedae58279e8bdf1746d44d3bf7aa47
+size 860
diff --git a/model_weights/metadata.json b/model_weights/metadata.json
new file mode 100644
index 0000000000000000000000000000000000000000..efdcae4b720b402ac0295007ff69eefab33a2e82
--- /dev/null
+++ b/model_weights/metadata.json
@@ -0,0 +1 @@
+{"sharded_backend": "zarr", "sharded_backend_version": 1, "common_backend": "torch", "common_backend_version": 1}
\ No newline at end of file
diff --git a/model_weights/module.decoder.final_layernorm.bias/.zarray b/model_weights/module.decoder.final_layernorm.bias/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..5de96d94449fb5b42c7aa27f143bea52472890af
--- /dev/null
+++ b/model_weights/module.decoder.final_layernorm.bias/.zarray
@@ -0,0 +1,14 @@
+{
+    "chunks": [
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.decoder.final_layernorm.bias/0 b/model_weights/module.decoder.final_layernorm.bias/0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.final_layernorm.bias/0 differ
diff --git a/model_weights/module.decoder.final_layernorm.weight/.zarray b/model_weights/module.decoder.final_layernorm.weight/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..5de96d94449fb5b42c7aa27f143bea52472890af
--- /dev/null
+++ b/model_weights/module.decoder.final_layernorm.weight/.zarray
@@ -0,0 +1,14 @@
+{
+    "chunks": [
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.decoder.final_layernorm.weight/0 b/model_weights/module.decoder.final_layernorm.weight/0
new file mode 100644
index 0000000000000000000000000000000000000000..7b666430fe939d73d91a6fce26ede6086f5445c8
--- /dev/null
+++ b/model_weights/module.decoder.final_layernorm.weight/0
@@ -0,0 +1 @@
+�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1._extra_state/shard_0_4.pt b/model_weights/module.decoder.layers.mlp.linear_fc1._extra_state/shard_0_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..af9c0d2f7a9154b6db4b60f739a17a04261b8254
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc1._extra_state/shard_0_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:8fe8cf89ac8228df7c20a5fbac2a50c841310072585016f109c1955934c30a0f
+size 1832
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1._extra_state/shard_1_4.pt b/model_weights/module.decoder.layers.mlp.linear_fc1._extra_state/shard_1_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..e457104e80ff37eb0fadab384fbb4d74a039dc83
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc1._extra_state/shard_1_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:fb0399cdab9deadf27e09a92b0fffb8a3c3ba32d2d100bbeaf41c8056257c338
+size 1832
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1._extra_state/shard_2_4.pt b/model_weights/module.decoder.layers.mlp.linear_fc1._extra_state/shard_2_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..25e6ad98c99c125b7f81f77b5acb0be7a4c13621
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc1._extra_state/shard_2_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:4195c2ae65e03ab3843c91ed7bca9cd02ca971de54785f765564867b3ba53e07
+size 1832
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1._extra_state/shard_3_4.pt b/model_weights/module.decoder.layers.mlp.linear_fc1._extra_state/shard_3_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..dc163334ff59a1c37fefd5cc4696234a708560e2
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc1._extra_state/shard_3_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0cc0f2a0c549c38845a9e549180085c1e05916eb0fc2eef084e3411c67b1379b
+size 1832
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.bias/.zarray b/model_weights/module.decoder.layers.mlp.linear_fc1.bias/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..91bcb254821f80cbed7cc4f944e647b3de090e78
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc1.bias/.zarray
@@ -0,0 +1,16 @@
+{
+    "chunks": [
+        1,
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        4,
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.bias/0.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.bias/0.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc1.bias/0.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.bias/1.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.bias/1.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc1.bias/1.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.bias/2.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.bias/2.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc1.bias/2.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.bias/3.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.bias/3.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc1.bias/3.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/.zarray b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..91bcb254821f80cbed7cc4f944e647b3de090e78
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/.zarray
@@ -0,0 +1,16 @@
+{
+    "chunks": [
+        1,
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        4,
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/0.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/0.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/0.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/1.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/1.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/1.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/2.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/2.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/2.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/3.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/3.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_bias/3.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/.zarray b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..91bcb254821f80cbed7cc4f944e647b3de090e78
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/.zarray
@@ -0,0 +1,16 @@
+{
+    "chunks": [
+        1,
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        4,
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/0.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/0.0
new file mode 100644
index 0000000000000000000000000000000000000000..7b666430fe939d73d91a6fce26ede6086f5445c8
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/0.0
@@ -0,0 +1 @@
+�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/1.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/1.0
new file mode 100644
index 0000000000000000000000000000000000000000..7b666430fe939d73d91a6fce26ede6086f5445c8
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/1.0
@@ -0,0 +1 @@
+�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/2.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/2.0
new file mode 100644
index 0000000000000000000000000000000000000000..7b666430fe939d73d91a6fce26ede6086f5445c8
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/2.0
@@ -0,0 +1 @@
+�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/3.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/3.0
new file mode 100644
index 0000000000000000000000000000000000000000..7b666430fe939d73d91a6fce26ede6086f5445c8
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc1.layer_norm_weight/3.0
@@ -0,0 +1 @@
+�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.weight/.zarray b/model_weights/module.decoder.layers.mlp.linear_fc1.weight/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..abc247f0c8cae28b0ea693bee2c683771910b77c
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc1.weight/.zarray
@@ -0,0 +1,18 @@
+{
+    "chunks": [
+        1,
+        256,
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        4,
+        256,
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.weight/0.0.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.weight/0.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..6f0ae90c316281a083bb99a60aa9e40674b749a5
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc1.weight/0.0.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.weight/1.0.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.weight/1.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..f301abe64e92e6c2636d0d41d4487689417dff69
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc1.weight/1.0.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.weight/2.0.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.weight/2.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..c6cdb2991e0301327461154263c0a3ec8a2f7dc6
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc1.weight/2.0.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc1.weight/3.0.0 b/model_weights/module.decoder.layers.mlp.linear_fc1.weight/3.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..c03bc6d70be21681de3841d7d0f44e104c4250e0
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc1.weight/3.0.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc2._extra_state/shard_0_4.pt b/model_weights/module.decoder.layers.mlp.linear_fc2._extra_state/shard_0_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..af9c0d2f7a9154b6db4b60f739a17a04261b8254
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc2._extra_state/shard_0_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:8fe8cf89ac8228df7c20a5fbac2a50c841310072585016f109c1955934c30a0f
+size 1832
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc2._extra_state/shard_1_4.pt b/model_weights/module.decoder.layers.mlp.linear_fc2._extra_state/shard_1_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..e457104e80ff37eb0fadab384fbb4d74a039dc83
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc2._extra_state/shard_1_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:fb0399cdab9deadf27e09a92b0fffb8a3c3ba32d2d100bbeaf41c8056257c338
+size 1832
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc2._extra_state/shard_2_4.pt b/model_weights/module.decoder.layers.mlp.linear_fc2._extra_state/shard_2_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..25e6ad98c99c125b7f81f77b5acb0be7a4c13621
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc2._extra_state/shard_2_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:4195c2ae65e03ab3843c91ed7bca9cd02ca971de54785f765564867b3ba53e07
+size 1832
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc2._extra_state/shard_3_4.pt b/model_weights/module.decoder.layers.mlp.linear_fc2._extra_state/shard_3_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..dc163334ff59a1c37fefd5cc4696234a708560e2
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc2._extra_state/shard_3_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0cc0f2a0c549c38845a9e549180085c1e05916eb0fc2eef084e3411c67b1379b
+size 1832
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc2.bias/.zarray b/model_weights/module.decoder.layers.mlp.linear_fc2.bias/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..91bcb254821f80cbed7cc4f944e647b3de090e78
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc2.bias/.zarray
@@ -0,0 +1,16 @@
+{
+    "chunks": [
+        1,
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        4,
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc2.bias/0.0 b/model_weights/module.decoder.layers.mlp.linear_fc2.bias/0.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc2.bias/0.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc2.bias/1.0 b/model_weights/module.decoder.layers.mlp.linear_fc2.bias/1.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc2.bias/1.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc2.bias/2.0 b/model_weights/module.decoder.layers.mlp.linear_fc2.bias/2.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc2.bias/2.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc2.bias/3.0 b/model_weights/module.decoder.layers.mlp.linear_fc2.bias/3.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc2.bias/3.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc2.weight/.zarray b/model_weights/module.decoder.layers.mlp.linear_fc2.weight/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..abc247f0c8cae28b0ea693bee2c683771910b77c
--- /dev/null
+++ b/model_weights/module.decoder.layers.mlp.linear_fc2.weight/.zarray
@@ -0,0 +1,18 @@
+{
+    "chunks": [
+        1,
+        256,
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        4,
+        256,
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc2.weight/0.0.0 b/model_weights/module.decoder.layers.mlp.linear_fc2.weight/0.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..03270f43d5f7b6484efce64d3a6e97f79c3bf181
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc2.weight/0.0.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc2.weight/1.0.0 b/model_weights/module.decoder.layers.mlp.linear_fc2.weight/1.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..dd17d8be0e74ef0dd28804b862622741ed638dac
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc2.weight/1.0.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc2.weight/2.0.0 b/model_weights/module.decoder.layers.mlp.linear_fc2.weight/2.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..014d26de7b8e0ea0c510806d9b8df9daee8d323b
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc2.weight/2.0.0 differ
diff --git a/model_weights/module.decoder.layers.mlp.linear_fc2.weight/3.0.0 b/model_weights/module.decoder.layers.mlp.linear_fc2.weight/3.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..82d7ae5c0cef63b88f80e0d764ae178e23c15332
Binary files /dev/null and b/model_weights/module.decoder.layers.mlp.linear_fc2.weight/3.0.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_proj._extra_state/shard_0_4.pt b/model_weights/module.decoder.layers.self_attention.linear_proj._extra_state/shard_0_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..af9c0d2f7a9154b6db4b60f739a17a04261b8254
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_proj._extra_state/shard_0_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:8fe8cf89ac8228df7c20a5fbac2a50c841310072585016f109c1955934c30a0f
+size 1832
diff --git a/model_weights/module.decoder.layers.self_attention.linear_proj._extra_state/shard_1_4.pt b/model_weights/module.decoder.layers.self_attention.linear_proj._extra_state/shard_1_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..e457104e80ff37eb0fadab384fbb4d74a039dc83
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_proj._extra_state/shard_1_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:fb0399cdab9deadf27e09a92b0fffb8a3c3ba32d2d100bbeaf41c8056257c338
+size 1832
diff --git a/model_weights/module.decoder.layers.self_attention.linear_proj._extra_state/shard_2_4.pt b/model_weights/module.decoder.layers.self_attention.linear_proj._extra_state/shard_2_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..25e6ad98c99c125b7f81f77b5acb0be7a4c13621
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_proj._extra_state/shard_2_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:4195c2ae65e03ab3843c91ed7bca9cd02ca971de54785f765564867b3ba53e07
+size 1832
diff --git a/model_weights/module.decoder.layers.self_attention.linear_proj._extra_state/shard_3_4.pt b/model_weights/module.decoder.layers.self_attention.linear_proj._extra_state/shard_3_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..dc163334ff59a1c37fefd5cc4696234a708560e2
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_proj._extra_state/shard_3_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0cc0f2a0c549c38845a9e549180085c1e05916eb0fc2eef084e3411c67b1379b
+size 1832
diff --git a/model_weights/module.decoder.layers.self_attention.linear_proj.bias/.zarray b/model_weights/module.decoder.layers.self_attention.linear_proj.bias/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..91bcb254821f80cbed7cc4f944e647b3de090e78
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_proj.bias/.zarray
@@ -0,0 +1,16 @@
+{
+    "chunks": [
+        1,
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        4,
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.self_attention.linear_proj.bias/0.0 b/model_weights/module.decoder.layers.self_attention.linear_proj.bias/0.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_proj.bias/0.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_proj.bias/1.0 b/model_weights/module.decoder.layers.self_attention.linear_proj.bias/1.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_proj.bias/1.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_proj.bias/2.0 b/model_weights/module.decoder.layers.self_attention.linear_proj.bias/2.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_proj.bias/2.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_proj.bias/3.0 b/model_weights/module.decoder.layers.self_attention.linear_proj.bias/3.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_proj.bias/3.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_proj.weight/.zarray b/model_weights/module.decoder.layers.self_attention.linear_proj.weight/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..abc247f0c8cae28b0ea693bee2c683771910b77c
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_proj.weight/.zarray
@@ -0,0 +1,18 @@
+{
+    "chunks": [
+        1,
+        256,
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        4,
+        256,
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.self_attention.linear_proj.weight/0.0.0 b/model_weights/module.decoder.layers.self_attention.linear_proj.weight/0.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..7b997aa9ac7ef08f732361d5c7fe979f061969a7
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_proj.weight/0.0.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_proj.weight/1.0.0 b/model_weights/module.decoder.layers.self_attention.linear_proj.weight/1.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..a71fcf289e8812343ad11f98c18b2201b7405630
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_proj.weight/1.0.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_proj.weight/2.0.0 b/model_weights/module.decoder.layers.self_attention.linear_proj.weight/2.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..a914cd1b057d3584d8dceba76879974957ea9451
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_proj.weight/2.0.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_proj.weight/3.0.0 b/model_weights/module.decoder.layers.self_attention.linear_proj.weight/3.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..cd82c3849e02b21253d6a8c4efc226012e730f06
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_proj.weight/3.0.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv._extra_state/shard_0_4.pt b/model_weights/module.decoder.layers.self_attention.linear_qkv._extra_state/shard_0_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..af9c0d2f7a9154b6db4b60f739a17a04261b8254
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_qkv._extra_state/shard_0_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:8fe8cf89ac8228df7c20a5fbac2a50c841310072585016f109c1955934c30a0f
+size 1832
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv._extra_state/shard_1_4.pt b/model_weights/module.decoder.layers.self_attention.linear_qkv._extra_state/shard_1_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..e457104e80ff37eb0fadab384fbb4d74a039dc83
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_qkv._extra_state/shard_1_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:fb0399cdab9deadf27e09a92b0fffb8a3c3ba32d2d100bbeaf41c8056257c338
+size 1832
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv._extra_state/shard_2_4.pt b/model_weights/module.decoder.layers.self_attention.linear_qkv._extra_state/shard_2_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..25e6ad98c99c125b7f81f77b5acb0be7a4c13621
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_qkv._extra_state/shard_2_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:4195c2ae65e03ab3843c91ed7bca9cd02ca971de54785f765564867b3ba53e07
+size 1832
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv._extra_state/shard_3_4.pt b/model_weights/module.decoder.layers.self_attention.linear_qkv._extra_state/shard_3_4.pt
new file mode 100644
index 0000000000000000000000000000000000000000..dc163334ff59a1c37fefd5cc4696234a708560e2
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_qkv._extra_state/shard_3_4.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0cc0f2a0c549c38845a9e549180085c1e05916eb0fc2eef084e3411c67b1379b
+size 1832
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/.zarray b/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..638ec2daa362b0a18f4cd1b1816f9fbfc9844b5a
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/.zarray
@@ -0,0 +1,16 @@
+{
+    "chunks": [
+        1,
+        768
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        4,
+        768
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/0.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/0.0
new file mode 100644
index 0000000000000000000000000000000000000000..49840e85c85399ec320f8248ae85c78921b97b06
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/0.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/1.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/1.0
new file mode 100644
index 0000000000000000000000000000000000000000..49840e85c85399ec320f8248ae85c78921b97b06
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/1.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/2.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/2.0
new file mode 100644
index 0000000000000000000000000000000000000000..49840e85c85399ec320f8248ae85c78921b97b06
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/2.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/3.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/3.0
new file mode 100644
index 0000000000000000000000000000000000000000..49840e85c85399ec320f8248ae85c78921b97b06
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_qkv.bias/3.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/.zarray b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..91bcb254821f80cbed7cc4f944e647b3de090e78
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/.zarray
@@ -0,0 +1,16 @@
+{
+    "chunks": [
+        1,
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        4,
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/0.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/0.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/0.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/1.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/1.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/1.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/2.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/2.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/2.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/3.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/3.0
new file mode 100644
index 0000000000000000000000000000000000000000..a64a5a93fb4aef4d5f63d79cb2582731b9ac5063
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_bias/3.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/.zarray b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..91bcb254821f80cbed7cc4f944e647b3de090e78
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/.zarray
@@ -0,0 +1,16 @@
+{
+    "chunks": [
+        1,
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        4,
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/0.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/0.0
new file mode 100644
index 0000000000000000000000000000000000000000..7b666430fe939d73d91a6fce26ede6086f5445c8
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/0.0
@@ -0,0 +1 @@
+�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/1.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/1.0
new file mode 100644
index 0000000000000000000000000000000000000000..7b666430fe939d73d91a6fce26ede6086f5445c8
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/1.0
@@ -0,0 +1 @@
+�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/2.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/2.0
new file mode 100644
index 0000000000000000000000000000000000000000..7b666430fe939d73d91a6fce26ede6086f5445c8
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/2.0
@@ -0,0 +1 @@
+�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/3.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/3.0
new file mode 100644
index 0000000000000000000000000000000000000000..7b666430fe939d73d91a6fce26ede6086f5445c8
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_qkv.layer_norm_weight/3.0
@@ -0,0 +1 @@
+�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/.zarray b/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..f729fcd4f6ae6d4399d47f47333318e7a1cf59be
--- /dev/null
+++ b/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/.zarray
@@ -0,0 +1,18 @@
+{
+    "chunks": [
+        1,
+        768,
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        4,
+        768,
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/0.0.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/0.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..db14ecc5f474fd111d08200202b1030a0547b5ff
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/0.0.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/1.0.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/1.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..0b6aa64bc5e7397269bbb5cdb8340e30368b992c
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/1.0.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/2.0.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/2.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..125557e98e277e2648e75468ee9e7818f552e8e7
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/2.0.0 differ
diff --git a/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/3.0.0 b/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/3.0.0
new file mode 100644
index 0000000000000000000000000000000000000000..fbfaa0ea88d960dbfeacfdb756d6e4033b9fda40
Binary files /dev/null and b/model_weights/module.decoder.layers.self_attention.linear_qkv.weight/3.0.0 differ
diff --git a/model_weights/module.embedding.position_embeddings.weight/.zarray b/model_weights/module.embedding.position_embeddings.weight/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..4e379f18c95e2b4490e1f52ff5fc1c109868e9d6
--- /dev/null
+++ b/model_weights/module.embedding.position_embeddings.weight/.zarray
@@ -0,0 +1,16 @@
+{
+    "chunks": [
+        2048,
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        2048,
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.embedding.position_embeddings.weight/0.0 b/model_weights/module.embedding.position_embeddings.weight/0.0
new file mode 100644
index 0000000000000000000000000000000000000000..fa1cc541861f119ea9bdb85012419c4ee3106954
--- /dev/null
+++ b/model_weights/module.embedding.position_embeddings.weight/0.0
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:59f59e21de2e02b2ac7a11853225a2537dc2f8632258bd1fd8327e0f8ed64167
+size 1048576
diff --git a/model_weights/module.embedding.word_embeddings.weight/.zarray b/model_weights/module.embedding.word_embeddings.weight/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..df4c9cc4e816bbac866b7fe01c9c226469878742
--- /dev/null
+++ b/model_weights/module.embedding.word_embeddings.weight/.zarray
@@ -0,0 +1,16 @@
+{
+    "chunks": [
+        50304,
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        50304,
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.embedding.word_embeddings.weight/0.0 b/model_weights/module.embedding.word_embeddings.weight/0.0
new file mode 100644
index 0000000000000000000000000000000000000000..c4673cf9f9c3497e86102180946f0dd92570d17a
--- /dev/null
+++ b/model_weights/module.embedding.word_embeddings.weight/0.0
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:45e2b22415976e8aa8e53369f90c3e7fafa65372f6090fc2a378be1d25448041
+size 25755648
diff --git a/model_weights/module.output_layer.weight/.zarray b/model_weights/module.output_layer.weight/.zarray
new file mode 100644
index 0000000000000000000000000000000000000000..df4c9cc4e816bbac866b7fe01c9c226469878742
--- /dev/null
+++ b/model_weights/module.output_layer.weight/.zarray
@@ -0,0 +1,16 @@
+{
+    "chunks": [
+        50304,
+        256
+    ],
+    "compressor": null,
+    "dtype": "bfloat16",
+    "fill_value": null,
+    "filters": null,
+    "order": "C",
+    "shape": [
+        50304,
+        256
+    ],
+    "zarr_format": 2
+}
\ No newline at end of file
diff --git a/model_weights/module.output_layer.weight/0.0 b/model_weights/module.output_layer.weight/0.0
new file mode 100644
index 0000000000000000000000000000000000000000..4ce00cce426f442cc9e2c32b23fd9671fe7d619a
--- /dev/null
+++ b/model_weights/module.output_layer.weight/0.0
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:e1b8de6c6ce8539eb1de25677b0ea513c0a4314c74be1c9195c26e3863a4ede8
+size 25755648