LLaMutation-Qwen2.5-14B-SFFT-v0.0

image/webp

This model is a Spectrum FFT of Qwen/Qwen2.5-14B on a code translation dataset evolved with EvolKit.

Model description

Code translation and completion model trained on Qwen2.5-14B as there is not yet a Qwen2.5-Coder-14B model. This is 100% an alpha completion model thus there will be quirks to it's useage parameters.

I will refine the model both for completion and create an instruct/chat variant.

Intended uses & limitations

Differing system prompts for code translation and use as a tab autocomplete model with continue.dev

Chat template and sampling paramaters.

Chat template is chatml.

Sampling parameters for the generation and demo at the hackathon are here:

image/png

SYSTEM PROMPT MUST BE USED FOR THIS MODEL

You are an Al assistant that is an expert at converting code from any language to another within properly formatted code blocks. DON'T SAY ANYTHING ABOUT NOT SEEING CODE. Keep non code text to the a minimum possible. DO NOT REPEAT ANY NON CODE TEXT. ONLY PRINT OUT CODE ONCE DO NOT ITTERATE!

Training procedure

Spectrum FFT/SFFT

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
0.3948 0.0237 1 0.3920
0.2392 0.4970 21 0.2500
0.2606 0.9941 42 0.2621

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.3.1+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.1

Built with Axolotl

See axolotl config

axolotl version: 0.4.1

base_model: Qwen/Qwen2.5-14B

load_in_8bit: false
load_in_4bit: false
strict: false

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true

plugins:
  - axolotl.integrations.spectrum.SpectrumPlugin

spectrum_top_fraction: 0.5
# Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
spectrum_model_name: Qwen/Qwen2.5-14B

datasets:
  - path: datasets/LLaMutation.jsonl
    type: sharegpt
  - path: datasets/LLaMutationMAX_Train.json
    type: sharegpt

chat_template: chatml
shuffle_merged_datasets: true
val_set_size: 0.1
output_dir: ./LLaMutation-Qwen2.5-14B-SFFT-v0.0

sequence_len: 8192
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true

# adapter: qlora
# lora_model_dir:
# lora_r: 32
# lora_alpha: 16
# lora_dropout: 0.05
# lora_target_linear: true
# peft_use_dora: true

wandb_project: LLaMutation-Qwen2.5-14B-SFFT-v0.0
wandb_entity:
wandb_watch:
wandb_name: Unit-00
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_torch
lr_scheduler: linear
learning_rate: 0.0005
max_grad_norm: 3

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: true
early_stopping_patience:
resume_from_checkpoint: 
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 50
evals_per_epoch: 2
saves_per_epoch: 2
save_safetensors: true
hub_model_id: 
hub_strategy: 
debug:
deepspeed: deepspeed_configs/zero3_bf16.json
weight_decay: 0.1
# fsdp:
#   - full_shard
#   - auto_wrap
# fsdp_config:
#   fsdp_limit_all_gathers: true
#   fsdp_sync_module_states: true
#   fsdp_offload_params: false  # Changed from true
#   fsdp_use_orig_params: true  # Changed from false
#   fsdp_cpu_ram_efficient_loading: true
#   fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
#   fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
#   fsdp_activation_checkpointing: true
#   fsdp_state_dict_type: SHARDED_STATE_DICT  # Changed from FULL_STATE_DICT
#   fsdp_sharding_strategy: FULL_SHARD
#   fsdp_forward_prefetch: true  # Added
#   fsdp_backward_prefetch: "BACKWARD_POST"  # Added
#   fsdp_backward_prefetch_limit: 1  # Added
#   fsdp_mixed_precision: BF16  # Added

Downloads last month
15
Safetensors
Model size
14.8B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Kearm/LLaMutation-Qwen2.5-14B-SFFT-v0.0

Base model

Qwen/Qwen2.5-14B
Finetuned
(31)
this model