OpenMOSE's picture
Update README.md
41d5365 verified
|
raw
history blame
2.04 kB
metadata
license: apache-2.0

RWKV-x070-2B9-CJE-Instruct Model Card

Model Overview

  • Model Name: RWKV-x070-2B9-CJE-Instruct
  • Description: An instruction-tuned model specialized for Japanese, Chinese, and English languages
  • Base Model: rwkv-x070-2b9-world-v3-40%trained-20250113-ctx4k.pth
  • Architecture: RWKV x070 "Goose"
  • Parameters: 2.9B
  • Model Dimension: 2560
  • Number of Layers: 32

Fine-tuning Details

Training Configuration

  • Trainer: RWKV-LM-RLHF (https://github.com/OpenMOSE/RWKV-LM-RLHF)
  • PEFT Mode: Hybrid Training combining frozen embeddings and Bone (Block Affine Transformation) + full parameter training
  • SFT Method: SmoothingLoss SFT
  • Context Window: 5120 (trained with 1024 token overlap)
  • Compute Power: AMD Instinct MI100 x 2 60hrs (100% solar energy)

Dataset Specifications

  • Size: 800k pairs
  • Content:
    • Mixed data in Japanese, Chinese, and English
    • Conversations
    • Programming code
    • Translation tasks
    • Chain-of-Thought reasoning tasks

How to use

curl http://127.0.0.1:9000/loadmodel -X POST -H "Content-Type: application/json" -d '{"model_filename":"models/rwkv-x070-2b9-cje-instruct-1.pth","model_viewname":"RWKV x070 2B9 CJE Instruct-1","model_strategy":"fp16","endtoken":"\\n\\n\\x17"}'

Important Note

  • Set the end token as '\n\n\x17'
User: who are you?\n\n\x17
Assistant: gooday i'm rwkv\n\n\x17

Limitations and Considerations

  • This is an experimental model; inference stability is not fully guaranteed
  • Unexpected behaviors may occur
  • Continuous improvements are being made; feedback is welcome

License

Apache License 2.0

Acknowledgments

We express our gratitude to the RWKV base model and the RWKV community for their support in developing this model.