yagilb's picture
Update README.md
72755c0 verified
|
raw
history blame
6.41 kB
metadata
license: apache-2.0
datasets:
  - berkeley-nest/Nectar
language:
  - en
library_name: transformers
tags:
  - reward model
  - RLHF
  - RLAIF
quantized_by: bartowski
pipeline_tag: text-generation
lm_studio:
  param_count: 7b
  use_case: general
  release_date: 19-03-2024
  model_creator: Nexusflow
  prompt_template: OpenChat
  system_prompt: none
  base_model: mistral
  original_repo: Nexusflow/Starling-LM-7B-beta

๐Ÿ’ซ Community Model> Starling-LM-7B-beta by Nexusflow

๐Ÿ‘พ LM Studio Community models highlights program. Highlighting new & noteworthy models by the community. Join the conversation on Discord.

Model creator: Nexusflow
Original model: Starling-LM-7B-beta
GGUF quantization: provided by bartowski based on llama.cpp release b2440

Model Summary:

Starling-LM-7B-beta is an iteration on the Starling series of models, building on the work from berkeley-nest who released Starling-LM-7B-alpha.
This model is finetuned from openchat/openchat-3.5-0106 based on Mistral-7B-v0.1.
Using Nexusflow/Starling-RM-34B as the reward model and a policy optimization method of Fine-Tuning Language Models from Human Preferences (PPO) to yield impressive results for such a small model.

Prompt Template:

Choose the 'OpenChat' preset in your LM Studio.

Under the hood, the model will see a prompt that's formatted like so:

GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant:

You can also choose the 'OpenChatCode' for a coding assistant.

Under the hood, the model will see a prompt that's formatted like so:

Code User: {prompt}<|end_of_turn|>Code Assistant:

Note that this model does not support a System prompt.

Use case and examples

Starling-LM-7B-beta is tuned to achieve impressive multi-turn chat performance. The base model was designed for being a generalist and coding tuned model, so it will work well for general tasks as well as coding.

General knowledge:

GPT4 Correct User: Why is the sky blue?<|end_of_turn|> GPT4 Correct Assistant: The sky appears blue due to a phenomenon called Rayleigh scattering, named after British scientist Lord Rayleigh who first studied it in detail. Here's a simplified explanation of why this occurs:

  1. Sunlight is composed of various colors of light, each with different wavelengths. These colors include red, orange, yellow, green, blue, indigo, and violet, which together form the visible spectrum.

  2. When sunlight enters Earth's atmosphere, it interacts with molecules of air (mainly nitrogen and oxygen) and small particles like dust and water droplets. These interactions cause the different colors of light to scatter in various directions.

  3. Rayleigh scattering is more effective at shorter wavelengths, meaning that blue and violet light are scattered more than red and yellow light. However, our eyes are more sensitive to blue light than violet, so we perceive the sky as blue rather than a mixture of blue and violet.

  4. At sunrise and sunset, the sun is lower in the sky, and its light has to travel through more atmosphere. This causes even more scattering of shorter wavelengths (blue and violet), leaving longer wavelengths (red, orange, and yellow) to dominate, resulting in the warm colors we see during these times.

  5. In some cases, such as after a rainstorm or in areas with high pollution levels, the sky may appear more grayish or hazy. This is because the scattering of light is less selective, and more of the shorter wavelengths (blue) are scattered out of our line of sight, along with the longer wavelengths (red).

In summary, the sky appears blue due to the selective scattering of sunlight by the atmosphere, with shorter wavelengths (blue) being scattered more than longer wavelengths (red), and our eyes being more sensitive to blue light.

Coding:

Code User: Implement quicksort using C++<|end_of_turn|> Code Assistant: Here is an example of how to implement the quicksort algorithm in C++:

#include <iostream>
#include <vector>

using namespace std;

void swap(int* a, int* b) {
    int t = *a;
    *a = *b;
    *b = t;
}

int partition(vector<int>& arr, int low, int high) {
    int pivot = arr[high];
    int i = (low - 1);
    
    for (int j = low; j <= high - 1; j++) {
        if (arr[j] < pivot) {
            i++;
            swap(&arr[i], &arr[j]);
        }
    }
    swap(&arr[i + 1], &arr[high]);
    return (i + 1);
}

void quickSort(vector<int>& arr, int low, int high) {
    if (low < high) {
        int pi = partition(arr, low, high);
        
        quickSort(arr, low, pi - 1);
        quickSort(arr, pi + 1, high);
    }
}

int main() {
    vector<int> arr = {10, 7, 8, 9, 1, 5};
    int n = arr.size();
    
    quickSort(arr, 0, n - 1);
    
    cout << "Sorted array: \n";
    for (int i = 0; i < n; i++) {
        cout << arr[i] << " ";
    }
    cout << endl;
    
    return 0;
}

This code defines a quickSort function that takes a vector of integers, a starting index (low), and an ending index (high) as parameters. The function uses the partition method to find the pivot element's correct position in the sorted array and then recursively sorts the sub-arrays before and after the pivot.

The main function demonstrates how to use the quickSort function by sorting an example vector of integers and printing the sorted result.

Technical Details

Uses Nexusflow/Starling-RM-34B as a reward model for PPO.

Nexusflow/Starling-RM-34B was trained on the following dataset:

Special thanks

๐Ÿ™ Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.

Disclaimers

TBD