leaderboard-pr-bot's picture
Adding Evaluation Results
b4b1433 verified
|
raw
history blame
6.02 kB
metadata
language:
  - en
license: llama2
datasets:
  - sordonia/flan-10k-flat
model-index:
  - name: Llama-2-7b-chat-hf-afr-200step-flan
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 52.47
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Korabbit/Llama-2-7b-chat-hf-afr-200step-flan
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 78.02
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Korabbit/Llama-2-7b-chat-hf-afr-200step-flan
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 48.42
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Korabbit/Llama-2-7b-chat-hf-afr-200step-flan
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 45.47
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Korabbit/Llama-2-7b-chat-hf-afr-200step-flan
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 72.69
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Korabbit/Llama-2-7b-chat-hf-afr-200step-flan
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 18.65
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Korabbit/Llama-2-7b-chat-hf-afr-200step-flan
          name: Open LLM Leaderboard

Hi there! this is test my approach "AFR training" The base model used to "Llama-2-7b-chat"

System Prompt: You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. \n If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. Instruction: Implement binary search in python. Response: Sure! Here is an example of how to implement a binary search algorithm in Python:

def binary_search(arr, target):
    # Initialize the index of the target element
    low = 0
    high = len(arr) - 1

    # Loop until the target element is found or the end of the array is reached
    while low <= high:
        # Calculate the middle index of the array
        mid = (low + high) // 2

        # Check if the target element is in the middle index
        if arr[mid] == target:
            return mid

        # If the target element is not in the middle index, check if it is less than the middle index
        if arr[mid] > target:
            low = mid + 1
        else:
            high = mid - 1

    # If the target element is not found in the array, return -1
    return -1

# Test the binary search function
arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
target = 5
result = binary_search(arr, target)
print(f"The target element {target} is at index {result}")

This code defines a binary_search function that takes an array and a target element as input, and returns the index of the target element in the array if it is found, or -1 if it is not found.

The function works by iteratively dividing the range of the array into two sub-ranges, one containing elements less than the target, and one containing elements greater than the target. It then checks the middle index of each sub-range to see if the target element is present. If the target element is found, the function returns the index of the middle index. If the target element is not found, the function returns -1.

The code also includes a test case to demonstrate how to use the binary_search function. In this case, the target element is 5, and the function returns the index of the element 5 in the array.

I hope this helps! Let me know if you have any questions.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 52.62
AI2 Reasoning Challenge (25-Shot) 52.47
HellaSwag (10-Shot) 78.02
MMLU (5-Shot) 48.42
TruthfulQA (0-shot) 45.47
Winogrande (5-shot) 72.69
GSM8k (5-shot) 18.65