metadata

license:
  - apache-2.0
  - cc-by-nc-4.0
datasets: pszemraj/fleece2instructions-codealpaca
tags:
  - generated_from_trainer
  - instruct
  - instructions
  - code
metrics:
  - rouge
language:
  - en
widget:
  - text: >
      import torch

      from transformers import AutoTokenizer, AutoModelForSequenceClassification


      checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"

      tokenizer = AutoTokenizer.from_pretrained(checkpoint)

      model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

      sequences = ["I've been waiting for a HuggingFace course my whole life.",
      "So have I!"]


      tokens = tokenizer(sequences, padding=True, truncation=True,
      return_tensors="pt")

      output = model(**tokens)
    example_title: Example One
  - text: >
      import torch

      from tqdm.auto import tqdm


      device = torch.device("cuda") if torch.cuda.is_available() else
      torch.device("cpu")

      model.to(device)


      progress_bar = tqdm(range(num_training_steps))


      model.train()

      for epoch in range(num_epochs):
          for batch in train_dataloader:
              batch = {k: v.to(device) for k, v in batch.items()}
              outputs = model(**batch)
              loss = outputs.loss
              loss.backward()

              optimizer.step()
              lr_scheduler.step()
              optimizer.zero_grad()
              progress_bar.update(1)
    example_title: Example Two
  - text: |
      import evaluate

      metric = evaluate.load("glue", "mrpc")
      model.eval()
      for batch in eval_dataloader:
          batch = {k: v.to(device) for k, v in batch.items()}
          with torch.no_grad():
              outputs = model(**batch)

          logits = outputs.logits
          predictions = torch.argmax(logits, dim=-1)
          metric.add_batch(predictions=predictions, references=batch["labels"])

      metric.compute()
    example_title: Example Three
  - text: |
      git lfs install
      huggingface-cli lfs-enable-largefiles .
      git lfs track "*.bin"
      git add .
      git commit -a -m "add fp32 chkpt"
      git push
    example_title: Example Four
  - text: |
      export interface DocumentParams {
        pageContent: string;

        // eslint-disable-next-line @typescript-eslint/no-explicit-any
        metadata: Record<string, any>;
      }

      /**
       * Interface for interacting with a document.
       */
      export class Document implements DocumentParams {
        pageContent: string;

        // eslint-disable-next-line @typescript-eslint/no-explicit-any
        metadata: Record<string, any>;

        constructor(fields?: Partial<DocumentParams>) {
          this.pageContent = fields?.pageContent ?? this.pageContent;
          this.metadata = fields?.metadata ?? {};
        }
      }
    example_title: Example Five
inference:
  parameters:
    max_length: 96
    num_beams: 4

bart-base-code-instructiongen

Use this text2text model to find out what LLM instructions might be able to generate an arbitary piece of code!

This model is a fine-tuned version of facebook/bart-base on the pszemraj/fleece2instructions-codealpaca dataset. It achieves the following results on the evaluation set:

Loss: 1.0136
Rouge1: 59.9513
Rouge2: 33.9118
Rougel: 55.7815
Rougelsum: 56.9064
Gen Len: 29.7146

Intended uses & limitations

🚨 note: as the authors elected to release the original dataset under cc-by-nc, the license carries over to this model and cannot be used for commercial activity.

This is just a base size model, which does a decent job for its size, but is not perfect. For better quality instructions, check out bart-large or fine tune your own larger model on the dataset :)

Intended use: Research on domain adaptation and/or other improvements to LLMs by extending instruction:text data pairs.

Training and evaluation data

Refer to the linked dataset card for pszemraj/fleece2instructions-codealpaca or the original dataset repo.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.02
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1.1165	1.0	281	1.1090	57.9239	31.9259	53.8737	54.9811	28.2924
1.0763	2.0	563	1.0267	59.9605	34.0298	55.7523	56.8021	29.6966
0.9595	2.99	843	1.0136	59.9513	33.9118	55.7815	56.9064	29.7146