|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- 6cf/liveideabench |
|
language: |
|
- en |
|
base_model: |
|
- Qwen/QwQ-32B-Preview |
|
tags: |
|
- chemistry |
|
- biology |
|
- climate |
|
- medical |
|
--- |
|
|
|
|
|
|
|
|
|
|
|
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6205fefd3f1dc8a642d70b10/JEZgA_xV6oF8AIsya9dop.jpeg) |
|
|
|
|
|
# IdeaWhiz Model Card π§ |
|
|
|
## Model Summary π¬ |
|
IdeaWhiz is a fine-tuned version of QwQ-32B-Preview, specifically optimized for scientific creativity and step-by-step reasoning. The model leverages the LiveIdeaBench dataset to enhance its capabilities in generating novel scientific ideas and hypotheses. |
|
|
|
## Key Features π |
|
- Base Model: QwQ-32B-Preview π |
|
- Training Dataset: LiveIdeaBench π |
|
- Main Focus: Scientific creativity and idea generation π‘ |
|
- Reasoning Style: o1-style step-by-step reasoning β‘ |
|
|
|
## Intended Use π― |
|
- Scientific hypothesis generation π§ͺ |
|
- Creative problem-solving in research π |
|
- Step-by-step scientific reasoning π |
|
- Research direction brainstorming π± |
|
|
|
|
|
## Model Performance Compared to QwQ-32B-Preview |
|
|
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6205fefd3f1dc8a642d70b10/a1PnP5YH_4b5SrH7JdGBf.png) |
|
|
|
|
|
## Quickstart π |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "6cf/QwQ-32B-Preview-IdeaWhiz-v1" |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype="auto", |
|
device_map="auto" |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
|
|
|
|
prompt = """I'll be submitting your next responses to a "Good Scientific Idea" expert review panel. If they consider your idea to be a good one, you'll receive a reward. Your assigned keyword is: "cancer". You may provide background information. The idea MUST be within 100 words (including background information). (Note: good scientific ideas should be novel, verifiable, practically valuable, and able to advance the field.). NOTE: You MUST give your answer after **Final Idea:** |
|
""" |
|
|
|
|
|
messages = [ |
|
{"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."}, |
|
{"role": "user", "content": prompt} |
|
] |
|
text = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
|
|
|
generated_ids = model.generate( |
|
**model_inputs, |
|
max_new_tokens=4096 |
|
) |
|
generated_ids = [ |
|
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
|
] |
|
|
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
``` |
|
|
|
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
# Example Outputs |
|
|
|
``` |
|
Alright, so I have this task to come up with a good scientific idea related to cancer, |
|
and it has to be novel, verifiable, practically valuable, and able to advance the field. |
|
Plus, it needs to be within 100 words, including any background information. |
|
That's a pretty tight constraint, but I think I can manage. First, ... |
|
|
|
... |
|
|
|
**Final Idea:** propose a novel approach to cancer treatment by developing personalized |
|
cancer vaccines tailored to each patient's unique tumor mutations. By integrating machine |
|
learning algorithms with comprehensive omics data, including genomics, epigenetics, |
|
transcriptomics, and proteomics, this method aims to identify and prioritize the most |
|
immunogenic tumor-specific antigens. This integrated strategy could enhance vaccine |
|
efficacy and represents an advancement in precision medicine for cancer treatment. |
|
``` |
|
|
|
|
|
|
|
# Traning Dataset |
|
|
|
## π€π‘ LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context |
|
|
|
|
|
### Dataset |
|
|
|
[![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-yellow)](https://huggingface.co/datasets/6cf/liveideabench) |
|
|
|
### Paper |
|
|
|
[![arXiv](https://img.shields.io/badge/arXiv-2412.17596-b31b1b.svg)](https://arxiv.org/abs/2412.17596) |
|
|
|
|
|
|
|
If you use this model, please cite: |
|
|
|
``` |
|
@article{ruan2024liveideabench, |
|
title={LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context}, |
|
author={Ruan, Kai and Wang, Xuan and Hong, Jixiang and Sun, Hao}, |
|
journal={arXiv preprint arXiv:2412.17596}, |
|
year={2024} |
|
} |
|
``` |