EnvGPT: Leveraging a Large Language Model for Environmental Science
EnvGPT is the first domain-specific large language model tailored for environmental science tasks.
Environmental science presents unique challenges for LLMs due to its interdisciplinary nature. EnvGPT was developed to address these challenges by leveraging a domain-specific environmental science instruction dataset and benchmark.
The model was fine-tuned on this environmental science-specific instruction dataset, ChatEnv, through Supervised Fine-Tuning (SFT). The dataset contains a total token count of 107,197,329, highlighting its depth and comprehensiveness for environmental science tasks.
🚀 Getting Started
Download the model
Download the model: EnvGPT
git lfs install
git clone https://huggingface.co/SustcZhangYX/EnvGPT
Model Usage
Here is a Python code snippet that demonstrates how to load the tokenizer and model and generate text using EnvGPT.
import transformers
import torch
# Set the path to your local model
model_path = "YOUR_LOCAL_MODEL_PATH"
pipeline = transformers.pipeline(
"text-generation",
model=model_path, # Use local model path
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are an expert assistant in environmental science, EnvGPT.You are a helpful assistant."},
{"role": "user", "content": "What is the definition of environmental science?"},
]
# Pass top_p and temperature directly in the pipeline call
outputs = pipeline(
messages,
max_new_tokens=4096,
top_p=0.7, # Add nucleus sampling
temperature=0.9, # Add temperature control
)
print(outputs[0]["generated_text"])
This code demonstrates how to load the tokenizer and model from your local path, define environmental science-specific prompts, and generate responses using sampling techniques like top-p and temperature.
🌏 Acknowledgement
EnvGPT is fine-tuned based on the open-sourced LLaMA. We thank Meta AI for their contributions to the community.
❗Disclaimer
This project is intended solely for academic research and exploration. Please note that, like all large language models, this model may exhibit limitations, including potential inaccuracies or hallucinations in generated outputs.
Limitations
- The model may produce hallucinated outputs or inaccuracies, which are inherent to large language models.
- The model's identity has not been specifically optimized and may generate content that resembles outputs from other LLaMA-based models or similar architectures.
- Generated outputs can vary between attempts due to sensitivity to prompt phrasing and token context.
🚩Citation
If you use EnvGPT in your research or applications, please cite this work as follows:
[Placeholder for Citation]
Please refer to the forthcoming publication for details about EnvGPT.
This section will be updated with the citation once the paper is officially published.
- Downloads last month
- 8