ZJU-Fangyin's picture
Update README.md
ba8c960 verified
---
license: mit
---
## πŸ—žοΈ Model description
**InstructCell** is a multi-modal AI copilot that integrates natural language with single-cell RNA sequencing data, enabling researchers to perform tasks like cell type annotation, pseudo-cell generation, and drug sensitivity prediction through intuitive text commands.
By leveraging a specialized multi-modal architecture and our multi-modal single-cell instruction dataset, InstructCell reduces technical barriers and enhances accessibility for single-cell analysis.
**Instruct Version**: Supports generating only the answer portion without additional explanatory text, providing concise and task-specific outputs.
### πŸš€ How to use
We provide a simple example for quick reference. This demonstrates a basic **cell type annotation** workflow.
Make sure to specify the paths for `H5AD_PATH` and `GENE_VOCAB_PATH` appropriately:
- `H5AD_PATH`: Path to your `.h5ad` single-cell data file (e.g., `H5AD_PATH = "path/to/your/data.h5ad"`).
- `GENE_VOCAB_PATH`: Path to your gene vocabulary file (e.g., `GENE_VOCAB_PATH = "path/to/your/gene_vocab.npy"`).
```python
from mmllm.module import InstructCell
import anndata
import numpy as np
from utils import unify_gene_features
# Load the pre-trained InstructCell model from HuggingFace
model = InstructCell.from_pretrained("zjunlp/InstructCell-instruct")
# Load the single-cell data (H5AD format) and gene vocabulary file (numpy format)
adata = anndata.read_h5ad(H5AD_PATH)
gene_vocab = np.load(GENE_VOCAB_PATH)
adata = unify_gene_features(adata, gene_vocab, force_gene_symbol_uppercase=False)
# Select a random single-cell sample and extract its gene counts and metadata
k = np.random.randint(0, len(adata))
gene_counts = adata[k, :].X.toarray()
sc_metadata = adata[k, :].obs.iloc[0].to_dict()
# Define the model prompt with placeholders for metadata and gene expression profile
prompt = (
"Can you help me annotate this single cell from a {species}? "
"It was sequenced using {sequencing_method} and is derived from {tissue}. "
"The gene expression profile is {input}. Thanks!"
)
# Use the model to generate predictions
for key, value in model.predict(
prompt,
gene_counts=gene_counts,
sc_metadata=sc_metadata,
do_sample=True,
top_p=0.95,
top_k=50,
max_new_tokens=256,
).items():
# Print each key-value pair
print(f"{key}: {value}")
```
For more detailed explanations and additional examples, please refer to the Jupyter notebook [demo.ipynb](https://github.com/zjunlp/InstructCell/blob/main/demo.ipynb).