wenkai
/

FAPM

Model card Files Files and versions Community

FAPM / README.md

wenkai's picture

Update README.md

0223bf3 verified 8 months ago

|

2.8 kB

	## Introduction
	<p align="center">
	<br>
	<img src="assets/FAPM.png"/>
	<br>
	<p>

	Huggingface repo: https://huggingface.co/wenkai/FAPM/

	## Installation

	1. (Optional) Creating conda environment

	```bash
	conda create -n lavis python=3.8
	conda activate lavis
	```

	2. for development, you may build from source

	```bash
	git clone https://github.com/xiangwenkai/FAPM.git
	cd FAPM
	pip install -e .

	pip install Biopython
	pip install fair-esm
	```

	### Datasets
	#### 1.raw dataset
	Raw data are avaliable at https://ftp.uniprot.org/pub/databases/uniprot/previous_releases/release-2023_04/knowledgebase/, this file is very large and need to be processed to get its name, sequence, GO label, function description and prompt.
	The domain level protein dataset we used are avaliable at https://ftp.ebi.ac.uk/pub/databases/interpro/releases/95.0/protein2ipr.dat.gz
	In this respository, We provide the experimental train/val/test sets of Swiss-Prot, which are avaliable at data/swissprot_exp
	#### 2.ESM2 embeddings
	Source code for ESM2 embeddings generation: https://github.com/facebookresearch/esm
	The generation command:
	```bash
	conda activate FAPM
	python esm_scripts/extract.py esm2_t36_3B_UR50D you_path/protein.fasta you_path_to_save_embedding_files --repr_layers 36 --truncation_seq_length 1024 --include per_tok
	```
	Example:
	```
	conda activate FAPM
	python esm_scripts/extract.py esm2_t36_3B_UR50D data/fasta/example.fasta data/emb_esm2_3b --repr_layers 36 --truncation_seq_length 1024 --include per_tok
	```
	The default path to save embedding files is data/emb_esm2_3b
	You can refer to data/fasta/prepare_custom_fasta.py to prepare your custom fasta data.


	## Pretraining language models
	Source: https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B

	## Training
	data config: lavis/configs/datasets/protein/GO_defaults_cap.yaml
	stage1 config: lavis/projects/blip2/train/protein_pretrain_stage1.yaml
	stage1 training command: run_scripts/blip2/train/protein_pretrain_domain_stage1.sh
	stage2 config: lavis/projects/blip2/train/protein_pretrain_stage2.yaml
	stage2 training/finetuning command: run_scripts/blip2/train/protein_pretrain_domain_stage2.sh

	## Trained models
	The models are avaliable at https://huggingface.co/wenkai/FAPM/tree/main/model
	You can also download our trained models from google drive: https://drive.google.com/drive/folders/1aA0eSYxNw3DvrU5GU1Cu-4q2kIxxAGSE?usp=drive_link

	## Testing
	config: lavis/projects/blip2/eval/caption_protein_eval.yaml
	command: run_scripts/blip2/eval/eval_cap_protein.sh

	## Inference example
	```
	python FAPM_inference.py \
	--model_path model/checkpoint_mf2.pth \
	--example_path data/emb_esm2_3b/P18281.pt \
	--device cuda \
	--prompt Acanthamoeba \
	--prop True
	```