What should be the length of each protein for prediction

#5
by rohitsatyam - opened

Hi!!

I am interested in using your model. Is this final version? Also, what should be length of each protein that we give as an input (like some method perform poor when the length exceeds 1500 aa). I tried running your code and it works well. But in the example, do you use proteins that do not interact because the prediction results in zero value (just confirming)?

For the record, I used the following way to create environment.

mamba create -n synteract python pytorch::pytorch conda-forge::transformers conda-forge::tokenizers pytorch::torchvision
Gleghorn Lab org

Yep, this model is its final version, we are actively working on SYNTERACT 2.0, which has a completely different methodology. I'm not sure when this will be released.

(like some method perform poor when the length exceeds 1500 aa)

This is expected, SYNTERACT is trained only on pairs of proteins which have lengths that sum to less than 1024. This was common practice when SYNTERACT was released, but there are much more efficient implementations of attention now. SYNTERACT 2.0 will have at least 2048 combined context length, if not much longer.

But in the example, do you use proteins that do not interact because the prediction results in zero value (just confirming)?

If you are asking if the example input is of a negative interaction, yes. This is a negative example from our test set.

Please let me know if you have any other questions.

Sign up or log in to comment