How to finetune models

#56
by yangguofeng - opened

hello,
I currently have access to approximately 30 million protein sequences and am fine-tuning ProtGPT2 using a LoRA-based approach due to limited computational resources. Given that ProtGPT2 contains 36 transformer blocks, I am wondering whether it is more effective to apply LoRA adapters to all blocks, or to restrict them to a subset of blocks (such as the higher or middle-to-high layers) in order to balance performance and efficiency. I would appreciate any guidance on which blocks tend to be most important for adaptation in this setting while preserving the pretrained protein-level priors.

Thanks for reaching out, and apologies for the massive delay!
I haven’t personally fine-tuned ProtGPT2 with LoRA (I’ve only done full models), but from what's out there in the literature, I'd rather do the upper part of the network rather than every block.

Hope that helps!

Sign up or log in to comment