Show HN: ESM C, setting a new state of the art for protein language models
evolutionaryscale.aiHey HN! We're releasing ESM C, a series of protein language models that set a new state of the art. By training on metagenomics and fine-tuning on UniRef, we're able to match ESM2 650M with ESM C 300M, and ESM C 6B has incredible unsupervised contact precision performance. These models are meant to be used as base models and representation learning models for protein sequences, and we'll release a preprint describing exact details soon.
Love to see this on HN, very interesting research. I'm looking forward to the paper release and I appreciate the way that you are licensing these models. ESM 2-650M is still a solid baseline but seeing ESM C 6B outperforming it by these kinds of strides looks encouraging for the future possibilities of protein language models. Would be very interested to find out how well it performs on other benchmarks (ie ProteinGym zero-shot).