Skip to main content

Protein structure prediction, search, and analysis with AI

    • Thursday, November 21, 2024, 1:30-4:30pm CT
      • Lead: ARS scientists Hye-Seon Kim and Carson Andorf
      • Prerequisites:
        • Familiarity with basic command-line concepts. We will offer virtual training for these skills before the Forum begins.

In this workshop, participants will learn how to use cutting-edge, AI-based tools for analyzing protein structure and function.

The workshop will start by exploring 3D protein structure prediction using AlphaFold for alignment-based structure prediction and ESMFold for single-sequence structure prediction. Participants will then learn how to use FoldSeek for structure-based protein similarity search. The last part of the workshop will bring all of these concepts together by using PanEffect to explore how genetic variations in protein sequence can influence an organism’s phenotype.

Tutorial Setup Instructions

Steps to prepare for the tutorial session:

  1. Login to Atlas Open OnDemand at https://atlas-ood.hpc.msstate.edu/. For more information on login procedures for web-based SCINet access, see the SCINet access user guide.

  2. Open a command-line session by clicking on “Clusters” -> “Atlas Shell Access” on the top menu. This will open a new tab with a command-line session on Atlas’ login node.

  3. Request resources on a compute node by running the following command:

     salloc --reservation=forum-gpu -A scinet_workshop1 -p gpu-a100-mig7 -n1 --gres=gpu:1 -A scinet_workshop1 -t 3:00:00
    

    salloc: Granted job allocation <job-id>
    salloc: Nodes atlas-0245 are ready for job

     srun --jobid=<job-id> --pty bash
    
  4. Create a workshop working directory and copy the workshop materials into it by running the following commands. Note: you do not have to edit the commands with your username as it will be determined by the $USER variable.

     mkdir -p /90daydata/shared/$USER/ 
     cd /90daydata/shared/$USER/ 
     cp -r /project/ai_forum/protein_structure . 
    
  5. Stop the interactive job on the compute node by running the command exit.

Schedule

Materials Start Est. minutes Topic Presenter
Introduction 1:30 PM 10 minutes Introduction Hye-Seon Kim & Carson Andorf
Protein Structure Prediction 1:40 PM 30 minutes AlphaFold 2 & 3 Hye-Seon Kim
AlphaFold online Hye-Seon Kim
2:10 PM 30 minutes ESMFold Carson Andorf
ESMFold online Carson Andorf
2:40 PM 20 minutes OmegaFold Stephen Harding
Protein Structure Search 3:00 PM 30 minutes FoldSeek Olivia Haley
FoldSeek Online Stephen Harding
Missense Variant Effect Predictions 3:30 PM 30 minutes ESM-variant Carson Andorf
PanEffect (Fusarium) Hye-Seon Kim
PanEffect (Maize) Carson Andorf
Protein Binder Predictions 4:00 PM 30 minutes RFdiffusion Olivia Haley
RFdiffusion online Olivia Haley

Additional Resources: