We are focused on the practical knowledge and skill building to perform many analyses in genomics or that use genomic data. This will sometimes be statistical theory or computing adjacent, but we will often accept common practice as the best practice, but this does not mean that what is popular should not be challenged - only that we are limited on time. Coding examples will be provided throughout the course where appropriate, but keep in mind there are often many solutions and you should leverage languages and coding styles that are of most benefit to you.

Learning Goals

  • Understand strategies and challenges for assembling chromosome-level genomes
  • Establish best-practices for the analysis of population genomic data
  • Develop strategies for associating molecular variation with traits in non-model systems
  • Scalability of evolutionary models with a lot of data
  • Gain intuition on how genome features can be treated as trait data
  • Develop high-level understanding of statistical theory, algorithms, and computing used in common genomic applications

Learning Outcomes

  • Assemble a genome using a combination of Illumina, PacBio, and Hi-C data with MaSuRCA and FALCON-Phase
  • Annotate a genome with MAKER
  • Evaluate variant quality from population resequencing data with GATK
  • Compute basic population structure and molecular variation statistics across a genome with BEDTools, VCFtools, and PopGenome
  • Characterize changes in architecture between genomes with MCScanX
  • Estimate divergence times and rates of evolution with genome-scale alignments with MCMCTREE
  • Model duplication and loss over a tree with CAFÉ
  • Write basic script to facilitate genomic analyses on a computing cluster
  • Advance with investigation of your own data by applying general skills from modules