Mixture models for detecting whole-genome duplications "> Mixture models for detecting whole-genome duplications "> Mixture models for detecting whole-genome duplications " />
Evolutionary Biologist
I have implemented some mixture models in R that might be useful for detecting ancient whole-genome duplications from genomic or transcriptomic data
I never made this into a proper R package on CRAN because I do not think the code is that novel - it is mostly repackaging pre-existing algorithms to implement some slightly different models. The code has no dependencies, so it is easy enough to run with a source call. Models can be used to analyze any data for that matter, it does not have to be limited to the task of detecting whole-genome duplications.
I implemented a simulation-based test for the placement of ancient whole-genome duplications on phylogenies based on summary statistics from reconciled gene trees.
Models exist that might be more interesting these days, but as far as I am aware, our approach is the best option for large-scale phylogenomic studies because it is fast. This should probably be used as a data exploration method and then test a candidate set of hypotheses with some of the more rigorous models.
Collaborators and I have implemented a pipeline for phasing sequence data from polyploids.
This is still under development but is capable of giving users analysis-ready output and hopefully avoids a number of bioinformatic headaches for biologists.
It took a while for me to find a way to simulate data that allows for genealogical discordance under the MSC and introgression. BPP does this very well and here are some scripts that might be helpful for others. I found BPP to be much more intuitive than ms, but ms or fastsimcoal2 might be more appropriate to simulate under infinite sites for population genomics.