Notes on SARS-CoV-2 and phylodynamics: Software, Papers, links, numbers, questions

Started 2020-03-24.

Software for phylodynamics: Inference

BEAST2 is a a cross-platform program for Bayesian phylogenetic analysis of molecular sequences. Many people (including me) have written packages which deal with particular types of analysis. Some (but not including mine) focus on infectious diseases. I got the BEAST2 ones from here.

BEAST2 packages

BADTRIP. Infer transmission time for non-haplotype data and epi data

SCOTTI. Structured COalescent Transmission Tree Inference

bdmm. Multitype birth-death model (aka birth-death-migration model)

BDSKY. birth death skyline - handles serially sampled tips, piecewise constant rate changes through time and sampled ancestors.

EpiInf. BD/SIR/SIS epidemic trajectory inference.

PhyDyn. Epidemiological modelling with BEAST

phylodynamics. BDSIR and Stochastic Coalescent

BASTA. Bayesian structured coalescent approximation

CoalRe. Infer viral reassortment networks

BEAST v1 models

There is also beastlier which is implemented in BEAST v1. Github link.

Others

TransPhylo R package

PHYLOSCANNER R package.. Inferring Transmission from Within- and Between-Host Pathogen Genetic Diversity. Molecular Biology And Evolution

Phybreak R package. Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks

TreeFix-TP. TreeFix-TP: Phylogenetic Error-Correction for Infectious Disease Transmission Network Inference

SLAFEEL. Inferring epidemiological links from deep sequencing data: a statistical learning approach for human, animal and plant diseases

No doubt there is other software...

Software for phylodynamics: Simulations

FAVITES. Simultaneous simulation of transmission networks, phylogenetic trees and sequences

Software for phylodynamics: Visualisation and analysis

Previously mentioned R package PHYLOSCANNER maybe useful.

About the virus

Mutation rate

Based on figures for the first SARS and Andrew Rambaut's preliminary look and 30000 sites in the genome, I reckon roughly somewhat less than one mutation per genome per week, or roughly one within each host. (0.0015 per site per year is 30000*.0015/50 = 0.9 per genome per week; 0.0015 is on the high side of the estimates.)

Direct RNA sequencing and early evolution of SARS-CoV-2, George Taiaroa et al, puts it at 0.0012 substitutions/site/year (95% HPD 0.00063 to 0.0017)

Phylodynamic analyses based on 128 sequences, Tanja Stadler et al, has two estimates, for two different models, both near 0.0007.

Replication fidelity

Coronaviruses: An RNA proofreading machine regulates replication fidelity and diversity has some useful information about murine coronavirus, but I haven't got an absolute value.

Coronaviruses as DNA Wannabes: A New Model for the Regulation of RNA Virus Replication Fidelity

Thinking Outside the Triangle:Replication Fidelity of the Largest RNA Viruses. If I read Fig 2 correctly, the number of errors per nucleotide per replication cycle is between 1e-6 and 1e-7 for coronaviruses.

Generation time

How long does it take from a virus particle entering a cell to its progeny entering other cells? I don't know.

Recombination rate

This is probably high. This article has the following abstract.

Mouse hepatitis virus (MHV), a coronavirus, has been shown to undergo a high frequency of RNA recombination both in tissue culture and in animal infection. So far, RNA recombination has been demonstrated only between genomic RNAs of two coinfecting viruses. To understand the mechanism of RNA recombination and to further explore the potential of RNA recombination, we studied whether recombination could occur between a replicating MHV RNA and transfected RNA fragments. We first used RNA fragments which represented the 5' end of genomic-sense sequences of MHV RNA for transfection. By using polymerase chain reaction amplification with two specific primers, we were able to detect recombinant RNAs which incorporated the transfected fragment into the 5' end of the viral RNA in the infected cells. Surprisingly, even the anti-genomic-sense RNA fragments complementary to the 5' end of MHV genomic RNA could also recombine with the MHV genomic RNAs. This observation suggests that RNA recombination can occur during both positive- and negative-strand RNA synthesis. Furthermore, the recombinant RNAs could be detected in the virion released from the infected cells even after several passages of virus in tissue culture cells, indicating that these recombinant RNAs represented functional virion RNAs. The crossover sites of these recombinants were detected throughout the transfected RNA fragments. However, when an RNA fragment with a nine-nucleotide (CUUUAUAAA) deletion immediately downstream of a pentanucleotide (UCUAA) repeat sequence in the leader RNA was transfected into MHV-infected cells, most of the recombinants between this RNA and the MHV genome contained crossover sites near this pentanucleotide repeat sequence. In contrast, when exogenous RNAs with the intact nine-nucleotide sequence were used in similar experiments, the crossover sites of recombinants in viral genomic RNA could be detected at more-downstream sites. This study demonstrated that recombination can occur between replicating MHV RNAs and RNA fragments which do not replicate, suggesting the potential of RNA recombination for genetic engineering.

General links

There is an excellent podcast called TWIV (this week in virology). They are long conversations or interviews, some require quite a bit of background, some are answering emails from the general public. There's many hours about COVID-19.

Half-hour talk on the mathematics of the Corona outbreak by Tom Britton. Only needs school maths.

If you are a programmer and want to help, these may interest you.
Biohackathons
Kaggle

More generally, there is
Crowdfight COVID-19

You can contribute your computer's time here, to help understand the virus proteins.
Folding at home