[PubMed automatic search; some publications might not appear]
2024
Dispersal history of SARS-CoV-2 in Galicia, Spain
Dispersal history of SARS-CoV-2 in Galicia, Spain
The dynamics of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission are influenced by a variety of factors, including social restrictions and the emergence of distinct variants. In this study, we delve into the origins and dissemination of the Alpha, Delta, and Omicron-BA.1 variants of concern in Galicia, northwest Spain. For this, we leveraged genomic data collected by the EPICOVIGAL Consortium and from the GISAID database, along with mobility information from other Spanish regions and foreign countries. Our analysis indicates that initial introductions during the Alpha phase were predominantly from other Spanish regions and France. However, as the pandemic progressed, introductions from Portugal and the United States became increasingly significant. The number of detected introductions varied from 96 and 101 for Alpha and Delta to 39 for Omicron-BA.1. Most of these introductions left a low number of descendants (<10), suggesting a limited impact on the evolution of the pandemic in Galicia. Notably, Galicia's major coastal cities emerged as critical hubs for viral transmission, highlighting their role in sustaining and spreading the virus. This research emphasizes the critical role of regional connectivity in the spread of SARS-CoV-2 and offers essential insights for enhancing public health strategies and surveillance measures.
Dispersal history of SARS-CoV-2 variants Alpha, Delta, and Omicron (BA.1) in Spain
Dispersal history of SARS-CoV-2 variants Alpha, Delta, and Omicron (BA.1) in Spain
Different factors influence the spread of SARS-CoV-2, from the inherent transmission capabilities of the different variants to the control measurements put in place. Here we studied the introduction of the Alpha, Delta, and Omicron-BA.1 variants of concern (VOCs) into Spain. For this, we collected genomic data from the GISAID database and combined it with connectivity data from different countries with Spain to perform a phylodynamic Bayesian analysis of the introductions. Our findings reveal that the introductions of these VOCs predominantly originated from France, especially in the case of Alpha. As travel restrictions were eased during the Delta and Omicron-BA.1 waves, the number of introductions from distinct countries increased, with the United Kingdom and Germany becoming significant sources of the virus. The largest number of introductions detected corresponded to the Delta wave, which was associated with fewer restrictions and the summer period, when Spain receives a considerable number of tourists. This research underscores the importance of monitoring international travel patterns and implementing targeted public health measures to manage the spread of SARS-CoV-2.
Crykey: Rapid identification of SARS-CoV-2 cryptic mutations in wastewater
Crykey: Rapid identification of SARS-CoV-2 cryptic mutations in wastewater
Wastewater surveillance for SARS-CoV-2 provides early warnings of emerging variants of concerns and can be used to screen for novel cryptic linked-read mutations, which are co-occurring single nucleotide mutations that are rare, or entirely missing, in existing SARS-CoV-2 databases. While previous approaches have focused on specific regions of the SARS-CoV-2 genome, there is a need for computational tools capable of efficiently tracking cryptic mutations across the entire genome and investigating their potential origin. We present Crykey, a tool for rapidly identifying rare linked-read mutations across the genome of SARS-CoV-2. We evaluated the utility of Crykey on over 3,000 wastewater and over 22,000 clinical samples; our findings are three-fold: i) we identify hundreds of cryptic mutations that cover the entire SARS-CoV-2 genome, ii) we track the presence of these cryptic mutations across multiple wastewater treatment plants and over three years of sampling in Houston, and iii) we find a handful of cryptic mutations in wastewater mirror cryptic mutations in clinical samples and investigate their potential to represent real cryptic lineages. In summary, Crykey enables large-scale detection of cryptic mutations in wastewater that represent potential circulating cryptic lineages, serving as a new computational tool for wastewater surveillance of SARS-CoV-2.
Dispersal history of SARS-CoV-2 in Galicia, Spain
Dispersal history of SARS-CoV-2 in Galicia, Spain
The dynamics of SARS-CoV-2 transmission are influenced by a variety of factors, including social restrictions and the emergence of distinct variants. In this study, we delve into the origins and dissemination of the Alpha, Delta, and Omicron variants of concern in Galicia, northwest Spain. For this, we leveraged genomic data collected by the EPICOVIGAL Consortium and from the GISAID database, along with mobility information from other Spanish regions and foreign countries. Our analysis indicates that initial introductions during the Alpha phase were predominantly from other Spanish regions and France. However, as the pandemic progressed, introductions from Portugal and the USA became increasingly significant. Notably, Galicia's major coastal cities emerged as critical hubs for viral transmission, highlighting their role in sustaining and spreading the virus. This research emphasizes the critical role of regional connectivity in the spread of SARS-CoV-2 and offers essential insights for enhancing public health strategies and surveillance measures.
2023
Evolutionary and spatiotemporal analyses reveal multiple introductions and cryptic transmission of SARS-CoV-2 VOC/VOI in Malta
Evolutionary and spatiotemporal analyses reveal multiple introductions and cryptic transmission of SARS-CoV-2 VOC/VOI in Malta
Our study provides insights into the evolution of the coronavirus disease 2019 (COVID-19) pandemic in Malta, a highly connected and understudied country. We combined epidemiological and phylodynamic analyses to analyze trends in the number of new cases, deaths, tests, positivity rates, and evolutionary and dispersal patterns from August 2020 to January 2022. Our reconstructions inferred 173 independent severe acute respiratory syndrome coronavirus 2 introductions into Malta from various global regions. Our study demonstrates that characterizing epidemiological trends coupled with phylodynamic modeling can inform the implementation of public health interventions to help control COVID-19 transmission in the community.
Somatic evolution of marine transmissible leukemias in the common cockle, Cerastoderma edule
Somatic evolution of marine transmissible leukemias in the common cockle, Cerastoderma edule
Transmissible cancers are malignant cell lineages that spread clonally between individuals. Several such cancers, termed bivalve transmissible neoplasia (BTN), induce leukemia-like disease in marine bivalves. This is the case of BTN lineages affecting the common cockle, Cerastoderma edule, which inhabits the Atlantic coasts of Europe and northwest Africa. To investigate the evolution of cockle BTN, we collected 6,854 cockles, diagnosed 390 BTN tumors, generated a reference genome and assessed genomic variation across 61 tumors. Our analyses confirmed the existence of two BTN lineages with hemocytic origins. Mitochondrial variation revealed mitochondrial capture and host co-infection events. Mutational analyses identified lineage-specific signatures, one of which likely reflects DNA alkylation. Cytogenetic and copy number analyses uncovered pervasive genomic instability, with whole-genome duplication, oncogene amplification and alkylation-repair suppression as likely drivers. Satellite DNA distributions suggested ancient clonal origins. Our study illuminates long-term cancer evolution under the sea and reveals tolerance of extreme instability in neoplastic genomes.
Crykey: Rapid Identification of SARS-CoV-2 Cryptic Mutations in Wastewater
Crykey: Rapid Identification of SARS-CoV-2 Cryptic Mutations in Wastewater
We present Crykey, a computational tool for rapidly identifying cryptic mutations of SARS-CoV-2. Specifically, we identify co-occurring single nucleotide mutations on the same sequencing read, called linked-read mutations, that are rare or entirely missing in existing databases, and have the potential to represent novel cryptic lineages found in wastewater. While previous approaches exist for identifying cryptic linked-read mutations from specific regions of the SARS-CoV-2 genome, there is a need for computational tools capable of efficiently tracking cryptic mutations across the entire genome and for tens of thousands of samples and with increased scrutiny, given their potential to represent either artifacts or hidden SARS-CoV-2 lineages. Crykey fills this gap by identifying rare linked-read mutations that pass stringent computational filters to limit the potential for artifacts. We evaluate the utility of Crykey on >3,000 wastewater and >22,000 clinical samples; our findings are three-fold: i) we identify hundreds of cryptic mutations that cover the entire SARS-CoV-2 genome, ii) we track the presence of these cryptic mutations across multiple wastewater treatment plants and over a three years of sampling in Houston, and iii) we find a handful of cryptic mutations in wastewater mirror cryptic mutations in clinical samples and investigate their potential to represent real cryptic lineages. In summary, Crykey enables large-scale detection of cryptic mutations representing potential cryptic lineages in wastewater.
Single-cell phylogenies reveal changes in the evolutionary rate within cancer and healthy tissues
Single-cell phylogenies reveal changes in the evolutionary rate within cancer and healthy tissues
Cell lineages accumulate somatic mutations during organismal development, potentially leading to pathological states. The rate of somatic evolution within a cell population can vary due to multiple factors, including selection, a change in the mutation rate, or differences in the microenvironment. Here, we developed a statistical test called the Poisson Tree (PT) test to detect varying evolutionary rates among cell lineages, leveraging the phylogenetic signal of single-cell DNA sequencing (scDNA-seq) data. We applied the PT test to 24 healthy and cancer samples, rejecting a constant evolutionary rate in 11 out of 15 cancer and five out of nine healthy scDNA-seq datasets. In six cancer datasets, we identified subclonal mutations in known driver genes that could explain the rate accelerations of particular cancer lineages. Our findings demonstrate the efficacy of scDNA-seq for studying somatic evolution and suggest that cell lineages often evolve at different rates within cancer and healthy tissues.
Wastewater early warning system for SARS-CoV-2 outbreaks and variants in a Coruña, Spain
Wastewater early warning system for SARS-CoV-2 outbreaks and variants in a Coruña, Spain
Wastewater-based epidemiology has been widely used as a cost-effective method for tracking the COVID-19 pandemic at the community level. Here we describe COVIDBENS, a wastewater surveillance program running from June 2020 to March 2022 in the wastewater treatment plant of Bens in A Coruña (Spain). The main goal of this work was to provide an effective early warning tool based in wastewater epidemiology to help in decision-making at both the social and public health levels. RT-qPCR procedures and Illumina sequencing were used to weekly monitor the viral load and to detect SARS-CoV-2 mutations in wastewater, respectively. In addition, own statistical models were applied to estimate the real number of infected people and the frequency of each emerging variant circulating in the community, which considerable improved the surveillance strategy. Our analysis detected 6 viral load waves in A Coruña with concentrations between 10 and 10 SARS-CoV-2 RNA copies/L. Our system was able to anticipate community outbreaks during the pandemic with 8-36 days in advance with respect to clinical reports and, to detect the emergence of new SARS-CoV-2 variants in A Coruña such as Alpha (B.1.1.7), Delta (B.1.617.2), and Omicron (B.1.1.529 and BA.2) in wastewater with 42, 30, and 27 days, respectively, before the health system did. Data generated here helped local authorities and health managers to give a faster and more efficient response to the pandemic situation, and also allowed important industrial companies to adapt their production to each situation. The wastewater-based epidemiology program developed in our metropolitan area of A Coruña (Spain) during the SARS-CoV-2 pandemic served as a powerful early warning system combining statistical models with mutations and viral load monitoring in wastewater over time.
2022
Comparative analysis of capture methods for genomic profiling of circulating tumor cells in colorectal cancer
Comparative analysis of capture methods for genomic profiling of circulating tumor cells in colorectal cancer
The genomic profiling of circulating tumor cells (CTCs) in the bloodstream should provide clinically relevant information on therapeutic efficacy and help predict cancer survival. Here, we contrasted the genomic profiles of CTC pools recovered from metastatic colorectal cancer (mCRC) patients using different enrichment strategies (CellSearch, Parsortix, and FACS). Mutations inferred in the CTC pools differed depending on the enrichment strategy and, in all cases, represented a subset of the mutations detected in the matched primary tumor samples. However, the CTC pools from Parsortix, and in part, CellSearch, showed diversity estimates, mutational signatures, and drug-suitability scores remarkably close to those found in matching primary tumor samples. In addition, FACS CTC pools were enriched in apparent sequencing artifacts, leading to much higher genomic diversity estimates. Our results highlight the utility of CTCs to assess the genomic heterogeneity of individual tumors and help clinicians prioritize drugs in mCRC.
SIEVE: joint inference of single-nucleotide variants and cell phylogeny from single-cell DNA sequencing data
SIEVE: joint inference of single-nucleotide variants and cell phylogeny from single-cell DNA sequencing data
We present SIEVE, a statistical method for the joint inference of somatic variants and cell phylogeny under the finite-sites assumption from single-cell DNA sequencing. SIEVE leverages raw read counts for all nucleotides and corrects the acquisition bias of branch lengths. In our simulations, SIEVE outperforms other methods in phylogenetic reconstruction and variant calling accuracy, especially in the inference of homozygous variants. Applying SIEVE to three datasets, one for triple-negative breast (TNBC), and two for colorectal cancer (CRC), we find that double mutant genotypes are rare in CRC but unexpectedly frequent in the TNBC samples.
Clonality and timing of relapsing colorectal cancer metastasis revealed through whole-genome single-cell sequencing
Clonality and timing of relapsing colorectal cancer metastasis revealed through whole-genome single-cell sequencing
Recurrence of tumor cells following local and systemic therapy is a significant hurdle in cancer. Most patients with metastatic colorectal cancer (mCRC) will relapse, despite resection of the metastatic lesions. A better understanding of the evolutionary history of recurrent lesions is required to identify the spatial and temporal patterns of metastatic progression and expose the genetic and evolutionary determinants of therapeutic resistance. With this goal in mind, here we leveraged a unique single-cell whole-genome sequencing dataset from recurrent hepatic lesions of an mCRC patient. Our phylogenetic analysis confirms that the treatment induced a severe demographic bottleneck in the liver metastasis but also that a previously diverged lineage survived this surgery, possibly after migration to a different site in the liver. This lineage evolved very slowly for two years under adjuvant drug therapy and diversified again in a very short period. We identified several non-silent mutations specific to this lineage and inferred a substantial contribution of chemotherapy to the overall, genome-wide mutational burden. All in all, our study suggests that mCRC subclones can migrate locally and evade resection, keep evolving despite rounds of chemotherapy, and re-expand explosively.
Phylogenomic Analyses of 2,786 Genes in 158 Lineages Support a Root of the Eukaryotic Tree of Life between Opisthokonts and All Other Lineages
Phylogenomic Analyses of 2,786 Genes in 158 Lineages Support a Root of the Eukaryotic Tree of Life between Opisthokonts and All Other Lineages
Advances in phylogenomics and high-throughput sequencing have allowed the reconstruction of deep phylogenetic relationships in the evolution of eukaryotes. Yet, the root of the eukaryotic tree of life remains elusive. The most popular hypothesis in textbooks and reviews is a root between Unikonta (Opisthokonta + Amoebozoa) and Bikonta (all other eukaryotes), which emerged from analyses of a single-gene fusion. Subsequent, highly cited studies based on concatenation of genes supported this hypothesis with some variations or proposed a root within Excavata. However, concatenation of genes does not consider phylogenetically-informative events like gene duplications and losses. A recent study using gene tree parsimony (GTP) suggested the root lies between Opisthokonta and all other eukaryotes, but only including 59 taxa and 20 genes. Here we use GTP with a duplication-loss model in a gene-rich and taxon-rich dataset (i.e., 2,786 gene families from two sets of 155 and 158 diverse eukaryotic lineages) to assess the root, and we iterate each analysis 100 times to quantify tree space uncertainty. We also contrasted our results and discarded alternative hypotheses from the literature using GTP and the likelihood-based method SpeciesRax. Our estimates suggest a root between Fungi or Opisthokonta and all other eukaryotes; but based on further analysis of genome size, we propose that the root between Opisthokonta and all other eukaryotes is the most likely.
Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data
Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data
Single-nucleotide variants (SNVs) are the most common variations in the human genome. Recently developed methods for SNV detection from single-cell DNA sequencing data, such as SCIΦ and scVILP, leverage the evolutionary history of the cells to overcome the technical errors associated with single-cell sequencing protocols. Despite being accurate, these methods are not scalable to the extensive genomic breadth of single-cell whole-genome (scWGS) and whole-exome sequencing (scWES) data.
Single-cell mtDNA heteroplasmy in colorectal cancer
Single-cell mtDNA heteroplasmy in colorectal cancer
Human mitochondria can be genetically distinct within the same individual, a phenomenon known as heteroplasmy. In cancer, this phenomenon seems exacerbated, and most mitochondrial mutations seem to be heteroplasmic. How this genetic variation is arranged within and among normal and tumor cells is not well understood. To address this question, here we sequenced single-cell mitochondrial genomes from multiple normal and tumoral locations in four colorectal cancer patients. Our results suggest that single cells, both normal and tumoral, can carry various mitochondrial haplotypes. Remarkably, this intra-cell heteroplasmy can arise before tumor development and be maintained afterward in specific tumoral cell subpopulations. At least in the colorectal patients studied here, the somatic mutations in the single-cells do not seem to have a prominent role in tumorigenesis.
Mitochondrial genome sequencing of marine leukaemias reveals cancer contagion between clam species in the Seas of Southern Europe
Mitochondrial genome sequencing of marine leukaemias reveals cancer contagion between clam species in the Seas of Southern Europe
Clonally transmissible cancers are tumour lineages that are transmitted between individuals via the transfer of living cancer cells. In marine bivalves, leukaemia-like transmissible cancers, called hemic neoplasia (HN), have demonstrated the ability to infect individuals from different species. We performed whole-genome sequencing in eight warty venus clams that were diagnosed with HN, from two sampling points located more than 1000 nautical miles away in the Atlantic Ocean and the Mediterranean Sea Coasts of Spain. Mitochondrial genome sequencing analysis from neoplastic animals revealed the coexistence of haplotypes from two different clam species. Phylogenies estimated from mitochondrial and nuclear markers confirmed this leukaemia originated in striped venus clams and later transmitted to clams of the species warty venus, in which it survives as a contagious cancer. The analysis of mitochondrial and nuclear gene sequences supports all studied tumours belong to a single neoplastic lineage that spreads in the Seas of Southern Europe.
SARS-CoV-2 Evolution and Spike-Specific CD4+ T-Cell Response in Persistent COVID-19 with Severe HIV Immune Suppression
SARS-CoV-2 Evolution and Spike-Specific CD4+ T-Cell Response in Persistent COVID-19 with Severe HIV Immune Suppression
Intra-host evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been reported in cases with persistent coronavirus disease 2019 (COVID-19). In this study, we describe a severely immunosuppressed individual with HIV-1/SARS-CoV-2 coinfection with a long-term course of SARS-CoV-2 infection. A 28-year-old man was diagnosed with HIV-1 infection (CD4+ count: 3 cells/µL nd 563000 HIV-1 RNA copies/mL) and simultaneous pneumonia, disseminated infection and SARS-CoV-2 infection. SARS-CoV-2 real-time reverse transcription polymerase chain reaction positivity from nasopharyngeal samples was prolonged for 15 weeks. SARS-CoV-2 was identified as variant Alpha (PANGO lineage B.1.1.7) with mutation S:E484K. Spike-specific T-cell response was similar to HIV-negative controls although enriched in IL-2, and showed disproportionately increased immunological exhaustion marker levels. Despite persistent SARS-CoV-2 infection, adaptive intra-host SARS-CoV-2 evolution, was not identified. Spike-specific T-cell response protected against a severe COVID-19 outcome and the increased immunological exhaustion marker levels might have favoured SARS-CoV-2 persistence.
CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data
CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data
We introduce CellPhy, a maximum likelihood framework for inferring phylogenetic trees from somatic single-cell single-nucleotide variants. CellPhy leverages a finite-site Markov genotype model with 16 diploid states and considers amplification error and allelic dropout. We implement CellPhy into RAxML-NG, a widely used phylogenetic inference package that provides statistical confidence measurements and scales well on large datasets with hundreds or thousands of cells. Comprehensive simulations suggest that CellPhy is more robust to single-cell genomics errors and outperforms state-of-the-art methods under realistic scenarios, both in accuracy and speed. CellPhy is freely available at https://github.com/amkozlov/cellphy .
Limited genomic reconstruction of SARS-CoV-2 transmission history within local epidemiological clusters
Limited genomic reconstruction of SARS-CoV-2 transmission history within local epidemiological clusters
A detailed understanding of how and when severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission occurs is crucial for designing effective prevention measures. Other than contact tracing, genome sequencing provides information to help infer who infected whom. However, the effectiveness of the genomic approach in this context depends on both (high enough) mutation and (low enough) transmission rates. Today, the level of resolution that we can obtain when describing SARS-CoV-2 outbreaks using just genomic information alone remains unclear. In order to answer this question, we sequenced forty-nine SARS-CoV-2 patient samples from ten local clusters in NW Spain for which partial epidemiological information was available and inferred transmission history using genomic variants. Importantly, we obtained high-quality genomic data, sequencing each sample twice and using unique barcodes to exclude cross-sample contamination. Phylogenetic and cluster analyses showed that consensus genomes were generally sufficient to discriminate among independent transmission clusters. However, levels of intrahost variation were low, which prevented in most cases the unambiguous identification of direct transmission events. After filtering out recurrent variants across clusters, the genomic data were generally compatible with the epidemiological information but did not support specific transmission events over possible alternatives. We estimated the effective transmission bottleneck size to be one to two viral particles for sample pairs whose donor-recipient relationship was likely. Our analyses suggest that intrahost genomic variation in SARS-CoV-2 might be generally limited and that homoplasy and recurrent errors complicate identifying shared intrahost variants. Reliable reconstruction of direct SARS-CoV-2 transmission based solely on genomic data seems hindered by a slow mutation rate, potential convergent events, and technical artifacts. Detailed contact tracing seems essential in most cases to study SARS-CoV-2 transmission at high resolution.
Somatic variant calling from single-cell DNA sequencing data
Somatic variant calling from single-cell DNA sequencing data
Single-cell sequencing has gained popularity in recent years. Despite its numerous applications, single-cell DNA sequencing data is highly error-prone due to technical biases arising from uneven sequencing coverage, allelic dropout, and amplification error. With these artifacts, the identification of somatic genomic variants becomes a challenging task, and over the years, several methods have been developed explicitly for this type of data. Single-cell variant callers implement distinct strategies, make different use of the data, and typically result in many discordant calls when applied to real data. Here, we review current approaches for single-cell variant calling, emphasizing single nucleotide variants. We highlight their potential benefits and shortcomings to help users choose a suitable tool for their data at hand.
2021
Coalescent models derived from birth-death processes
Coalescent models derived from birth-death processes
A coalescent model of a sample of size n is derived from a birth-death process that originates at a random time in the past from a single founder individual. Over time, the descendants of the founder evolve into a population of large (infinite) size from which a sample of size n is taken. The parameters and time of the birth-death process are scaled in N, the size of the present-day population, while letting N→∞, similarly to how the standard Kingman coalescent process arises from the Wright-Fisher model. The model is named the Limit Birth-Death (LBD) coalescent model. Simulations from the LBD coalescent model with sample size n are computationally slow compared to standard coalescent models. Therefore, we suggest different approximations to the LBD coalescent model assuming the population size is a deterministic function of time rather than a stochastic process. Furthermore, we introduce a hybrid LBD coalescent model, that combines the exactness of the LBD coalescent model model with the speed of the approximations.
OmniSARS2: A Highly Sensitive and Specific RT-qPCR-Based COVID-19 Diagnostic Method Designed to Withstand SARS-CoV-2 Lineage Evolution
OmniSARS2: A Highly Sensitive and Specific RT-qPCR-Based COVID-19 Diagnostic Method Designed to Withstand SARS-CoV-2 Lineage Evolution
Extensive transmission of SARS-CoV-2 during the COVID-19 pandemic allowed the generation of thousands of mutations within its genome. While several of these become rare, others largely increase in prevalence, potentially jeopardizing the sensitivity of PCR-based diagnostics. Taking advantage of SARS-CoV-2 genomic knowledge, we designed a one-step probe-based multiplex RT-qPCR (OmniSARS2) to simultaneously detect short fragments of the SARS-CoV-2 genome in ORF1ab, E gene and S gene. Comparative genomics of the most common SARS-CoV-2 lineages, other human betacoronavirus and alphacoronavirus, was the basis for this design, targeting both highly conserved regions across SARS-CoV-2 lineages and variable or absent in other viruses. The highest analytical sensitivity of this method for SARS-CoV-2 detection was 94.2 copies/mL at 95% detection probability (~1 copy per total reaction volume) for the S gene assay, matching the most sensitive available methods. In vitro specificity tests, performed using reference strains, showed no cross-reactivity with other human coronavirus or common pathogens. The method was compared with commercially available methods and detected the virus in clinical samples encompassing different SARS-CoV-2 lineages, including B.1, B.1.1, B.1.177 or B.1.1.7 and rarer lineages. OmniSARS2 revealed a sensitive and specific viral detection method that is less likely to be affected by lineage evolution oligonucleotide-sample mismatch, of relevance to ensure the accuracy of COVID-19 molecular diagnostic methods.
Felsenstein Phylogenetic Likelihood
Felsenstein Phylogenetic Likelihood
In 1981, the Journal of Molecular Evolution (JME) published an article entitled "Evolutionary trees from DNA sequences: A maximum likelihood approach" by Joseph (Joe) Felsenstein (J Mol Evol 17:368-376, 1981). This groundbreaking work laid the foundation for the emerging field of statistical phylogenetics, providing a tractable way of finding maximum likelihood (ML) estimates of evolutionary trees from DNA sequence data. This paper is the second most cited (more than 9000 citations) in JME after Kimura's (J Mol Evol 16:111-120, 1980) seminal paper on a model of nucleotide substitution (with nearly 20,000 citations). On the occasion of the 50th anniversary of JME, we elaborate on the significance of Felsenstein's ML approach to estimating phylogenetic trees.
SARS-CoV-2 genomic diversity and the implications for qRT-PCR diagnostics and transmission
SARS-CoV-2 genomic diversity and the implications for qRT-PCR diagnostics and transmission
The COVID-19 pandemic has sparked an urgent need to uncover the underlying biology of this devastating disease. Though RNA viruses mutate more rapidly than DNA viruses, there are a relatively small number of single nucleotide polymorphisms (SNPs) that differentiate the main SARS-CoV-2 lineages that have spread throughout the world. In this study, we investigated 129 RNA-seq data sets and 6928 consensus genomes to contrast the intra-host and inter-host diversity of SARS-CoV-2. Our analyses yielded three major observations. First, the mutational profile of SARS-CoV-2 highlights intra-host single nucleotide variant (iSNV) and SNP similarity, albeit with differences in C > U changes. Second, iSNV and SNP patterns in SARS-CoV-2 are more similar to MERS-CoV than SARS-CoV-1. Third, a significant fraction of insertions and deletions contribute to the genetic diversity of SARS-CoV-2. Altogether, our findings provide insight into SARS-CoV-2 genomic diversity, inform the design of detection tests, and highlight the potential of iSNVs for tracking the transmission of SARS-CoV-2.
2020
Massive gene presence-absence variation shapes an open pan-genome in the Mediterranean mussel
Massive gene presence-absence variation shapes an open pan-genome in the Mediterranean mussel
The Mediterranean mussel Mytilus galloprovincialis is an ecologically and economically relevant edible marine bivalve, highly invasive and resilient to biotic and abiotic stressors causing recurrent massive mortalities in other bivalves. Although these traits have been recently linked with the maintenance of a high genetic variation within natural populations, the factors underlying the evolutionary success of this species remain unclear.
Malignant transformation and genetic alterations are uncoupled in early colorectal cancer progression
Malignant transformation and genetic alterations are uncoupled in early colorectal cancer progression
Colorectal cancer (CRC) development is generally accepted as a sequential process, with genetic mutations determining phenotypic tumor progression. However, matching genetic profiles with histological transition requires the analyses of temporal samples from the same patient at key stages of progression.
Hidden genomic diversity of SARS-CoV-2: implications for qRT-PCR diagnostics and transmission
Hidden genomic diversity of SARS-CoV-2: implications for qRT-PCR diagnostics and transmission
The COVID-19 pandemic has sparked an urgent need to uncover the underlying biology of this devastating disease. Though RNA viruses mutate more rapidly than DNA viruses, there are a relatively small number of single nucleotide polymorphisms (SNPs) that differentiate the main SARS-CoV-2 clades that have spread throughout the world. In this study, we investigated over 7,000 SARS-CoV-2 datasets to unveil both intrahost and interhost diversity. Our intrahost and interhost diversity analyses yielded three major observations. First, the mutational profile of SARS-CoV-2 highlights iSNV and SNP similarity, albeit with high variability in C>T changes. Second, iSNV and SNP patterns in SARS-CoV-2 are more similar to MERS-CoV than SARS-CoV-1. Third, a significant fraction of small indels fuel the genetic diversity of SARS-CoV-2. Altogether, our findings provide insight into SARS-CoV-2 genomic diversity, inform the design of detection tests, and highlight the potential of iSNVs for tracking the transmission of SARS-CoV-2.
CellCoal: Coalescent Simulation of Single-Cell Sequencing Samples
CellCoal: Coalescent Simulation of Single-Cell Sequencing Samples
Our capacity to study individual cells has enabled a new level of resolution for understanding complex biological systems such as multicellular organisms or microbial communities. Not surprisingly, several methods have been developed in recent years with a formidable potential to investigate the somatic evolution of single cells in both healthy and pathological tissues. However, single-cell sequencing data can be quite noisy due to different technical biases, so inferences resulting from these new methods need to be carefully contrasted. Here, I introduce CellCoal, a software tool for the coalescent simulation of single-cell sequencing genotypes. CellCoal simulates the history of single-cell samples obtained from somatic cell populations with different demographic histories and produces single-nucleotide variants under a variety of mutation models, sequencing read counts, and genotype likelihoods, considering allelic imbalance, allelic dropout, amplification, and sequencing errors, typical of this type of data. CellCoal is a flexible tool that can be used to understand the implications of different somatic evolutionary processes at the single-cell level, and to benchmark dedicated bioinformatic tools for the analysis of single-cell sequencing data. CellCoal is available at https://github.com/dapogon/cellcoal.
Emerging Frontiers in the Study of Molecular Evolution
Emerging Frontiers in the Study of Molecular Evolution
A collection of the editors of Journal of Molecular Evolution have gotten together to pose a set of key challenges and future directions for the field of molecular evolution. Topics include challenges and new directions in prebiotic chemistry and the RNA world, reconstruction of early cellular genomes and proteins, macromolecular and functional evolution, evolutionary cell biology, genome evolution, molecular evolutionary ecology, viral phylodynamics, theoretical population genomics, somatic cell molecular evolution, and directed evolution. While our effort is not meant to be exhaustive, it reflects research questions and problems in the field of molecular evolution that are exciting to our editors.
ModelTest-NG: A New and Scalable Tool for the Selection of DNA and Protein Evolutionary Models
ModelTest-NG: A New and Scalable Tool for the Selection of DNA and Protein Evolutionary Models
ModelTest-NG is a reimplementation from scratch of jModelTest and ProtTest, two popular tools for selecting the best-fit nucleotide and amino acid substitution models, respectively. ModelTest-NG is one to two orders of magnitude faster than jModelTest and ProtTest but equally accurate and introduces several new features, such as ascertainment bias correction, mixture, and free-rate models, or the automatic processing of single partitions. ModelTest-NG is available under a GNU GPL3 license at https://github.com/ddarriba/modeltest , last accessed September 2, 2019.