[PubMed automatic search; some publications might not appear]
2024
2023
2022
Comparative analysis of capture methods for genomic profiling of circulating tumor cells in colorectal cancer
Comparative analysis of capture methods for genomic profiling of circulating tumor cells in colorectal cancer
The genomic profiling of circulating tumor cells (CTCs) in the bloodstream should provide clinically relevant information on therapeutic efficacy and help predict cancer survival. Here, we contrasted the genomic profiles of CTC pools recovered from metastatic colorectal cancer (mCRC) patients using different enrichment strategies (CellSearch, Parsortix, and FACS). Mutations inferred in the CTC pools differed depending on the enrichment strategy and, in all cases, represented a subset of the mutations detected in the matched primary tumor samples. However, the CTC pools from Parsortix, and in part, CellSearch, showed diversity estimates, mutational signatures, and drug-suitability scores remarkably close to those found in matching primary tumor samples. In addition, FACS CTC pools were enriched in apparent sequencing artifacts, leading to much higher genomic diversity estimates. Our results highlight the utility of CTCs to assess the genomic heterogeneity of individual tumors and help clinicians prioritize drugs in mCRC.
SIEVE: joint inference of single-nucleotide variants and cell phylogeny from single-cell DNA sequencing data
SIEVE: joint inference of single-nucleotide variants and cell phylogeny from single-cell DNA sequencing data
We present SIEVE, a statistical method for the joint inference of somatic variants and cell phylogeny under the finite-sites assumption from single-cell DNA sequencing. SIEVE leverages raw read counts for all nucleotides and corrects the acquisition bias of branch lengths. In our simulations, SIEVE outperforms other methods in phylogenetic reconstruction and variant calling accuracy, especially in the inference of homozygous variants. Applying SIEVE to three datasets, one for triple-negative breast (TNBC), and two for colorectal cancer (CRC), we find that double mutant genotypes are rare in CRC but unexpectedly frequent in the TNBC samples.
Clonality and timing of relapsing colorectal cancer metastasis revealed through whole-genome single-cell sequencing
Clonality and timing of relapsing colorectal cancer metastasis revealed through whole-genome single-cell sequencing
Recurrence of tumor cells following local and systemic therapy is a significant hurdle in cancer. Most patients with metastatic colorectal cancer (mCRC) will relapse, despite resection of the metastatic lesions. A better understanding of the evolutionary history of recurrent lesions is required to identify the spatial and temporal patterns of metastatic progression and expose the genetic and evolutionary determinants of therapeutic resistance. With this goal in mind, here we leveraged a unique single-cell whole-genome sequencing dataset from recurrent hepatic lesions of an mCRC patient. Our phylogenetic analysis confirms that the treatment induced a severe demographic bottleneck in the liver metastasis but also that a previously diverged lineage survived this surgery, possibly after migration to a different site in the liver. This lineage evolved very slowly for two years under adjuvant drug therapy and diversified again in a very short period. We identified several non-silent mutations specific to this lineage and inferred a substantial contribution of chemotherapy to the overall, genome-wide mutational burden. All in all, our study suggests that mCRC subclones can migrate locally and evade resection, keep evolving despite rounds of chemotherapy, and re-expand explosively.
Phylogenomic Analyses of 2,786 Genes in 158 Lineages Support a Root of the Eukaryotic Tree of Life between Opisthokonts and All Other Lineages
Phylogenomic Analyses of 2,786 Genes in 158 Lineages Support a Root of the Eukaryotic Tree of Life between Opisthokonts and All Other Lineages
Advances in phylogenomics and high-throughput sequencing have allowed the reconstruction of deep phylogenetic relationships in the evolution of eukaryotes. Yet, the root of the eukaryotic tree of life remains elusive. The most popular hypothesis in textbooks and reviews is a root between Unikonta (Opisthokonta + Amoebozoa) and Bikonta (all other eukaryotes), which emerged from analyses of a single-gene fusion. Subsequent, highly cited studies based on concatenation of genes supported this hypothesis with some variations or proposed a root within Excavata. However, concatenation of genes does not consider phylogenetically-informative events like gene duplications and losses. A recent study using gene tree parsimony (GTP) suggested the root lies between Opisthokonta and all other eukaryotes, but only including 59 taxa and 20 genes. Here we use GTP with a duplication-loss model in a gene-rich and taxon-rich dataset (i.e., 2,786 gene families from two sets of 155 and 158 diverse eukaryotic lineages) to assess the root, and we iterate each analysis 100 times to quantify tree space uncertainty. We also contrasted our results and discarded alternative hypotheses from the literature using GTP and the likelihood-based method SpeciesRax. Our estimates suggest a root between Fungi or Opisthokonta and all other eukaryotes; but based on further analysis of genome size, we propose that the root between Opisthokonta and all other eukaryotes is the most likely.
Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data
Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data
Single-nucleotide variants (SNVs) are the most common variations in the human genome. Recently developed methods for SNV detection from single-cell DNA sequencing data, such as SCIΦ and scVILP, leverage the evolutionary history of the cells to overcome the technical errors associated with single-cell sequencing protocols. Despite being accurate, these methods are not scalable to the extensive genomic breadth of single-cell whole-genome (scWGS) and whole-exome sequencing (scWES) data.
Single-cell mtDNA heteroplasmy in colorectal cancer
Single-cell mtDNA heteroplasmy in colorectal cancer
Human mitochondria can be genetically distinct within the same individual, a phenomenon known as heteroplasmy. In cancer, this phenomenon seems exacerbated, and most mitochondrial mutations seem to be heteroplasmic. How this genetic variation is arranged within and among normal and tumor cells is not well understood. To address this question, here we sequenced single-cell mitochondrial genomes from multiple normal and tumoral locations in four colorectal cancer patients. Our results suggest that single cells, both normal and tumoral, can carry various mitochondrial haplotypes. Remarkably, this intra-cell heteroplasmy can arise before tumor development and be maintained afterward in specific tumoral cell subpopulations. At least in the colorectal patients studied here, the somatic mutations in the single-cells do not seem to have a prominent role in tumorigenesis.
Mitochondrial genome sequencing of marine leukaemias reveals cancer contagion between clam species in the Seas of Southern Europe
Mitochondrial genome sequencing of marine leukaemias reveals cancer contagion between clam species in the Seas of Southern Europe
Clonally transmissible cancers are tumour lineages that are transmitted between individuals via the transfer of living cancer cells. In marine bivalves, leukaemia-like transmissible cancers, called hemic neoplasia (HN), have demonstrated the ability to infect individuals from different species. We performed whole-genome sequencing in eight warty venus clams that were diagnosed with HN, from two sampling points located more than 1000 nautical miles away in the Atlantic Ocean and the Mediterranean Sea Coasts of Spain. Mitochondrial genome sequencing analysis from neoplastic animals revealed the coexistence of haplotypes from two different clam species. Phylogenies estimated from mitochondrial and nuclear markers confirmed this leukaemia originated in striped venus clams and later transmitted to clams of the species warty venus, in which it survives as a contagious cancer. The analysis of mitochondrial and nuclear gene sequences supports all studied tumours belong to a single neoplastic lineage that spreads in the Seas of Southern Europe.
SARS-CoV-2 Evolution and Spike-Specific CD4+ T-Cell Response in Persistent COVID-19 with Severe HIV Immune Suppression
SARS-CoV-2 Evolution and Spike-Specific CD4+ T-Cell Response in Persistent COVID-19 with Severe HIV Immune Suppression
Intra-host evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been reported in cases with persistent coronavirus disease 2019 (COVID-19). In this study, we describe a severely immunosuppressed individual with HIV-1/SARS-CoV-2 coinfection with a long-term course of SARS-CoV-2 infection. A 28-year-old man was diagnosed with HIV-1 infection (CD4+ count: 3 cells/µL nd 563000 HIV-1 RNA copies/mL) and simultaneous pneumonia, disseminated infection and SARS-CoV-2 infection. SARS-CoV-2 real-time reverse transcription polymerase chain reaction positivity from nasopharyngeal samples was prolonged for 15 weeks. SARS-CoV-2 was identified as variant Alpha (PANGO lineage B.1.1.7) with mutation S:E484K. Spike-specific T-cell response was similar to HIV-negative controls although enriched in IL-2, and showed disproportionately increased immunological exhaustion marker levels. Despite persistent SARS-CoV-2 infection, adaptive intra-host SARS-CoV-2 evolution, was not identified. Spike-specific T-cell response protected against a severe COVID-19 outcome and the increased immunological exhaustion marker levels might have favoured SARS-CoV-2 persistence.
CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data
CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data
We introduce CellPhy, a maximum likelihood framework for inferring phylogenetic trees from somatic single-cell single-nucleotide variants. CellPhy leverages a finite-site Markov genotype model with 16 diploid states and considers amplification error and allelic dropout. We implement CellPhy into RAxML-NG, a widely used phylogenetic inference package that provides statistical confidence measurements and scales well on large datasets with hundreds or thousands of cells. Comprehensive simulations suggest that CellPhy is more robust to single-cell genomics errors and outperforms state-of-the-art methods under realistic scenarios, both in accuracy and speed. CellPhy is freely available at https://github.com/amkozlov/cellphy .
Limited genomic reconstruction of SARS-CoV-2 transmission history within local epidemiological clusters
Limited genomic reconstruction of SARS-CoV-2 transmission history within local epidemiological clusters
A detailed understanding of how and when severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission occurs is crucial for designing effective prevention measures. Other than contact tracing, genome sequencing provides information to help infer who infected whom. However, the effectiveness of the genomic approach in this context depends on both (high enough) mutation and (low enough) transmission rates. Today, the level of resolution that we can obtain when describing SARS-CoV-2 outbreaks using just genomic information alone remains unclear. In order to answer this question, we sequenced forty-nine SARS-CoV-2 patient samples from ten local clusters in NW Spain for which partial epidemiological information was available and inferred transmission history using genomic variants. Importantly, we obtained high-quality genomic data, sequencing each sample twice and using unique barcodes to exclude cross-sample contamination. Phylogenetic and cluster analyses showed that consensus genomes were generally sufficient to discriminate among independent transmission clusters. However, levels of intrahost variation were low, which prevented in most cases the unambiguous identification of direct transmission events. After filtering out recurrent variants across clusters, the genomic data were generally compatible with the epidemiological information but did not support specific transmission events over possible alternatives. We estimated the effective transmission bottleneck size to be one to two viral particles for sample pairs whose donor-recipient relationship was likely. Our analyses suggest that intrahost genomic variation in SARS-CoV-2 might be generally limited and that homoplasy and recurrent errors complicate identifying shared intrahost variants. Reliable reconstruction of direct SARS-CoV-2 transmission based solely on genomic data seems hindered by a slow mutation rate, potential convergent events, and technical artifacts. Detailed contact tracing seems essential in most cases to study SARS-CoV-2 transmission at high resolution.
Somatic variant calling from single-cell DNA sequencing data
Somatic variant calling from single-cell DNA sequencing data
Single-cell sequencing has gained popularity in recent years. Despite its numerous applications, single-cell DNA sequencing data is highly error-prone due to technical biases arising from uneven sequencing coverage, allelic dropout, and amplification error. With these artifacts, the identification of somatic genomic variants becomes a challenging task, and over the years, several methods have been developed explicitly for this type of data. Single-cell variant callers implement distinct strategies, make different use of the data, and typically result in many discordant calls when applied to real data. Here, we review current approaches for single-cell variant calling, emphasizing single nucleotide variants. We highlight their potential benefits and shortcomings to help users choose a suitable tool for their data at hand.
2021
Coalescent models derived from birth-death processes
Coalescent models derived from birth-death processes
A coalescent model of a sample of size n is derived from a birth-death process that originates at a random time in the past from a single founder individual. Over time, the descendants of the founder evolve into a population of large (infinite) size from which a sample of size n is taken. The parameters and time of the birth-death process are scaled in N, the size of the present-day population, while letting N→∞, similarly to how the standard Kingman coalescent process arises from the Wright-Fisher model. The model is named the Limit Birth-Death (LBD) coalescent model. Simulations from the LBD coalescent model with sample size n are computationally slow compared to standard coalescent models. Therefore, we suggest different approximations to the LBD coalescent model assuming the population size is a deterministic function of time rather than a stochastic process. Furthermore, we introduce a hybrid LBD coalescent model, that combines the exactness of the LBD coalescent model model with the speed of the approximations.
OmniSARS2: A Highly Sensitive and Specific RT-qPCR-Based COVID-19 Diagnostic Method Designed to Withstand SARS-CoV-2 Lineage Evolution
OmniSARS2: A Highly Sensitive and Specific RT-qPCR-Based COVID-19 Diagnostic Method Designed to Withstand SARS-CoV-2 Lineage Evolution
Extensive transmission of SARS-CoV-2 during the COVID-19 pandemic allowed the generation of thousands of mutations within its genome. While several of these become rare, others largely increase in prevalence, potentially jeopardizing the sensitivity of PCR-based diagnostics. Taking advantage of SARS-CoV-2 genomic knowledge, we designed a one-step probe-based multiplex RT-qPCR (OmniSARS2) to simultaneously detect short fragments of the SARS-CoV-2 genome in ORF1ab, E gene and S gene. Comparative genomics of the most common SARS-CoV-2 lineages, other human betacoronavirus and alphacoronavirus, was the basis for this design, targeting both highly conserved regions across SARS-CoV-2 lineages and variable or absent in other viruses. The highest analytical sensitivity of this method for SARS-CoV-2 detection was 94.2 copies/mL at 95% detection probability (~1 copy per total reaction volume) for the S gene assay, matching the most sensitive available methods. In vitro specificity tests, performed using reference strains, showed no cross-reactivity with other human coronavirus or common pathogens. The method was compared with commercially available methods and detected the virus in clinical samples encompassing different SARS-CoV-2 lineages, including B.1, B.1.1, B.1.177 or B.1.1.7 and rarer lineages. OmniSARS2 revealed a sensitive and specific viral detection method that is less likely to be affected by lineage evolution oligonucleotide-sample mismatch, of relevance to ensure the accuracy of COVID-19 molecular diagnostic methods.
Felsenstein Phylogenetic Likelihood
Felsenstein Phylogenetic Likelihood
In 1981, the Journal of Molecular Evolution (JME) published an article entitled "Evolutionary trees from DNA sequences: A maximum likelihood approach" by Joseph (Joe) Felsenstein (J Mol Evol 17:368-376, 1981). This groundbreaking work laid the foundation for the emerging field of statistical phylogenetics, providing a tractable way of finding maximum likelihood (ML) estimates of evolutionary trees from DNA sequence data. This paper is the second most cited (more than 9000 citations) in JME after Kimura's (J Mol Evol 16:111-120, 1980) seminal paper on a model of nucleotide substitution (with nearly 20,000 citations). On the occasion of the 50th anniversary of JME, we elaborate on the significance of Felsenstein's ML approach to estimating phylogenetic trees.
SARS-CoV-2 genomic diversity and the implications for qRT-PCR diagnostics and transmission
SARS-CoV-2 genomic diversity and the implications for qRT-PCR diagnostics and transmission
The COVID-19 pandemic has sparked an urgent need to uncover the underlying biology of this devastating disease. Though RNA viruses mutate more rapidly than DNA viruses, there are a relatively small number of single nucleotide polymorphisms (SNPs) that differentiate the main SARS-CoV-2 lineages that have spread throughout the world. In this study, we investigated 129 RNA-seq data sets and 6928 consensus genomes to contrast the intra-host and inter-host diversity of SARS-CoV-2. Our analyses yielded three major observations. First, the mutational profile of SARS-CoV-2 highlights intra-host single nucleotide variant (iSNV) and SNP similarity, albeit with differences in C > U changes. Second, iSNV and SNP patterns in SARS-CoV-2 are more similar to MERS-CoV than SARS-CoV-1. Third, a significant fraction of insertions and deletions contribute to the genetic diversity of SARS-CoV-2. Altogether, our findings provide insight into SARS-CoV-2 genomic diversity, inform the design of detection tests, and highlight the potential of iSNVs for tracking the transmission of SARS-CoV-2.
2020
Massive gene presence-absence variation shapes an open pan-genome in the Mediterranean mussel
Massive gene presence-absence variation shapes an open pan-genome in the Mediterranean mussel
The Mediterranean mussel Mytilus galloprovincialis is an ecologically and economically relevant edible marine bivalve, highly invasive and resilient to biotic and abiotic stressors causing recurrent massive mortalities in other bivalves. Although these traits have been recently linked with the maintenance of a high genetic variation within natural populations, the factors underlying the evolutionary success of this species remain unclear.
Malignant transformation and genetic alterations are uncoupled in early colorectal cancer progression
Malignant transformation and genetic alterations are uncoupled in early colorectal cancer progression
Colorectal cancer (CRC) development is generally accepted as a sequential process, with genetic mutations determining phenotypic tumor progression. However, matching genetic profiles with histological transition requires the analyses of temporal samples from the same patient at key stages of progression.
Hidden genomic diversity of SARS-CoV-2: implications for qRT-PCR diagnostics and transmission
Hidden genomic diversity of SARS-CoV-2: implications for qRT-PCR diagnostics and transmission
The COVID-19 pandemic has sparked an urgent need to uncover the underlying biology of this devastating disease. Though RNA viruses mutate more rapidly than DNA viruses, there are a relatively small number of single nucleotide polymorphisms (SNPs) that differentiate the main SARS-CoV-2 clades that have spread throughout the world. In this study, we investigated over 7,000 SARS-CoV-2 datasets to unveil both intrahost and interhost diversity. Our intrahost and interhost diversity analyses yielded three major observations. First, the mutational profile of SARS-CoV-2 highlights iSNV and SNP similarity, albeit with high variability in C>T changes. Second, iSNV and SNP patterns in SARS-CoV-2 are more similar to MERS-CoV than SARS-CoV-1. Third, a significant fraction of small indels fuel the genetic diversity of SARS-CoV-2. Altogether, our findings provide insight into SARS-CoV-2 genomic diversity, inform the design of detection tests, and highlight the potential of iSNVs for tracking the transmission of SARS-CoV-2.
CellCoal: Coalescent Simulation of Single-Cell Sequencing Samples
CellCoal: Coalescent Simulation of Single-Cell Sequencing Samples
Our capacity to study individual cells has enabled a new level of resolution for understanding complex biological systems such as multicellular organisms or microbial communities. Not surprisingly, several methods have been developed in recent years with a formidable potential to investigate the somatic evolution of single cells in both healthy and pathological tissues. However, single-cell sequencing data can be quite noisy due to different technical biases, so inferences resulting from these new methods need to be carefully contrasted. Here, I introduce CellCoal, a software tool for the coalescent simulation of single-cell sequencing genotypes. CellCoal simulates the history of single-cell samples obtained from somatic cell populations with different demographic histories and produces single-nucleotide variants under a variety of mutation models, sequencing read counts, and genotype likelihoods, considering allelic imbalance, allelic dropout, amplification, and sequencing errors, typical of this type of data. CellCoal is a flexible tool that can be used to understand the implications of different somatic evolutionary processes at the single-cell level, and to benchmark dedicated bioinformatic tools for the analysis of single-cell sequencing data. CellCoal is available at https://github.com/dapogon/cellcoal.
Emerging Frontiers in the Study of Molecular Evolution
Emerging Frontiers in the Study of Molecular Evolution
A collection of the editors of Journal of Molecular Evolution have gotten together to pose a set of key challenges and future directions for the field of molecular evolution. Topics include challenges and new directions in prebiotic chemistry and the RNA world, reconstruction of early cellular genomes and proteins, macromolecular and functional evolution, evolutionary cell biology, genome evolution, molecular evolutionary ecology, viral phylodynamics, theoretical population genomics, somatic cell molecular evolution, and directed evolution. While our effort is not meant to be exhaustive, it reflects research questions and problems in the field of molecular evolution that are exciting to our editors.
ModelTest-NG: A New and Scalable Tool for the Selection of DNA and Protein Evolutionary Models
ModelTest-NG: A New and Scalable Tool for the Selection of DNA and Protein Evolutionary Models
ModelTest-NG is a reimplementation from scratch of jModelTest and ProtTest, two popular tools for selecting the best-fit nucleotide and amino acid substitution models, respectively. ModelTest-NG is one to two orders of magnitude faster than jModelTest and ProtTest but equally accurate and introduces several new features, such as ascertainment bias correction, mixture, and free-rate models, or the automatic processing of single partitions. ModelTest-NG is available under a GNU GPL3 license at https://github.com/ddarriba/modeltest , last accessed September 2, 2019.