Mol. Biol. Evol. 37(7):2034–2044; doi:10.1093/molbev/msaa065
Molecular Biology and Evolution provides yearly recognition of recently published manuscripts that have made strong impressions on our research community since their publication. Below, we highlight ten discoveries, five methods, and five resources as “Emerging Classics” based on citations accrued per fractional year since print publication. Articles are listed alphabetically by the first author’s family name. Total citation counts were obtained from Web of Science on November 6, 2020. We congratulate these authors on the significance of their contributions and look forward to seeing new classics emerge in the years to come.
AbstractPresumably, due to a rapid early diversification, major parts of the higher-level phylogeny of birds are still resolved controversially in different analyses or are considered unresolvable. To address this problem, we produced an avian tree of life, which includes molecular sequences of one or several species of ∼90% of the currently recognized family-level taxa (429 species, 379 genera) including all 106 family-level taxa of the nonpasserines and 115 of the passerines (Passeriformes). The unconstrained analyses of noncoding 3-prime untranslated region (3′-UTR) sequences and those of coding sequences yielded different trees. In contrast to the coding sequences, the 3′-UTR sequences resulted in a well-resolved and stable tree topology. The 3′-UTR contained, unexpectedly, transcription factor binding motifs that were specific for different higher-level taxa. In this tree, grebes and flamingos are the sister clade of all other Neoaves, which are subdivided into five major clades. All nonpasserine taxa were placed with robust statistical support including the long-time enigmatic hoatzin (Opisthocomiformes), which was found being the sister taxon of the Caprimulgiformes. The comparatively late radiation of family-level clades of the songbirds (oscine Passeriformes) contrasts with the attenuated diversification of nonpasseriform taxa since the early Miocene. This correlates with the evolution of vocal production learning, an important speciation factor, which is ancestral for songbirds and evolved convergent only in hummingbirds and parrots. As 3′-UTR-based phylotranscriptomics resolved the avian family-level tree of life, we suggest that this procedure will also resolve the all-species avian tree of life
AbstractReconstructing the evolutionary history of island biotas is complicated by unusual morphological evolution in insular environments. However, past human-caused extinctions limit the use of molecular analyses to determine origins and affinities of enigmatic island taxa. The Caribbean formerly contained a morphologically diverse assemblage of caviomorph rodents (33 species in 19 genera), ranging from ∼0.1 to 200 kg and traditionally classified into three higher-order taxa (Capromyidae/Capromyinae, Heteropsomyinae, and Heptaxodontidae). Few species survive today, and the evolutionary affinities of living and extinct Caribbean caviomorphs to each other and to mainland taxa are unclear: Are they monophyletic, polyphyletic, or paraphyletic? We use ancient DNA techniques to present the first genetic data for extinct heteropsomyines and heptaxodontids, as well as for several extinct capromyids, and demonstrate through analysis of mitogenomic and nuclear data sets that all sampled Caribbean caviomorphs represent a well-supported monophyletic group. The remarkable morphological and ecological variation observed across living and extinct caviomorphs from Cuba, Hispaniola, Jamaica, Puerto Rico, and other islands was generated through within-archipelago evolutionary radiation following a single Early Miocene overwater colonization. This evolutionary pattern contrasts with the origination of diversity in many other Caribbean groups. All living and extinct Caribbean caviomorphs comprise a single biologically remarkable subfamily (Capromyinae) within the morphologically conservative living Neotropical family Echimyidae. Caribbean caviomorphs represent an important new example of insular mammalian adaptive radiation, where taxa retaining “ancestral-type” characteristics coexisted alongside taxa occupying novel island niches. Diversification was associated with the greatest insular body mass increase recorded in rodents and possibly the greatest for any mammal lineage.
AbstractSubstantial progress has been made globally to control malaria, however there is a growing need for innovative new tools to ensure continued progress. One approach is to harness genetic sequencing and accompanying methodological approaches as have been used in the control of other infectious diseases. However, to utilize these methodologies for malaria, we first need to extend the methods to capture the complex interactions between parasites, human and vector hosts, and environment, which all impact the level of genetic diversity and relatedness of malaria parasites. We develop an individual-based transmission model to simulate malaria parasite genetics parameterized using estimated relationships between complexity of infection and age from five regions in Uganda and Kenya. We predict that cotransmission and superinfection contribute equally to within-host parasite genetic diversity at 11.5% PCR prevalence, above which superinfections dominate. Finally, we characterize the predictive power of six metrics of parasite genetics for detecting changes in transmission intensity, before grouping them in an ensemble statistical model. The model predicted malaria prevalence with a mean absolute error of 0.055. Different assumptions about the availability of sample metadata were considered, with the most accurate predictions of malaria prevalence made when the clinical status and age of sampled individuals is known. Parasite genetics may provide a novel surveillance tool for estimating the prevalence of malaria in areas in which prevalence surveys are not feasible. However, the findings presented here reinforce the need for patient metadata to be recorded and made available within all future attempts to use parasite genetics for surveillance.
AbstractThe genus Acropora comprises the most diverse and abundant scleractinian corals (Anthozoa, Cnidaria) in coral reefs, the most diverse marine ecosystems on Earth. However, the genetic basis for the success and wide distribution of Acropora are unknown. Here, we sequenced complete genomes of 15 Acropora species and 3 other acroporid taxa belonging to the genera Montipora and Astreopora to examine genomic novelties that explain their evolutionary success. We successfully obtained reasonable draft genomes of all 18 species. Molecular dating indicates that the Acropora ancestor survived warm periods without sea ice from the mid or late Cretaceous to the Early Eocene and that diversification of Acropora may have been enhanced by subsequent cooling periods. In general, the scleractinian gene repertoire is highly conserved; however, coral- or cnidarian-specific possible stress response genes are tandemly duplicated in Acropora. Enzymes that cleave dimethlysulfonioproprionate into dimethyl sulfide, which promotes cloud formation and combats greenhouse gasses, are the most duplicated genes in the Acropora ancestor. These may have been acquired by horizontal gene transfer from algal symbionts belonging to the family Symbiodiniaceae, or from coccolithophores, suggesting that although functions of this enzyme in Acropora are unclear, Acropora may have survived warmer marine environments in the past by enhancing cloud formation. In addition, possible antimicrobial peptides and symbiosis-related genes are under positive selection in Acropora, perhaps enabling adaptation to diverse environments. Our results suggest unique Acropora adaptations to ancient, warm marine environments and provide insights into its capacity to adjust to rising seawater temperatures.
AbstractCorrespondence between evolution and development has been discussed for more than two centuries. Recent work reveals that phylogeny−ontogeny correlations are indeed present in developmental transcriptomes of eukaryotic clades with complex multicellularity. Nevertheless, it has been largely ignored that the pervasive presence of phylogeny−ontogeny correlations is a hallmark of development in eukaryotes. This perspective opens a possibility to look for similar parallelisms in biological settings where developmental logic and multicellular complexity are more obscure. For instance, it has been increasingly recognized that multicellular behavior underlies biofilm formation in bacteria. However, it remains unclear whether bacterial biofilm growth shares some basic principles with development in complex eukaryotes. Here we show that the ontogeny of growing Bacillus subtilis biofilms recapitulates phylogeny at the expression level. Using time-resolved transcriptome and proteome profiles, we found that biofilm ontogeny correlates with the evolutionary measures, in a way that evolutionary younger and more diverged genes were increasingly expressed toward later timepoints of biofilm growth. Molecular and morphological signatures also revealed that biofilm growth is highly regulated and organized into discrete ontogenetic stages, analogous to those of eukaryotic embryos. Together, this suggests that biofilm formation in Bacillus is a bona fide developmental process comparable to organismal development in animals, plants, and fungi. Given that most cells on Earth reside in the form of biofilms and that biofilms represent the oldest known fossils, we anticipate that the widely adopted vision of the first life as a single-cell and free-living organism needs rethinking.
AbstractPopulation genetic theory and empirical evidence indicate that deleterious alleles can be purged in small populations. However, this viewpoint remains controversial. It is unclear whether natural selection is powerful enough to purge deleterious mutations when wild populations continue to decline. Pheasants are terrestrial birds facing a long-term risk of extinction as a result of anthropogenic perturbations and exploitation. Nevertheless, there are scant genomics resources available for conservation management and planning. Here, we analyzed comparative population genomic data for the three extant isolated populations of Brown eared pheasant (Crossoptilon mantchuricum) in China. We showed that C. mantchuricum has low genome-wide diversity and a contracting effective population size because of persistent declines over the past 100,000 years. We compared genome-wide variation in C. mantchuricum with that of its closely related sister species, the Blue eared pheasant (C. auritum) for which the conservation concern is low. There were detrimental genetic consequences across all C. mantchuricum genomes including extended runs of homozygous sequences, slow rates of linkage disequilibrium decay, excessive loss-of-function mutations, and loss of adaptive genetic diversity at the major histocompatibility complex region. To the best of our knowledge, this study is the first to perform a comprehensive conservation genomic analysis on this threatened pheasant species. Moreover, we demonstrated that natural selection may not suffice to purge deleterious mutations in wild populations undergoing long-term decline. The findings of this study could facilitate conservation planning for threatened species and help recover their population size.
AbstractIt has been suggested that, due to the structure of the genetic code, nonsynonymous transitions are less likely than transversions to cause radical changes in amino acid physicochemical properties so are on average less deleterious. This view was supported by some but not all mutagenesis experiments. Because laboratory measures of fitness effects have limited sensitivities and relative frequencies of different mutations in mutagenesis studies may not match those in nature, we here revisit this issue using comparative genomics. We extend the standard codon model of sequence evolution by adding the parameter η that quantifies the ratio of the fixation probability of transitional nonsynonymous mutations to that of transversional nonsynonymous mutations. We then estimate η from the concatenated alignment of all protein-coding DNA sequences of two closely related genomes. Surprisingly, η ranges from 0.13 to 2.0 across 90 species pairs sampled from the tree of life, with 51 incidences of η < 1 and 30 incidences of η >1 that are statistically significant. Hence, whether nonsynonymous transversions are overall more deleterious than nonsynonymous transitions is species-dependent. Because the corresponding groups of amino acid replacements differ between nonsynonymous transitions and transversions, η is influenced by the relative exchangeabilities of amino acid pairs. Indeed, an extensive search reveals that the large variation in η is primarily explainable by the recently reported among-species disparity in amino acid exchangeabilities. These findings demonstrate that genome-wide nucleotide substitution patterns in coding sequences have species-specific features and are more variable among evolutionary lineages than are currently thought.
AbstractCytoplasmic incompatibility is a selfish reproductive manipulation induced by the endosymbiont Wolbachia in arthropods. In males Wolbachia modifies sperm, leading to embryonic mortality in crosses with Wolbachia-free females. In females, Wolbachia rescues the cross and allows development to proceed normally. This provides a reproductive advantage to infected females, allowing the maternally transmitted symbiont to spread rapidly through host populations. We identified homologs of the genes underlying this phenotype, cifA and cifB, in 52 of 71 new and published Wolbachia genome sequences. They are strongly associated with cytoplasmic incompatibility. There are up to seven copies of the genes in each genome, and phylogenetic analysis shows that Wolbachia frequently acquires new copies due to pervasive horizontal transfer between strains. In many cases, the genes have subsequently acquired loss-of-function mutations to become pseudogenes. As predicted by theory, this tends to occur first in cifB, whose sole function is to modify sperm, and then in cifA, which is required to rescue the cross in females. Although cif genes recombine, recombination is largely restricted to closely related homologs. This is predicted under a model of coevolution between sperm modification and embryonic rescue, where recombination between distantly related pairs of genes would create a self-incompatible strain. Together, these patterns of gene gain, loss, and recombination support evolutionary models of cytoplasmic incompatibility.
AbstractIn correctly predicting that selection efficiency is positively correlated with the effective population size (Ne), the nearly neutral theory provides a coherent understanding of between-species variation in numerous genomic parameters, including heritable error (germline mutation) rates. Does the same theory also explain variation in phenotypic error rates and in abundance of error mitigation mechanisms? Translational read-through provides a model to investigate both issues as it is common, mostly nonadaptive, and has good proxy for rate (TAA being the least leaky stop codon) and potential error mitigation via “fail-safe” 3′ additional stop codons (ASCs). Prior theory of translational read-through has suggested that when population sizes are high, weak selection for local mitigation can be effective thus predicting a positive correlation between ASC enrichment and Ne. Contra to prediction, we find that ASC enrichment is not correlated with Ne. ASC enrichment, although highly phylogenetically patchy, is, however, more common both in unicellular species and in genes expressed in unicellular modes in multicellular species. By contrast, Ne does positively correlate with TAA enrichment. These results imply that local phenotypic error rates, not local mitigation rates, are consistent with a drift barrier/nearly neutral model.
AbstractAmino acid substitutions at nonconserved protein positions can have noncanonical and “long-distance” outcomes on protein function. Such outcomes might arise from changes in the internal protein communication network, which is often accompanied by changes in structural flexibility. To test this, we calculated flexibilities and dynamic coupling for positions in the linker region of the lactose repressor protein. This region contains nonconserved positions for which substitutions alter DNA-binding affinity. We first chose to study 11 substitutions at position 52. In computations, substitutions showed long-range effects on flexibilities of DNA-binding positions, and the degree of flexibility change correlated with experimentally measured changes in DNA binding. Substitutions also altered dynamic coupling to DNA-binding positions in a manner that captured other experimentally determined functional changes. Next, we broadened calculations to consider the dynamic coupling between 17 linker positions and the DNA-binding domain. Experimentally, these linker positions exhibited a wide range of substitution outcomes: Four conserved positions tolerated hardly any substitutions (“toggle”), ten nonconserved positions showed progressive changes from a range of substitutions (“rheostat”), and three nonconserved positions tolerated almost all substitutions (“neutral”). In computations with wild-type lactose repressor protein, the dynamic couplings between the DNA-binding domain and these linker positions showed varied degrees of asymmetry that correlated with the observed toggle/rheostat/neutral substitution outcomes. Thus, we propose that long-range and noncanonical substitutions outcomes at nonconserved positions arise from rewiring long-range communication among functionally important positions. Such calculations might enable predictions for substitution outcomes at a range of nonconserved positions.
AbstractDivergence of gene function and expression during development can give rise to phenotypic differences at the level of cells, tissues, organs, and ultimately whole organisms. To gain insights into the evolution of gene expression and novel genes at spatial resolution, we compared the spatially resolved transcriptomes of two distantly related nematodes, Caenorhabditis elegans and Pristionchus pacificus, that diverged 60–90 Ma. The spatial transcriptomes of adult worms show little evidence for strong conservation at the level of single genes. Instead, regional expression is largely driven by recent duplication and emergence of novel genes. Estimation of gene ages across anatomical structures revealed an enrichment of novel genes in sperm-related regions. This provides first evidence in nematodes for the “out of testis” hypothesis that has been previously postulated based on studies in Drosophila and mammals. “Out of testis” genes represent a mix of products of pervasive transcription as well as fast evolving members of ancient gene families. Strikingly, numerous novel genes have known functions during meiosis in Caenorhabditis elegans indicating that even universal processes such as meiosis may be targets of rapid evolution. Our study highlights the importance of novel genes in generating phenotypic diversity and explicitly characterizes gene origination in sperm-related regions. Furthermore, it proposes new functions for previously uncharacterized genes and establishes the spatial transcriptome of Pristionchus pacificus as a catalog for future studies on the evolution of gene expression and function.
AbstractTelomerase RNA (TR) is a noncoding RNA essential for the function of telomerase ribonucleoprotein. TRs from vertebrates, fungi, ciliates, and plants exhibit extreme diversity in size, sequence, secondary structure, and biogenesis pathway. However, the evolutionary pathways leading to such unusual diversity among eukaryotic kingdoms remain elusive. Within the metazoan kingdom, the study of TR has been limited to vertebrates and echinoderms. To understand the origin and evolution of TR across the animal kingdom, we employed a phylogeny-guided, structure-based bioinformatics approach to identify 82 novel TRs from eight previously unexplored metazoan phyla, including the basal-branching sponges. Synthetic TRs from two representative species, a hemichordate and a mollusk, reconstitute active telomerase in vitro with their corresponding telomerase reverse transcriptase components, confirming that they are authentic TRs. Comparative analysis shows that three functional domains, template-pseudoknot (T-PK), CR4/5, and box H/ACA, are conserved between vertebrate and the basal metazoan lineages, indicating a monophyletic origin of the animal TRs with a snoRNA-related biogenesis mechanism. Nonetheless, TRs along separate animal lineages evolved with divergent structural elements in the T-PK and CR4/5 domains. For example, TRs from echinoderms and protostomes lack the canonical CR4/5 and have independently evolved functionally equivalent domains with different secondary structures. In the T-PK domain, a P1.1 stem common in most metazoan clades defines the template boundary, which is replaced by a P1-defined boundary in vertebrates. This study provides unprecedented insight into the divergent evolution of detailed TR secondary structures across broad metazoan lineages, revealing ancestral and later-diversified elements.
AbstractThe recent technological advances underlying the screening of large combinatorial libraries in high-throughput mutational scans deepen our understanding of adaptive protein evolution and boost its applications in protein design. Nevertheless, the large number of possible genotypes requires suitable computational methods for data analysis, the prediction of mutational effects, and the generation of optimized sequences. We describe a computational method that, trained on sequencing samples from multiple rounds of a screening experiment, provides a model of the genotype–fitness relationship. We tested the method on five large-scale mutational scans, yielding accurate predictions of the mutational effects on fitness. The inferred fitness landscape is robust to experimental and sampling noise and exhibits high generalization power in terms of broader sequence space exploration and higher fitness variant predictions. We investigate the role of epistasis and show that the inferred model provides structural information about the 3D contacts in the molecular fold.
AbstractThe evolutionary transition from outcrossing to selfing can have important genomic consequences. Decreased effective population size and the reduced efficacy of selection are predicted to play an important role in the molecular evolution of the genomes of selfing species. We investigated evidence for molecular signatures of the genomic selfing syndrome using 66 species of Primula including distylous (outcrossing) and derived homostylous (selfing) taxa. We complemented our comparative analysis with a microevolutionary study of P. chungensis, which is polymorphic for mating system and consists of both distylous and homostylous populations. We generated chloroplast and nuclear genomic data sets for distylous, homostylous, and distylous–homostylous species and identified patterns of nonsynonymous to synonymous divergence (dN/dS) and polymorphism (πN/πS) in species or lineages with contrasting mating systems. Our analysis of coding sequence divergence and polymorphism detected strongly reduced genetic diversity and heterozygosity, decreased efficacy of purifying selection, purging of large-effect deleterious mutations, and lower rates of adaptive evolution in samples from homostylous compared with distylous populations, consistent with theoretical expectations of the genomic selfing syndrome. Our results demonstrate that self-fertilization is a major driver of molecular evolutionary processes with genomic signatures of selfing evident in both old and relatively young homostylous populations.
AbstractSex chromosomes are classically predicted to stop recombining in the heterogametic sex, thereby enforcing linkage between sex-determining (SD) and sex-antagonistic (SA) genes. With the same rationale, a pre-existing sex asymmetry in recombination is expected to affect the evolution of heterogamety, for example, a low rate of male recombination might favor transitions to XY systems, by generating immediate linkage between SD and SA genes. Furthermore, the accumulation of deleterious mutations on nonrecombining Y chromosomes should favor XY-to-XY transitions (which discard the decayed Y), but disfavor XY-to-ZW transitions (which fix the decayed Y as an autosome). Like many anuran amphibians, Hyla tree frogs have been shown to display drastic heterochiasmy (males only recombine at chromosome tips) and are typically XY, which seems to fit the above expectations. Instead, here we demonstrate that two species, H. sarda and H. savignyi, share a common ZW system since at least 11 Ma. Surprisingly, the typical pattern of restricted male recombination has been maintained since then, despite female heterogamety. Hence, sex chromosomes recombine freely in ZW females, not in ZZ males. This suggests that heterochiasmy does not constrain heterogamety (and vice versa), and that the role of SA genes in the evolution of sex chromosomes might have been overemphasized.
AbstractThe postsynaptic density extends across the postsynaptic dendritic spine with discs large (DLG) as the most abundant scaffolding protein. DLG dynamically alters the structure of the postsynaptic density, thus controlling the function and distribution of specific receptors at the synapse. DLG contains three PDZ domains and one important interaction governing postsynaptic architecture is that between the PDZ3 domain from DLG and a protein called cysteine-rich interactor of PDZ3 (CRIPT). However, little is known regarding functional evolution of the PDZ3:CRIPT interaction. Here, we subjected PDZ3 and CRIPT to ancestral sequence reconstruction, resurrection, and biophysical experiments. We show that the PDZ3:CRIPT interaction is an ancient interaction, which was likely present in the last common ancestor of Eukaryotes, and that high affinity is maintained in most extant animal phyla. However, affinity is low in nematodes and insects, raising questions about the physiological function of the interaction in species from these animal groups. Our findings demonstrate how an apparently established protein–protein interaction involved in cellular scaffolding in bilaterians can suddenly be subject to dynamic evolution including possible loss of function.
AbstractWe studied five chemically distinct but related 1,3,5-triazine antifolates with regard to their effects on growth of a set of mutants in dihydrofolate reductase. The mutants comprise a combinatorially complete data set of all 16 possible combinations of four amino acid replacements associated with resistance to pyrimethamine in the malaria parasite Plasmodium falciparum. Pyrimethamine was a mainstay medication for malaria for many years, and it is still in use in intermittent treatment during pregnancy or as a partner drug in artemisinin combination therapy. Our goal was to investigate the extent to which the alleles yield similar adaptive topographies and patterns of epistasis across chemically related drugs. We find that the adaptive topographies are indeed similar with the same or closely related alleles being fixed in computer simulations of stepwise evolution. For all but one of the drugs the topography features at least one suboptimal fitness peak. Our data are consistent with earlier results indicating that third order and higher epistatic interactions appear to contribute only modestly to the overall adaptive topography, and they are largely conserved. In regard to drug development, our data suggest that higher-order interactions are likely to be of little value as an advisory tool in the choice of lead compounds.
AbstractPhylogenetic dating is one of the most powerful and commonly used methods of drawing epidemiological interpretations from pathogen genomic data. Building such trees requires considering a molecular clock model which represents the rate at which substitutions accumulate on genomes. When the molecular clock rate is constant throughout the tree then the clock is said to be strict, but this is often not an acceptable assumption. Alternatively, relaxed clock models consider variations in the clock rate, often based on a distribution of rates for each branch. However, we show here that the distributions of rates across branches in commonly used relaxed clock models are incompatible with the biological expectation that the sum of the numbers of substitutions on two neighboring branches should be distributed as the substitution number on a single branch of equivalent length. We call this expectation the additivity property. We further show how assumptions of commonly used relaxed clock models can lead to estimates of evolutionary rates and dates with low precision and biased confidence intervals. We therefore propose a new additive relaxed clock model where the additivity property is satisfied. We illustrate the use of our new additive relaxed clock model on a range of simulated and real data sets, and we show that using this new model leads to more accurate estimates of mean evolutionary rates and ancestral dates.
AbstractSpermatogenesis is an essential process for producing sperm cells. Reproductive strategy is successfully evolved for a species to adapt to a certain ecological system. However, roles of newly evolved genes in testis autophagy remain unclear. In this study, we found that a newly evolved gene srag (Sox9-regulated autophagy gene) plays an important role in promoting autophagy in testis in the lineage of the teleost Monopterus albus. The gene integrated into an interaction network through a two-way strategy of evolution, via Sox9-binding in its promoter and interaction with Becn1 in the coding region. Its promoter region evolved a cis element for binding of Sox9, a transcription factor for male sex determination. Both in vitro and in vivo analyses demonstrated that transcription factor Sox9 could bind to and activate the srag promoter. Its coding region acquired ability to interact with key autophagy initiation factor Becn1 via the conserved C-terminal, indicating that srag integrated into preexisting autophagy network. Moreover, we determined that Srag enhanced autophagy by interacting with Becn1. Notably, srag transgenic zebrafish revealed that Srag exerted the same function by enhancing autophagy through the Srag–Becn1 pathway. Thus, the new gene srag regulated autophagy in testis by integrated into preexisting autophagy network.
AbstractHuman herpesvirus 6A and 6B (HHV-6) can integrate into the germline, and as a result, ∼70 million people harbor the genome of one of these viruses in every cell of their body. Until now, it has been largely unknown if 1) these integrations are ancient, 2) if they still occur, and 3) whether circulating virus strains differ from integrated ones. Here, we used next-generation sequencing and mining of public human genome data sets to generate the largest and most diverse collection of circulating and integrated HHV-6 genomes studied to date. In genomes of geographically dispersed, only distantly related people, we identified clades of integrated viruses that originated from a single ancestral event, confirming this with fluorescent in situ hybridization to directly observe the integration locus. In contrast to HHV-6B, circulating and integrated HHV-6A sequences form distinct clades, arguing against ongoing integration of circulating HHV-6A or “reactivation” of integrated HHV-6A. Taken together, our study provides the first comprehensive picture of the evolution of HHV-6, and reveals that integration of heritable HHV-6 has occurred since the time of, if not before, human migrations out of Africa.
AbstractLarge-scale re-engineering of synonymous sites is a promising strategy to generate vaccines either through synthesis of attenuated viruses or via codon-optimized genes in DNA vaccines. Attenuation typically relies on deoptimization of codon pairs and maximization of CpG dinucleotide frequencies. So as to formulate evolutionarily informed attenuation strategies that aim to force nucleotide usage against the direction favored by selection, here, we examine available whole-genome sequences of SARS-CoV-2 to infer patterns of mutation and selection on synonymous sites. Analysis of mutational profiles indicates a strong mutation bias toward U. In turn, analysis of observed synonymous site composition implicates selection against U. Accounting for dinucleotide effects reinforces this conclusion, observed UU content being a quarter of that expected under neutrality. Possible mechanisms of selection against U mutations include selection for higher expression, for high mRNA stability or lower immunogenicity of viral genes. Consistent with gene-specific selection against CpG dinucleotides, we observe systematic differences of CpG content between SARS-CoV-2 genes. We propose an evolutionarily informed approach to attenuation that, unusually, seeks to increase usage of the already most common synonymous codons. Comparable analysis of H1N1 and Ebola finds that GC3 deviated from neutral equilibrium is not a universal feature, cautioning against generalization of results.
AbstractThe ribosome is an essential cellular machine performing protein biosynthesis. Its structure and composition are highly conserved in all species. However, some bacteria have been reported to have an incomplete set of ribosomal proteins. We have analyzed ribosomal protein composition in 214 small bacterial genomes (<1 Mb) and found that although the ribosome composition is fairly stable, some ribosomal proteins may be absent, especially in bacteria with dramatically reduced genomes. The protein composition of the large subunit is less conserved than that of the small subunit. We have identified the set of frequently lost ribosomal proteins and demonstrated that they tend to be positioned on the ribosome surface and have fewer contacts to other ribosome components. Moreover, some proteins are lost in an evolutionary correlated manner. The reduction of ribosomal RNA is also common, with deletions mostly occurring in free loops. Finally, the loss of the anti-Shine–Dalgarno sequence is associated with the loss of a higher number of ribosomal proteins.
AbstractThe Kingman coalescent and its developments are often considered among the most important advances in population genetics of the last decades. Demographic inference based on coalescent theory has been used to reconstruct the population dynamics and evolutionary history of several species, including Mycobacterium tuberculosis (MTB), an important human pathogen causing tuberculosis. One key assumption of the Kingman coalescent is that the number of descendants of different individuals does not vary strongly, and violating this assumption could lead to severe biases caused by model misspecification. Individual lineages of MTB are expected to vary strongly in reproductive success because 1) MTB is potentially under constant selection due to the pressure of the host immune system and of antibiotic treatment, 2) MTB undergoes repeated population bottlenecks when it transmits from one host to the next, and 3) some hosts show much higher transmission rates compared with the average (superspreaders).Here, we used an approximate Bayesian computation approach to test whether multiple-merger coalescents (MMC), a class of models that allow for large variation in reproductive success among lineages, are more appropriate models to study MTB populations. We considered 11 publicly available whole-genome sequence data sets sampled from local MTB populations and outbreaks and found that MMC had a better fit compared with the Kingman coalescent for 10 of the 11 data sets. These results indicate that the null model for analyzing MTB outbreaks should be reassessed and that past findings based on the Kingman coalescent need to be revisited.
AbstractDirect comparisons between historical and contemporary populations allow for detecting changes in genetic diversity through time and assessment of the impact of habitat fragmentation. Here, we determined the genetic architecture of both historical and modern lions to document changes in genetic diversity over the last century. We surveyed microsatellite and mitochondrial genome variation from 143 high-quality museum specimens of known provenance, allowing us to directly compare this information with data from several recently published nuclear and mitochondrial studies. Our results provide evidence for male-mediated gene flow and recent isolation of local subpopulations, likely due to habitat fragmentation. Nuclear markers showed a significant decrease in genetic diversity from the historical (HE = 0.833) to the modern (HE = 0.796) populations, whereas mitochondrial genetic diversity was maintained (Hd = 0.98 for both). Although the historical population appears to have been panmictic based on nDNA data, hierarchical structure analysis identified four tiers of genetic structure in modern populations and was able to detect most sampling locations. Mitogenome analyses identified four clusters: Southern, Mixed, Eastern, and Western and were consistent between modern and historically sampled haplotypes. Within the last century, habitat fragmentation caused lion subpopulations to become more geographically isolated as human expansion changed the African landscape. This resulted in an increase in fine-scale nuclear genetic structure and loss of genetic diversity as lion subpopulations became more differentiated, whereas mitochondrial structure and diversity were maintained over time.
Despite the fact that viruses are among the simplest biological entities—consisting only of DNA or RNA encapsulated in a protein shell—they can have devastating consequences, with viruses such as influenza, human immunodeficiency virus (HIV), and Ebola having dramatically affected the course of human history. Because they generally lack the cellular machinery necessary to reproduce, they propagate by hijacking host cells, often to the host’s detriment. Although their status as a “living” organism may be in question, there is no doubt that viruses are shaped by evolutionary forces that influence their genomes, as well as their replication, host range, virulence, and other features. With the emergence of SARS-CoV-2, a virus that has intimately affected virtually all aspects of human life for the past year, the study of viral evolution may seem more profound and relevant than ever before. Genome Biology and Evolution’s latest virtual issue is a collection of thought-provoking articles in the field of viral evolution from the past 2 years, providing new insight into the evolutionary mechanisms that influence viruses, their genomes, and their hosts, as well as showcasing their use in the study of evolution.
AbstractsangeranalyseR is feature-rich, free, and open-source R package for processing Sanger sequencing data. It allows users to go from loading reads to saving aligned contigs in a few lines of R code by using sensible defaults for most actions. It also provides complete flexibility for determining how individual reads and contigs are processed, both at the command-line in R and via interactive Shiny applications. sangeranalyseR provides a wide range of options for all steps in Sanger processing pipelines including trimming reads, detecting secondary peaks, viewing chromatograms, detecting indels and stop codons, aligning contigs, estimating phylogenetic trees, and more. Input data can be in either ABIF or FASTA format. sangeranalyseR comes with extensive online documentation and outputs aligned and unaligned reads and contigs in FASTA format, along with detailed interactive HTML reports. sangeranalyseR supports the use of colorblind-friendly palettes for viewing alignments and chromatograms. It is released under an MIT licence and available for all platforms on Bioconductor (https://bioconductor.org/packages/sangeranalyseR, last accessed February 22, 2021) and on Github (https://github.com/roblanf/sangeranalyseR, last accessed February 22, 2021).
AbstractFrom a genomics perspective, bivalves (Mollusca: Bivalvia) have been poorly explored with the exception for those of high economic value. The bivalve order Unionida, or freshwater mussels, has been of interest in recent genomic studies due to their unique mitochondrial biology and peculiar life cycle. However, genomic studies have been hindered by the lack of a high-quality reference genome. Here, I present a genome assembly of Potamilus streckersoni using Pacific Bioscience single-molecule real-time long reads and 10X Genomics-linked read sequencing. Further, I use RNA sequencing from multiple tissue types and life stages to annotate the reference genome. The final assembly was far superior to any previously published freshwater mussel genome and was represented by 2,368 scaffolds (2,472 contigs) and 1,776,755,624 bp, with a scaffold N50 of 2,051,244 bp. A high proportion of the assembly was comprised of repetitive elements (51.03%), aligning with genomic characteristics of other bivalves. The functional annotation returned 52,407 gene models (41,065 protein, 11,342 tRNAs), which was concordant with the estimated number of genes in other freshwater mussel species. This genetic resource, along with future studies developing high-quality genome assemblies and annotations, will be integral toward unraveling the genomic bases of ecologically and evolutionarily important traits in this hyper-diverse group.
AbstractThe vertebrate mitochondrial genomes generally present a typical gene order. Exceptions are uncommon and important to study the genetic mechanisms of gene order rearrangements and their consequences on phylogenetic output and mitochondrial function. Antarctic notothenioid fish carry some peculiar rearrangements of the mitochondrial gene order. In this first systematic study of 28 species, we analyzed known and undescribed mitochondrial genome rearrangements for a total of eight different gene orders within the notothenioid fish. Our reconstructions suggest that transpositions, duplications, and inversion of multiple genes are the most likely mechanisms of rearrangement in notothenioid mitochondrial genomes. In Trematominae, we documented an extremely rare inversion of a large genomic segment of 5,300 bp that partially affected the gene compositional bias but not the phylogenetic output. The genomic region delimited by nad5 and trnF, close to the area of the Control Region, was identified as the hot spot of variation in Antarctic fish mitochondrial genomes. Analyzing the sequence of several intergenic spacers and mapping the arrangements on a newly generated phylogeny showed that the entire history of the Antarctic notothenioids is characterized by multiple, relatively rapid, events of disruption of the gene order. We hypothesized that a pre-existing genomic flexibility of the ancestor of the Antarctic notothenioids may have generated a precondition for gene order rearrangement, and the pressure of purifying selection could have worked for a rapid restoration of the mitochondrial functionality and compactness after each event of rearrangement.
AbstractThe novel DSE Laburnicola rhizohalophila (Pleosporales, Ascomycota) is frequently found in the halophytic seepweed (Suaeda salsa). In this article, we report a near-chromosome-level hybrid assembly of this fungus using a combination of short-read Illumina data to polish assemblies generated from long-read Nanopore data. The reference genome for L. rhizohalophila was assembled into 26 scaffolds with a total length of 64.0 Mb and a N50 length of 3.15 Mb. Of them, 17 scaffolds approached the length of intact chromosomes, and 5 had telomeres at one end only. A total of 10,891 gene models were predicted. Intriguingly, 27.5 Mb of repeat sequences that accounted for 42.97% of the genome was identified, and long terminal repeat retrotransposons were the most frequent known transposable elements, indicating that transposable element proliferation contributes to its increased genome size. BUSCO analyses using the Fungi_odb10 data set showed that 95.0% of genes were complete. In addition, 292 carbohydrate active enzymes, 33 secondary metabolite clusters, and 84 putative effectors were identified in silico. The resulting high-quality assembly and genome features are not only an important resource for further research on understanding the mechanism of root-fungi symbiotic interactions but will also contribute to comparative analyses of genome biology and evolution within Pleosporalean species.
AbstractDifferences in immune function between species could be a result of interspecific divergence in coding sequence and/or expression of immune genes. Here, we investigate how the degree of divergence in coding sequence and expression differs between functional categories of immune genes, and if differences between categories occur independently of other factors (expression level, pleiotropy). To this end, we compared spleen transcriptomes of wild-caught yellow-necked mice and bank voles. Immune genes expressed in the spleen were divided into four categories depending on the function of the encoded protein: pattern recognition receptors (PRR); signal transduction proteins; transcription factors; and cyto- and chemokines and their receptors. Genes encoding PRR and cyto-/chemokines had higher sequence divergence than genes encoding signal transduction proteins and transcription factors, even when controlling for potentially confounding factors. Genes encoding PRR also had higher expression divergence than genes encoding signal transduction proteins and transcription factors. There was a positive correlation between expression divergence and coding sequence divergence, in particular for PRR genes. We propose that this is a result of that divergence in PRR coding sequence leads to divergence in PRR expression through positive feedback of PRR ligand binding on PRR expression. When controlling for sequence divergence, expression divergence of PRR genes did not differ from other categories. Taken together, the results indicate that coding sequence divergence of PRR genes is a major cause of differences in immune function between species.
AbstractMitochondrial DNA (mtDNA) is present in multiple copies within an organism. Since these copies are not identical, a single individual carries a heterogeneous population of mtDNAs, a condition known as heteroplasmy. Several factors play a role in the dynamics of the within-organism mtDNA population: among them, genetic bottlenecks, selection, and strictly maternal inheritance are known to shape the levels of heteroplasmy across mtDNAs.In Metazoa, the only evolutionarily stable exception to the strictly maternal inheritance of mitochondria is the doubly uniparental inheritance (DUI), reported in 100+ bivalve species. In DUI species, there are two highly divergent mtDNA lineages, one inherited through oocyte mitochondria (F-type) and the other through sperm mitochondria (M-type). Having both parents contributing to the mtDNA pool of the progeny makes DUI a unique system to study the dynamics of mtDNA populations. Since, in bivalves, the spermatozoon has few mitochondria (4–5), M-type mtDNA faces a tight bottleneck during embryo segregation, one of the narrowest mitochondrial bottlenecks investigated so far.Here, we analyzed the F- and M-type mtDNA variability within individuals of the DUI species Ruditapes philippinarum and investigated for the first time the effects of such a narrow bottleneck affecting mtDNA populations. As a potential consequence of this narrow bottleneck, the M-type mtDNA shows a large variability in different tissues, a condition so pronounced that it leads to genotypes from different tissues of the same individual not to cluster together. We believe that such results may help understanding the effect of low population size on mtDNA bottleneck.
AbstractAs a polyphagous soil-dwelling predatory mite, Stratiolaelaps scimitus (Womersley) (Acari: Laelapidae), formerly known as Stratiolaelaps miles (Berlese), is native to the Northern hemisphere and preys on soil invertebrates, including fungus gnats, springtails, thrips nymphs, nematodes, and other species of mites. Already mass-produced and commercialized in North America, Europe, Oceania and China, S. scimitus will highly likely be introduced to other countries and regions as a biocontrol agent against edaphic pests in the near future. The introduction, however, can lead to unexpected genetic changes within populations of biological control agents, which might decrease the efficacy of pest management or increase the risks to local environments. To better understand the genetic basis of its biology and behavior, we sequenced and assembled the draft genome of S. scimitus using the PacBio Sequel platform II. We generated ∼150× (64.81 Gb) PacBio long reads with an average read length of 12.60 kb. Reads longer than 5 kb were assembled into contigs, resulting in the final assembly of 158 contigs with an N50 length of 7.66 Mb, and captured 93.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set (n = 1,066). We identified 16.39% (69.91 Mb) repetitive elements, 1,686 noncoding RNAs, and 13,305 protein-coding genes, which represented 95.8% BUSCO completeness. Combining analyses of genome family evolution and function enrichment of gene ontology and pathway, a total of 135 families experienced significant expansions, which were mainly involved in digestion, detoxification, immunity, and venom. Major expansions of the detoxification enzymes, that is, P450s and carboxylesterases, suggest a possible genetic mechanism underlying polyphagy and ecological adaptions. Our high-quality genome assembly and annotation provide new insights on the evolutionary biology, soil ecology, and biological control for predaceous mites.
AbstractThe Pelagophyceae are marine stramenopile algae that include Aureoumbra lagunensis and Aureococcus anophagefferens, two microbial species notorious for causing harmful algal blooms. Despite their ecological significance, relatively few genomic studies of pelagophytes have been carried out. To improve understanding of the biology and evolution of pelagophyte algae, we sequenced complete mitochondrial genomes for A. lagunensis (CCMP1510), Pelagomonas calceolata (CCMP1756), and five strains of Aureoc. anophagefferens (CCMP1707, CCMP1708, CCMP1850, CCMP1984, and CCMP3368) using Nanopore long-read sequencing. All pelagophyte mitochondrial genomes assembled into single, circular mapping contigs between 39,376 bp (P. calceolata) and 55,968 bp (A. lagunensis) in size. Mitochondrial genomes for the five Aureoc. anophagefferens strains varied slightly in length (42,401–42,621 bp) and were 99.4–100.0% identical. Gene content and order were highly conserved between the Aureoc. anophagefferens and P. calceolata genomes, with the only major difference being a unique region in Aureoc. anophagefferens containingDNA adenine and cytosine methyltransferase (dam/dcm) genes that appear to be the product of lateral gene transfer from a prokaryotic or viral donor. Although the A. lagunensis mitochondrial genome shares seven distinct syntenic blocks with the other pelagophyte genomes, it has a tandem repeat expansion comprising ∼40% of its length, and lacks identifiable rps19 and glycine tRNA genes. Laterally acquired self-splicing introns were also found in the 23S rRNA (rnl) gene of P. calceolata and the coxI gene of the five Aureoc. anophagefferens genomes. Overall, these data provide baseline knowledge about the genetic diversity of bloom-forming pelagophytes relative to nonbloom-forming species.
ForewordIn an occasional series of articles, we will be publishing autobiographical sketches from some of those working in the field of genome evolution. The series will feature both the very eminent, but also researchers closer to their start of their career, and those from underrepresented groups. The series will show the unusual paths that academics sometimes take and the obstacles they have overcome. We start this series with one of the most influential researchers in the field of molecular evolution, Wen-Hsiung Li. Wen-Hsiung has contributed enormously to the field and published on a wide diversity of topics, as described in this autobiographical sketch; he also wrote two textbooks, one of them with Dan Graur, which for many years were the bibles of the field. He was awarded the Motoo Kimura prize by the Society of Molecular Biology and Evolution in 2019 in recognition of his contributions to our subject.
AbstractAllopatric divergence is one of the principal mechanisms for speciation of macro-organisms. Microbes by comparison are assumed to disperse more freely and to be less limited by dispersal barriers. However, thermophilic prokaryotes restricted to geothermal springs have shown clear signals of geographic isolation, but robust studies on this topic for microbes with less strict habitat requirements are scarce. Furthermore, it has only recently been recognized that homologous recombination among conspecific individuals provides species coherence in a wide range of prokaryotes. Recombination barriers thus may define prokaryotic species boundaries, yet, the extent to which geographic distance between populations gives rise to such barriers is an open question. Here, we investigated gene flow and population structure in a widespread species of pelagic freshwater bacteria, Polynucleobacter paneuropaeus. Through comparative genomics of 113 conspecific strains isolated from freshwater lakes and ponds located across a North–South range of more than 3,000 km, we were able to reconstruct past gene flow events. The species turned out to be highly recombinogenic as indicated by significant signs of gene transfer and extensive genome mosaicism. Although genomic differences increased with spatial distance on a regional scale (<170 km), such correlations were mostly absent on larger scales up to 3,400 km. We conclude that allopatric divergence in European P. paneuropaeus is minor, and that effective gene flow across the sampled geographic range in combination with a high recombination efficacy maintains species coherence.
AbstractDue to their pluripotent nature and unlimited cell renewal, stem cells have been proposed as an ideal material for establishing long-term cnidarian cell cultures. However, the lack of unifying principles associated with “stemness” across the phylum complicates stem cells’ identification and isolation. Here, we for the first time report gene expression profiles for cultured coral cells, focusing on regulatory gene networks underlying pluripotency and differentiation. Cultures were initiated from Acropora digitifera tip fragments, the fastest growing tissue in Acropora. Overall, in vitro transcription resembled early larvae, overexpressing orthologs of premetazoan and Hydra stem cell markers, and transcripts with roles in cell division, migration, and differentiation. Our results suggest the presence of pluripotent cell types in cultures and indicate the existence of ancestral genome regulatory modules underlying pluripotency and cell differentiation in cnidaria. Cultured cells appear to be synthesizing protein, differentiating, and proliferating.
AbstractThe Northern house mosquito, Culex pipiens pallens, serves as important temperate vectors of several diseases, particularly the epidemic encephalitis and lymphatic filariasis. Reference genome of the Cx. pipiens pallens is helpful to understand its genomic basis underlying the complexity of mosquito biology. Using 142 Gb (∼250×) of the PacBio long reads, we assembled a draft genome of 567.56 Mb. The assembly includes 1,714 contigs with a N50 length of 0.84 Mb and a Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness of 95.6% (n = 1,367). We masked 60.63% (344.11 Mb) of the genome as repetitive elements and identified 2,032 noncoding RNAs. A total of 18,122 protein-coding genes captured a 94.1% of BUSCO gene set. Gene family evolution and function enrichment analyses revealed that significantly expanded gene families mainly involved in immunity, gustatory and olfactory chemosensation, and DNA replication/repair.
AbstractTrichoptera (caddisflies) play an essential role in freshwater ecosystems; for instance, larvae process organic material from the water and are food for a variety of predators. Knowledge on the genomic diversity of caddisflies can facilitate comparative and phylogenetic studies thereby allowing scientists to better understand the evolutionary history of caddisflies. Although Trichoptera are the most diverse aquatic insect order, they remain poorly represented in terms of genomic resources. To date, all long-read based genomes have been sequenced from individuals in the retreat-making suborder, Annulipalpia, leaving ∼275 Ma of evolution without high-quality genomic resources. Here, we report the first long-read based de novo genome assemblies of two tube case-making Trichoptera from the suborder Integripalpia, Agrypnia vestita Walker and Hesperophylax magnus Banks. We find that these tube case-making caddisflies have genome sizes that are at least 3-fold larger than those of currently sequenced annulipalpian genomes and that this pattern is at least partly driven by major expansion of repetitive elements. In H. magnus, long interspersed nuclear elements alone exceed the entire genome size of some annulipalpian counterparts suggesting that caddisflies have high potential as a model for understanding genome size evolution in diverse insect lineages.
AbstractGene duplications and novel genes have been shown to play a major role in helminth adaptation to a parasitic lifestyle because they provide the novelty necessary for adaptation to a changing environment, such as living in multiple hosts. Here we present the de novo sequenced and annotated genome of the parasitic trematode Atriophallophorus winterbourni and its comparative genomic analysis to other major parasitic trematodes. First, we reconstructed the species phylogeny, and dated the split of A. winterbourni from the Opisthorchiata suborder to approximately 237.4 Ma (±120.4 Myr). We then addressed the question of which expanded gene families and gained genes are potentially involved in adaptation to parasitism. To do this, we used hierarchical orthologous groups to reconstruct three ancestral genomes on the phylogeny leading to A. winterbourni and performed a GO (Gene Ontology) enrichment analysis of the gene composition of each ancestral genome, allowing us to characterize the subsequent genomic changes. Out of the 11,499 genes in the A. winterbourni genome, as much as 24% have arisen through duplication events since the speciation of A. winterbourni from the Opisthorchiata, and as much as 31.9% appear to be novel, that is, newly acquired. We found 13 gene families in A. winterbourni to have had more than ten genes arising through these recent duplications; all of which have functions potentially relating to host behavioral manipulation, host tissue penetration, and hiding from host immunity through antigen presentation. We identified several families with genes evolving under positive selection. Our results provide a valuable resource for future studies on the genomic basis of adaptation to parasitism and point to specific candidate genes putatively involved in antagonistic host–parasite adaptation.