Continue Reading →
In the past 25 years, stretching from the dawn of personal computing to the social media age, the “always free” molecular evolutionary genetics analysis (MEGA) tool has been downloaded 1.6 million times worldwide.
Vincent L. Cannataro and Jeffrey P. Townsend
AbstractSelective sweep is a phenomenon of reduced variation at presumably neutrally evolving sites (hitchhikers) in the genome that is caused by the spread of a selected allele at a linked focal site, and is widely used to test for action of positive selection. Nonetheless, selective sweep may also provide an unprecedented opportunity for studying nonequilibrium properties of the neutral variation itself. We have demonstrated this possibility in relation to ancient selective sweep for modern human-specific changes and ongoing selective sweep for local population-specific changes.
AbstractThe Molecular Evolutionary Genetics Analysis (Mega) software implements many analytical methods and tools for phylogenomics and phylomedicine. Here, we report a transformation of Mega to enable cross-platform use on Microsoft Windows and Linux operating systems. Mega X does not require virtualization or emulation software and provides a uniform user experience across platforms. Mega X has additionally been upgraded to use multiple computing cores for many molecular evolutionary analyses. Mega X is available in two interfaces (graphical and command line) and can be downloaded from www.megasoftware.net free of charge.
AbstractIn this perspective, we evaluate the explanatory power of the neutral theory of molecular evolution, 50 years after its introduction by Kimura. We argue that the neutral theory was supported by unreliable theoretical and empirical evidence from the beginning, and that in light of modern, genome-scale data, we can firmly reject its universality. The ubiquity of adaptive variation both within and between species means that a more comprehensive theory of molecular evolution must be sought.
AbstractKimura’s neutral theory provides the whole theoretical basis of the behavior of mutations in a Wright–Fisher population. We here discuss how it can be applied to a cancer cell population, in which there is an increasing interest in genetic variation within a tumor. We explain a couple of fundamental differences between cancer cell populations and asexual organismal populations. Once these differences are taken into account, a number of powerful theoretical tools developed for a Wright–Fisher population could be readily contribute to our deeper understanding of the evolutionary dynamics of cancer cell population.
AbstractHIV is one of the fastest evolving organisms known. It evolves about 1 million times faster than its host, humans. Because HIV establishes chronic infections, with continuous evolution, its divergence within a single infected human surpasses the divergence of the entire humanoid history. Yet, it is still the same virus, infecting the same cell types and using the same replication machinery year after year. Hence, one would think that most mutations that HIV accumulates are neutral. But the picture is more complicated than that. HIV evolution is also a clear example of strong positive selection, that is, mutants have a survival advantage. How do these facts come together?
AbstractImportance of chance, finiteness, and history in evolution is pointed out with special reference to the neutral theory.
AbstractThe evolution of viral pathogens is shaped by strong selective forces that are exerted during jumps to new hosts, confrontations with host immune responses and antiviral drugs, and numerous other processes. However, while undeniably strong and frequent, adaptive evolution is largely confined to small parts of information-packed viral genomes, and the majority of observed variation is effectively neutral. The predictions and implications of the neutral theory have proven immensely useful in this context, with applications spanning understanding within-host population structure, tracing the origins and spread of viral pathogens, predicting evolutionary dynamics, and modeling the emergence of drug resistance. We highlight the multiple ways in which the neutral theory has had an impact, which has been accelerated in the age of high-throughput, high-resolution genomics.
AbstractResearch in population genetics and evolutionary biology has always provided a computational backbone for life sciences as a whole. Today evolutionary and population biology reasoning are essential for interpretation of large complex datasets that are characteristic of all domains of today’s life sciences ranging from cancer biology to microbial ecology. This situation makes algorithms and software tools developed by our community more important than ever before. This means that we, developers of software tool for molecular evolutionary analyses, now have a shared responsibility to make these tools accessible using modern technological developments as well as provide adequate documentation and training.
AbstractGenetic differences between species and within populations are two sides of the same coin under the neutral theory of molecular evolution. This theory posits that a vast majority of evolutionary substitutions, which appear as differences between species, are (nearly) neutral, that is, these substitutions are permitted without a significantly adverse impact on a species’ survival. We refer to them as evolutionarily permissible (ePerm) variation. Evolutionary permissibility of any possible variant can be inferred from multispecies sequence alignments by applying sophisticated statistical methods to the evolutionary tree of species. Here, we explore the evolutionary permissibility of amino acid variants associated with genetic diseases and those observed in personal exomes. Consistent with the predictions of the neutral theory, disease associated amino acid variants are rarely ePerm, much more biochemically radical, and found predominantly at more conserved positions than their non-disease counterparts. Only 10% of amino acid mutations are ePerm, but these variants rise to become two-thirds of all substitutions in the human lineage (a 6-fold enrichment). In contrast, only a minority of the variants in a personal exome are ePerm, a seemingly counterintuitive pattern that results from a combination of mutational and evolutionary processes that are, in fact, broadly consistent with the neutral theory. Evolutionarily forbidden variants outnumber detrimental variants in individual exomes and may play an underappreciated role in protecting against disease. We discuss these observations and conclude that the long-term evolutionary history of species can illuminate functional biomedical properties of variation present in personal exomes.
AbstractAmong the multitude of papers published yearly in scientific journals, precious few publications may be worth looking back in half a century to appreciate the significance of the discoveries that would later become common knowledge and get a chance to shape a field or several adjacent fields. Here, Kimura’s fundamental concept of neutral mutation-random drift, which was published 50 years ago, is re-examined in light of its pervasive influence on comparative genomics and, more specifically, on the contribution of transposable elements to eukaryotic genome evolution.
AbstractI detail four major open problems in microbial population genetics with direct implications to the study of molecular evolution: the lack of neutral polymorphism, the modeling of promiscuous genetic exchanges, the genetics of ill-defined populations, and the difficulty of untangling selection and demography in the light of these issues. Together with the historical focus on the study of single nucleotide polymorphism and widespread non-random sampling, these problems limit our understanding of the genetic variation in bacterial populations and their adaptive effects. I argue that we need novel theoretical approaches accounting for pervasive selection and strong genetic linkage to better understand microbial evolution.
AbstractKimura’s neutral theory argued that positive selection was not responsible for an appreciable fraction of molecular substitutions. Correspondingly, quantitative analysis reveals that the vast majority of substitutions in cancer genomes are not detectably under selection. Insights from the somatic evolution of cancer reveal that beneficial substitutions in cancer constitute a small but important fraction of the molecular variants. The molecular evolution of cancer community will benefit by incorporating the neutral theory of molecular evolution into their understanding and analysis of cancer evolution—and accepting the use of tractable, predictive models, even when there is some evidence that they are not perfect.
AbstractKimura’s neutral theory of molecular evolution has been essential to virtually every advance in evolutionary genetics, and by extension, is foundational to the field of conservation genetics. Conservation genetics utilizes the key concepts of neutral theory to identify species and populations at risk of losing evolutionary potential by detecting patterns of inbreeding depression and low effective population size. In turn, this information can inform the management of organisms and their habitat providing hope for the long-term preservation of both. We expand upon Avise’s “inventorial” and “functional” categories of conservation genetics by proposing a third category that is linked to the coalescent and that we refer to as “process-driven.” It is here that connections between Kimura’s theory and conservation genetics are strongest. Process-driven conservation genetics can be especially applied to large genomic data sets to identify patterns of historical risk, such as population bottlenecks, and accordingly, yield informed intuitions for future outcomes. By examining inventorial, functional, and process-driven conservation genetics in sequence, we assess the progression from theory, to data collection and analysis, and ultimately, to the production of hypotheses that can inform conservation policies.
AbstractDAMBE is a comprehensive software package for genomic and phylogenetic data analysis on Windows, Linux, and Macintosh computers. New functions include imputing missing distances and phylogeny simultaneously (paving the way to build large phage and transposon trees), new bootstrapping/jackknifing methods for PhyPA (phylogenetics from pairwise alignments), and an improved function for fast and accurate estimation of the shape parameter of the gamma distribution for fitting rate heterogeneity over sites. Previous method corrects multiple hits for each site independently. DAMBE’s new method uses all sites simultaneously for correction. DAMBE, featuring a user-friendly graphic interface, is freely available from http://dambe.bio.uottawa.ca (last accessed, April 17, 2018).
AbstractFish mitochondrial genome (mitogenome) data form a fundamental basis for revealing vertebrate evolution and hydrosphere ecology. Here, we report recent functional updates of MitoFish, which is a database of fish mitogenomes with a precise annotation pipeline MitoAnnotator. Most importantly, we describe implementation of MiFish pipeline for metabarcoding analysis of fish mitochondrial environmental DNA, which is a fast-emerging and powerful technology in fish studies. MitoFish, MitoAnnotator, and MiFish pipeline constitute a key platform for studies of fish evolution, ecology, and conservation, and are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed April 7th, 2018).
AbstractHere, we present a synthetic view on how Kimura’s Neutral theory has helped us gaining insight on the different evolutionary forces that shape human evolution. We put this perspective in the frame of recent emerging challenges: the use of whole genome data for reconstructing population histories, natural selection on complex polygenic traits, and integrating cultural processes in human evolution.
AbstractPrimates have traditionally been regarded as vision-oriented animals with low olfactory ability, though this “microsmatic primates” view has been challenged recently. To clarify when and how degeneration of the olfactory system occurred and to specify the relevant factors during primate evolution, we here examined the olfactory receptor (OR) genes from 24 phylogenetically and ecologically diverse primate species. The results revealed that strepsirrhines with curved noses had functional OR gene repertoires that were nearly twice as large as those for haplorhines with simple noses. Neither activity pattern (nocturnal/diurnal) nor color vision system showed significant correlation with the number of functional OR genes while phylogeny and nose structure (haplorhine/strepsirrhine) are statistically controlled, but extent of folivory did. We traced the evolutionary fates of individual OR genes by identifying orthologous gene groups, demonstrating that the rates of OR gene losses were accelerated at the ancestral branch of haplorhines, which coincided with the acquisition of acute vision. The highest rate of OR gene loss was observed at the ancestral branch of leaf-eating colobines; this reduction is possibly linked with the dietary transition from frugivory to folivory because odor information is essential for fruit foraging but less so for leaf foraging. Intriguingly, we found accelerations of OR gene losses in an external branch to every hominoid species examined. These findings suggest that the current OR gene repertoire in each species has been shaped by a complex interplay of phylogeny, anatomy, and habitat; therefore, multiple factors may contribute to the olfactory degeneration in primates.
AbstractAlthough the neutral theory of molecular evolution was proposed to explain DNA and protein sequence evolution, in principle it could also explain phenotypic evolution. Nevertheless, overall, phenotypes should be less likely than genotypes to evolve neutrally. I propose that, when phenotypic traits are stratified according to a hierarchy of biological organization, the fraction of evolutionary changes in phenotype that are adaptive rises with the phenotypic level considered. Consistently, molecular traits are frequently found to evolve neutrally whereas a large, random set of organismal traits were recently reported to vary largely adaptively. Many more studies of unbiased samples of phenotypic traits are needed to test the general validity of this hypothesis.
AbstractIn its initial formulation by Motoo Kimura, the neutral theory was concerned solely with the level of variability maintained by random genetic drift of selectively neutral mutations, and the rate of molecular evolution caused by the fixation of such mutations. The original theory considered events at a single genetic locus in isolation from the rest of the genome. It did not take long, however, for theoreticians to wonder whether selection at one or more loci might influence neutral variability at linked sites. Once DNA sequence variability could be studied, and especially when resequencing of whole genomes became possible, it became clear that patterns of neutral variability in genomes are affected by selection at linked sites, and that these patterns could advance our understanding of natural selection, and can be used to detect the action of selection in genomic regions, including selection much weaker than could be detected by direct measurements of the relative fitnesses of different genotypes. We outline the different types of processes that have been studied, in approximate order of their historical development.
AbstractSmall insertions and deletions (INDELs; ≤50 bp) are the most common type of variability after single nucleotide polymorphism (SNP). However, compared with SNPs, we know little about the distribution of fitness effects (DFE) of new INDEL mutations and how prevalent adaptive INDEL substitutions are. Studying INDELs has been difficult partly because identifying ancestral states at these sites is error-prone and misidentification can lead to severely biased estimates of the strength of selection. To solve these problems, we develop new maximum likelihood methods, which use polymorphism data to simultaneously estimate the DFE, the mutation rate, and the misidentification rate. These methods are applicable to both INDELs and SNPs. Simulations show that they can provide highly accurate results. We applied the methods to an INDEL polymorphism data set in Drosophila melanogaster. We found that the DFE for polymorphic INDELs in protein-coding regions is bimodal, with the variants being either nearly neutral or strongly deleterious. Based on the DFE, we estimated that 71.5–83.7% of the INDEL substitutions that took place along the D. melanogaster lineage were fixed by positive selection, which is comparable with the prevalence of adaptive substitutions at nonsynonymous sites. The new methods have been implemented in the software package anavar.
AbstractMammalian diversification has coincided with a rapid proliferation of various types of noncoding RNAs, including members of both snRNAs and snoRNAs. The significance of this expansion however remains obscure. While some ncRNA copy-number expansions have been linked to functionally tractable effects, such events may equally likely be neutral, perhaps as a result of random retrotransposition. Hindering progress in our understanding of such observations is the difficulty in establishing function for the diverse features that have been identified in our own genome. Projects such as ENCODE and FANTOM have revealed a hidden world of genomic expression patterns, as well as a host of other potential indicators of biological function. However, such projects have been criticized, particularly from practitioners in the field of molecular evolution, where many suspect these data provide limited insight into biological function. The molecular evolution community has largely taken a skeptical view, thus it is important to establish tests of function. We use a range of data, including data drawn from ENCODE and FANTOM, to examine the case for function for the recent copy number expansion in mammals of six evolutionarily ancient RNA families involved in splicing and rRNA maturation. We use several criteria to assess evidence for function: conservation of sequence and structure, genomic synteny, evidence for transposition, and evidence for species-specific expression. Applying these criteria, we find that only a minority of loci show strong evidence for function and that, for the majority, we cannot reject the null hypothesis of no function.
AbstractThe genetic basis of parallel evolution of similar species is of great interest in evolutionary biology. In the adaptive radiation of Lake Victoria cichlid fishes, sister species with either blue or red-back male nuptial coloration have evolved repeatedly, often associated with shallower and deeper water, respectively. One such case is blue and red-backed Pundamilia species, for which we recently showed that a young species pair may have evolved through “hybrid parallel speciation”. Coalescent simulations suggested that the older species P. pundamilia (blue) and P. nyererei (red-back) admixed in the Mwanza Gulf and that new “nyererei-like” and “pundamilia-like” species evolved from the admixed population. Here, we use genome scans to study the genomic architecture of differentiation, and assess the influence of hybridization on the evolution of the younger species pair. For each of the two species pairs, we find over 300 genomic regions, widespread across the genome, which are highly differentiated. A subset of the most strongly differentiated regions of the older pair are also differentiated in the younger pair. These shared differentiated regions often show parallel allele frequency differences, consistent with the hypothesis that admixture-derived alleles were targeted by divergent selection in the hybrid population. However, two-thirds of the genomic regions that are highly differentiated between the younger species are not highly differentiated between the older species, suggesting independent evolutionary responses to selection pressures. Our analyses reveal how divergent selection on admixture-derived genetic variation can facilitate new speciation events.
AbstractIdentifying the genomic basis underlying local adaptation is paramount to evolutionary biology, and bears many applications in the fields of conservation biology, crop, and animal breeding, as well as personalized medicine. Although many approaches have been developed to detect signatures of positive selection within single populations and population pairs, the increasing wealth of high-throughput sequencing data requires improved methods capable of handling multiple, and ideally large number of, populations in a single analysis. In this study, we introduce LSD (levels of exclusively shared differences), a fast and flexible framework to perform genome-wide selection scans, along the internal and external branches of a given population tree. We use forward simulations to demonstrate that LSD can identify branches targeted by positive selection with remarkable sensitivity and specificity. We illustrate a range of potential applications by analyzing data from the 1000 Genomes Project and uncover a list of adaptive candidates accompanying the expansion of anatomically modern humans out of Africa and their spread to Europe.
AbstractDetecting selection on codon usage (CU) is a difficult task, since CU can be shaped by both the mutational process and selective constraints operating at the DNA, RNA, and protein levels. Yang and Nielsen (2008) developed a test (which we call CUYN) for detecting selection on CU using two competing mutation-selection models of codon substitution. The null model assumes that CU is determined by the mutation bias alone, whereas the alternative model assumes that both mutation bias and/or selection act on CU. In applications on mammalian-scale alignments, the CUYN test detects selection on CU for numerous genes. This is surprising, given the small effective population size of mammals, and prompted us to use simulations to evaluate the robustness of the test to model violations. Simulations using a modest level of CpG hypermutability completely mislead the test, with 100% false positives. Surprisingly, a high level of false positives (56.1%) resulted simply from using the HKY mutation-level parameterization within the CUYN test on simulations conducted with a GTR mutation-level parameterization. Finally, by using a crude optimization procedure on a parameter controlling the CpG hypermutability rate, we find that this mutational property could explain a very large part of the observed mammalian CU. Altogether, our work emphasizes the need to evaluate the potential impact of model violations on statistical tests in the field of molecular phylogenetic analysis. The source code of the simulator and the mammalian genes used are available as a GitHub repository (https://github.com/Simonll/LikelihoodFreePhylogenetics.git).
AbstractWhen a substitution model is fitted to an alignment using maximum likelihood, its parameters are adjusted to account for as much site-pattern variation as possible. A parameter might therefore absorb a substantial quantity of the total variance in an alignment (or more formally, bring about a substantial reduction in the deviance of the fitted model) even if the process it represents played no role in the generation of the data. When this occurs, we say that the parameter estimate carries phenomenological load (PL). Large PL in a parameter estimate is a concern because it not only invalidates its mechanistic interpretation (if it has one) but also increases the likelihood that it will be found to be statistically significant. The problem of PL was not identified in the past because most off-the-shelf substitution models make simplifying assumptions that preclude the generation of realistic levels of variation. In this study, we use the more realistic mutation-selection framework as the basis of a generating model formulated to produce data that mimic an alignment of mammalian mitochondrial DNA. We show that a parameter estimate can carry PL when 1) the substitution model is underspecified and 2) the parameter represents a process that is confounded with other processes represented in the data-generating model. We then provide a method that can be used to identify signal for the process that a given parameter represents despite the existence of PL.
AbstractThe olfactory receptor (OR) gene families, which govern mammalian olfaction, have undergone extensive expansion and contraction through duplication and pseudogenization. Previous studies have shown that broadly defined environmental adaptations (e.g., terrestrial vs. aquatic) are correlated with the number of functional and non-functional OR genes retained. However, to date, no study has examined species-specific gene duplications in multiple phylogenetically divergent mammals to elucidate OR evolution and adaptation. Here, we identify the OR gene families driving adaptation to different ecological niches by mapping the fate of species-specific gene duplications in the OR repertoire of 94 diverse mammalian taxa, using molecular phylogenomic methods. We analyze >70,000 OR gene sequences mined from whole genomes, generated from novel amplicon sequencing data, and collated with data from previous studies, comprising one of the largest OR studies to date. For the first time, we demonstrate statistically significant patterns of OR species-specific gene duplications associated with the presence of a functioning vomeronasal organ. With respect to dietary niche, we uncover a novel link between a large number of duplications in OR family 5/8/9 and herbivory. Our results also highlight differences between social and solitary niches, indicating that a greater OR repertoire expansion may be associated with a solitary lifestyle. This study demonstrates the utility of species-specific duplications in elucidating gene family evolution, revealing how the OR repertoire has undergone expansion and contraction with respect to a number of ecological adaptations in mammals.
AbstractWith advances in transcript profiling, the presence of transcriptional activities in intergenic regions has been well established. However, whether intergenic expression reflects transcriptional noise or activity of novel genes remains unclear. We identified intergenic transcribed regions (ITRs) in 15 diverse flowering plant species and found that the amount of intergenic expression correlates with genome size, a pattern that could be expected if intergenic expression is largely nonfunctional. To further assess the functionality of ITRs, we first built machine learning models using Arabidopsis thaliana as a model that accurately distinguish functional sequences (benchmark protein-coding and RNA genes) and likely nonfunctional ones (pseudogenes and unexpressed intergenic regions) by integrating 93 biochemical, evolutionary, and sequence-structure features. Next, by applying the models genome-wide, we found that 4,427 ITRs (38%) and 796 annotated ncRNAs (44%) had features significantly similar to benchmark protein-coding or RNA genes and thus were likely parts of functional genes. Approximately 60% of ITRs and ncRNAs were more similar to nonfunctional sequences and were likely transcriptional noise. The predictive framework established here provides not only a comprehensive look at how functional, genic sequences are distinct from likely nonfunctional ones, but also a new way to differentiate novel genes from genomic regions with noisy transcriptional activities.
Abstractβ-Catenin acts as a transcriptional coactivator in the Wnt/β-catenin signaling pathway and a cytoplasmic effector in cadherin-based cell adhesion. These functions are ancient within animals, but the earliest steps in β-catenin evolution remain unresolved due to limited data from key lineages—sponges, ctenophores, and placozoans. Previous studies in sponges have characterized β-catenin expression dynamics and used GSK3B antagonists to ectopically activate the Wnt/β-catenin pathway; both approaches rely upon untested assumptions about the conservation of β-catenin function and regulation in sponges. Here, we test these assumptions using an antibody raised against β-catenin from the sponge Ephydatia muelleri. We find that cadherin-complex genes coprecipitate with endogenous Em β-catenin from cell lysates, but that Wnt pathway components do not. However, through immunostaining we detect both cell boundary and nuclear populations, and we find evidence that Em β-catenin is a conserved substrate of GSK3B. Collectively, these data support conserved roles for Em β-catenin in both cell adhesion and Wnt signaling. Additionally, we find evidence for an Em β-catenin population associated with the distal ends of F-actin stress fibers in apparent cell–substrate adhesion structures that resemble focal adhesions. This finding suggests a fundamental difference in the adhesion properties of sponge tissues relative to other animals, in which the adhesion functions of β-catenin are typically restricted to cell–cell adhesions.
AbstractThe evolution of new biochemical activities frequently involves complex dependencies between mutations and rapid evolutionary radiation. Mutation co-occurrence and covariation have previously been used to identify compensating mutations that are the result of physical contacts and preserve protein function and fold. Here, we model pairwise functional dependencies and higher order interactions that enable evolution of new protein functions. We use a network model to find complex dependencies between mutations resulting from evolutionary trade-offs and pleiotropic effects. We present a method to construct these networks and to identify functionally interacting mutations in both extant and reconstructed ancestral sequences (Network Analysis of Protein Adaptation). The time ordering of mutations can be incorporated into the networks through phylogenetic reconstruction. We apply NAPA to three distantly homologous β-lactamase protein clusters (TEM, CTX-M-3, and OXA-51), each of which has experienced recent evolutionary radiation under substantially different selective pressures. By analyzing the network properties of each protein cluster, we identify key adaptive mutations, positive pairwise interactions, different adaptive solutions to the same selective pressure, and complex evolutionary trajectories likely to increase protein fitness. We also present evidence that incorporating information from phylogenetic reconstruction and ancestral sequence inference can reduce the number of spurious links in the network, whereas preserving overall network community structure. The analysis does not require structural or biochemical data. In contrast to function-preserving mutation dependencies, which are frequently from structural contacts, gain-of-function mutation dependencies are most commonly between residues distal in protein structure.
AbstractThe visual systems of snakes are heavily modified relative to other squamates, a condition often thought to reflect their fossorial origins. Further modifications are seen in caenophidian snakes, where evolutionary transitions between rod and cone photoreceptors, termed photoreceptor transmutations, have occurred in many lineages. Little previous work, however, has focused on the molecular evolutionary underpinnings of these morphological changes. To address this, we sequenced seven snake eye transcriptomes and utilized new whole-genome and targeted capture sequencing data. We used these data to analyze gene loss and shifts in selection pressures in phototransduction genes that may be associated with snake evolutionary origins and photoreceptor transmutation. We identified the surprising loss of rhodopsin kinase (GRK1), despite a low degree of gene loss overall and a lack of relaxed selection early during snake evolution. These results provide some of the first evolutionary genomic corroboration for a dim-light ancestor that lacks strong fossorial adaptations. Our results also indicate that snakes with photoreceptor transmutation experienced significantly different selection pressures from other reptiles. Significant positive selection was found primarily in cone-specific genes, but not rod-specific genes, contrary to our expectations. These results reveal potential molecular adaptations associated with photoreceptor transmutation and also highlight unappreciated functional differences between rod- and cone-specific phototransduction proteins. This intriguing example of snake visual system evolution illustrates how the underlying molecular components of a complex system can be reshaped in response to changing selection pressures.
AbstractGenome reduction is a recurring theme of symbiont evolution. The genus Spiroplasma contains species that are mostly facultative insect symbionts. The typical genome sizes of those species within the Apis clade were estimated to be ∼1.0–1.4 Mb. Intriguingly, Spiroplasma clarkii was found to have a genome size that is >30% larger than the median of other species within the same clade. To investigate the molecular evolution events that led to the genome expansion of this bacterium, we determined its complete genome sequence and inferred the evolutionary origin of each protein-coding gene based on the phylogenetic distribution of homologs. Among the 1,346 annotated protein-coding genes, 641 were originated from within the Apis clade while 233 were putatively acquired from outside of the clade (including 91 high-confidence candidates). Additionally, 472 were specific to S. clarkii without homologs in the current database (i.e., the origins remained unknown). The acquisition of protein-coding genes, rather than mobile genetic elements, appeared to be a major contributing factor of genome expansion. Notably, >50% of the high-confidence acquired genes are related to carbohydrate transport and metabolism, suggesting that these acquired genes contributed to the expansion of both genome size and metabolic capability. The findings of this work provided an interesting case against the general evolutionary trend observed among symbiotic bacteria and further demonstrated the flexibility of Spiroplasma genomes. For future studies, investigation on the functional integration of these acquired genes, as well as the inference of their contribution to fitness could improve our knowledge of symbiont evolution.
AbstractLactobacillus curvatus is a lactic acid bacterium encountered in many different types of fermented food (meat, seafood, vegetables, and cereals). Although this species plays an important role in the preservation of these foods, few attempts have been made to assess its genomic diversity. This study uses comparative analyses of 13 published genomes (complete or draft) to better understand the evolutionary processes acting on the genome of this species. Phylogenomic analysis, based on a coalescent model of evolution, revealed that the 6,742 sites of single nucleotide polymorphism within the L. curvatus core genome delineate two major groups, with lineage 1 represented by the newly sequenced strain FLEC03, and lineage 2 represented by the type-strain DSM20019. The two lineages could also be distinguished by the content of their accessory genome, which sheds light on a long-term evolutionary process of lineage-dependent genetic acquisition and the possibility of population structure. Interestingly, one clade from lineage 2 shared more accessory genes with strains of lineage 1 than with other strains of lineage 2, indicating recent convergence in carbohydrate catabolism. Both lineages had a wide repertoire of accessory genes involved in the fermentation of plant-derived carbohydrates that are released from polymers of α/β-glucans, α/β-fructans, and N-acetylglucosan. Other gene clusters were distributed among strains according to the type of food from which the strains were isolated. These results give new insight into the ecological niches in which L. curvatus may naturally thrive (such as silage or compost heaps) in addition to fermented food.
AbstractIn multicellular organisms, such as vertebrates and flowering plants, horizontal transfer (HT) of genetic information is thought to be a rare event. However, recent findings unveiled unexpectedly frequent HT of RTE-clade LINEs. To elucidate the molecular footprints of the genomic integration machinery of RTE-related retroposons, the sequence patterns surrounding the insertion sites of plant Au-like SINE families were analyzed in the genomes of a wide variety of flowering plants. A novel and remarkable finding regarding target site duplications (TSDs) for SINEs was they start with thymine approximately one helical pitch (ten nucleotides) downstream of a thymine stretch. This TSD pattern was found in RTE-clade LINEs, which share the 3′-end sequence of these SINEs, in the genome of leguminous plants. These results demonstrably show that Au-like SINEs were mobilized by the enzymatic machinery of RTE-clade LINEs. Further, we discovered the same TSD pattern in animal SINEs from lizard and mammals, in which the RTE-clade LINEs sharing the 3′-end sequence with these animal SINEs showed a distinct TSD pattern. Moreover, a significant correlation was observed between the first nucleotide of TSDs and microsatellite-like sequences found at the 3′-ends of SINEs and LINEs. We propose that RTE-encoded protein could preferentially bind to a DNA region that contains a thymine stretch to cleave a phosphodiester bond downstream of the stretch. Further, determination of cleavage sites and/or efficiency of primer sites for reverse transcription may depend on microsatellite-like repeats in the RNA template. Such a unique mechanism may have enabled retroposons to successfully expand in frontier genomes after HT.
AbstractThe small and conserved genomes of birds are likely a result of flight-related metabolic constraints. Recombination-driven deletions and minimal transposable element (TE) expansions have led to continually shrinking genomes during evolution of many lineages of volant birds. Despite constraints of genome size in birds, we identified multiple waves of amplification of TEs in Piciformes (woodpeckers, honeyguides, toucans, and barbets). Relative to other bird species’ genomic TE abundance (< 10% of genome), we found ∼17–30% TE content in multiple clades within Piciformes. Several families of the retrotransposon superfamily chicken repeat 1 (CR1) expanded in at least three different waves of activity. The most recent CR1 expansions (∼4–7% of genome) preceded bursts of diversification in the woodpecker clade and in the American barbets + toucans clade. Additionally, we identified several thousand polymorphic CR1 insertions (hundreds per individual) in three closely related woodpecker species. Woodpecker CR1 insertion polymorphisms are maintained at lower frequencies than single nucleotide polymorphisms indicating that purifying selection is acting against additional CR1 copies and that these elements impose a fitness cost on their host. These findings provide evidence of large scale and ongoing TE activity in avian genomes despite continual constraint on genome size.
AbstractCandidate genes associated with migration have been identified in multiple taxa: including salmonids, many of whom perform migrations requiring a series of physiological changes associated with the freshwater–saltwater transition. We screened over 5,500 SNPs for signatures of selection related to migratory behavior of brown trout Salmo trutta by focusing on ten differentially migrating freshwater populations from two watersheds (the Koutajoki and the Oulujoki). We found eight outlier SNPs potentially associated with migratory versus resident life history using multiple (≥3) outlier detection approaches. Comparison of three migratory versus resident population pairs in the Koutajoki watershed revealed seven outlier SNPs, of which three mapped close to genes ZNF665-like, GRM4-like, and PCDH8-like that have been previously associated with migration and smoltification in salmonids. Two outlier SNPs mapped to genes involved in mucus secretion (ST3GAL1-like) and osmoregulation (C14orf37-like). The last two strongly supported outlier SNPs mapped to thermally induced genes (FNTA1-like, FAM134C-like). Within the Oulujoki, the only consistent outlier SNP mapped close to a gene (EZH2) that is associated with compensatory growth in fasted trout. Our results suggest that a relatively small yet common set of genes responsible for physiological functions associated with resident and migratory life histories is evolutionarily conserved.
AbstractWe sequenced mitochondrial genomes from five diverse diatoms (Toxarium undulatum, Psammoneis japonica, Eunotia naegelii, Cylindrotheca closterium, and Nitzschia sp.), chosen to fill important phylogenetic gaps and help us characterize broadscale patterns of mitochondrial genome evolution in diatoms. Although gene content was strongly conserved, intron content varied widely across species. The vast majority of introns were of group II type and were located in the cox1 or rnl genes. Although recurrent intron loss appears to be the principal underlying cause of the sporadic distributions of mitochondrial introns across diatoms, phylogenetic analyses showed that intron distributions superficially consistent with a recurrent-loss model were sometimes more complicated, implicating horizontal transfer as a likely mechanism of intron acquisition as well. It was not clear, however, whether diatoms were the donors or recipients of horizontally transferred introns, highlighting a general challenge in resolving the evolutionary histories of many diatom mitochondrial introns. Although some of these histories may become clearer as more genomes are sampled, high rates of intron loss suggest that the origins of many diatom mitochondrial introns are likely to remain unclear.
AbstractVariation in genome content is a potent mechanism of microbial adaptation. The genomes of members of the cyanobacterial genus Acaryochloris vary greatly in gene content as a consequence of the idiosyncratic retention of both recent gene duplicates and plasmid-encoded genes acquired by horizontal transfer. For example, the genome of Acaryochloris strain MBIC11017, which was isolated from an iron-limited environment, is enriched in duplicated and novel genes involved in iron assimilation. Here, we took an integrative approach to characterize the adaptation of Acaryochloris MBIC11017 to low environmental iron availability and the relative contributions of the expression of duplicated versus novel genes. We observed that Acaryochloris MBIC11017 grew faster and to a higher yield in the presence of nanomolar concentrations of iron than did a closely related strain. These differences were associated with both a higher rate of iron assimilation and a greater abundance of iron assimilation transcripts. However, recently duplicated genes contributed little to increased transcript dosage; rather, the maintenance of these duplicates in the MBIC11017 genome is likely due to the sharing of ancestral dosage by expression reduction. Instead, novel, horizontally transferred genes are responsible for the differences in transcript abundance. The study provides insights on the mechanisms of adaptive genome evolution and gene expression in Acaryochloris.
AbstractMeiosis is one of the most conserved molecular processes in eukaryotes. The fidelity of pairing and segregation of homologous chromosomes has a major impact on the proper transmission of genetic information. Aberrant chromosomal transmission can have major phenotypic consequences, yet the mechanisms are poorly understood. Fungi are excellent models to investigate processes of chromosomal transmission, because many species have highly polymorphic genomes that include accessory chromosomes. Inheritance of accessory chromosomes is often unstable and chromosomal losses have little impact on fitness. We analyzed chromosomal inheritance in 477 progeny coming from two crosses of the fungal wheat pathogen Zymoseptoria tritici. For this, we developed a high-throughput screening method based on restriction site-associated DNA sequencing that generated dense coverage of genetic markers along each chromosome. We identified rare instances of chromosomal duplications (disomy) in core chromosomes. Accessory chromosomes showed high overall frequencies of disomy. Chromosomal rearrangements were found exclusively on accessory chromosomes and were more frequent than disomy. Accessory chromosomes present in only one of the parents in an analyzed cross were inherited at significantly higher rates than the expected 1:1 segregation ratio. Both the chromosome and the parental background had significant impacts on the rates of disomy, losses, rearrangements, and distorted inheritance. We found that chromosomes with higher sequence similarity and lower repeat content were inherited more faithfully. The large number of rearranged progeny chromosomes identified in this species will enable detailed analyses of the mechanisms underlying chromosomal rearrangement.
AbstractGonadal sex differentiation and reproduction are the keys to the perpetuation of favorable gene combinations and positively selected traits. In vertebrates, several gonad development features that differentiate tetrapods and fishes are likely to be, at least in part, related to the water-to-land transition. The collection of information from basal sarcopterygians, coelacanths, and lungfishes, is crucial to improve our understanding of the molecular evolution of pathways involved in reproductive functions, since these organisms are generally regarded as “living fossils” and as the direct ancestors of tetrapods. Here, we report for the first time the characterization of >50 genes related to sex differentiation and gametogenesis in Latimeria menadoensis and Protopterus annectens. Although the expression profiles of most genes is consistent with the intermediate position of basal sarcopterygians between actinopterygian fish and tetrapods, their phylogenetic placement and presence/absence patterns often reveal a closer affinity to the tetrapod orthologs. On the other hand, particular genes, for example, the male gonad factor gsdf (Gonadal Soma-Derived Factor), provide examples of ancestral traits shared with actinopterygians, which disappeared in the tetrapod lineage.
AbstractmicroRNAs are conserved noncoding regulatory factors implicated in diverse physiological and developmental processes in multicellular organisms, as causal macroevolutionary agents and for phylogeny inference. However, the conservation and phylogenetic utility of microRNAs has been questioned on evidence of pervasive loss. Here, we show that apparent widespread losses are, largely, an artefact of poorly sampled and annotated microRNAomes. Using a curated data set of animal microRNAomes, we reject the view that miRNA families are never lost, but they are rarely lost (92% are never lost). A small number of families account for a majority of losses (1.7% of families account for >45% losses), and losses are associated with lineages exhibiting phenotypic simplification. Phylogenetic analyses based on the presence/absence of microRNA families among animal lineages, and based on microRNA sequences among Osteichthyes, demonstrate the power of these small data sets in phylogenetic inference. Perceptions of widespread evolutionary loss of microRNA families are due to the uncritical use of public archives corrupted by spurious microRNA annotations, and failure to discriminate false absences that occur because of incomplete microRNAome annotation.
AbstractThe host plant range of herbivorous insects is a major aspect of insect–plant interaction, but the genetic basis of host range expansion in insects is poorly understood. In butterflies, gustatory receptor genes (GRs) play important roles in host plant selection by ovipositing females. Since several studies have shown associations between the repertoire sizes of chemosensory gene families and the diversity of resource use, we hypothesized that the increase in the number of genes in the GR family is associated with host range expansion in butterflies. Here, we analyzed the evolutionary dynamics of GRs among related species, including the host generalist Vanessa cardui and three specialists. Although the increase of the GR repertoire itself was not observed, we found that the gene birth rate of GRs was the highest in the lineage leading to V. cardui compared with other specialist lineages. We also identified two taxon-specific subfamilies of GRs, characterized by frequent lineage-specific duplications and higher non-synonymous substitution rates. Together, our results suggest that frequent gene duplications in GRs, which might be involved in the detection of plant secondary metabolites, were associated with host range expansion in the V. cardui lineage. These evolutionary patterns imply that the capability to perceive various compounds during host selection was favored during adaptation to diverse host plants.
AbstractThe merging of two divergent genomes in a hybrid is believed to trigger a “genomic shock”, disrupting gene regulation and transposable element (TE) silencing. Here, we tested this expectation by comparing the pattern of expression of transposable elements in their native and hybrid genomic context. For this, we sequenced the transcriptome of the Arabidopsis thaliana genotype Col-0, the A. lyrata genotype MN47 and their F1 hybrid. Contrary to expectations, we observe that the level of TE expression in the hybrid is strongly correlated to levels in the parental species. We detect that at most 1.1% of expressed transposable elements belonging to two specific subfamilies change their expression level upon hybridization. Most of these changes, however, are of small magnitude. We observe that the few hybrid-specific modifications in TE expression are more likely to occur when TE insertions are close to genes. In addition, changes in epigenetic histone marks H3K9me2 and H3K27me3 following hybridization do not coincide with TEs with changed expression. Finally, we further examined TE expression in parents and hybrids exposed to severe dehydration stress. Despite the major reorganization of gene and TE expression by stress, we observe that hybridization does not lead to increased disorganization of TE expression in the hybrid. Although our study did not examine TE transposition activity in hybrids, the examination of the transcriptome shows that TE expression is globally robust to hybridization. The term “genomic shock” is perhaps not appropriate to describe transcriptional modification in a viable hybrid merging divergent genomes.
AbstractStatistical phylogenetic analyses of genomic data depend on models of nucleotide or amino acid substitution. The adequacy of these substitution models can be assessed using a number of test statistics, allowing the model to be rejected when it is found to provide a poor description of the evolutionary process. A potentially valuable use of model-adequacy test statistics is to identify when data sets are likely to produce unreliable phylogenetic estimates, but their differences in performance are rarely explored. We performed a comprehensive simulation study to identify test statistics that are sensitive to some of the most commonly cited sources of phylogenetic estimation error. Our results show that, for many test statistics, traditional thresholds for assessing model adequacy can fail to reject the model when the phylogenetic inferences are inaccurate and imprecise. This is particularly problematic when analysing loci that have few informative sites. We propose new thresholds for assessing substitution model adequacy and demonstrate their effectiveness in analyses of three phylogenomic data sets. These thresholds lead to frequent rejection of the model for loci that yield topological inferences that are imprecise and are likely to be inaccurate. We also propose the use of a summary statistic that provides a practical assessment of overall model adequacy. Our approach offers a promising means of enhancing model choice in genome-scale data sets, potentially leading to improvements in the reliability of phylogenomic inference.
AbstractEndozoicomonas bacteria are generally beneficial symbionts of diverse marine invertebrates including reef-building corals, sponges, sea squirts, sea slugs, molluscs, and Bryozoans. In contrast, the recently reported Ca. Endozoicomonas cretensis was identified as a vertebrate pathogen, causing epitheliocystis in fish larvae resulting in massive mortality. Here, we described the Ca. E. cretensis draft genome, currently undergoing genome decay as evidenced by massive insertion sequence (IS element) expansion and pseudogene formation. Many of the insertion sequences are also predicted to carry outward-directed promoters, implying that they may be able to modulate the expression of neighbouring coding sequences (CDSs). Comparative genomic analysis has revealed many Ca. E. cretensis-specific CDSs, phage integration and novel gene families. Potential virulence related CDSs and machineries were identified in the genome, including secretion systems and related effector proteins, and systems related to biofilm formation and directed cell movement. Mucin degradation would be of importance to a fish pathogen, and many candidate CDSs associated with this pathway have been identified. The genome may reflect a bacterium in the process of changing niche from symbiont to pathogen, through expansion of virulence genes and some loss of metabolic capacity.
AbstractComparative genomics has become a central tool for evolutionary biology, and a better knowledge of understudied taxa represents the foundation for future work. In this study, we characterized the transcriptome of male and female mature gonads in the European clam Ruditapes decussatus, compared with that in the Manila clam Ruditapes philippinarum providing, for the first time in bivalves, information about transcription dynamics and sequence evolution of sex-biased genes. In both the species, we found a relatively low number of sex-biased genes (1,284, corresponding to 41.3% of the orthologous genes between the two species), probably due to the absence of sexual dimorphism, and the transcriptional bias is maintained in only 33% of the orthologs. The dN/dS is generally low, indicating purifying selection, with genes where the female-biased transcription is maintained between the two species showing a significantly higher dN/dS. Genes involved in embryo development, cell proliferation, and maintenance of genome stability show a faster sequence evolution. Finally, we report a lack of clear correlation between transcription level and evolutionary rate in these species, in contrast with studies that reported a negative correlation. We discuss such discrepancy and call into question some methodological approaches and rationales generally used in this type of comparative studies.