In the past 25 years, stretching from the dawn of personal computing to the social media age, the “always free” molecular evolutionary genetics analysis (MEGA) tool has been downloaded 1.6 million times worldwide.
Vincent L. Cannataro and Jeffrey P. Townsend
AbstractSelective sweep is a phenomenon of reduced variation at presumably neutrally evolving sites (hitchhikers) in the genome that is caused by the spread of a selected allele at a linked focal site, and is widely used to test for action of positive selection. Nonetheless, selective sweep may also provide an unprecedented opportunity for studying nonequilibrium properties of the neutral variation itself. We have demonstrated this possibility in relation to ancient selective sweep for modern human-specific changes and ongoing selective sweep for local population-specific changes.
AbstractThe Molecular Evolutionary Genetics Analysis (Mega) software implements many analytical methods and tools for phylogenomics and phylomedicine. Here, we report a transformation of Mega to enable cross-platform use on Microsoft Windows and Linux operating systems. Mega X does not require virtualization or emulation software and provides a uniform user experience across platforms. Mega X has additionally been upgraded to use multiple computing cores for many molecular evolutionary analyses. Mega X is available in two interfaces (graphical and command line) and can be downloaded from www.megasoftware.net free of charge.
AbstractIn this perspective, we evaluate the explanatory power of the neutral theory of molecular evolution, 50 years after its introduction by Kimura. We argue that the neutral theory was supported by unreliable theoretical and empirical evidence from the beginning, and that in light of modern, genome-scale data, we can firmly reject its universality. The ubiquity of adaptive variation both within and between species means that a more comprehensive theory of molecular evolution must be sought.
AbstractKimura’s neutral theory provides the whole theoretical basis of the behavior of mutations in a Wright–Fisher population. We here discuss how it can be applied to a cancer cell population, in which there is an increasing interest in genetic variation within a tumor. We explain a couple of fundamental differences between cancer cell populations and asexual organismal populations. Once these differences are taken into account, a number of powerful theoretical tools developed for a Wright–Fisher population could be readily contribute to our deeper understanding of the evolutionary dynamics of cancer cell population.
AbstractHIV is one of the fastest evolving organisms known. It evolves about 1 million times faster than its host, humans. Because HIV establishes chronic infections, with continuous evolution, its divergence within a single infected human surpasses the divergence of the entire humanoid history. Yet, it is still the same virus, infecting the same cell types and using the same replication machinery year after year. Hence, one would think that most mutations that HIV accumulates are neutral. But the picture is more complicated than that. HIV evolution is also a clear example of strong positive selection, that is, mutants have a survival advantage. How do these facts come together?
AbstractImportance of chance, finiteness, and history in evolution is pointed out with special reference to the neutral theory.
AbstractThe evolution of viral pathogens is shaped by strong selective forces that are exerted during jumps to new hosts, confrontations with host immune responses and antiviral drugs, and numerous other processes. However, while undeniably strong and frequent, adaptive evolution is largely confined to small parts of information-packed viral genomes, and the majority of observed variation is effectively neutral. The predictions and implications of the neutral theory have proven immensely useful in this context, with applications spanning understanding within-host population structure, tracing the origins and spread of viral pathogens, predicting evolutionary dynamics, and modeling the emergence of drug resistance. We highlight the multiple ways in which the neutral theory has had an impact, which has been accelerated in the age of high-throughput, high-resolution genomics.
AbstractResearch in population genetics and evolutionary biology has always provided a computational backbone for life sciences as a whole. Today evolutionary and population biology reasoning are essential for interpretation of large complex datasets that are characteristic of all domains of today’s life sciences ranging from cancer biology to microbial ecology. This situation makes algorithms and software tools developed by our community more important than ever before. This means that we, developers of software tool for molecular evolutionary analyses, now have a shared responsibility to make these tools accessible using modern technological developments as well as provide adequate documentation and training.
AbstractGenetic differences between species and within populations are two sides of the same coin under the neutral theory of molecular evolution. This theory posits that a vast majority of evolutionary substitutions, which appear as differences between species, are (nearly) neutral, that is, these substitutions are permitted without a significantly adverse impact on a species’ survival. We refer to them as evolutionarily permissible (ePerm) variation. Evolutionary permissibility of any possible variant can be inferred from multispecies sequence alignments by applying sophisticated statistical methods to the evolutionary tree of species. Here, we explore the evolutionary permissibility of amino acid variants associated with genetic diseases and those observed in personal exomes. Consistent with the predictions of the neutral theory, disease associated amino acid variants are rarely ePerm, much more biochemically radical, and found predominantly at more conserved positions than their non-disease counterparts. Only 10% of amino acid mutations are ePerm, but these variants rise to become two-thirds of all substitutions in the human lineage (a 6-fold enrichment). In contrast, only a minority of the variants in a personal exome are ePerm, a seemingly counterintuitive pattern that results from a combination of mutational and evolutionary processes that are, in fact, broadly consistent with the neutral theory. Evolutionarily forbidden variants outnumber detrimental variants in individual exomes and may play an underappreciated role in protecting against disease. We discuss these observations and conclude that the long-term evolutionary history of species can illuminate functional biomedical properties of variation present in personal exomes.
AbstractAmong the multitude of papers published yearly in scientific journals, precious few publications may be worth looking back in half a century to appreciate the significance of the discoveries that would later become common knowledge and get a chance to shape a field or several adjacent fields. Here, Kimura’s fundamental concept of neutral mutation-random drift, which was published 50 years ago, is re-examined in light of its pervasive influence on comparative genomics and, more specifically, on the contribution of transposable elements to eukaryotic genome evolution.
AbstractI detail four major open problems in microbial population genetics with direct implications to the study of molecular evolution: the lack of neutral polymorphism, the modeling of promiscuous genetic exchanges, the genetics of ill-defined populations, and the difficulty of untangling selection and demography in the light of these issues. Together with the historical focus on the study of single nucleotide polymorphism and widespread non-random sampling, these problems limit our understanding of the genetic variation in bacterial populations and their adaptive effects. I argue that we need novel theoretical approaches accounting for pervasive selection and strong genetic linkage to better understand microbial evolution.
AbstractKimura’s neutral theory argued that positive selection was not responsible for an appreciable fraction of molecular substitutions. Correspondingly, quantitative analysis reveals that the vast majority of substitutions in cancer genomes are not detectably under selection. Insights from the somatic evolution of cancer reveal that beneficial substitutions in cancer constitute a small but important fraction of the molecular variants. The molecular evolution of cancer community will benefit by incorporating the neutral theory of molecular evolution into their understanding and analysis of cancer evolution—and accepting the use of tractable, predictive models, even when there is some evidence that they are not perfect.
AbstractKimura’s neutral theory of molecular evolution has been essential to virtually every advance in evolutionary genetics, and by extension, is foundational to the field of conservation genetics. Conservation genetics utilizes the key concepts of neutral theory to identify species and populations at risk of losing evolutionary potential by detecting patterns of inbreeding depression and low effective population size. In turn, this information can inform the management of organisms and their habitat providing hope for the long-term preservation of both. We expand upon Avise’s “inventorial” and “functional” categories of conservation genetics by proposing a third category that is linked to the coalescent and that we refer to as “process-driven.” It is here that connections between Kimura’s theory and conservation genetics are strongest. Process-driven conservation genetics can be especially applied to large genomic data sets to identify patterns of historical risk, such as population bottlenecks, and accordingly, yield informed intuitions for future outcomes. By examining inventorial, functional, and process-driven conservation genetics in sequence, we assess the progression from theory, to data collection and analysis, and ultimately, to the production of hypotheses that can inform conservation policies.
AbstractDAMBE is a comprehensive software package for genomic and phylogenetic data analysis on Windows, Linux, and Macintosh computers. New functions include imputing missing distances and phylogeny simultaneously (paving the way to build large phage and transposon trees), new bootstrapping/jackknifing methods for PhyPA (phylogenetics from pairwise alignments), and an improved function for fast and accurate estimation of the shape parameter of the gamma distribution for fitting rate heterogeneity over sites. Previous method corrects multiple hits for each site independently. DAMBE’s new method uses all sites simultaneously for correction. DAMBE, featuring a user-friendly graphic interface, is freely available from http://dambe.bio.uottawa.ca (last accessed, April 17, 2018).
AbstractFish mitochondrial genome (mitogenome) data form a fundamental basis for revealing vertebrate evolution and hydrosphere ecology. Here, we report recent functional updates of MitoFish, which is a database of fish mitogenomes with a precise annotation pipeline MitoAnnotator. Most importantly, we describe implementation of MiFish pipeline for metabarcoding analysis of fish mitochondrial environmental DNA, which is a fast-emerging and powerful technology in fish studies. MitoFish, MitoAnnotator, and MiFish pipeline constitute a key platform for studies of fish evolution, ecology, and conservation, and are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed April 7th, 2018).
AbstractHere, we present a synthetic view on how Kimura’s Neutral theory has helped us gaining insight on the different evolutionary forces that shape human evolution. We put this perspective in the frame of recent emerging challenges: the use of whole genome data for reconstructing population histories, natural selection on complex polygenic traits, and integrating cultural processes in human evolution.
AbstractPrimates have traditionally been regarded as vision-oriented animals with low olfactory ability, though this “microsmatic primates” view has been challenged recently. To clarify when and how degeneration of the olfactory system occurred and to specify the relevant factors during primate evolution, we here examined the olfactory receptor (OR) genes from 24 phylogenetically and ecologically diverse primate species. The results revealed that strepsirrhines with curved noses had functional OR gene repertoires that were nearly twice as large as those for haplorhines with simple noses. Neither activity pattern (nocturnal/diurnal) nor color vision system showed significant correlation with the number of functional OR genes while phylogeny and nose structure (haplorhine/strepsirrhine) are statistically controlled, but extent of folivory did. We traced the evolutionary fates of individual OR genes by identifying orthologous gene groups, demonstrating that the rates of OR gene losses were accelerated at the ancestral branch of haplorhines, which coincided with the acquisition of acute vision. The highest rate of OR gene loss was observed at the ancestral branch of leaf-eating colobines; this reduction is possibly linked with the dietary transition from frugivory to folivory because odor information is essential for fruit foraging but less so for leaf foraging. Intriguingly, we found accelerations of OR gene losses in an external branch to every hominoid species examined. These findings suggest that the current OR gene repertoire in each species has been shaped by a complex interplay of phylogeny, anatomy, and habitat; therefore, multiple factors may contribute to the olfactory degeneration in primates.
AbstractAlthough the neutral theory of molecular evolution was proposed to explain DNA and protein sequence evolution, in principle it could also explain phenotypic evolution. Nevertheless, overall, phenotypes should be less likely than genotypes to evolve neutrally. I propose that, when phenotypic traits are stratified according to a hierarchy of biological organization, the fraction of evolutionary changes in phenotype that are adaptive rises with the phenotypic level considered. Consistently, molecular traits are frequently found to evolve neutrally whereas a large, random set of organismal traits were recently reported to vary largely adaptively. Many more studies of unbiased samples of phenotypic traits are needed to test the general validity of this hypothesis.
AbstractIn its initial formulation by Motoo Kimura, the neutral theory was concerned solely with the level of variability maintained by random genetic drift of selectively neutral mutations, and the rate of molecular evolution caused by the fixation of such mutations. The original theory considered events at a single genetic locus in isolation from the rest of the genome. It did not take long, however, for theoreticians to wonder whether selection at one or more loci might influence neutral variability at linked sites. Once DNA sequence variability could be studied, and especially when resequencing of whole genomes became possible, it became clear that patterns of neutral variability in genomes are affected by selection at linked sites, and that these patterns could advance our understanding of natural selection, and can be used to detect the action of selection in genomic regions, including selection much weaker than could be detected by direct measurements of the relative fitnesses of different genotypes. We outline the different types of processes that have been studied, in approximate order of their historical development.
AbstractSmall insertions and deletions (INDELs; ≤50 bp) are the most common type of variability after single nucleotide polymorphism (SNP). However, compared with SNPs, we know little about the distribution of fitness effects (DFE) of new INDEL mutations and how prevalent adaptive INDEL substitutions are. Studying INDELs has been difficult partly because identifying ancestral states at these sites is error-prone and misidentification can lead to severely biased estimates of the strength of selection. To solve these problems, we develop new maximum likelihood methods, which use polymorphism data to simultaneously estimate the DFE, the mutation rate, and the misidentification rate. These methods are applicable to both INDELs and SNPs. Simulations show that they can provide highly accurate results. We applied the methods to an INDEL polymorphism data set in Drosophila melanogaster. We found that the DFE for polymorphic INDELs in protein-coding regions is bimodal, with the variants being either nearly neutral or strongly deleterious. Based on the DFE, we estimated that 71.5–83.7% of the INDEL substitutions that took place along the D. melanogaster lineage were fixed by positive selection, which is comparable with the prevalence of adaptive substitutions at nonsynonymous sites. The new methods have been implemented in the software package anavar.
AbstractMammalian diversification has coincided with a rapid proliferation of various types of noncoding RNAs, including members of both snRNAs and snoRNAs. The significance of this expansion however remains obscure. While some ncRNA copy-number expansions have been linked to functionally tractable effects, such events may equally likely be neutral, perhaps as a result of random retrotransposition. Hindering progress in our understanding of such observations is the difficulty in establishing function for the diverse features that have been identified in our own genome. Projects such as ENCODE and FANTOM have revealed a hidden world of genomic expression patterns, as well as a host of other potential indicators of biological function. However, such projects have been criticized, particularly from practitioners in the field of molecular evolution, where many suspect these data provide limited insight into biological function. The molecular evolution community has largely taken a skeptical view, thus it is important to establish tests of function. We use a range of data, including data drawn from ENCODE and FANTOM, to examine the case for function for the recent copy number expansion in mammals of six evolutionarily ancient RNA families involved in splicing and rRNA maturation. We use several criteria to assess evidence for function: conservation of sequence and structure, genomic synteny, evidence for transposition, and evidence for species-specific expression. Applying these criteria, we find that only a minority of loci show strong evidence for function and that, for the majority, we cannot reject the null hypothesis of no function.
AbstractThe genetic basis of parallel evolution of similar species is of great interest in evolutionary biology. In the adaptive radiation of Lake Victoria cichlid fishes, sister species with either blue or red-back male nuptial coloration have evolved repeatedly, often associated with shallower and deeper water, respectively. One such case is blue and red-backed Pundamilia species, for which we recently showed that a young species pair may have evolved through “hybrid parallel speciation”. Coalescent simulations suggested that the older species P. pundamilia (blue) and P. nyererei (red-back) admixed in the Mwanza Gulf and that new “nyererei-like” and “pundamilia-like” species evolved from the admixed population. Here, we use genome scans to study the genomic architecture of differentiation, and assess the influence of hybridization on the evolution of the younger species pair. For each of the two species pairs, we find over 300 genomic regions, widespread across the genome, which are highly differentiated. A subset of the most strongly differentiated regions of the older pair are also differentiated in the younger pair. These shared differentiated regions often show parallel allele frequency differences, consistent with the hypothesis that admixture-derived alleles were targeted by divergent selection in the hybrid population. However, two-thirds of the genomic regions that are highly differentiated between the younger species are not highly differentiated between the older species, suggesting independent evolutionary responses to selection pressures. Our analyses reveal how divergent selection on admixture-derived genetic variation can facilitate new speciation events.
AbstractIdentifying the genomic basis underlying local adaptation is paramount to evolutionary biology, and bears many applications in the fields of conservation biology, crop, and animal breeding, as well as personalized medicine. Although many approaches have been developed to detect signatures of positive selection within single populations and population pairs, the increasing wealth of high-throughput sequencing data requires improved methods capable of handling multiple, and ideally large number of, populations in a single analysis. In this study, we introduce LSD (levels of exclusively shared differences), a fast and flexible framework to perform genome-wide selection scans, along the internal and external branches of a given population tree. We use forward simulations to demonstrate that LSD can identify branches targeted by positive selection with remarkable sensitivity and specificity. We illustrate a range of potential applications by analyzing data from the 1000 Genomes Project and uncover a list of adaptive candidates accompanying the expansion of anatomically modern humans out of Africa and their spread to Europe.
AbstractDetecting selection on codon usage (CU) is a difficult task, since CU can be shaped by both the mutational process and selective constraints operating at the DNA, RNA, and protein levels. Yang and Nielsen (2008) developed a test (which we call CUYN) for detecting selection on CU using two competing mutation-selection models of codon substitution. The null model assumes that CU is determined by the mutation bias alone, whereas the alternative model assumes that both mutation bias and/or selection act on CU. In applications on mammalian-scale alignments, the CUYN test detects selection on CU for numerous genes. This is surprising, given the small effective population size of mammals, and prompted us to use simulations to evaluate the robustness of the test to model violations. Simulations using a modest level of CpG hypermutability completely mislead the test, with 100% false positives. Surprisingly, a high level of false positives (56.1%) resulted simply from using the HKY mutation-level parameterization within the CUYN test on simulations conducted with a GTR mutation-level parameterization. Finally, by using a crude optimization procedure on a parameter controlling the CpG hypermutability rate, we find that this mutational property could explain a very large part of the observed mammalian CU. Altogether, our work emphasizes the need to evaluate the potential impact of model violations on statistical tests in the field of molecular phylogenetic analysis. The source code of the simulator and the mammalian genes used are available as a GitHub repository (https://github.com/Simonll/LikelihoodFreePhylogenetics.git).
AbstractWhen a substitution model is fitted to an alignment using maximum likelihood, its parameters are adjusted to account for as much site-pattern variation as possible. A parameter might therefore absorb a substantial quantity of the total variance in an alignment (or more formally, bring about a substantial reduction in the deviance of the fitted model) even if the process it represents played no role in the generation of the data. When this occurs, we say that the parameter estimate carries phenomenological load (PL). Large PL in a parameter estimate is a concern because it not only invalidates its mechanistic interpretation (if it has one) but also increases the likelihood that it will be found to be statistically significant. The problem of PL was not identified in the past because most off-the-shelf substitution models make simplifying assumptions that preclude the generation of realistic levels of variation. In this study, we use the more realistic mutation-selection framework as the basis of a generating model formulated to produce data that mimic an alignment of mammalian mitochondrial DNA. We show that a parameter estimate can carry PL when 1) the substitution model is underspecified and 2) the parameter represents a process that is confounded with other processes represented in the data-generating model. We then provide a method that can be used to identify signal for the process that a given parameter represents despite the existence of PL.
AbstractThe olfactory receptor (OR) gene families, which govern mammalian olfaction, have undergone extensive expansion and contraction through duplication and pseudogenization. Previous studies have shown that broadly defined environmental adaptations (e.g., terrestrial vs. aquatic) are correlated with the number of functional and non-functional OR genes retained. However, to date, no study has examined species-specific gene duplications in multiple phylogenetically divergent mammals to elucidate OR evolution and adaptation. Here, we identify the OR gene families driving adaptation to different ecological niches by mapping the fate of species-specific gene duplications in the OR repertoire of 94 diverse mammalian taxa, using molecular phylogenomic methods. We analyze >70,000 OR gene sequences mined from whole genomes, generated from novel amplicon sequencing data, and collated with data from previous studies, comprising one of the largest OR studies to date. For the first time, we demonstrate statistically significant patterns of OR species-specific gene duplications associated with the presence of a functioning vomeronasal organ. With respect to dietary niche, we uncover a novel link between a large number of duplications in OR family 5/8/9 and herbivory. Our results also highlight differences between social and solitary niches, indicating that a greater OR repertoire expansion may be associated with a solitary lifestyle. This study demonstrates the utility of species-specific duplications in elucidating gene family evolution, revealing how the OR repertoire has undergone expansion and contraction with respect to a number of ecological adaptations in mammals.
AbstractWith advances in transcript profiling, the presence of transcriptional activities in intergenic regions has been well established. However, whether intergenic expression reflects transcriptional noise or activity of novel genes remains unclear. We identified intergenic transcribed regions (ITRs) in 15 diverse flowering plant species and found that the amount of intergenic expression correlates with genome size, a pattern that could be expected if intergenic expression is largely nonfunctional. To further assess the functionality of ITRs, we first built machine learning models using Arabidopsis thaliana as a model that accurately distinguish functional sequences (benchmark protein-coding and RNA genes) and likely nonfunctional ones (pseudogenes and unexpressed intergenic regions) by integrating 93 biochemical, evolutionary, and sequence-structure features. Next, by applying the models genome-wide, we found that 4,427 ITRs (38%) and 796 annotated ncRNAs (44%) had features significantly similar to benchmark protein-coding or RNA genes and thus were likely parts of functional genes. Approximately 60% of ITRs and ncRNAs were more similar to nonfunctional sequences and were likely transcriptional noise. The predictive framework established here provides not only a comprehensive look at how functional, genic sequences are distinct from likely nonfunctional ones, but also a new way to differentiate novel genes from genomic regions with noisy transcriptional activities.
Abstractβ-Catenin acts as a transcriptional coactivator in the Wnt/β-catenin signaling pathway and a cytoplasmic effector in cadherin-based cell adhesion. These functions are ancient within animals, but the earliest steps in β-catenin evolution remain unresolved due to limited data from key lineages—sponges, ctenophores, and placozoans. Previous studies in sponges have characterized β-catenin expression dynamics and used GSK3B antagonists to ectopically activate the Wnt/β-catenin pathway; both approaches rely upon untested assumptions about the conservation of β-catenin function and regulation in sponges. Here, we test these assumptions using an antibody raised against β-catenin from the sponge Ephydatia muelleri. We find that cadherin-complex genes coprecipitate with endogenous Em β-catenin from cell lysates, but that Wnt pathway components do not. However, through immunostaining we detect both cell boundary and nuclear populations, and we find evidence that Em β-catenin is a conserved substrate of GSK3B. Collectively, these data support conserved roles for Em β-catenin in both cell adhesion and Wnt signaling. Additionally, we find evidence for an Em β-catenin population associated with the distal ends of F-actin stress fibers in apparent cell–substrate adhesion structures that resemble focal adhesions. This finding suggests a fundamental difference in the adhesion properties of sponge tissues relative to other animals, in which the adhesion functions of β-catenin are typically restricted to cell–cell adhesions.
AbstractThe evolution of new biochemical activities frequently involves complex dependencies between mutations and rapid evolutionary radiation. Mutation co-occurrence and covariation have previously been used to identify compensating mutations that are the result of physical contacts and preserve protein function and fold. Here, we model pairwise functional dependencies and higher order interactions that enable evolution of new protein functions. We use a network model to find complex dependencies between mutations resulting from evolutionary trade-offs and pleiotropic effects. We present a method to construct these networks and to identify functionally interacting mutations in both extant and reconstructed ancestral sequences (Network Analysis of Protein Adaptation). The time ordering of mutations can be incorporated into the networks through phylogenetic reconstruction. We apply NAPA to three distantly homologous β-lactamase protein clusters (TEM, CTX-M-3, and OXA-51), each of which has experienced recent evolutionary radiation under substantially different selective pressures. By analyzing the network properties of each protein cluster, we identify key adaptive mutations, positive pairwise interactions, different adaptive solutions to the same selective pressure, and complex evolutionary trajectories likely to increase protein fitness. We also present evidence that incorporating information from phylogenetic reconstruction and ancestral sequence inference can reduce the number of spurious links in the network, whereas preserving overall network community structure. The analysis does not require structural or biochemical data. In contrast to function-preserving mutation dependencies, which are frequently from structural contacts, gain-of-function mutation dependencies are most commonly between residues distal in protein structure.
AbstractThe visual systems of snakes are heavily modified relative to other squamates, a condition often thought to reflect their fossorial origins. Further modifications are seen in caenophidian snakes, where evolutionary transitions between rod and cone photoreceptors, termed photoreceptor transmutations, have occurred in many lineages. Little previous work, however, has focused on the molecular evolutionary underpinnings of these morphological changes. To address this, we sequenced seven snake eye transcriptomes and utilized new whole-genome and targeted capture sequencing data. We used these data to analyze gene loss and shifts in selection pressures in phototransduction genes that may be associated with snake evolutionary origins and photoreceptor transmutation. We identified the surprising loss of rhodopsin kinase (GRK1), despite a low degree of gene loss overall and a lack of relaxed selection early during snake evolution. These results provide some of the first evolutionary genomic corroboration for a dim-light ancestor that lacks strong fossorial adaptations. Our results also indicate that snakes with photoreceptor transmutation experienced significantly different selection pressures from other reptiles. Significant positive selection was found primarily in cone-specific genes, but not rod-specific genes, contrary to our expectations. These results reveal potential molecular adaptations associated with photoreceptor transmutation and also highlight unappreciated functional differences between rod- and cone-specific phototransduction proteins. This intriguing example of snake visual system evolution illustrates how the underlying molecular components of a complex system can be reshaped in response to changing selection pressures.
AbstractMany insects rely on bacterial symbionts to supply essential amino acids and vitamins that are deficient in their diets, but metabolic comparisons of closely related gut bacteria in insects with different dietary preferences have not been performed. Here, we demonstrate that herbivorous ants of the genus Dolichoderus from the Peruvian Amazon host bacteria of the family Bartonellaceae, known for establishing chronic or pathogenic infections in mammals. We detected these bacteria in all studied Dolichoderus species, and found that they reside in the midgut wall, that is, the same location as many previously described nutritional endosymbionts of insects. The genomic analysis of four divergent strains infecting different Dolichoderus species revealed genes encoding pathways for nitrogen recycling and biosynthesis of several vitamins and all essential amino acids. In contrast, several biosynthetic pathways have been lost, whereas genes for the import and conversion of histidine and arginine to glutamine have been retained in the genome of a closely related gut bacterium of the carnivorous ant Harpegnathos saltator. The broad biosynthetic repertoire in Bartonellaceae of herbivorous ants resembled that of gut bacteria of honeybees that likewise feed on carbohydrate-rich diets. Taken together, the broad distribution of Bartonellaceae across Dolichoderus ants, their small genome sizes, the specific location within hosts, and the broad biosynthetic capability suggest that these bacteria are nutritional symbionts in herbivorous ants. The results highlight the important role of the host nutritional biology for the genomic evolution of the gut microbiota—and conversely, the importance of the microbiota for the nutrition of hosts.
AbstractThe mutational patterns of large tandem arrays of short sequence repeats remain largely unknown, despite observations of their high levels of variation in sequence and genomic abundance within and between species. Many factors can influence the dynamics of tandem repeat evolution; however, their evolution has only been examined over a limited phylogenetic sample of taxa. Here, we use publicly available whole-genome sequencing data of 85 haploid mutation accumulation lines derived from six geographically diverse Chlamydomonas reinhardtii isolates to investigate genome-wide mutation rates and patterns in tandem repeats in this species. We find that tandem repeat composition differs among ancestral strains, both in genome-wide abundance and presence/absence of individual repeats. Estimated mutation rates (repeat copy number expansion and contraction) were high, averaging 4.3×10−4 per generation per single unit copy. Although orders of magnitude higher than other types of mutation previously reported in C. reinhardtii, these tandem repeat mutation rates were one order of magnitude lower than what has recently been found in Daphnia pulex, even after correcting for lower overall genome-wide satellite abundance in C. reinhardtii. Most high-abundance repeats were related to others by a single mutational step. Correlations of repeat copy number changes within genomes revealed clusters of closely related repeats that were strongly correlated positively or negatively, and similar patterns of correlation arose independently in two different mutation accumulation experiments. Together, these results paint a dynamic picture of tandem repeat evolution in this unicellular alga.
AbstractDuring the last decades, the mammalian genome has been proposed to have regions prone to breakage and reorganization concentrated in certain chromosomal bands that seem to correspond to evolutionary breakpoints. These bands are likely to be involved in chromosome fragility or instability. In Primates, some biomarkers of genetic damage may be associated with various degrees of genomic instability. Here, we investigated the usefulness of Sister Chromatid Exchange as a biomarker of potential sites of frequent chromosome breakage and rearrangement in Alouatta caraya, Ateles chamek, Ateles paniscus, and Cebus cay. These Neotropical species have particular genomic and chromosomal features allowing the analysis of genomic instability for comparative purposes. We determined the frequency of spontaneous induction of Sister Chromatid Exchanges and assessed the relationship between these and structural rearrangements implicated in the evolution of the primates of interest. Overall, A. caraya and C. cay presented a low proportion of statistically significant unstable bands, suggesting fairly stable genomes and the existence of some kind of protection against endogenous damage. In contrast, Ateles showed a highly significant proportion of unstable bands; these were mainly found in the rearranged regions, which is consistent with the numerous genomic reorganizations that might have occurred during the evolution of this genus.
AbstractFreshwater mussels (Bivalvia: Unionida) serve an important role as aquatic ecosystem engineers but are one of the most critically imperilled groups of animals. Here, we used a combination of sequencing strategies to assemble and annotate a draft genome of Venustaconcha ellipsiformis, which will serve as a valuable genomic resource given the ecological value and unique “doubly uniparental inheritance” mode of mitochondrial DNA transmission of freshwater mussels. The genome described here was obtained by combining high-coverage short reads (65× genome coverage of Illumina paired-end and 11× genome coverage of mate-pairs sequences) with low-coverage Pacific Biosciences long reads (0.3× genome coverage). Briefly, the final scaffold assembly accounted for a total size of 1.54 Gb (366,926 scaffolds, N50 = 6.5 kb, with 2.3% of “N” nucleotides), representing 86% of the predicted genome size of 1.80 Gb, while over one third of the genome (37.5%) consisted of repeated elements and >85% of the core eukaryotic genes were recovered. Given the repeated genetic bottlenecks of V. ellipsiformis populations as a result of glaciations events, heterozygosity was also found to be remarkably low (0.6%), in contrast to most other sequenced bivalve species. Finally, we reassembled the full mitochondrial genome and found six polymorphic sites with respect to the previously published reference. This resource opens the way to comparative genomics studies to identify genes related to the unique adaptations of freshwater mussels and their distinctive mitochondrial inheritance mechanism.
AbstractDosage compensation has evolved in concert with Y-chromosome degeneration in many taxa that exhibit heterogametic sex chromosomes. Dosage compensation overcomes the biological challenge of a “half dose” of X chromosome gene transcripts in the heterogametic sex. The need to equalize gene expression of a hemizygous X with that of autosomes arises from the fact that the X chromosomes retain hundreds of functional genes that are actively transcribed in both sexes and interact with genes expressed on the autosomes. Sex determination and heterogametic sex chromosomes have evolved multiple times in Diptera, and in each case the genetic control of dosage compensation is tightly linked to sex determination. In the Anopheles gambiae species complex (Culicidae), maleness is conferred by the Y-chromosome gene Yob, which despite its conserved role between species is polymorphic in its copy number between them. Previous work demonstrated that male An. gambiae s.s. males exhibit complete dosage compensation in pupal and adult stages. In the present study, we have extended this analysis to three sister species in the An. gambiae complex: An. coluzzii, An. arabiensis, and An. quadriannulatus. In addition, we analyzed dosage compensation in bi-directional F1 hybrids between these species to determine if hybridization results in the mis-regulation and disruption of dosage compensation. Our results confirm that dosage compensation operates in the An. gambiae species complex through the hypertranscription of the male X chromosome. Additionally, dosage compensation in hybrid males does not differ from parental males, indicating that hybridization does not result in the mis-regulation of dosage compensation.
AbstractHeterotrophic plants provide evolutionarily independent, natural experiments in the genomic consequences of radically altered nutritional regimes. Here, we have sequenced and annotated the plastid genome of the endangered mycoheterotrophic orchid Hexalectris warnockii. This orchid bears a plastid genome that is ∼80% the total length of the leafy, photosynthetic Phalaenopsis, and contains just over half the number of putatively functional genes of the latter. The plastid genome of H. warnockii bears pseudogenes and has experienced losses of genes encoding proteins directly (e.g., psa/psb, rbcL) and indirectly involved in photosynthesis (atp genes), suggesting it has progressed beyond the initial stages of plastome degradation, based on previous models of plastid genome evolution. Several dispersed and tandem repeats were detected, that are potentially useful as conservation genetic markers. In addition, a 29-kb inversion and a significant contraction of the inverted repeat boundaries are observed in this plastome. The Hexalectris warnockii plastid genome adds to a growing body of data useful in refining evolutionary models in parasites, and provides a resource for conservation studies in these endangered orchids.