AbstractSome of the fastest evolving regions of the human genome are conserved noncoding elements with many human-specific DNA substitutions. These human accelerated regions (HARs) are enriched nearby regulatory genes, and several HARs function as developmental enhancers. To investigate if this evolutionary signature is unique to humans, we quantified evidence of accelerated substitutions in conserved genomic elements across multiple lineages and applied this approach simultaneously to the genomes of five apes: human, chimpanzee, gorilla, orangutan, and gibbon. We find roughly similar numbers and genomic distributions of lineage-specific accelerated regions (linARs) in all five apes. In particular, apes share an enrichment of linARs in regulatory DNA nearby genes involved in development, especially transcription factors and other regulators. Many developmental loci harbor clusters of nonoverlapping linARs from multiple apes, suggesting that accelerated evolution in each species affected distinct regulatory elements that control a shared set of developmental pathways. Our statistical tests distinguish between GC-biased and unbiased accelerated substitution rates, allowing us to quantify the roles of different evolutionary forces in creating linARs. We find evidence of GC-biased gene conversion in each ape, but unbiased acceleration consistent with positive selection or loss of constraint is more common in all five lineages. It therefore appears that similar evolutionary processes created independent accelerated regions in the genomes of different apes, and that these lineage-specific changes to conserved noncoding sequences may have differentially altered expression of a core set of developmental genes across ape evolution.
AbstractModel organisms subjected to sustained experimental evolution often show levels of phenotypic differentiation that dramatically exceed the phenotypic differences observed in natural populations. Genome-wide sequencing of pooled populations then offers the opportunity to make inferences about the genes that are the cause of these phenotypic differences. We tested, through computer simulations, the efficacy of a statistical learning technique called the “fused lasso additive model” (FLAM). We focused on the ability of FLAM to distinguish between genes which are differentiated and directly affect a phenotype from differentiated genes which have no effect on the phenotype. FLAM can separate these two classes of genes even with relatively small samples (10 populations, in total). The efficacy of FLAM is improved with increased number of populations, reduced environmental phenotypic variation, and increased within-treatment among-replicate variation. FLAM was applied to SNP variation measured in both twenty-population and thirty-population studies of Drosophila subjected to selection for age-at-reproduction, to illustrate the application of the method.
AbstractMitochondrial DNA sequences are frequently transferred into the nuclear genome, giving rise to numts (nuclear mitochondrial DNA segments). In the absence of whole genomes, avian numts have been suggested to be rare and relatively short. We examined 64 bird genomes to test hypotheses regarding numt frequency, distribution among taxa, and likelihood of homoplasy. We discovered 100-fold variation in numt number across species. Two songbirds, Geospiza fortis (Darwin’s finch) and Zonotrichia albicollis (white-throated sparrow) had the largest number of numts. Ancestral state reconstruction of 957 numt insertions in these two species and their close relatives indicated a remarkable acceleration of numt insertion in the ancestor of Geospiza and Zonotrichia followed by slower, continued accumulation in each lineage. These numts appear to result primarily from de novo insertion with the duplication of existing numts representing a secondary pathway. Insertion events were essentially homoplasy-free and numts appear to represent perfect rare genomic changes.
AbstractNoncoding DNA sequences, which play various roles in gene expression and regulation, are under evolutionary pressure. Gene regulation requires specific protein–DNA binding events, and our previous studies showed that both DNA sequence and shape readout are employed by transcription factors (TFs) to achieve DNA binding specificity. By investigating the shape-disrupting properties of single nucleotide polymorphisms (SNPs) in human regulatory regions, we established a link between disruptive local DNA shape changes and loss of specific TF binding. Furthermore, we described cases where disease-associated SNPs may alter TF binding through DNA shape changes. This link led us to hypothesize that local DNA shape within and around TF binding sites is under selection pressure. To verify this hypothesis, we analyzed SNP data derived from 216 natural strains of Drosophila melanogaster. Comparing SNPs located in functional and nonfunctional regions within experimentally validated cis-regulatory modules (CRMs) from D. melanogaster that are active in the blastoderm stage of development, we found that SNPs within functional regions tended to cause smaller DNA shape variations. Furthermore, SNPs with higher minor allele frequency were more likely to result in smaller DNA shape variations. The same analysis based on a large number of SNPs in putative CRMs of the D. melanogaster genome derived from DNase I accessibility data confirmed these observations. Taken together, our results indicate that common SNPs in functional regions tend to maintain DNA shape, whereas shape-disrupting SNPs are more likely to be eliminated through purifying selection.
AbstractPopulation genomic data can be used to infer historical effective population sizes (Ne), which help study the impact of past climate changes on biodiversity. Previous genome sequencing of one individual of the common bottlenose dolphin Tursiops truncatus revealed an unusual, sharp rise in Ne during the last glacial, raising questions about the reliability, generality, underlying cause, and biological implication of this finding. Here we first verify this result by additional sampling of T. truncatus. We then sequence and analyze the genomes of its close relative, the Indo-Pacific bottlenose dolphin T. aduncus. The two species exhibit contrasting demographic changes in the last glacial, likely through actual changes in population size and/or alterations in the level of gene flow among populations. Our findings suggest that even closely related species can have drastically different responses to climatic changes, making predicting the fate of individual species in the ongoing global warming a serious challenge.
AbstractThe human genome contains hundreds of thousands of missense mutations. However, only a handful of these variants are known to be adaptive, which implies that adaptation through protein sequence change is an extremely rare phenomenon in human evolution. Alternatively, existing methods may lack the power to pinpoint adaptive variation. We have developed and applied an Evolutionary Probability Approach (EPA) to discover candidate adaptive polymorphisms (CAPs) through the discordance between allelic evolutionary probabilities and their observed frequencies in human populations. EPA reveals thousands of missense CAPs, which suggest that a large number of previously optimal alleles experienced a reversal of fortune in the human lineage. We explored nonadaptive mechanisms to explain CAPs, including the effects of demography, mutation rate variability, and negative and positive selective pressures in modern humans. Many nonadaptive hypotheses were tested, but failed to explain the data, which suggests that a large proportion of CAP alleles have increased in frequency due to beneficial selection. This suggestion is supported by the fact that a vast majority of adaptive missense variants discovered previously in humans are CAPs, and hundreds of CAP alleles are protective in genotype–phenotype association data. Our integrated phylogenomic and population genetic EPA approach predicts the existence of thousands of nonneutral candidate variants in the human proteome. We expect this collection to be enriched in beneficial variation. The EPA approach can be applied to discover candidate adaptive variation in any protein, population, or species for which allele frequency data and reliable multispecies alignments are available.
AbstractGenomic data drive evolutionary research on the relationships and timescale of life but the genomes of most species remain poorly sampled. Phylogenetic trees can be reconstructed reliably using small data sets and the same has been assumed for the estimation of divergence time with molecular clocks. However, we show here that undersampling of molecular data results in a bias expressed as disproportionately shorter branch lengths and underestimated divergence times in the youngest nodes and branches, termed the small sample artifact. In turn, this leads to increasing speciation and diversification rates towards the present. Any evolutionary analyses derived from these biased branch lengths and speciation rates will be similarly biased. The widely used timetrees of the major species-rich studies of amphibians, birds, mammals, and squamate reptiles are all data-poor and show upswings in diversification rate, suggesting that their results were biased by undersampling. Our results show that greater sampling of genomes is needed for accurate time and rate estimation, which are basic data used in ecological and evolutionary research.
AbstractAlong with tRNAs, enzymes that modify anticodon bases are a key aspect of translation across the tree of life. tRNA modifications extend wobble pairing, allowing specific (“target”) tRNAs to recognize multiple codons and cover for other (“nontarget”) tRNAs, often improving translation efficiency and accuracy. However, the detailed evolutionary history and impact of tRNA modifying enzymes has not been analyzed. Using ancestral reconstruction of five tRNA modifications across 1093 bacteria, we show that most modifications were ancestral to eubacteria, but were repeatedly lost in many lineages. Most modification losses coincided with evolutionary shifts in nontarget tRNAs, often driven by increased bias in genomic GC and associated codon use, or by genome reduction. In turn, the loss of tRNA modifications stabilized otherwise highly dynamic tRNA gene repertoires. Our work thus traces the complex history of bacterial tRNA modifications, providing the first clear evidence for their role in the evolution of bacterial translation.
AbstractWe genotyped 738 individuals belonging to 49 populations from Nepal, Bhutan, North India, or Tibet at over 500,000 SNPs, and analyzed the genotypes in the context of available worldwide population data in order to investigate the demographic history of the region and the genetic adaptations to the harsh environment. The Himalayan populations resembled other South and East Asians, but in addition displayed their own specific ancestral component and showed strong population structure and genetic drift. We also found evidence for multiple admixture events involving Himalayan populations and South/East Asians between 200 and 2,000 years ago. In comparisons with available ancient genomes, the Himalayans, like other East and South Asian populations, showed similar genetic affinity to Eurasian hunter-gatherers (a 24,000-year-old Upper Palaeolithic Siberian), and the related Bronze Age Yamnaya. The high-altitude Himalayan populations all shared a specific ancestral component, suggesting that genetic adaptation to life at high altitude originated only once in this region and subsequently spread. Combining four approaches to identifying specific positively selected loci, we confirmed that the strongest signals of high-altitude adaptation were located near the Endothelial PAS domain-containing protein 1 and Egl-9 Family Hypoxia Inducible Factor 1 loci, and discovered eight additional robust signals of high-altitude adaptation, five of which have strong biological functional links to such adaptation. In conclusion, the demographic history of Himalayan populations is complex, with strong local differentiation, reflecting both genetic and cultural factors; these populations also display evidence of multiple genetic adaptations to high-altitude environments.
AbstractDissecting the evolutionary genetic processes underlying eye reduction and vision loss in obligate cave-dwelling organisms has been a long-standing challenge in evolutionary biology. Independent vision loss events in related subterranean organisms can provide critical insight into these processes as well as into the nature of convergent loss of complex traits. Advances in evolutionary developmental biology have illuminated the significant role of heritable gene expression variation in the evolution of new forms. Here, we analyze gene expression variation in adult eye tissue across the freshwater crayfish, representing four independent vision-loss events in caves. Species and individual expression patterns cluster by eye function rather than phylogeny, suggesting convergence in transcriptome evolution in independently blind animals. However, this clustering is not greater than what is observed in surface species with conserved eye function after accounting for phylogenetic expectations. Modeling expression evolution suggests that there is a common increase in evolutionary rates in the blind lineages, consistent with a relaxation of selective constraint maintaining optimal expression levels. This is evidence for a repeated loss of expression constraint in the transcriptomes of blind animals and that convergence occurs via a similar trajectory through genetic drift.
AbstractAging is a complex process affecting different species and individuals in different ways. Comparing genetic variation across species with their aging phenotypes will help understanding the molecular basis of aging and longevity. Although most studies on aging have so far focused on short-lived model organisms, recent comparisons of genomic, transcriptomic, and metabolomic data across lineages with different lifespans are unveiling molecular signatures associated with longevity. Here, we examine the relationship between genomic variation and maximum lifespan across primate species. We used two different approaches. First, we searched for parallel amino-acid mutations that co-occur with increases in longevity across the primate linage. Twenty-five such amino-acid variants were identified, several of which have been previously reported by studies with different experimental setups and in different model organisms. The genes harboring these mutations are mainly enriched in functional categories such as wound healing, blood coagulation, and cardiovascular disorders. We demonstrate that these pathways are highly enriched for pleiotropic effects, as predicted by the antagonistic pleiotropy theory of aging. A second approach was focused on changes in rates of protein evolution across the primate phylogeny. Using the phylogenetic generalized least squares, we show that some genes exhibit strong correlations between their evolutionary rates and longevity-associated traits. These include genes in the Sphingosine 1-phosphate pathway, PI3K signaling, and the Thrombin/protease-activated receptor pathway, among other cardiovascular processes. Together, these results shed light into human senescence patterns and underscore the power of comparative genomics to identify pathways related to aging and longevity.
AbstractRepeated evolutionary events imply underlying genetic constraints that can make evolutionary mechanisms predictable. Morphological traits are thought to evolve frequently through cis-regulatory changes because these mechanisms bypass constraints in pleiotropic genes that are reused during development. In contrast, the constraints acting on metabolic traits during evolution are less well studied. Here we show how a metabolic bottleneck gene has repeatedly adopted similar cis-regulatory solutions during evolution, likely due to its pleiotropic role integrating flux from multiple metabolic pathways. Specifically, the genes encoding phosphoglucomutase activity (PGM1/PGM2), which connect GALactose catabolism to glycolysis, have gained and lost direct regulation by the transcription factor Gal4 several times during yeast evolution. Through targeted mutations of predicted Gal4-binding sites in yeast genomes, we show this galactose-mediated regulation of PGM1/2 supports vigorous growth on galactose in multiple yeast species, including Saccharomyces uvarum and Lachancea kluyveri. Furthermore, the addition of galactose-inducible PGM1 alone is sufficient to improve the growth on galactose of multiple species that lack this regulation, including Saccharomyces cerevisiae. The strong association between regulation of PGM1/2 by Gal4 even enables remarkably accurate predictions of galactose growth phenotypes between closely related species. This repeated mode of evolution suggests that this specific cis-regulatory connection is a common way that diverse yeasts can govern flux through the pathway, likely due to the constraints imposed by this pleiotropic bottleneck gene. Since metabolic pathways are highly interconnected, we argue that cis-regulatory evolution might be widespread at pleiotropic genes that control metabolic bottlenecks and intersections.
AbstractWhile the natural history of flatfish has been debated for decades, the mode of diversification of this biologically and economically important group has never been elucidated. To address this question, we assembled the largest molecular data set to date, covering > 300 species (out of ca. 800 extant), from 13 of the 14 known families over nine genes, and employed relaxed molecular clocks to uncover their patterns of diversification. As the fossil record of flatfish is contentious, we used sister species distributed on both sides of the American continent to calibrate clock models based on the closure of the Central American Seaway (CAS), and on their current species range. We show that flatfish diversified in two bouts, as species that are today distributed around the equator diverged during the closure of CAS, whereas those with a northern range diverged after this, hereby suggesting the existence of a postCAS closure dispersal for these northern species, most likely along a trans-Arctic northern route, a hypothesis fully compatible with paleogeographic reconstructions.
AbstractThe Universal Gene Set of Life (UGSL) is common to genomes of all extant organisms. The UGSL is small, consisting of <100 genes, and is dominated by genes encoding the translation system. Here we extend the search for biological universality to three dimensions. We characterize and quantitate the universality of structure of macromolecules that are common to all of life. We determine that around 90% of prokaryotic ribosomal RNA (rRNA) forms a common core, which is the structural and functional foundation of rRNAs of all cytoplasmic ribosomes. We have established a database, which we call the Sparse and Efficient Representation of the Extant Biology (the SEREB database). This database contains complete and cross-validated rRNA sequences of species chosen, as far as possible, to sparsely and efficiently sample all known phyla. Atomic-resolution structures of ribosomes provide data for structural comparison and validation of sequence-based models. We developed a similarity statistic called pairing adjusted sequence entropy, which characterizes paired nucleotides by their adherence to covariation and unpaired nucleotides by conventional conservation of identity. For canonically paired nucleotides the unit of structure is the nucleotide pair. For unpaired nucleotides, the unit of structure is the nucleotide. By quantitatively defining the common core of rRNA, we systematize the conservation and divergence of the translational system across the tree of life, and can begin to understand the unique evolutionary pressures that cause its universality. We explore the relationship between ribosomal size and diversity, geological time, and organismal complexity.
AbstractUstilaginomycotina is home to a broad array of fungi including important plant pathogens collectively called smut fungi. Smuts are biotrophs that produce characteristic perennating propagules called teliospores, one of which, Ustilago maydis, is a model genetic organism. Broad exploration of smut biology has been hampered by limited phylogenetic resolution of Ustilaginiomycotina as well as an overall lack of genomic data for members of this subphylum. In this study, we sequenced eight Ustilaginomycotina genomes from previously unrepresented lineages, deciphered ordinal-level phylogenetic relationships for the subphylum, and performed comparative analyses. Unlike other Basidiomycota subphyla, all sampled Ustilaginomycotina genomes are relatively small and compact. Ancestral state reconstruction analyses indicate that teliospore formation was present at the origin of the subphylum. Divergence time estimation dates the divergence of most extant smut fungi after that of grasses (Poaceae). However, we found limited conservation of well-characterized genes related to smut pathogenesis from U. maydis, indicating dissimilar pathogenic mechanisms exist across other smut lineages. The genomes of Malasseziomycetes are highly diverged from the other sampled Ustilaginomycotina, likely due to their unique history as mammal-associated lipophilic yeasts. Despite extensive genomic data, the phylogenetic placement of this class remains ambiguous. Although the sampled Ustilaginomycotina members lack many core enzymes for plant cell wall decomposition and starch catabolism, we identified several novel carbohydrate active enzymes potentially related to pectin breakdown. Finally, ∼50% of Ustilaginomycotina species-specific genes are present in previously undersampled and rare lineages, highlighting the importance of exploring fungal diversity as a resource for novel gene discovery.
AbstractLipids are essential structural and functional components of cells. Little is known, however, about the evolution of lipid composition in different tissues. Here, we report a large-scale analysis of the lipidome evolution in six tissues of 32 species representing primates, rodents, and bats. While changes in genes’ sequence and expression accumulate proportionally to the phylogenetic distances, <2% of the lipidome evolves this way. Yet, lipids constituting this 2% cluster in specific functions shared among all tissues. Among species, human show the largest amount of species-specific lipidome differences. Many of the uniquely human lipidome features localize in the brain cortex and cluster in specific pathways implicated in cognitive disorders.
AbstractPhenotypic plasticity results in a diversity of phenotypes from a single genotype in response to environmental cues. To understand the molecular basis of phenotypic plasticity, studies have focused on differential gene expression levels between environmentally determined phenotypes. The extent of alternative splicing differences among environmentally determined phenotypes has largely been understudied. Here, we study alternative splicing differences among plastically produced morphs of the pea aphid using RNA-sequence data. Pea aphids express two separate polyphenisms (plasticity with discrete phenotypes): a wing polyphenism consisting of winged and wingless females and a reproduction polyphenism consisting of asexual and sexual females. We find that pea aphids alternatively splice 34% of their genes, a high percentage for invertebrates. We also find that there is extensive use of differential spliced events between genetically identical, polyphenic females. These differentially spliced events are enriched for exon skipping and mutually exclusive exon events that maintain the open reading frame, suggesting that polyphenic morphs use alternative splicing to produce phenotype-biased proteins. Many genes that are differentially spliced between polyphenic morphs have putative functions associated with their respective phenotypes. We find that the majority of differentially spliced genes is not differentially expressed genes. Our results provide a rich candidate gene list for future functional studies that would not have been previously considered based solely on gene expression studies, such as ensconsin in the reproductive polyphenism, and CAKI in the wing polyphenism. Overall, this study suggests an important role for alternative splicing in the expression of environmentally determined phenotypes.
AbstractUnlike most crops, which were domesticated through long periods of selection by ancient humans, horticultural plants were primarily domesticated through intentional selection over short time periods. The molecular mechanisms underlying the origin and spread of novel traits in the domestication process have remained largely unexplored in horticultural plants. Gloxinia (Sinningia speciosa), whose attractive peloric flowers influenced the thoughts of Darwin, have been cultivated since the early 19th century, but its origin and genetic basis are currently unknown. By employing multiple experimental approaches including genetic analysis, genotype–phenotype associations, gene expression analysis, and functional interrogations, we showed that a single gene encoding a TCP protein, SsCYC, controls both floral orientation and zygomorphy in gloxinia. We revealed that a causal mutation responsible for the development of peloric gloxinia lies in a 10-bp deletion in the coding sequence of SsCYC. By combining genetic inference and literature searches, we have traced the putative ancestor and reconstructed the domestication path of the peloric gloxinia, in which a 10-bp deletion in SsCYC under selection triggered its evolution from the wild progenitor. The results presented here suggest that a simple genetic change in a pleiotropic gene can promote the elaboration of floral organs under intensive selection pressure.
AbstractHorizontal gene transfer (HGT) can equip organisms with novel genes, expanding the repertoire of genetic material available for evolutionary innovation and allowing recipient lineages to colonize new environments. However, few studies have characterized the functions of HGT genes experimentally or examined postacquisition functional divergence. Here, we report the use of ancestral sequence reconstruction and heterologous expression in Saccharomyces cerevisiae to examine the evolutionary history of an oomycete transporter gene family that was horizontally acquired from fungi. We demonstrate that the inferred ancestral oomycete HGT transporter proteins and their extant descendants transport dicarboxylic acids which are intermediates of the tricarboxylic acid cycle. The substrate specificity profile of the most ancestral protein has largely been retained throughout the radiation of oomycetes, including in both plant and animal pathogens and in a free-living saprotroph, indicating that the ancestral HGT transporter function has been maintained by selection across a range of different lifestyles. No evidence of neofunctionalization in terms of substrate specificity was detected for different HGT transporter paralogues which have different patterns of temporal expression. However, a striking expansion of substrate range was observed for one plant pathogenic oomycete, with a HGT derived paralogue from Pythium aphanidermatum encoding a protein that enables tricarboxylic acid uptake in addition to dicarboxylic acid uptake. This demonstrates that HGT acquisitions can provide functional additions to the recipient proteome as well as the foundation material for the evolution of expanded protein functions.
AbstractRed algae (Rhodophyta) underwent two phases of large-scale genome reduction during their early evolution. The red seaweeds did not attain genome sizes or gene inventories typical of other multicellular eukaryotes. We generated a high-quality 92.1 Mb draft genome assembly from the red seaweed Gracilariopsis chorda, including methylation and small (s)RNA data. We analyzed these and other Archaeplastida genomes to address three questions: 1) What is the role of repeats and transposable elements (TEs) in explaining Rhodophyta genome size variation, 2) what is the history of genome duplication and gene family expansion/reduction in these taxa, and 3) is there evidence for TE suppression in red algae? We find that the number of predicted genes in red algae is relatively small (4,803–13,125 genes), particularly when compared with land plants, with no evidence of polyploidization. Genome size variation is primarily explained by TE expansion with the red seaweeds having the largest genomes. Long terminal repeat elements and DNA repeats are the major contributors to genome size growth. About 8.3% of the G. chorda genome undergoes cytosine methylation among gene bodies, promoters, and TEs, and 71.5% of TEs contain methylated-DNA with 57% of these regions associated with sRNAs. These latter results suggest a role for TE-associated sRNAs in RNA-dependent DNA methylation to facilitate silencing. We postulate that the evolution of genome size in red algae is the result of the combined action of TE spread and the concomitant emergence of its epigenetic suppression, together with other important factors such as changes in population size.
AbstractThermotolerance is a polygenic trait that contributes to cell survival and growth under unusually high temperatures. Although some genes associated with high-temperature growth (Htg+) have been identified, how cells accumulate mutations to achieve prolonged thermotolerance is still mysterious. Here, we conducted experimental evolution of a Saccharomyces cerevisiae laboratory strain with stepwise temperature increases for it to grow at 42 °C. Whole genome resequencing of 14 evolved strains and the parental strain revealed a total of 153 mutations in the evolved strains, including single nucleotide variants, small INDELs, and segmental duplication/deletion events. Some mutations persisted from an intermediate temperature to 42 °C, so they might be Htg+ mutations. Functional categorization of mutations revealed enrichment of exonic mutations in the SWI/SNF complex and F-type ATPase, pointing to their involvement in high-temperature tolerance. In addition, multiple mutations were found in a general stress-associated signal transduction network consisting of Hog1 mediated pathway, RAS-cAMP pathway, and Rho1-Pkc1 mediated cell wall integrity pathway, implying that cells can achieve Htg+ partly through modifying existing stress regulatory mechanisms. Using pooled segregant analysis of five Htg+ phenotype-orientated pools, we inferred causative mutations for growth at 42 °C and identified those mutations with stronger impacts on the phenotype. Finally, we experimentally validated a number of the candidate Htg+ mutations. This study increased our understanding of the genetic basis of yeast tolerance to high temperature.
AbstractThe common ancestry of archaea and eukaryotes is evident in their genome architecture. All eukaryotic and several archaeal genomes consist of multiple chromosomes, each replicated from multiple origins. Three scenarios have been proposed for the evolution of this genome architecture: 1) mutational diversification of a multi-copy chromosome; 2) capture of a new chromosome by horizontal transfer; 3) acquisition of new origins and splitting into two replication-competent chromosomes. We report an example of the third scenario: the multi-origin chromosome of the archaeon Haloferax volcanii has split into two elements via homologous recombination. The newly generated elements are bona fide chromosomes, because each bears “chromosomal” replication origins, rRNA loci, and essential genes. The new chromosomes were stable during routine growth but additional genetic manipulation, which involves selective bottlenecks, provoked further rearrangements. To the best of our knowledge, rearrangement of a naturally evolved prokaryotic genome to generate two new chromosomes has not been described previously.
On October 13, 2017, a sandstorm blew off the west coast of Africa, creating a plume of dust that stretched thousands of miles across the Atlantic Ocean and reached the Caribbean five days later. Each year, up to five billion tons of dust is ejected into the earth’s atmosphere, mostly from large deserts like the Sahara in Africa and the Gobi in Asia. Such dust plumes affect all regions of the planet, with some individual plumes even circling the globe.
AbstractRibosomes are highly abundant in cells and comprise, besides RNAs of varying lengths, 55–80 similarly sized, short proteins. This seemingly unusual composition is thought to have resulted from selection for rapid autocatalytic ribosome production. Here, we demonstrate that ribosomal protein-splitting mutations cannot accelerate ribosome production. The autocatalytic explanation is also unnecessary, because protein lengths generally decline with expression levels. Although ribosomal proteins are shorter than expected from their expression levels, they are not outliers among members of large protein complexes in mean protein length or coefficient of variation. These observations are explainable because 1) shortening proteins lowers their synthetic cost and reduces the waste from mistranslation-induced protein dysfunction and degradation, 2) such benefits rise with expression levels, and 3) members of large complexes participate in more protein–protein interactions so are less tolerant to mistranslation. These and other considerations suggest that the compositional features of ribosomes originate from cellular energy economics.
AbstractDNA methylation is an evolutionary ancient epigenetic modification that is phylogenetically widespread. Comparative studies of the methylome across a diverse range of non-conventional and conventional model organisms is expected to help reveal how the landscape of DNA methylation and its functions have evolved. Here, we explore the DNA methylation profile of two species of the crustacean Daphnia using whole genome bisulfite sequencing. We then compare our data with the methylomes of two insects and two mammals to achieve a better understanding of the function of DNA methylation in Daphnia. Using RNA-sequencing data for all six species, we investigate the correlation between DNA methylation and gene expression. DNA methylation in Daphnia is mainly enriched within the coding regions of genes, with the highest methylation levels observed at exons 2–4. In contrast, vertebrate genomes are globally methylated, and increase towards the highest methylation levels observed at exon 2, and maintained across the rest of the gene body. Although DNA methylation patterns differ among all species, their methylation profiles share a bimodal distribution across the genomes. Genes with low levels of CpG methylation and gene expression are mainly enriched for species specific genes. In contrast, genes associated with high methylated CpG sites are highly transcribed and evolutionary conserved across all species. Finally, the positive correlation between internal exons and gene expression potentially points to an evolutionary conserved mechanism, whereas the negative regulation of gene expression via methylation of promoters and exon 1 is potentially a secondary mechanism that has been evolved in vertebrates.
AbstractIn species with large population sizes such as Drosophila, natural selection may have substantial effects on genetic diversity and divergence. However, the implications of this widespread nonneutrality for standard population genetic assumptions and practices remain poorly resolved. Here, we assess the consequences of recurrent hitchhiking (RHH), in which selective sweeps occur at a given rate randomly across the genome. We use forward simulations to examine two published RHH models for D. melanogaster, reflecting relatively common/weak and rare/strong selection. We find that unlike the rare/strong RHH model, the common/weak model entails a slight degree of Hill–Robertson interference in high recombination regions. We also find that the common/weak RHH model is more consistent with our genome-wide estimate of the proportion of substitutions fixed by natural selection between D. melanogaster and D. simulans (19%). Finally, we examine how these models of RHH might bias demographic inference. We find that these RHH scenarios can bias demographic parameter estimation, but such biases are weaker for parameters relating recently diverged populations, and for the common/weak RHH model in general. Thus, even for species with important genome-wide impacts of selective sweeps, neutralist demographic inference can have some utility in understanding the histories of recently diverged populations.
AbstractDNA acquisition via genetic recombination is considered advantageous as it has the potential to bring together beneficial mutations that emerge independently within a population. Furthermore, recombination is considered to contribute to the maintenance of genome stability by purging slightly deleterious mutations. The prevalence of recombination differs among prokaryotic species and depends on the accessibility of DNA transfer mechanisms. An exceptional example is the human pathogen Mycobacterium tuberculosis (MTB) where no clear transfer mechanisms have been so far characterized and the presence of recombination is questioned. Here, we analyze completely assembled MTB genomes in search for evidence of recombination. We find that putative recombination events are enriched in strains reconstructed by reference-guided assembly and in regions with unreliable alignments. In addition, assembly and alignment artefacts introduce phylogenetic signals that are conflicting the established MTB phylogeny. Our results reveal that the so far reported recombination events in MTB are likely to stem from methodological artefacts. We conclude that no reliable signal of recombination is observed in the currently available MTB genomes. Moreover, our study demonstrates the limitations of reference-guided genome assembly for phylogenetic reconstructions. Rigorously de novo assembled genomes of high quality are mandatory in order to distinguish true evolutionary signal from noise, in particular for low diversity species such as MTB.
AbstractMycobacterium africanum consists of Lineages L5 and L6 of the Mycobacterium tuberculosis complex (MTBC) and causes human tuberculosis in specific regions of Western Africa, but is generally not transmitted in other parts of the world. Since M. africanum is evolutionarily closely placed between the globally dispersed Mycobacterium tuberculosis and animal-adapted MTBC-members, these lineages provide valuable insight into M. tuberculosis evolution. Here, we have collected 15 M. africanum L5 strains isolated in France over 4 decades. Illumina sequencing and phylogenomic analysis revealed a previously underappreciated diversity within L5, which consists of distinct sublineages. L5 strains caused relatively high levels of extrapulmonary tuberculosis and included multi- and extensively drug-resistant isolates, especially in the newly defined sublineage L5.2. The specific L5 sublineages also exhibit distinct phenotypic characteristics related to in vitro growth, protein secretion and in vivo immunogenicity. In particular, we identified a PE_PGRS and PPE-MPTR secretion defect specific for sublineage L5.2, which was independent of PPE38. Furthermore, L5 isolates were able to efficiently secrete and induce immune responses against ESX-1 substrates contrary to previous predictions. These phenotypes of Type VII protein secretion and immunogenicity provide valuable information to better link genome sequences to phenotypic traits and thereby understand the evolution of the MTBC.
AbstractWe previously proposed that changes in the efficiency of protein translation are associated with autism spectrum disorders (ASDs). This hypothesis connects environmental factors and genetic factors because each can alter translation efficiency. For genetic factors, we previously tested our hypothesis using a small set of ASD-associated genes, a small set of ASD-associated variants, and a statistic to quantify by how much a single nucleotide variant (SNV) in a protein coding region changes translation speed. In this study, we confirm and extend our hypothesis using a published set of 1,800 autism quartets (parents, one affected child and one unaffected child) and genome-wide variants. Then, we extend the test statistic to combine translation efficiency with other possibly relevant variables: ribosome profiling data, presence/absence of CpG dinucleotides, and phylogenetic conservation. The inclusion of ribosome profiling abundances strengthens our results for male–male sibling pairs. The inclusion of CpG information strengthens our results for female–female pairs, giving an insight into the significant gender differences in autism incidence. By combining the single-variant test statistic for all variants in a gene, we obtain a single gene score to evaluate how well a gene distinguishes between affected and unaffected siblings. Using statistical methods, we compute gene sets that have some power to distinguish between affected and unaffected siblings by translation efficiency of gene variants. Pathway and enrichment analysis of those gene sets suggest the importance of Wnt signaling pathways, some other pathways related to cancer, ATP binding, and ATP-ase pathways in the etiology of ASDs.
AbstractMany organisms have a global mechanism for dosage compensation (DC) operating along the entire male X chromosome, which equalizes gene expression on the male X with that on the two Xs in females and/or on autosomes. At the initial stage of sex chromosome evolution, however, gene-by-gene (or localized) DC may also be necessary because the degeneration of Y-linked genes occurs independently at different times. We therefore tested whether the up-regulation of X-linked genes depends on the status of their Y-linked homologs, using the young sex chromosomes, neo-X and neo-Y, in Drosophila miranda. In support of the presence of gene-by-gene DC, the extent of up-regulation in males was indeed higher for neo-X-linked genes with pseudogenized neo-Y-linked homologs than for neo-X-linked genes with functional neo-Y-linked homologs. Further molecular evolutionary analysis also supports the idea that many individual neo-X-linked genes first acquired the potential for up-regulation, which then enabled the pseudogenization of neo-Y-linked homologs, without serious deleterious effects on male fitness.
AbstractThe study of microbe domestication has witnessed major advances that contribute to a better understanding of the emergence of artificially selected phenotypes and set the foundations of their rational improvement for biotechnology. Several features make Saccharomyces cerevisiae an ideal model for such a study, notably the availability of a catalogue of signatures of artificial selection and the extensive knowledge available on its biological processes. Here, we investigate with population and comparative genomics a set of strains used for cachaça fermentation, a Brazilian beverage based on the fermentation of sugar cane juice. We ask if the selective pressures posed by this fermentation have given rise to a domesticated lineage distinct from the ones already known, like wine, beer, bread, and sake yeasts. Our results show that cachaça yeasts derive from wine yeasts that have undergone an additional round of domestication, which we define as secondary domestication. As a consequence, cachaça strains combine features of wine yeasts, such as the presence of genes relevant for wine fermentation and advantageous gene inactivations, with features of beer yeasts like resistance to the effects of inhibitory compounds present in molasses. For other markers like those related to sulfite resistance and biotin metabolism our analyses revealed distributions more complex than previously reported that support the secondary domestication hypothesis. We propose a multilayered microbe domestication model encompassing not only transitions from wild to primarily domesticated populations, as in the case of wine yeasts, but also secondary domestications like those of cachaça yeasts.
AbstractDust and sandstorm events inject substantial quantities of foreign microorganisms into global ecosystems, with the ability to impact distant environments. The majority of these microorganisms originate from deserts and drylands where the soil is laden with highly stress-resistant microbes capable of thriving under extreme environmental conditions, and a substantial portion of them survive long journeys through the atmosphere. This large-scale transmission of highly resilient alien microbial contaminants raises concerns with regards to the invasion of sensitive and/or pristine sink environments, and to human health—concerns exacerbated by increases in the rate of desertification. Further increases in the transport of dust-associated microbiota could extend the spread of foreign microbes to new ecosystems, increase their load in present sink environments, disrupt ecosystem balance, and potentially introduce new pathogens. Our present understanding of these microorganisms, their phylogenic affiliations and functional significance, is insufficient to determine their impact. The purpose of this review is to provide an overview of available data regarding dust and sandstorm microbiota and their potential ramifications on human and ecosystem health. We conclude by discussing current gaps in dust and sandstorm microbiota research, and the need for collaborative studies involving high-resolution meta-omic approaches in conjunction with extensive ecological time-series studies to advance the field towards an improved and sufficient understanding of these invisible atmospheric travelers and their global ramifications.
AbstractThe globin gene superfamily has been well-characterized in vertebrates, however, there has been limited research in early-diverging lineages, such as phylum Cnidaria. This study aimed to identify globin genes in multiple cnidarian lineages, and use bioinformatic approaches to characterize the evolution, structure, and expression of these genes. Phylogenetic analyses and in silico protein predictions showed that all cnidarians have undergone an expansion of globin genes, which likely have a hexacoordinate protein structure. Our protein modeling has also revealed the possibility of a single pentacoordinate globin lineage in anthozoan species. Some cnidarian globin genes displayed tissue and development specific expression with very few orthologous genes similarly expressed across species. Our phylogenetic analyses also revealed that eumetazoan globin genes form a polyphyletic relationship with vertebrate globin genes. Overall, our analyses suggest that a Ngb-like and GbX-like gene were most likely present in the globin gene repertoire for the last common ancestor of eumetazoans. The identification of a large-scale expansion and subfunctionalization of globin genes in actiniarians provides an excellent starting point to further our understanding of the evolution and function of the globin gene superfamily in early-diverging lineages.
AbstractDrosophila guanche is a member of the obscura group that originated in the Canary Islands archipelago upon its colonization by D. subobscura. It evolved into a new species in the laurisilva, a laurel forest present in wet regions that in the islands have only minor long-term weather fluctuations. Oceanic island endemic species such as D. guanche can become model species to investigate not only the relative role of drift and adaptation in speciation processes but also how population size affects nucleotide variation. Moreover, the previous identification of two satellite DNAs in D. guanche makes this species attractive for studying how centromeric DNA evolves. As a prerequisite for its establishment as a model species suitable to address all these questions, we generated a high-quality D. guanche genome sequence composed of 42 cytologically mapped scaffolds, which are assembled into six super-scaffolds (one per chromosome). The comparative analysis of the D. guanche proteome with that of twelve other Drosophila species identified 151 genes that were subject to adaptive evolution in the D. guanche lineage, with a subset of them being involved in flight and genome stability. For example, the Centromere Identifier (CID) protein, directly interacting with centromeric satellite DNA, shows signals of adaptation in this species. Both genomic analyses and FISH of the two satellites would support an ongoing replacement of centromeric satellite DNA in D. guanche.
AbstractThe emergence of robust single-cell ‘omics techniques enables studies of uncultivable species, allowing for the (re)discovery of diverse genomic features. In this study, we combine single-cell genomics and transcriptomics to explore genome evolution in ciliates (a > 1 Gy old clade). Analysis of the data resulting from these single-cell ‘omics approaches show: 1) the description of the ciliates in the class Karyorelictea as “primitive” is inaccurate because their somatic macronuclei contain loci of varying copy number (i.e., they have been processed by genome rearrangements from the zygotic nucleus); 2) gene-sized somatic chromosomes exist in the class Litostomatea, consistent with Balbiani’s (1890) observation of giant chromosomes in this lineage; and 3) gene scrambling exists in the underexplored Postciliodesmatophora (the classes Heterotrichea and Karyorelictea, abbreviated here as the Po-clade), one of two major clades of ciliates. Together these data highlight the complex evolutionary patterns underlying germline genome architectures in ciliates and provide a basis for further exploration of principles of genome evolution in diverse microbial lineages.