Continue Reading →
AbstractSome of the fastest evolving regions of the human genome are conserved noncoding elements with many human-specific DNA substitutions. These human accelerated regions (HARs) are enriched nearby regulatory genes, and several HARs function as developmental enhancers. To investigate if this evolutionary signature is unique to humans, we quantified evidence of accelerated substitutions in conserved genomic elements across multiple lineages and applied this approach simultaneously to the genomes of five apes: human, chimpanzee, gorilla, orangutan, and gibbon. We find roughly similar numbers and genomic distributions of lineage-specific accelerated regions (linARs) in all five apes. In particular, apes share an enrichment of linARs in regulatory DNA nearby genes involved in development, especially transcription factors and other regulators. Many developmental loci harbor clusters of nonoverlapping linARs from multiple apes, suggesting that accelerated evolution in each species affected distinct regulatory elements that control a shared set of developmental pathways. Our statistical tests distinguish between GC-biased and unbiased accelerated substitution rates, allowing us to quantify the roles of different evolutionary forces in creating linARs. We find evidence of GC-biased gene conversion in each ape, but unbiased acceleration consistent with positive selection or loss of constraint is more common in all five lineages. It therefore appears that similar evolutionary processes created independent accelerated regions in the genomes of different apes, and that these lineage-specific changes to conserved noncoding sequences may have differentially altered expression of a core set of developmental genes across ape evolution.
AbstractModel organisms subjected to sustained experimental evolution often show levels of phenotypic differentiation that dramatically exceed the phenotypic differences observed in natural populations. Genome-wide sequencing of pooled populations then offers the opportunity to make inferences about the genes that are the cause of these phenotypic differences. We tested, through computer simulations, the efficacy of a statistical learning technique called the “fused lasso additive model” (FLAM). We focused on the ability of FLAM to distinguish between genes which are differentiated and directly affect a phenotype from differentiated genes which have no effect on the phenotype. FLAM can separate these two classes of genes even with relatively small samples (10 populations, in total). The efficacy of FLAM is improved with increased number of populations, reduced environmental phenotypic variation, and increased within-treatment among-replicate variation. FLAM was applied to SNP variation measured in both twenty-population and thirty-population studies of Drosophila subjected to selection for age-at-reproduction, to illustrate the application of the method.
AbstractMitochondrial DNA sequences are frequently transferred into the nuclear genome, giving rise to numts (nuclear mitochondrial DNA segments). In the absence of whole genomes, avian numts have been suggested to be rare and relatively short. We examined 64 bird genomes to test hypotheses regarding numt frequency, distribution among taxa, and likelihood of homoplasy. We discovered 100-fold variation in numt number across species. Two songbirds, Geospiza fortis (Darwin’s finch) and Zonotrichia albicollis (white-throated sparrow) had the largest number of numts. Ancestral state reconstruction of 957 numt insertions in these two species and their close relatives indicated a remarkable acceleration of numt insertion in the ancestor of Geospiza and Zonotrichia followed by slower, continued accumulation in each lineage. These numts appear to result primarily from de novo insertion with the duplication of existing numts representing a secondary pathway. Insertion events were essentially homoplasy-free and numts appear to represent perfect rare genomic changes.
AbstractNoncoding DNA sequences, which play various roles in gene expression and regulation, are under evolutionary pressure. Gene regulation requires specific protein–DNA binding events, and our previous studies showed that both DNA sequence and shape readout are employed by transcription factors (TFs) to achieve DNA binding specificity. By investigating the shape-disrupting properties of single nucleotide polymorphisms (SNPs) in human regulatory regions, we established a link between disruptive local DNA shape changes and loss of specific TF binding. Furthermore, we described cases where disease-associated SNPs may alter TF binding through DNA shape changes. This link led us to hypothesize that local DNA shape within and around TF binding sites is under selection pressure. To verify this hypothesis, we analyzed SNP data derived from 216 natural strains of Drosophila melanogaster. Comparing SNPs located in functional and nonfunctional regions within experimentally validated cis-regulatory modules (CRMs) from D. melanogaster that are active in the blastoderm stage of development, we found that SNPs within functional regions tended to cause smaller DNA shape variations. Furthermore, SNPs with higher minor allele frequency were more likely to result in smaller DNA shape variations. The same analysis based on a large number of SNPs in putative CRMs of the D. melanogaster genome derived from DNase I accessibility data confirmed these observations. Taken together, our results indicate that common SNPs in functional regions tend to maintain DNA shape, whereas shape-disrupting SNPs are more likely to be eliminated through purifying selection.
AbstractPopulation genomic data can be used to infer historical effective population sizes (Ne), which help study the impact of past climate changes on biodiversity. Previous genome sequencing of one individual of the common bottlenose dolphin Tursiops truncatus revealed an unusual, sharp rise in Ne during the last glacial, raising questions about the reliability, generality, underlying cause, and biological implication of this finding. Here we first verify this result by additional sampling of T. truncatus. We then sequence and analyze the genomes of its close relative, the Indo-Pacific bottlenose dolphin T. aduncus. The two species exhibit contrasting demographic changes in the last glacial, likely through actual changes in population size and/or alterations in the level of gene flow among populations. Our findings suggest that even closely related species can have drastically different responses to climatic changes, making predicting the fate of individual species in the ongoing global warming a serious challenge.
AbstractThe human genome contains hundreds of thousands of missense mutations. However, only a handful of these variants are known to be adaptive, which implies that adaptation through protein sequence change is an extremely rare phenomenon in human evolution. Alternatively, existing methods may lack the power to pinpoint adaptive variation. We have developed and applied an Evolutionary Probability Approach (EPA) to discover candidate adaptive polymorphisms (CAPs) through the discordance between allelic evolutionary probabilities and their observed frequencies in human populations. EPA reveals thousands of missense CAPs, which suggest that a large number of previously optimal alleles experienced a reversal of fortune in the human lineage. We explored nonadaptive mechanisms to explain CAPs, including the effects of demography, mutation rate variability, and negative and positive selective pressures in modern humans. Many nonadaptive hypotheses were tested, but failed to explain the data, which suggests that a large proportion of CAP alleles have increased in frequency due to beneficial selection. This suggestion is supported by the fact that a vast majority of adaptive missense variants discovered previously in humans are CAPs, and hundreds of CAP alleles are protective in genotype–phenotype association data. Our integrated phylogenomic and population genetic EPA approach predicts the existence of thousands of nonneutral candidate variants in the human proteome. We expect this collection to be enriched in beneficial variation. The EPA approach can be applied to discover candidate adaptive variation in any protein, population, or species for which allele frequency data and reliable multispecies alignments are available.
AbstractGenomic data drive evolutionary research on the relationships and timescale of life but the genomes of most species remain poorly sampled. Phylogenetic trees can be reconstructed reliably using small data sets and the same has been assumed for the estimation of divergence time with molecular clocks. However, we show here that undersampling of molecular data results in a bias expressed as disproportionately shorter branch lengths and underestimated divergence times in the youngest nodes and branches, termed the small sample artifact. In turn, this leads to increasing speciation and diversification rates towards the present. Any evolutionary analyses derived from these biased branch lengths and speciation rates will be similarly biased. The widely used timetrees of the major species-rich studies of amphibians, birds, mammals, and squamate reptiles are all data-poor and show upswings in diversification rate, suggesting that their results were biased by undersampling. Our results show that greater sampling of genomes is needed for accurate time and rate estimation, which are basic data used in ecological and evolutionary research.
AbstractAlong with tRNAs, enzymes that modify anticodon bases are a key aspect of translation across the tree of life. tRNA modifications extend wobble pairing, allowing specific (“target”) tRNAs to recognize multiple codons and cover for other (“nontarget”) tRNAs, often improving translation efficiency and accuracy. However, the detailed evolutionary history and impact of tRNA modifying enzymes has not been analyzed. Using ancestral reconstruction of five tRNA modifications across 1093 bacteria, we show that most modifications were ancestral to eubacteria, but were repeatedly lost in many lineages. Most modification losses coincided with evolutionary shifts in nontarget tRNAs, often driven by increased bias in genomic GC and associated codon use, or by genome reduction. In turn, the loss of tRNA modifications stabilized otherwise highly dynamic tRNA gene repertoires. Our work thus traces the complex history of bacterial tRNA modifications, providing the first clear evidence for their role in the evolution of bacterial translation.
AbstractWe genotyped 738 individuals belonging to 49 populations from Nepal, Bhutan, North India, or Tibet at over 500,000 SNPs, and analyzed the genotypes in the context of available worldwide population data in order to investigate the demographic history of the region and the genetic adaptations to the harsh environment. The Himalayan populations resembled other South and East Asians, but in addition displayed their own specific ancestral component and showed strong population structure and genetic drift. We also found evidence for multiple admixture events involving Himalayan populations and South/East Asians between 200 and 2,000 years ago. In comparisons with available ancient genomes, the Himalayans, like other East and South Asian populations, showed similar genetic affinity to Eurasian hunter-gatherers (a 24,000-year-old Upper Palaeolithic Siberian), and the related Bronze Age Yamnaya. The high-altitude Himalayan populations all shared a specific ancestral component, suggesting that genetic adaptation to life at high altitude originated only once in this region and subsequently spread. Combining four approaches to identifying specific positively selected loci, we confirmed that the strongest signals of high-altitude adaptation were located near the Endothelial PAS domain-containing protein 1 and Egl-9 Family Hypoxia Inducible Factor 1 loci, and discovered eight additional robust signals of high-altitude adaptation, five of which have strong biological functional links to such adaptation. In conclusion, the demographic history of Himalayan populations is complex, with strong local differentiation, reflecting both genetic and cultural factors; these populations also display evidence of multiple genetic adaptations to high-altitude environments.
AbstractDissecting the evolutionary genetic processes underlying eye reduction and vision loss in obligate cave-dwelling organisms has been a long-standing challenge in evolutionary biology. Independent vision loss events in related subterranean organisms can provide critical insight into these processes as well as into the nature of convergent loss of complex traits. Advances in evolutionary developmental biology have illuminated the significant role of heritable gene expression variation in the evolution of new forms. Here, we analyze gene expression variation in adult eye tissue across the freshwater crayfish, representing four independent vision-loss events in caves. Species and individual expression patterns cluster by eye function rather than phylogeny, suggesting convergence in transcriptome evolution in independently blind animals. However, this clustering is not greater than what is observed in surface species with conserved eye function after accounting for phylogenetic expectations. Modeling expression evolution suggests that there is a common increase in evolutionary rates in the blind lineages, consistent with a relaxation of selective constraint maintaining optimal expression levels. This is evidence for a repeated loss of expression constraint in the transcriptomes of blind animals and that convergence occurs via a similar trajectory through genetic drift.
AbstractAging is a complex process affecting different species and individuals in different ways. Comparing genetic variation across species with their aging phenotypes will help understanding the molecular basis of aging and longevity. Although most studies on aging have so far focused on short-lived model organisms, recent comparisons of genomic, transcriptomic, and metabolomic data across lineages with different lifespans are unveiling molecular signatures associated with longevity. Here, we examine the relationship between genomic variation and maximum lifespan across primate species. We used two different approaches. First, we searched for parallel amino-acid mutations that co-occur with increases in longevity across the primate linage. Twenty-five such amino-acid variants were identified, several of which have been previously reported by studies with different experimental setups and in different model organisms. The genes harboring these mutations are mainly enriched in functional categories such as wound healing, blood coagulation, and cardiovascular disorders. We demonstrate that these pathways are highly enriched for pleiotropic effects, as predicted by the antagonistic pleiotropy theory of aging. A second approach was focused on changes in rates of protein evolution across the primate phylogeny. Using the phylogenetic generalized least squares, we show that some genes exhibit strong correlations between their evolutionary rates and longevity-associated traits. These include genes in the Sphingosine 1-phosphate pathway, PI3K signaling, and the Thrombin/protease-activated receptor pathway, among other cardiovascular processes. Together, these results shed light into human senescence patterns and underscore the power of comparative genomics to identify pathways related to aging and longevity.
AbstractRepeated evolutionary events imply underlying genetic constraints that can make evolutionary mechanisms predictable. Morphological traits are thought to evolve frequently through cis-regulatory changes because these mechanisms bypass constraints in pleiotropic genes that are reused during development. In contrast, the constraints acting on metabolic traits during evolution are less well studied. Here we show how a metabolic bottleneck gene has repeatedly adopted similar cis-regulatory solutions during evolution, likely due to its pleiotropic role integrating flux from multiple metabolic pathways. Specifically, the genes encoding phosphoglucomutase activity (PGM1/PGM2), which connect GALactose catabolism to glycolysis, have gained and lost direct regulation by the transcription factor Gal4 several times during yeast evolution. Through targeted mutations of predicted Gal4-binding sites in yeast genomes, we show this galactose-mediated regulation of PGM1/2 supports vigorous growth on galactose in multiple yeast species, including Saccharomyces uvarum and Lachancea kluyveri. Furthermore, the addition of galactose-inducible PGM1 alone is sufficient to improve the growth on galactose of multiple species that lack this regulation, including Saccharomyces cerevisiae. The strong association between regulation of PGM1/2 by Gal4 even enables remarkably accurate predictions of galactose growth phenotypes between closely related species. This repeated mode of evolution suggests that this specific cis-regulatory connection is a common way that diverse yeasts can govern flux through the pathway, likely due to the constraints imposed by this pleiotropic bottleneck gene. Since metabolic pathways are highly interconnected, we argue that cis-regulatory evolution might be widespread at pleiotropic genes that control metabolic bottlenecks and intersections.
AbstractWhile the natural history of flatfish has been debated for decades, the mode of diversification of this biologically and economically important group has never been elucidated. To address this question, we assembled the largest molecular data set to date, covering > 300 species (out of ca. 800 extant), from 13 of the 14 known families over nine genes, and employed relaxed molecular clocks to uncover their patterns of diversification. As the fossil record of flatfish is contentious, we used sister species distributed on both sides of the American continent to calibrate clock models based on the closure of the Central American Seaway (CAS), and on their current species range. We show that flatfish diversified in two bouts, as species that are today distributed around the equator diverged during the closure of CAS, whereas those with a northern range diverged after this, hereby suggesting the existence of a postCAS closure dispersal for these northern species, most likely along a trans-Arctic northern route, a hypothesis fully compatible with paleogeographic reconstructions.
AbstractThe Universal Gene Set of Life (UGSL) is common to genomes of all extant organisms. The UGSL is small, consisting of <100 genes, and is dominated by genes encoding the translation system. Here we extend the search for biological universality to three dimensions. We characterize and quantitate the universality of structure of macromolecules that are common to all of life. We determine that around 90% of prokaryotic ribosomal RNA (rRNA) forms a common core, which is the structural and functional foundation of rRNAs of all cytoplasmic ribosomes. We have established a database, which we call the Sparse and Efficient Representation of the Extant Biology (the SEREB database). This database contains complete and cross-validated rRNA sequences of species chosen, as far as possible, to sparsely and efficiently sample all known phyla. Atomic-resolution structures of ribosomes provide data for structural comparison and validation of sequence-based models. We developed a similarity statistic called pairing adjusted sequence entropy, which characterizes paired nucleotides by their adherence to covariation and unpaired nucleotides by conventional conservation of identity. For canonically paired nucleotides the unit of structure is the nucleotide pair. For unpaired nucleotides, the unit of structure is the nucleotide. By quantitatively defining the common core of rRNA, we systematize the conservation and divergence of the translational system across the tree of life, and can begin to understand the unique evolutionary pressures that cause its universality. We explore the relationship between ribosomal size and diversity, geological time, and organismal complexity.
AbstractUstilaginomycotina is home to a broad array of fungi including important plant pathogens collectively called smut fungi. Smuts are biotrophs that produce characteristic perennating propagules called teliospores, one of which, Ustilago maydis, is a model genetic organism. Broad exploration of smut biology has been hampered by limited phylogenetic resolution of Ustilaginiomycotina as well as an overall lack of genomic data for members of this subphylum. In this study, we sequenced eight Ustilaginomycotina genomes from previously unrepresented lineages, deciphered ordinal-level phylogenetic relationships for the subphylum, and performed comparative analyses. Unlike other Basidiomycota subphyla, all sampled Ustilaginomycotina genomes are relatively small and compact. Ancestral state reconstruction analyses indicate that teliospore formation was present at the origin of the subphylum. Divergence time estimation dates the divergence of most extant smut fungi after that of grasses (Poaceae). However, we found limited conservation of well-characterized genes related to smut pathogenesis from U. maydis, indicating dissimilar pathogenic mechanisms exist across other smut lineages. The genomes of Malasseziomycetes are highly diverged from the other sampled Ustilaginomycotina, likely due to their unique history as mammal-associated lipophilic yeasts. Despite extensive genomic data, the phylogenetic placement of this class remains ambiguous. Although the sampled Ustilaginomycotina members lack many core enzymes for plant cell wall decomposition and starch catabolism, we identified several novel carbohydrate active enzymes potentially related to pectin breakdown. Finally, ∼50% of Ustilaginomycotina species-specific genes are present in previously undersampled and rare lineages, highlighting the importance of exploring fungal diversity as a resource for novel gene discovery.
AbstractLipids are essential structural and functional components of cells. Little is known, however, about the evolution of lipid composition in different tissues. Here, we report a large-scale analysis of the lipidome evolution in six tissues of 32 species representing primates, rodents, and bats. While changes in genes’ sequence and expression accumulate proportionally to the phylogenetic distances, <2% of the lipidome evolves this way. Yet, lipids constituting this 2% cluster in specific functions shared among all tissues. Among species, human show the largest amount of species-specific lipidome differences. Many of the uniquely human lipidome features localize in the brain cortex and cluster in specific pathways implicated in cognitive disorders.
AbstractPhenotypic plasticity results in a diversity of phenotypes from a single genotype in response to environmental cues. To understand the molecular basis of phenotypic plasticity, studies have focused on differential gene expression levels between environmentally determined phenotypes. The extent of alternative splicing differences among environmentally determined phenotypes has largely been understudied. Here, we study alternative splicing differences among plastically produced morphs of the pea aphid using RNA-sequence data. Pea aphids express two separate polyphenisms (plasticity with discrete phenotypes): a wing polyphenism consisting of winged and wingless females and a reproduction polyphenism consisting of asexual and sexual females. We find that pea aphids alternatively splice 34% of their genes, a high percentage for invertebrates. We also find that there is extensive use of differential spliced events between genetically identical, polyphenic females. These differentially spliced events are enriched for exon skipping and mutually exclusive exon events that maintain the open reading frame, suggesting that polyphenic morphs use alternative splicing to produce phenotype-biased proteins. Many genes that are differentially spliced between polyphenic morphs have putative functions associated with their respective phenotypes. We find that the majority of differentially spliced genes is not differentially expressed genes. Our results provide a rich candidate gene list for future functional studies that would not have been previously considered based solely on gene expression studies, such as ensconsin in the reproductive polyphenism, and CAKI in the wing polyphenism. Overall, this study suggests an important role for alternative splicing in the expression of environmentally determined phenotypes.
AbstractUnlike most crops, which were domesticated through long periods of selection by ancient humans, horticultural plants were primarily domesticated through intentional selection over short time periods. The molecular mechanisms underlying the origin and spread of novel traits in the domestication process have remained largely unexplored in horticultural plants. Gloxinia (Sinningia speciosa), whose attractive peloric flowers influenced the thoughts of Darwin, have been cultivated since the early 19th century, but its origin and genetic basis are currently unknown. By employing multiple experimental approaches including genetic analysis, genotype–phenotype associations, gene expression analysis, and functional interrogations, we showed that a single gene encoding a TCP protein, SsCYC, controls both floral orientation and zygomorphy in gloxinia. We revealed that a causal mutation responsible for the development of peloric gloxinia lies in a 10-bp deletion in the coding sequence of SsCYC. By combining genetic inference and literature searches, we have traced the putative ancestor and reconstructed the domestication path of the peloric gloxinia, in which a 10-bp deletion in SsCYC under selection triggered its evolution from the wild progenitor. The results presented here suggest that a simple genetic change in a pleiotropic gene can promote the elaboration of floral organs under intensive selection pressure.
AbstractHorizontal gene transfer (HGT) can equip organisms with novel genes, expanding the repertoire of genetic material available for evolutionary innovation and allowing recipient lineages to colonize new environments. However, few studies have characterized the functions of HGT genes experimentally or examined postacquisition functional divergence. Here, we report the use of ancestral sequence reconstruction and heterologous expression in Saccharomyces cerevisiae to examine the evolutionary history of an oomycete transporter gene family that was horizontally acquired from fungi. We demonstrate that the inferred ancestral oomycete HGT transporter proteins and their extant descendants transport dicarboxylic acids which are intermediates of the tricarboxylic acid cycle. The substrate specificity profile of the most ancestral protein has largely been retained throughout the radiation of oomycetes, including in both plant and animal pathogens and in a free-living saprotroph, indicating that the ancestral HGT transporter function has been maintained by selection across a range of different lifestyles. No evidence of neofunctionalization in terms of substrate specificity was detected for different HGT transporter paralogues which have different patterns of temporal expression. However, a striking expansion of substrate range was observed for one plant pathogenic oomycete, with a HGT derived paralogue from Pythium aphanidermatum encoding a protein that enables tricarboxylic acid uptake in addition to dicarboxylic acid uptake. This demonstrates that HGT acquisitions can provide functional additions to the recipient proteome as well as the foundation material for the evolution of expanded protein functions.
AbstractRed algae (Rhodophyta) underwent two phases of large-scale genome reduction during their early evolution. The red seaweeds did not attain genome sizes or gene inventories typical of other multicellular eukaryotes. We generated a high-quality 92.1 Mb draft genome assembly from the red seaweed Gracilariopsis chorda, including methylation and small (s)RNA data. We analyzed these and other Archaeplastida genomes to address three questions: 1) What is the role of repeats and transposable elements (TEs) in explaining Rhodophyta genome size variation, 2) what is the history of genome duplication and gene family expansion/reduction in these taxa, and 3) is there evidence for TE suppression in red algae? We find that the number of predicted genes in red algae is relatively small (4,803–13,125 genes), particularly when compared with land plants, with no evidence of polyploidization. Genome size variation is primarily explained by TE expansion with the red seaweeds having the largest genomes. Long terminal repeat elements and DNA repeats are the major contributors to genome size growth. About 8.3% of the G. chorda genome undergoes cytosine methylation among gene bodies, promoters, and TEs, and 71.5% of TEs contain methylated-DNA with 57% of these regions associated with sRNAs. These latter results suggest a role for TE-associated sRNAs in RNA-dependent DNA methylation to facilitate silencing. We postulate that the evolution of genome size in red algae is the result of the combined action of TE spread and the concomitant emergence of its epigenetic suppression, together with other important factors such as changes in population size.
AbstractThermotolerance is a polygenic trait that contributes to cell survival and growth under unusually high temperatures. Although some genes associated with high-temperature growth (Htg+) have been identified, how cells accumulate mutations to achieve prolonged thermotolerance is still mysterious. Here, we conducted experimental evolution of a Saccharomyces cerevisiae laboratory strain with stepwise temperature increases for it to grow at 42 °C. Whole genome resequencing of 14 evolved strains and the parental strain revealed a total of 153 mutations in the evolved strains, including single nucleotide variants, small INDELs, and segmental duplication/deletion events. Some mutations persisted from an intermediate temperature to 42 °C, so they might be Htg+ mutations. Functional categorization of mutations revealed enrichment of exonic mutations in the SWI/SNF complex and F-type ATPase, pointing to their involvement in high-temperature tolerance. In addition, multiple mutations were found in a general stress-associated signal transduction network consisting of Hog1 mediated pathway, RAS-cAMP pathway, and Rho1-Pkc1 mediated cell wall integrity pathway, implying that cells can achieve Htg+ partly through modifying existing stress regulatory mechanisms. Using pooled segregant analysis of five Htg+ phenotype-orientated pools, we inferred causative mutations for growth at 42 °C and identified those mutations with stronger impacts on the phenotype. Finally, we experimentally validated a number of the candidate Htg+ mutations. This study increased our understanding of the genetic basis of yeast tolerance to high temperature.
AbstractThe common ancestry of archaea and eukaryotes is evident in their genome architecture. All eukaryotic and several archaeal genomes consist of multiple chromosomes, each replicated from multiple origins. Three scenarios have been proposed for the evolution of this genome architecture: 1) mutational diversification of a multi-copy chromosome; 2) capture of a new chromosome by horizontal transfer; 3) acquisition of new origins and splitting into two replication-competent chromosomes. We report an example of the third scenario: the multi-origin chromosome of the archaeon Haloferax volcanii has split into two elements via homologous recombination. The newly generated elements are bona fide chromosomes, because each bears “chromosomal” replication origins, rRNA loci, and essential genes. The new chromosomes were stable during routine growth but additional genetic manipulation, which involves selective bottlenecks, provoked further rearrangements. To the best of our knowledge, rearrangement of a naturally evolved prokaryotic genome to generate two new chromosomes has not been described previously.
AbstractPseudomonas aeruginosa is an important opportunistic pathogen in hospitals, responsible for various infections that are difficult to treat due to intrinsic and acquired antibiotic resistance. Here, 20 epidemiologically unrelated strains isolated from patients in a general hospital over a time period of two decades were analyzed using whole genome sequencing. The genomes were compared in order to assess the presence of a predominant clone or sequence type (ST). No clonal structure was identified, but core genome-based single nucleotide polymorphism (SNP) analysis distinguished two major, previously identified phylogenetic groups. Interestingly, most of the older strains isolated between 1994 and 1998 harbored exoU, encoding a cytotoxic phospholipase. In contrast, most strains isolated between 2011 and 2016 were exoU-negative and phylogenetically very distinct from the older strains, suggesting a population shift of nosocomial P. aeruginosa over time. Three out of 20 strains were ST235 strains, a global high-risk clonal lineage; these carried several additional resistance determinants including aac(6’)Ib-cr encoding an aminoglycoside N-acetyltransferase that confers resistance to fluoroquinolones. Core genome comparison with ST235 strains from other parts of the world showed that the three strains clustered together with other Brazilian/Argentinean isolates. Despite this regional relatedness, the individuality of each of the three ST235 strains was revealed by core genome-based SNPs and the presence of genomic islands in the accessory genome. Similarly, strain-specific characteristics were detected for the remaining strains, indicative of individual evolutionary histories and elevated genome plasticity.
AbstractMicrorchidia (MORC) proteins have been described as epigenetic regulators and plant immune mediators in Arabidopsis. Typically, plant and animal MORC proteins contain a hallmark GHKL-type (Gyrase, Hsp90, Histidine kinase, MutL) ATPase domain in their N-terminus. Here, 356 and 83 MORC orthologues were identified in 60 plant and 27 animal genomes. Large-scale MORC sequence analyses revealed the presence of a highly conserved motif composition that defined as the MORC domain. The MORC domain was present in both plants and animals, indicating that it originated in the common ancestor before the divergence of plants and animals. Phylogenetic analyses showed that MORC genes in both plant and animal lineages were clearly classified into two major groups, named Plants-Group I, Plants-Group II and Animals-Group I, Animals-Group II, respectively. Further analyses of MORC genes in green plants uncovered that Group I can be subdivided into Group I-1 and Group I-2. Group I-1 only contains seed plant genes, suggesting that Group I-1 and I-2 divergence occurred at least before the emergence of spermatophytes. Group I-2 and Group II have undergone several gene duplications, resulting in the expansion of MORC gene family in angiosperms. Additionally, MORC gene expression analyses in Arabidopsis, soybean, and rice revealed a higher expression level in reproductive tissues compared with other organs, and showed divergent expression patterns for several paralogous gene pairs. Our studies offered new insights into the origins, phylogenetic relationships, and expressional patterns of MORC family members in green plants, which would help to further reveal their functions as plant epigenetic regulators.
AbstractPseudogenes are a paradigm of neutral evolution and their study has the potential to reveal intrinsic mutational biases. However, this potential is mitigated by the fact that pseudogenes are quickly purged from bacterial genomes. Here, we assembled a large set of pseudogenes from genomes experiencing reductive evolution as well as functional references for which we could establish reliable phylogenetic relationships. Using this unique dataset, we identified 857 independent insertion and deletion mutations and discover a pervasive bias towards deletions, but not insertions, with sizes multiples of 3 nt. We further show that selective constraints for the preservation of gene frame are unlikely to account for the observed mutational bias and propose that a mechanistic bias in alternative end-joining repair, a recombination-independent double strand break DNA repair mechanism, is responsible for the accumulation of 3n deletions.
AbstractMany insects rely on bacterial symbionts to supply essential amino acids and vitamins that are deficient in their diets, but metabolic comparisons of closely related gut bacteria in insects with different dietary preferences have not been performed. Here, we demonstrate that herbivorous ants of the genus Dolichoderus from the Peruvian Amazon host bacteria of the family Bartonellaceae, known for establishing chronic or pathogenic infections in mammals. We detected these bacteria in all studied Dolichoderus species, and found that they reside in the midgut wall, that is, the same location as many previously described nutritional endosymbionts of insects. The genomic analysis of four divergent strains infecting different Dolichoderus species revealed genes encoding pathways for nitrogen recycling and biosynthesis of several vitamins and all essential amino acids. In contrast, several biosynthetic pathways have been lost, whereas genes for the import and conversion of histidine and arginine to glutamine have been retained in the genome of a closely related gut bacterium of the carnivorous ant Harpegnathos saltator. The broad biosynthetic repertoire in Bartonellaceae of herbivorous ants resembled that of gut bacteria of honeybees that likewise feed on carbohydrate-rich diets. Taken together, the broad distribution of Bartonellaceae across Dolichoderus ants, their small genome sizes, the specific location within hosts, and the broad biosynthetic capability suggest that these bacteria are nutritional symbionts in herbivorous ants. The results highlight the important role of the host nutritional biology for the genomic evolution of the gut microbiota—and conversely, the importance of the microbiota for the nutrition of hosts.
AbstractRenibacterium salmoninarum, a slow-growing facultative intracellular pathogen belonging to the high C + G content Actinobacteria phylum, is the causative agent of bacterial kidney disease, a progressive granulomatous infection affecting salmonids worldwide. This Gram-positive bacterium has existed in the Chilean salmonid industry for >30 years, but little or no information is available regarding the virulence mechanisms and genomic characteristics of Chilean isolates. In this study, the genomes of two Chilean isolates (H-2 and DJ2R) were sequenced, and a search was conducted for genes and proteins involved in virulence and pathogenicity, and we compare with the type strain ATCC 33209 T genome. The genome sizes of H-2 and DJ2R are 3,155,332 bp and 3,155,228 bp, respectively. They genomes presented six ribosomal RNA, 46 transcription RNA, and 25 noncodingRNA, and both had the same 56.27% G + C content described for the type strain ATCC 33209 T. A total of 3,522 and 3,527 coding sequences were found for H-2 and DJ2R, respectively. Meanwhile, the ATCC 33209 T type strain had 3,519 coding sequences. The in silico genome analysis revealed a genes related to tricarboxylic acid cycle, glycolysis, iron transport and others metabolic pathway. Also, the data indicated that R salmoninarum may have a variety of possible virulence-factor and antibiotic-resistance strategies. Interestingly, many of genes had high identities with Mycobacterium species, a known pathogenic Actinobacteria bacterium. In summary, this study provides the first insights into and initial steps towards understanding the molecular basis of antibiotic resistance, virulence mechanisms and host/environment adaptation in two Chilean R. salmoninarum isolates that contain proteins of which were similar to those of Mycobacterium. Furthermore, important information is presented that could facilitate the development of preventive and treatment measures against R. salmoninarum in Chile and worldwide.
AbstractFunctional redundancy, understood as the functional overlap of different genes, is a double-edge sword. At the one side, it is thought to serve as a robustness mechanism that buffers the deleterious effect of mutations hitting one of the redundant copies, thus resulting in pseudogenization. At the other side, it is considered as a source of genetic and functional innovation. In any case, genetically redundant genes are expected to show an acceleration in the rate of molecular evolution. Here, we tackle the role of functional redundancy in viral RNA genomes. To this end, we have evaluated the rates of compensatory evolution for deleterious mutations affecting an essential function, the suppression of RNA silencing plant defense, of tobacco etch potyvirus (TEV). TEV genotypes containing deleterious mutations in presence/absence of engineered functional redundancy were evolved and the pattern of fitness and pathogenicity recovery evaluated. Genetically redundant genotypes suffered less from the effect of deleterious mutations and showed relatively minor changes in fitness and pathogenicity. By contrast, nongenetically redundant genotypes had very low fitness and pathogenicity at the beginning of the evolution experiment that were fully recovered by the end. At the molecular level, the outcome depended on the combination of the actual mutations being compensated and the presence/absence of functional redundancy. Reversions to wild-type alleles were the norm in the nonredundant genotypes while redundant ones either did not fix any mutation at all or showed a higher nonsynonymous mutational load.
AbstractThe Origin of Life Domain (OLD) is the period during which life on Earth began. Here, we derive and use a new phylogenetic algorithm to analyze Protein Families in order to reconstruct the chronological steps by which the OLD evolved. During this period, life began with the appearance of the fundamental components of life such as RNAs, DNAs, amino acids, and membranes. Chronologically, the Origin of Life preceded the Last Universal Common Ancestor, which then subsequently engendered modern life on Earth. Our phylogenetic algorithm allows us to explicitly answer previously unknown origin of life questions. Specifically, we explain and illustrate our computational methods by reconstructing the rings describing the evolution of the RNA and DNA worlds. We phylogenetically reconstruct how the RNA and DNA worlds evolved, infer the origins and chronological order of appearance of the first genetic codes, test whether the Ribosomal RNA world preceded the Membrane world, and interpret these new findings with respect to the experimental and theoretical origin of life studies by others.
AbstractThere is now ample evidence that endosymbionts can contribute to host adaptation to environmental challenges. However, how endosymbiont presence affects the adaptive trajectory and outcome of the host is yet largely unexplored. In Drosophila, Wolbachia confers protection to RNA virus infection, an effect that differs between Wolbachia strains and can be targeted by selection. Adaptation to RNA virus infections is mediated by both Wolbachia and the host, raising the question of whether adaptive genetic changes in the host vary with the presence/absence of the endosymbiont. Here, we address this question using a polymorphic D. melanogaster population previously adapted to DCV infection for 35 generations in the presence of Wolbachia, from which we removed the endosymbiont and followed survival over the subsequent 20 generations of infection. After an initial severe drop, survival frequencies upon DCV selection increased significantly, as seen before in the presence of Wolbachia. Whole-genome sequencing, revealed that the major genes involved in the first selection experiment, pastrel and Ubc-E2H, continued to be selected in Wolbachia-free D. melanogaster, with the frequencies of protective alleles being closer to fixation in the absence of Wolbachia. Our results suggest that heterogeneity in Wolbachia infection status may be sufficient to maintain polymorphisms even in the absence of costs.
AbstractDespite many hypotheses regarding the roles of fluorescent proteins (FPs), their biological roles and the genetic basis of FP-mediated color polymorphisms in Acropora remain unclear. In this study, we determined the genetic mechanism underlying fluorescent polymorphisms in A. digitifera. Using a high-throughput sequencing approach, we found that FP gene sequences in FP multigene family exhibit presence–absence polymorphism among individuals. A few particular sequences in short-to-middle wavelength emission and middle-to-long wavelength emission clades were highly expressed in adults, and different sequences were highly expressed in larvae. These highly expressed sequences were absent in the genomes of individuals with low total FP gene expression. In adults, presence–absence differences of the highly expressed FP sequences were consistent with measurements of emission spectra of corals, suggesting that presence–absence polymorphisms of these FP sequences contributed to the fluorescent polymorphisms. The functions of recombinant FPs encoded by highly expressed sequences in adult and larval stages were different, suggesting that expression of FP sequences with different functions may depend on the life-stage of A. digitifera. Highly expressed FP sequences exhibited presence–absence polymorphisms in subpopulations of A. digitifera, suggesting that presence–absence status is maintained during the evolution of A. digitifera subpopulations. The difference in FPs between adults and larvae and the polymorphisms of highly expressed FP genes may provide key insight into the biological roles of FPs in corals.
AbstractThis study investigated long-term substitution rate differences using three calibration points, divergences between lobe-finned vertebrates and ray-finned fish, between mammals and sauropsids, and between holosteans (gar and bowfin) and teleost fish with amino acid sequence data of 625 genes for 25 bony vertebrates. The result showed that the substitution rate was two to three times higher in the stem branches of lobe-finned vertebrates before the mammal-sauropsid divergence than in amniotes. The rate in the stem branch of ray-finned fish before the holostean-teleost fish divergence was also a few times higher than the holostean rate, whereas it was similar to or somewhat slower than the teleost fish rate. The phylogenetic relationship of coelacanth and lungfish with tetrapod was difficult to determine because of the short interval of the divergences. Considering the high rate in the stem branches, the divergences of coelacanth and lungfish from the stem branch were estimated as 408–427 Ma and 399–414 Ma, respectively, with the interval of 9–13 Myr. With the external calibration of the mammal-sauropsid split, the estimated times for ordinal divergences within eutherian mammals tend to be smaller than those in previous studies that used the calibration points within the lineage, with deeper divergences before the Cretaceous–Paleogene boundary and shallower ones after the boundary. In contrast the estimated times within birds were larger than those of previous studies, with the divergence between Galliformes and Anseriformes ∼80 Ma and that between Galloanserae and Neoaves 110 Ma.
AbstractPolydnaviruses (PDVs) are compelling examples of viral domestication, in which wasps express a large set of genes originating from a chromosomally integrated virus to produce particles necessary for their reproductive success. Parasitoid wasps generally use PDVs as a virulence gene delivery system allowing the protection of their progeny in the body of parasitized host. However, in the wasp Venturia canescens an independent viral domestication process led to an alternative strategy as the wasp incorporates virulence proteins in viral liposomes named virus-like particles (VLPs), instead of DNA molecules. Proteomic analysis of purified VLPs and transcriptome sequencing revealed the loss of some viral functions. In particular, the genes coding for capsid components are no longer expressed, which explains why VLPs do not incorporate DNA. Here a thorough examination of V. canescens genome revealed the presence of the pseudogenes corresponding to most of the genes involved in lost functions. This strongly suggests that an accumulation of mutations that leads to gene specific pseudogenization precedes the loss of viral genes observed during virus domestication. No evidence was found for block loss of collinear genes, although extensive gene order reshuffling of the viral genome was identified from comparisons between endogenous and exogenous viruses. These results provide the first insights on the early stages of large DNA virus domestication implicating massive genome reduction through gene-specific pseudogenization, a process which differs from the large deletions described for bacterial endosymbionts.
AbstractThe mutational patterns of large tandem arrays of short sequence repeats remain largely unknown, despite observations of their high levels of variation in sequence and genomic abundance within and between species. Many factors can influence the dynamics of tandem repeat evolution; however, their evolution has only been examined over a limited phylogenetic sample of taxa. Here, we use publicly available whole-genome sequencing data of 85 haploid mutation accumulation lines derived from six geographically diverse Chlamydomonas reinhardtii isolates to investigate genome-wide mutation rates and patterns in tandem repeats in this species. We find that tandem repeat composition differs among ancestral strains, both in genome-wide abundance and presence/absence of individual repeats. Estimated mutation rates (repeat copy number expansion and contraction) were high, averaging 4.3×10−4 per generation per single unit copy. Although orders of magnitude higher than other types of mutation previously reported in C. reinhardtii, these tandem repeat mutation rates were one order of magnitude lower than what has recently been found in Daphnia pulex, even after correcting for lower overall genome-wide satellite abundance in C. reinhardtii. Most high-abundance repeats were related to others by a single mutational step. Correlations of repeat copy number changes within genomes revealed clusters of closely related repeats that were strongly correlated positively or negatively, and similar patterns of correlation arose independently in two different mutation accumulation experiments. Together, these results paint a dynamic picture of tandem repeat evolution in this unicellular alga.
AbstractThe combined actions of proteins in networks underlie all fundamental cellular functions. Deeper insights into the dynamics of network composition across species and their functional consequences are crucial to fully understand protein network evolution. Large-scale comparative studies with high phylogenetic resolution are now feasible through the recent rise in available genomic data sets of both model and nonmodel species. Here, we focus on the polarity network, which is universally essential for cell proliferation and studied in great detail in the model organism, Saccharomyces cerevisiae. We examine 42 proteins, directly related to cell polarization, across 298 fungal strains/species to determine the composition of the network and patterns of conservation and diversification. We observe strong protein conservation for a group of 23 core proteins: >95% of all examined strains/species possess at least 14 of these core proteins, albeit in varying compositions, and non of the individual core proteins is 100% conserved. We find high levels of variation in prevalence and sequence identity in the remaining 19 proteins, resulting in distinct lineage-specific compositions of the network in the majority of strains/species. We show that the observed diversification in network composition correlates with lineage, lifestyle, and genetic distance. Yeast, filamentous and basal unicellular fungi, form distinctive groups based on these analyses, with substantial differences to their polarization network. Our study shows that the fungal polarization network is highly dynamic, even between closely related species, and that functional conservation appears to be achieved by varying the specific components of the fungal polarization repertoire.
AbstractDuring the last decades, the mammalian genome has been proposed to have regions prone to breakage and reorganization concentrated in certain chromosomal bands that seem to correspond to evolutionary breakpoints. These bands are likely to be involved in chromosome fragility or instability. In Primates, some biomarkers of genetic damage may be associated with various degrees of genomic instability. Here, we investigated the usefulness of Sister Chromatid Exchange as a biomarker of potential sites of frequent chromosome breakage and rearrangement in Alouatta caraya, Ateles chamek, Ateles paniscus, and Cebus cay. These Neotropical species have particular genomic and chromosomal features allowing the analysis of genomic instability for comparative purposes. We determined the frequency of spontaneous induction of Sister Chromatid Exchanges and assessed the relationship between these and structural rearrangements implicated in the evolution of the primates of interest. Overall, A. caraya and C. cay presented a low proportion of statistically significant unstable bands, suggesting fairly stable genomes and the existence of some kind of protection against endogenous damage. In contrast, Ateles showed a highly significant proportion of unstable bands; these were mainly found in the rearranged regions, which is consistent with the numerous genomic reorganizations that might have occurred during the evolution of this genus.
AbstractFreshwater mussels (Bivalvia: Unionida) serve an important role as aquatic ecosystem engineers but are one of the most critically imperilled groups of animals. Here, we used a combination of sequencing strategies to assemble and annotate a draft genome of Venustaconcha ellipsiformis, which will serve as a valuable genomic resource given the ecological value and unique “doubly uniparental inheritance” mode of mitochondrial DNA transmission of freshwater mussels. The genome described here was obtained by combining high-coverage short reads (65× genome coverage of Illumina paired-end and 11× genome coverage of mate-pairs sequences) with low-coverage Pacific Biosciences long reads (0.3× genome coverage). Briefly, the final scaffold assembly accounted for a total size of 1.54 Gb (366,926 scaffolds, N50 = 6.5 kb, with 2.3% of “N” nucleotides), representing 86% of the predicted genome size of 1.80 Gb, while over one third of the genome (37.5%) consisted of repeated elements and >85% of the core eukaryotic genes were recovered. Given the repeated genetic bottlenecks of V. ellipsiformis populations as a result of glaciations events, heterozygosity was also found to be remarkably low (0.6%), in contrast to most other sequenced bivalve species. Finally, we reassembled the full mitochondrial genome and found six polymorphic sites with respect to the previously published reference. This resource opens the way to comparative genomics studies to identify genes related to the unique adaptations of freshwater mussels and their distinctive mitochondrial inheritance mechanism.
AbstractDosage compensation has evolved in concert with Y-chromosome degeneration in many taxa that exhibit heterogametic sex chromosomes. Dosage compensation overcomes the biological challenge of a “half dose” of X chromosome gene transcripts in the heterogametic sex. The need to equalize gene expression of a hemizygous X with that of autosomes arises from the fact that the X chromosomes retain hundreds of functional genes that are actively transcribed in both sexes and interact with genes expressed on the autosomes. Sex determination and heterogametic sex chromosomes have evolved multiple times in Diptera, and in each case the genetic control of dosage compensation is tightly linked to sex determination. In the Anopheles gambiae species complex (Culicidae), maleness is conferred by the Y-chromosome gene Yob, which despite its conserved role between species is polymorphic in its copy number between them. Previous work demonstrated that male An. gambiae s.s. males exhibit complete dosage compensation in pupal and adult stages. In the present study, we have extended this analysis to three sister species in the An. gambiae complex: An. coluzzii, An. arabiensis, and An. quadriannulatus. In addition, we analyzed dosage compensation in bi-directional F1 hybrids between these species to determine if hybridization results in the mis-regulation and disruption of dosage compensation. Our results confirm that dosage compensation operates in the An. gambiae species complex through the hypertranscription of the male X chromosome. Additionally, dosage compensation in hybrid males does not differ from parental males, indicating that hybridization does not result in the mis-regulation of dosage compensation.
AbstractAlpha satellite is the major repeated DNA element of primate centromeres. Specific evolutionary mechanisms have led to a great diversity of sequence families with peculiar genomic organization and distribution, which have till now been studied mostly in great apes. Using high throughput sequencing of alpha satellite monomers obtained by enzymatic digestion followed by computational and cytogenetic analysis, we compare here the diversity and genomic distribution of alpha satellite DNA in two related Old World monkey species, Cercopithecus pogonias and Cercopithecus solatus, which are known to have diverged about 7 Ma. Two main families of monomers, called C1 and C2, are found in both species. A detailed analysis of our data sets revealed the existence of numerous subfamilies within the centromeric C1 family. Although the most abundant subfamily is conserved between both species, our fluorescence in situ hybridization (FISH) experiments clearly show that some subfamilies are specific for each species and that their distribution is restricted to a subset of chromosomes, thereby pointing to the existence of recurrent amplification/homogenization events. The pericentromeric C2 family is very abundant on the short arm of all acrocentric chromosomes in both species, pointing to specific mechanisms that lead to this distribution. Results obtained using two different restriction enzymes are fully consistent with a predominant monomeric organization of alpha satellite DNA that coexists with higher order organization patterns in the C. pogonias genome. Our study suggests a high dynamics of alpha satellite DNA in Cercopithecini, with recurrent apparition of new sequence variants and interchromosomal sequence transfer.
AbstractHeterotrophic plants provide evolutionarily independent, natural experiments in the genomic consequences of radically altered nutritional regimes. Here, we have sequenced and annotated the plastid genome of the endangered mycoheterotrophic orchid Hexalectris warnockii. This orchid bears a plastid genome that is ∼80% the total length of the leafy, photosynthetic Phalaenopsis, and contains just over half the number of putatively functional genes of the latter. The plastid genome of H. warnockii bears pseudogenes and has experienced losses of genes encoding proteins directly (e.g., psa/psb, rbcL) and indirectly involved in photosynthesis (atp genes), suggesting it has progressed beyond the initial stages of plastome degradation, based on previous models of plastid genome evolution. Several dispersed and tandem repeats were detected, that are potentially useful as conservation genetic markers. In addition, a 29-kb inversion and a significant contraction of the inverted repeat boundaries are observed in this plastome. The Hexalectris warnockii plastid genome adds to a growing body of data useful in refining evolutionary models in parasites, and provides a resource for conservation studies in these endangered orchids.