Continue Reading →
Molecular Biology and Evolution, Volume 35, Issue 9, 1 September 2018, Pages 2230–2239, https://doi.org/10.1093/molbev/msy123
AbstractFungi are evolutionary shape shifters and adapt quickly to new environments. Ectomycorrhizal (EM) symbioses are mutualistic associations between fungi and plants and have evolved repeatedly and independently across the fungal tree of life, suggesting lineages frequently reconfigure genome content to take advantage of open ecological niches. To date analyses of genomic mechanisms facilitating EM symbioses have involved comparisons of distantly related species, but here, we use the genomes of three EM and two asymbiotic (AS) fungi from the genus Amanita as well as an AS outgroup to study genome evolution following a single origin of symbiosis. Our aim was to identify the defining features of EM genomes, but our analyses suggest no clear differentiation of genome size, gene repertoire size, or transposable element content between EM and AS species. Phylogenetic inference of gene gains and losses suggests the transition to symbiosis was dominated by the loss of plant cell wall decomposition genes, a confirmation of previous findings. However, the same dynamic defines the AS species A. inopinata, suggesting loss is not strictly associated with origin of symbiosis. Gene expansions in the common ancestor of EM Amanita were modest, but lineage specific and large gene family expansions are found in two of the three EM extant species. Even closely related EM genomes appear to share few common features. The genetic toolkit required for symbiosis appears already encoded in the genomes of saprotrophic species, and this dynamic may explain the pervasive, recurrent evolution of ectomycorrhizal associations.
AbstractVertebrate estrogen receptors (ERs) perform numerous cell signaling and transcriptional regulatory functions. ERɑ (Esr1) and ERβ (Esr2) likely evolved from an ancestral receptor that duplicated and diverged at the protein and cis-regulatory levels, but the evolutionary history of ERs, including the timing of proposed duplications, remains unresolved. Here we report on identification of two distinct ERs in cartilaginous fishes and demonstrate their orthology to ERα and ERβ. Phylogenetic analyses place the ERα/ERβ duplication near the base of crown gnathostomes (jawed vertebrates). We find that ERα and ERβ from little skate (Leucoraja erinacea) and mammals share key subtype-specific residues, indicating conserved protein evolution. In contrast, jawless fishes have multiple non-orthologous Esr genes that arose by parallel duplications. Esr1 and Esr2 are expressed in subtype-specific and sexually dimorphic patterns in skate embryos, suggesting that ERs might have functioned in sexually dimorphic development before the divergence of cartilaginous and bony fishes.
AbstractEnzymes are known to fine-tune their sequences to optimize catalytic function, yet quantitative evolutionary design principles of enzymes remain elusive on the proteomic scale. Recently, it was found that the catalytic site in enzymes induces long-range evolutionary constraint, where even sites distant to the catalytic site are more conserved than expected. Given that protein-fold usage is generally different between enzymes and nonenzymes, it remains an open question to what extent this long-range evolutionary constraint in enzymes is dictated, either directly or indirectly, by the special three-dimensional structure of the enzyme. To investigate this question, we have compared evolutionary properties of enzymes with those of counterpart pseudoenzymes that share the same protein fold but are catalytically inactive. We found that the long-range evolutionary constraint observed in enzymes is significantly reduced in pseudoenzyme counterparts, despite very high structural similarity (∼1.5 Å RMSD on average). Furthermore, this significant reduction in long-range evolutionary constraint is observed even in pseudoenzyme counterparts which retain the ligand-binding ability of enzymes. Finally, the distance between the site that induces the highest gradient of sequence conservation and the pseudocatalytic site in pseudoenzymes is significantly larger than the corresponding distance in enzymes. Taken together, our results suggest that the long-range evolutionary constraint in enzymes is induced mainly by the presence of the catalytic site rather than by the special three-dimensional structure of the enzyme, and that such long-range evolutionary constraint in enzymes depends mainly on the catalytic function of the active site rather than on the ligand-binding ability of the enzyme.
AbstractA key question in molecular evolutionary biology concerns the relative roles of mutation and selection in shaping genomic data. Moreover, features of mutation and selection are heterogeneous along the genome and over time. Mechanistic codon substitution models based on the mutation–selection framework are promising approaches to separating these effects. In practice, however, several complications arise, since accounting for such heterogeneities often implies handling models of high dimensionality (e.g., amino acid preferences), or leads to across-site dependence (e.g., CpG hypermutability), making the likelihood function intractable. Approximate Bayesian Computation (ABC) could address this latter issue. Here, we propose a new approach, named Conditional ABC (CABC), which combines the sampling efficiency of MCMC and the flexibility of ABC. To illustrate the potential of the CABC approach, we apply it to the study of mammalian CpG hypermutability based on a new mutation-level parameter implying dependence across adjacent sites, combined with site-specific purifying selection on amino-acids captured by a Dirichlet process. Our proof-of-concept of the CABC methodology opens new modeling perspectives. Our application of the method reveals a high level of heterogeneity of CpG hypermutability across loci and mild heterogeneity across taxonomic groups; and finally, we show that CpG hypermutability is an important evolutionary factor in rendering relative synonymous codon usage. All source code is available as a GitHub repository (https://github.com/Simonll/LikelihoodFreePhylogenetics.git).
AbstractFor 30 years, it has been clear that angiosperm mitochondrial genomes evolve rapidly in sequence arrangement (i.e., synteny), yet absolute rates of rearrangement have not been measured in any plant group, nor is it known how much these rates vary. To investigate these issues, we sequenced and reconstructed the rearrangement history of seven mitochondrial genomes in Monsonia (Geraniaceae). We show that rearrangements (occurring mostly as inversions) not only take place at generally high rates in these genomes but also uncover significant variation in rearrangement rates. For example, the hyperactive mitochondrial genome of Monsonia ciliata has accumulated at least 30 rearrangements over the last million years, whereas the branch leading to M. ciliata and its sister species has sustained rearrangement at a rate that is at least ten times lower. Furthermore, our analysis of published data shows that rates of mitochondrial genome rearrangement in seed plants vary by at least 600-fold. We find that sites of rearrangement are highly preferentially located in very close proximity to repeated sequences in Monsonia. This provides strong support for the hypothesis that rearrangement in angiosperm mitochondrial genomes occurs largely through repeat-mediated recombination. Because there is little variation in the amount of repeat sequence among Monsonia genomes, the variable rates of rearrangement in Monsonia probably reflect variable rates of mitochondrial recombination itself. Finally, we show that mitochondrial synonymous substitutions occur in a clock-like manner in Monsonia; rates of mitochondrial substitutions and rearrangements are therefore highly uncoupled in this group.
AbstractMeiotic recombination is an evolutionary force that generates new genetic diversity upon which selection can act. Whereas multiple studies have assessed genome-wide patterns of recombination and specific cases of intragenic recombination, few studies have assessed intragenic recombination genome-wide in higher eukaryotes. We identified recombination events within or near genes in a population of maize recombinant inbred lines (RILs) using RNA-sequencing data. Our results are consistent with case studies that have shown that intragenic crossovers cluster at the 5′ ends of some genes. Further, we identified cases of intragenic crossovers that generate transgressive transcript accumulation patterns, that is, recombinant alleles displayed higher or lower levels of expression than did nonrecombinant alleles in any of ∼100 RILs, implicating intragenic recombination in the generation of new variants upon which selection can act. Thousands of apparent gene conversion events were identified, allowing us to estimate the genome-wide rate of gene conversion at SNP sites (4.9 × 10−5). The density of syntenic genes (i.e., those conserved at the same genomic locations since the divergence of maize and sorghum) exhibits a substantial correlation with crossover frequency, whereas the density of nonsyntenic genes (i.e., those which have transposed or been lost subsequent to the divergence of maize and sorghum) shows little correlation, suggesting that crossovers occur at higher rates in syntenic genes than in nonsyntenic genes. Increased rates of crossovers in syntenic genes could be either a consequence of the evolutionary conservation of synteny or a biological process that helps to maintain synteny.
AbstractThe oxymonad Monocercomonoides exilis was recently reported to be the first eukaryote that has completely lost the mitochondrial compartment. It was proposed that an important prerequisite for such a radical evolutionary step was the acquisition of the SUF Fe–S cluster assembly pathway from prokaryotes, making the mitochondrial ISC pathway dispensable. We have investigated genomic and transcriptomic data from six oxymonad species and their relatives, composing the group Preaxostyla (Metamonada, Excavata), for the presence and absence of enzymes involved in Fe–S cluster biosynthesis. None possesses enzymes of mitochondrial ISC pathway and all apparently possess the SUF pathway, composed of SufB, C, D, S, and U proteins, altogether suggesting that the transition from ISC to SUF preceded their last common ancestor. Interestingly, we observed that SufDSU were fused in all three oxymonad genomes, and in the genome of Paratrimastix pyriformis. The donor of the SUF genes is not clear from phylogenetic analyses, but the enzyme composition of the pathway and the presence of SufDSU fusion suggests Firmicutes, Thermotogae, Spirochaetes, Proteobacteria, or Chloroflexi as donors. The inventory of the downstream CIA pathway enzymes is consistent with that of closely related species that retain ISC, indicating that the switch from ISC to SUF did not markedly affect the downstream process of maturation of cytosolic and nuclear Fe–S proteins.
AbstractThe genomics era has expanded our knowledge about the diversity of the living world, yet harnessing high-throughput sequencing data to investigate alternative evolutionary trajectories, such as hybridization, is still challenging. Here we present sppIDer, a pipeline for the characterization of interspecies hybrids and pure species, that illuminates the complete composition of genomes. sppIDer maps short-read sequencing data to a combination genome built from reference genomes of several species of interest and assesses the genomic contribution and relative ploidy of each parental species, producing a series of colorful graphical outputs ready for publication. As a proof-of-concept, we use the genus Saccharomyces to detect and visualize both interspecies hybrids and pure strains, even with missing parental reference genomes. Through simulation, we show that sppIDer is robust to variable reference genome qualities and performs well with low-coverage data. We further demonstrate the power of this approach in plants, animals, and other fungi. sppIDer is robust to many different inputs and provides visually intuitive insight into genome composition that enables the rapid identification of species and their interspecies hybrids. sppIDer exists as a Docker image, which is a reusable, reproducible, transparent, and simple-to-run package that automates the pipeline and installation of the required dependencies (https://github.com/GLBRC/sppIDer; last accessed September 6, 2018).
AbstractUnderstanding how microalgae adapt to rapidly changing environments is not only important to science but can help clarify the potential impact of climate change on the biology of primary producers. We sequenced and analyzed the nuclear genome of multiple Picochlorum isolates (Chlorophyta) to elucidate strategies of environmental adaptation. It was previously found that coordinated gene regulation is involved in adaptation to salinity stress, and here we show that gene gain and loss also play key roles in adaptation. We determined the extent of horizontal gene transfer (HGT) from prokaryotes and their role in the origin of novel functions in the Picochlorum clade. HGT is an ongoing and dynamic process in this algal clade with adaptation being driven by transfer, divergence, and loss. One HGT candidate that is differentially expressed under salinity stress is indolepyruvate decarboxylase that is involved in the production of a plant auxin that mediates bacteria–diatom symbiotic interactions. Large differences in levels of heterozygosity were found in diploid haplotypes among Picochlorum isolates. Biallelic divergence was pronounced in P. oklahomensis (salt plains environment) when compared with its closely related sister taxon Picochlorum SENEW3 (brackish water environment), suggesting a role of diverged alleles in response to environmental stress. Our results elucidate how microbial eukaryotes with limited gene inventories expand habitat range from mesophilic to halophilic through allelic diversity, and with minor but important contributions made by HGT. We also explore how the nature and quality of genome data may impact inference of nuclear ploidy.
AbstractMolluscan shells, mainly composed of calcium carbonate, also contain organic components such as proteins and polysaccharides. Shell organic matrices construct frameworks of shell structures and regulate crystallization processes during shell formation. To date, a number of shell matrix proteins (SMPs) have been identified, and their functions in shell formation have been studied. However, previous studies focused only on SMPs extracted from adult shells, secreted after metamorphosis. Using proteomic analyses combined with genomic and transcriptomic analyses, we have identified 31 SMPs from larval shells of the pearl oyster, Pinctada fucata, and 111 from the Pacific oyster, Crassostrea gigas. Larval SMPs are almost entirely different from those of adults in both species. RNA-seq data also confirm that gene expression profiles for larval and adult shell formation are nearly completely different. Therefore, bivalves have two repertoires of SMP genes to construct larval and adult shells. Despite considerable differences in larval and adult SMPs, some functional domains are shared by both SMP repertoires. Conserved domains include von Willebrand factor type A (VWA), chitin-binding (CB), carbonic anhydrase (CA), and acidic domains. These conserved domains are thought to play crucial roles in shell formation. Furthermore, a comprehensive survey of animal genomes revealed that the CA and VWA–CB domain-containing protein families expanded in molluscs after their separation from other Lophotrochozoan linages such as the Brachiopoda. After gene expansion, some family members were co-opted for molluscan SMPs that may have triggered to develop mineralized shells from ancestral, nonmineralized chitinous exoskeletons.
AbstractAs are most non-European populations, the Han Chinese are relatively understudied in population and medical genetics studies. From low-coverage whole-genome sequencing of 11,670 Han Chinese women we present a catalog of 25,057,223 variants, including 548,401 novel variants that are seen at least 10 times in our data set. Individuals from this data set came from 24 out of 33 administrative divisions across China (including 19 provinces, 4 municipalities, and 1 autonomous region), thus allowing us to study population structure, genetic ancestry, and local adaptation in Han Chinese. We identified previously unrecognized population structure along the East–West axis of China, demonstrated a general pattern of isolation-by-distance among Han Chinese, and reported unique regional signals of admixture, such as European influences among the Northwestern provinces of China. Furthermore, we identified a number of highly differentiated, putatively adaptive, loci (e.g., MTHFR, ADH7, and FADS, among others) that may be driven by immune response, climate, and diet in the Han Chinese. Finally, we have made available allele frequency estimates stratified by administrative divisions across China in the Geography of Genetic Variant browser for the broader community. By leveraging the largest currently available genetic data set for Han Chinese, we have gained insights into the history and population structure of the world’s largest ethnic group.
AbstractHuman populations often exhibit contrasting patterns of genetic diversity in the mtDNA and the nonrecombining portion of the Y-chromosome (NRY), which reflect sex-specific cultural behaviors and population histories. Here, we sequenced 2.3 Mb of the NRY from 284 individuals representing more than 30 Native American groups from Northwestern Amazonia (NWA) and compared these data to previously generated mtDNA genomes from the same groups, to investigate the impact of cultural practices on genetic diversity and gain new insights about NWA population history. Relevant cultural practices in NWA include postmarital residential rules and linguistic exogamy, a marital practice in which men are required to marry women speaking a different language. We identified 2,969 SNPs in the NRY sequences, only 925 of which were previously described. The NRY and mtDNA data showed different sex-specific demographic histories: female effective population size has been larger than that of males through time, which might reflect larger variance in male reproductive success. Both markers show an increase in lineage diversification beginning ∼5,000 years ago, which may reflect the intensification of agriculture, technological innovations, and the expansion of regional trade networks documented in the archaeological evidence. Furthermore, we find similar excesses of NRY versus mtDNA between-population divergence at both the local and continental scale, suggesting long-term stability of female versus male migration. We also find evidence of the impact of sociocultural practices on diversity patterns. Finally, our study highlights the importance of analyzing high-resolution mtDNA and NRY sequences to reconstruct demographic history, since this can differ considerably between sexes.
AbstractBacteria regulate genes to survive antibiotic stress, but regulation can be far from perfect. When regulation is not optimal, mutations that change gene expression can contribute to antibiotic resistance. It is not systematically understood to what extent natural gene regulation is or is not optimal for distinct antibiotics, and how changes in expression of specific genes quantitatively affect antibiotic resistance. Here we discover a simple quantitative relation between fitness, gene expression, and antibiotic potency, which rationalizes our observation that a multitude of genes and even innate antibiotic defense mechanisms have expression that is critically nonoptimal under antibiotic treatment. First, we developed a pooled-strain drug-diffusion assay and screened Escherichia coli overexpression and knockout libraries, finding that resistance to a range of 31 antibiotics could result from changing expression of a large and functionally diverse set of genes, in a primarily but not exclusively drug-specific manner. Second, by synthetically controlling the expression of single-drug and multidrug resistance genes, we observed that their fitness–expression functions changed dramatically under antibiotic treatment in accordance with a log-sensitivity relation. Thus, because many genes are nonoptimally expressed under antibiotic treatment, many regulatory mutations can contribute to resistance by altering expression and by activating latent defenses.
AbstractUnder the nearly neutral theory of molecular evolution, the proportion of effectively neutral mutations is expected to depend upon the effective population size (Ne). Here, we investigate whether this is the case across the genome of Drosophila melanogaster using polymorphism data from North American and African lines. We show that the ratio of the number of nonsynonymous and synonymous polymorphisms is negatively correlated to the number of synonymous polymorphisms, even when the nonindependence is accounted for. The relationship is such that the proportion of effectively neutral nonsynonymous mutations increases by ∼45% as Ne is halved. However, we also show that this relationship is steeper than expected from an independent estimate of the distribution of fitness effects from the site frequency spectrum. We investigate a number of potential explanations for this and show, using simulation, that this is consistent with a model of genetic hitchhiking: Genetic hitchhiking depresses diversity at neutral and weakly selected sites, but has little effect on the diversity of strongly selected sites.
AbstractPhylogeny estimation is difficult for closely related populations and species, especially if they have been exchanging genes. We present a hierarchical Bayesian, Markov-chain Monte Carlo method with a state space that includes all possible phylogenies in a full Isolation-with-Migration model framework. The method is based on a new type of genealogy augmentation called a “hidden genealogy” that enables efficient updating of the phylogeny. This is the first likelihood-based method to fully incorporate directional gene flow and genetic drift for estimation of a species or population phylogeny. Application to human hunter-gatherer populations from Africa revealed a clear phylogenetic history, with strong support for gene exchange with an unsampled ghost population, and relatively ancient divergence between a ghost population and modern human populations, consistent with human/archaic divergence. In contrast, a study of five chimpanzee populations reveals a clear phylogeny with several pairs of populations having exchanged DNA, but does not support a history with an unsampled ghost population.
AbstractAdaptive divergence between marine and freshwater (FW) environments is important in generating phyletic diversity within fishes, but the genetic basis of this process remains poorly understood. Genome selection scans can identify adaptive loci, but incomplete knowledge of genotype–phenotype connections makes interpreting their significance difficult. In contrast, association mapping (genome-wide association mapping [GWAS], random forest [RF] analyses) links genotype to phenotype, but offer limited insight into the evolutionary forces shaping variation. Here, we combined GWAS, RF, and selection scans to identify loci important in adaptation to FW environments. We utilized FW-native and brackish water (BW)-native populations of Atlantic killifish (Fundulus heteroclitus) as well as a naturally admixed population between the two. We measured morphology and multiple physiological traits that differ between populations and may contribute to osmotic adaptation (salinity tolerance, hypoxia tolerance, metabolic rate, body shape) and used a reduced representation approach for genome-wide genotyping. Our results show patterns of population divergence in physiological capabilities that are consistent with local adaptation. Population genomic scans between BW-native and FW-native populations identified genomic regions evolving by natural selection, whereas association mapping revealed loci that contribute to variation for each trait. There was substantial overlap in the genomic regions putatively under selection and loci associated with phenotypic traits, particularly for salinity tolerance, suggesting that these regions and genes are important for adaptive divergence between BW and FW environments. Together, these data provide insight into the mechanisms that enable diversification of fishes across osmotic boundaries.
AbstractCytolytic pore-forming proteins are widespread in living organisms, being mostly involved in both sides of the host–pathogen interaction, either contributing to the innate defense or promoting infection. In venomous organisms, such as spiders, insects, scorpions, and sea anemones, pore-forming proteins are often secreted as key components of the venom. Coluporins are pore-forming proteins recently discovered in the Mediterranean hematophagous snail Cumia reticulata (Colubrariidae), highly expressed in the salivary glands that discharge their secretion at close contact with the host. To understand their putative functional role, we investigated coluporins’ molecular diversity and evolutionary patterns. Coluporins is a well-diversified family including at least 30 proteins, with an overall low sequence similarity but sharing a remarkably conserved actinoporin-like predicted structure. Tracking the evolutionary history of the molluscan porin genes revealed a scattered distribution of this family, which is present in some other lineages of predatory gastropods, including venomous conoidean snails. Comparative transcriptomic analyses highlighted the expansion of porin genes as a lineage-specific feature of colubrariids. Coluporins seem to have evolved from a single ancestral porin gene present in the latest common ancestor of all Caenogastropoda, undergoing massive expansion and diversification in this colubrariid lineage through repeated gene duplication events paired with widespread episodic positive selection. As for other parasites, these findings are congruent with a “one-sided arms race,” equipping the parasite with multiple variants in order to broaden its host spectrum. Overall, our results pinpoint a crucial adaptive role for coluporins in the evolution of the peculiar trophic ecology of vampire snails.
AbstractVariola virus is at risk of re-emergence either through accidental release, bioterrorism, or synthetic biology. The use of phylogenetics and phylogeography to support epidemic field response is expected to grow as sequencing technology becomes miniaturized, cheap, and ubiquitous. In this study, we aimed to explore the use of common VARV diagnostic targets hemagglutinin (HA), cytokine response modifier B (CrmB), and A-type inclusion protein (ATI) for phylogenetic characterization as well as the representativeness of modelling strategies in phylogeography to support epidemic response should smallpox re-emerge. We used Bayesian discrete-trait phylogeography using the most complete data set currently available of whole genome (n = 51) and partially sequenced (n = 20) VARV isolates. We show that multilocus models combining HA, ATI, and CrmB genes may represent a useful heuristic to differentiate between VARV Major and subclades of VARV Minor which have been associated with variable case-fatality rates. Where whole genome sequencing is unavailable, phylogeography models of HA, ATI, and CrmB may provide preliminary but uncertain estimates of transmission, while supplementing whole genome models with additional isolates sequenced only for HA can improve sample representativeness, maintaining similar support for transmission relative to whole genome models. We have also provided empirical evidence delineating historic international VARV transmission using phylogeography. Due to the persistent threat of re-emergence, our results provide important research for smallpox epidemic preparedness in the posteradication era as recommended by the World Health Organisation.
AbstractGenes are “born,” and eventually they “die.” These processes shape the phenotypic evolution of organisms and are hence of great biological interest. If genes die in plants, they generally do so quite rapidly. Here, we describe the fate of GOA-like genes that evolve in a dramatically different manner. GOA-like genes belong to the subfamily of Bsister genes of MIKC-type MADS-box genes. Typical MIKC-type genes encode conserved transcription factors controlling plant development. We show that ABS-like genes, a clade of Bsister genes, are indeed highly conserved in crucifers (Brassicaceae) maintaining the ancestral function of Bsister genes in ovule and seed development. In contrast, their closest paralogs, the GOA-like genes, have been undergoing convergent gene death in Brassicaceae. Intriguingly, erosion of GOA-like genes occurred after millions of years of coexistence with ABS-like genes. We thus describe Delayed Convergent Asymmetric Degeneration, a so far neglected but possibly frequent pattern of duplicate gene evolution that does not fit classical scenarios. Delayed Convergent Asymmetric Degeneration of GOA-like genes may have been initiated by a reduction in the expression of an ancestral GOA-like gene in the stem group of Brassicaceae and driven by dosage subfunctionalization. Our findings have profound implications for gene annotations in genomics, interpreting patterns of gene evolution and using genes in phylogeny reconstructions of species.
Matthew Jobin, Haiko Schurz, and Brenna M. Henn
AbstractThe study of segmental duplications (SDs) and copy-number variants (CNVs) is of great importance in the fields of genomics and evolution. However, SDs and CNVs are usually excluded from genome-wide scans for natural selection. Because of high identity between copies, SDs and CNVs that are not included in reference genomes are prone to be collapsed—that is, mistakenly aligned to the same region—when aligning sequence data from single individuals to the reference. Such collapsed duplications are additionally challenging because concerted evolution between duplications alters their site frequency spectrum and linkage disequilibrium patterns. To investigate the potential effect of collapsed duplications upon natural selection scans we obtained expectations for four summary statistics from simulations of duplications evolving under a range of interlocus gene conversion and crossover rates. We confirm that summary statistics traditionally used to detect the action of natural selection on DNA sequences cannot be applied to SDs and CNVs since in some cases values for known duplications mimic selective signatures. As a proof of concept of the pervasiveness of collapsed duplications, we analyzed data from the 1,000 Genomes Project. We find that, within regions identified as variable in copy number, diversity between individuals with the duplication is consistently higher than between individuals without the duplication. Furthermore, the frequency of single nucleotide variants (SNVs) deviating from Hardy–Weinberg Equilibrium is higher in individuals with the duplication, which strongly suggests that higher diversity is a consequence of collapsed duplications and incorrect evaluation of SNVs within these CNV regions.
AbstractThe evolution of novel protein-coding genes from noncoding regions of the genome is one of the most compelling pieces of evidence for genetic innovations in nature. One popular approach to identify de novo genes is phylostratigraphy, which consists of determining the approximate time of origin (age) of a gene based on its distribution along a species phylogeny. Several studies have revealed significant flaws in determining the age of genes, including de novo genes, using phylostratigraphy alone. However, the rate of false positives in de novo gene surveys, based on phylostratigraphy, remains unknown. Here, I reanalyze the findings from three studies, two of which identified tens to hundreds of rodent-specific de novo genes adopting a phylostratigraphy-centered approach. Most putative de novo genes discovered in these investigations are no longer included in recently updated mouse gene sets. Using a combination of synteny information and sequence similarity searches, I show that ∼60% of the remaining 381 putative de novo genes share homology with genes from other vertebrates, originated through gene duplication, and/or share no synteny information with nonrodent mammals. These results led to an estimated rate of ∼12 de novo genes per million years in mouse. Contrary to a previous study (Wilson BA, Foy SG, Neme R, Masel J. 2017. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol. 1:0146), I found no evidence supporting the preadaptation hypothesis of de novo gene formation. Nearly half of the de novo genes confirmed in this study are within older genes, indicating that co-option of preexisting regulatory regions and a higher GC content may facilitate the origin of novel genes.
AbstractAlthough Tibetans and Sherpa present several physiological adjustments evolved to cope with selective pressures imposed by the high-altitude environment, especially hypobaric hypoxia, few selective sweeps at a limited number of hypoxia related genes were confirmed by multiple genomic studies. Nevertheless, variants at these loci were found to be associated only with downregulation of the erythropoietic cascade, which represents an indirect aspect of the considered adaptive phenotype. Accordingly, the genetic basis of Tibetan/Sherpa adaptive traits remains to be fully elucidated, in part due to limitations of selection scans implemented so far and mostly relying on the hard sweep model.In order to overcome this issue, we used whole-genome sequence data and several selection statistics as input for gene network analyses aimed at testing for the occurrence of polygenic adaptation in these high-altitude Himalayan populations. Being able to detect also subtle genomic signatures ascribable to weak positive selection at multiple genes of the same functional subnetwork, this approach allowed us to infer adaptive evolution at loci individually showing small effect sizes, but belonging to highly interconnected biological pathways overall involved in angiogenetic processes.Therefore, these findings pinpointed a series of selective events neglected so far, which likely contributed to the augmented tissue blood perfusion observed in Tibetans and Sherpa, thus uncovering the genetic determinants of a key biological mechanism that underlies their adaptation to high altitude.
AbstractMoraxella catarrhalis is a human-adapted pathogen, and a major cause of otitis media (OM) and exacerbations of chronic obstructive pulmonary disease. The species is comprised of two main phylogenetic lineages, RB1 and RB2/3. Restriction–modification (R-M) systems are among the few lineage-associated genes identified in other bacterial genera and have multiple functions including defense against foreign invading DNA, maintenance of speciation, and epigenetic regulation of gene expression. Here, we define the repertoire of R-M systems in 51 publicly available M. catarrhalis genomes and report their distribution among M. catarrhalis phylogenetic lineages. An association with phylogenetic lineage (RB1 or RB2/3) was observed for six R-M systems, which may contribute to the evolution of the lineages by restricting DNA transformation. In addition, we observed a relationship between a mutually exclusive Type I R-M system and a Type III R-M system at a single locus conserved throughout a geographically and clinically diverse set of M. catarrhalis isolates. The Type III R-M system at this locus contains the phase-variable Type III DNA methyltransferase, modM, which controls a phasevarion (phase-variable regulon). We observed an association between modM presence and OM-associated middle ear isolates, indicating a potential role for ModM-mediated epigenetic regulation in OM pathobiology.
AbstractGenomic data have provided evidence of previously unknown ancient whole genome duplications (WGDs) and highlighted the role of WGDs in the evolution of many eukaryotic lineages. Ancient WGDs often are detected by examining distributions of synonymous substitutions per site (Ks) within a genome, or “Ks plots.” For example, WGDs can be detected from Ks plots by using univariate mixture models to identify peaks in Ks distributions. We performed gene family simulation experiments to evaluate the effects of different Ks estimation methods and mixture models on our ability to detect ancient WGDs from Ks plots. The simulation experiments, which accounted for variation in substitution rates and gene duplication and loss rates across gene families, tested the effects of WGD age and gene retention rates following WGD on inferring WGDs from Ks plots. Our simulations reveal limitations of Ks plot analyses. Strict interpretations of mixture model analyses often overestimate the number of WGD events, and Ks plot analyses typically fail to detect WGDs when ≤10% of the duplicated genes are retained following the WGD. However, WGDs can accurately be characterized over an intermediate range of Ks. The simulation results are supported by empirical analyses of transcriptomic data, which also suggest that biases in gene retention likely affect our ability to detect ancient WGDs. Although our results indicate mixture model results should be interpreted with great caution, using node-averaged Ks estimates and applying more appropriate mixture models can improve the accuracy of detecting WGDs.
AbstractThermosipho species inhabit thermal environments such as marine hydrothermal vents, petroleum reservoirs, and terrestrial hot springs. A 16S rRNA phylogeny of available Thermosipho spp. sequences suggested habitat specialists adapted to living in hydrothermal vents only, and habitat generalists inhabiting oil reservoirs, hydrothermal vents, and hotsprings. Comparative genomics of 15 Thermosipho genomes separated them into three distinct species with different habitat distributions: The widely distributed T. africanus and the more specialized, T. melanesiensis and T. affectus. Moreover, the species can be differentiated on the basis of genome size (GS), genome content, and immune system composition. For instance, the T. africanus genomes are largest and contained the most carbohydrate metabolism genes, which could explain why these isolates were obtained from ecologically more divergent habitats. Nonetheless, all the Thermosipho genomes, like other Thermotogae genomes, show evidence of genome streamlining. GS differences between the species could further be correlated to differences in defense capacities against foreign DNA, which influence recombination via HGT. The smallest genomes are found in T. affectus that contain both CRISPR-cas Type I and III systems, but no RM system genes. We suggest that this has caused these genomes to be almost devoid of mobile elements, contrasting the two other species genomes that contain a higher abundance of mobile elements combined with different immune system configurations. Taken together, the comparative genomic analyses of Thermosipho spp. revealed genetic variation allowing habitat differentiation within the genus as well as differentiation with respect to invading mobile DNA.
AbstractThe colonization of novel environments often involves changes in gene expression, protein coding sequence, or both. Studies of how populations adapt to novel conditions, however, often focus on only one of these two processes, potentially missing out on the relative importance of different parts of the evolutionary process. In this study, our objectives were 1) to better understand the qualitative concordance between conclusions drawn from analyses of differential expression and changes in genic sequence and 2) to quantitatively test whether differentially expressed genes were enriched for sites putatively under positive selection within gene regions. To achieve this, we compared populations of fish (Poecilia mexicana) that have independently adapted to hydrogen-sulfide-rich environments in southern Mexico to adjacent populations residing in nonsulfidic waters. Specifically, we used RNA-sequencing data to compare both gene expression and DNA sequence differences between populations. Analyzing these two different data types led to similar conclusions about which biochemical pathways (sulfide detoxification and cellular respiration) were involved in adaptation to sulfidic environments. Additionally, we found a greater overlap between genes putatively under selection and differentially expressed genes than expected by chance. We conclude that considering both differential expression and changes in DNA sequence led to a more comprehensive understanding of how these populations adapted to extreme environmental conditions. Our results imply that changes in both gene expression and DNA sequence—sometimes at the same loci—may be involved in adaptation.