Molecular Biology and Evolution, Volume 37, Issue 11, November 2020, Pages 3292–3307, https://doi.org/10.1093/molbev/msaa139
Molecular Biology and Evolution, msaa276, https://doi.org/10.1093/molbev/msaa276
Molecular Biology and Evolution, msab028, https://doi.org/10.1093/molbev/msab028
Molecular Biology and Evolution, msab122, https://doi.org/10.1093/molbev/msab122
AbstractMethods for evaluating the quality of genomic and metagenomic data are essential to aid genome assembly procedures and to correctly interpret the results of subsequent analyses. BUSCO estimates the completeness and redundancy of processed genomic data based on universal single-copy orthologs. Here, we present new functionalities and major improvements of the BUSCO software, as well as the renewal and expansion of the underlying data sets in sync with the OrthoDB v10 release. Among the major novelties, BUSCO now enables phylogenetic placement of the input sequence to automatically select the most appropriate BUSCO data set for the assessment, allowing the analysis of metagenome-assembled genomes of unknown origin. A newly introduced genome workflow increases the efficiency and runtimes especially on large eukaryotic genomes. BUSCO is the only tool capable of assessing both eukaryotic and prokaryotic species, and can be applied to various data types, from genome assemblies and metagenomic bins, to transcriptomes and gene sets.
AbstractVestimentiferan tubeworms are iconic animals that present as large habitat-forming chitinized tube bushes in deep-sea chemosynthetic ecosystems. They are gutless and depend entirely on their endosymbiotic sulfide-oxidizing chemoautotrophic bacteria for nutrition. Information on the genomes of several siboglinid endosymbionts has improved our understanding of their nutritional supplies. However, the interactions between tubeworms and their endosymbionts remain largely unclear due to a paucity of host genomes. Here, we report the chromosome-level genome of the vestimentiferan tubeworm Paraescarpia echinospica. We found that the genome has been remodeled to facilitate symbiosis through the expansion of gene families related to substrate transfer and innate immunity, suppression of apoptosis, regulation of lysosomal digestion, and protection against oxidative stress. Furthermore, the genome encodes a programmed cell death pathway that potentially controls the endosymbiont population. Our integrated genomic, transcriptomic, and proteomic analyses uncovered matrix proteins required for the formation of the chitinous tube and revealed gene family expansion and co-option as evolutionary mechanisms driving the acquisition of this unique supporting structure for deep-sea tubeworms. Overall, our study provides novel insights into the host’s support system that has enabled tubeworms to establish symbiosis, thrive in deep-sea hot vents and cold seeps, and produce the unique chitinous tubes in the deep sea.
AbstractMicroorganisms have the unique ability to survive extended periods of time in environments with extremely low levels of exploitable energy. To determine the extent that energy limitation affects microbial evolution, we examined the molecular evolutionary dynamics of a phylogenetically diverse set of taxa over the course of 1,000 days. We found that periodic exposure to energy limitation affected the rate of molecular evolution, the accumulation of genetic diversity, and the rate of extinction. We then determined the degree that energy limitation affected the spectrum of mutations as well as the direction of evolution at the gene level. Our results suggest that the initial depletion of energy altered the direction and rate of molecular evolution within each taxon, though after the initial depletion the rate and direction did not substantially change. However, this consistent pattern became diminished when comparisons were performed across phylogenetically distant taxa, suggesting that although the dynamics of molecular evolution under energy limitation are highly generalizable across the microbial tree of life, the targets of adaptation are specific to a given taxon.
AbstractSensory systems are attractive evolutionary models to address how organisms adapt to local environments that can cause ecological speciation. However, tests of these evolutionary models have focused on visual, auditory, and olfactory senses. Here, we show local adaptation of bitter taste receptor genes in two neighboring populations of a wild mammal—the blind mole rat Spalax galili—that show ecological speciation in divergent soil environments. We found that basalt-type bitter receptors showed higher response intensity and sensitivity compared with chalk-type ones using both genetic and cell-based functional analyses. Such functional changes could help animals adapted to basalt soil select plants with less bitterness from diverse local foods, whereas a weaker reception to bitter taste may allow consumption of a greater range of plants for animals inhabiting chalk soil with a scarcity of food supply. Our study shows divergent selection on food resources through local adaptation of bitter receptors, and suggests that taste plays an important yet underappreciated role in speciation.
AbstractThe date palm, Phoenix dactylifera, has been a cornerstone of Middle Eastern and North African agriculture for millennia. It was first domesticated in the Persian Gulf, and its evolution appears to have been influenced by gene flow from two wild relatives, P. theophrasti, currently restricted to Crete and Turkey, and P. sylvestris, widespread from Bangladesh to the West Himalayas. Genomes of ancient date palm seeds show that gene flow from P. theophrasti to P. dactylifera may have occurred by ∼2,200 years ago, but traces of P. sylvestris could not be detected. We here integrate archeogenomics of a ∼2,100-year-old P. dactylifera leaf from Saqqara (Egypt), molecular-clock dating, and coalescence approaches with population genomic tests, to probe the hybridization between the date palm and its two closest relatives and provide minimum and maximum timestamps for its reticulated evolution. The Saqqara date palm shares a close genetic affinity with North African date palm populations, and we find clear genomic admixture from both P. theophrasti, and P. sylvestris, indicating that both had contributed to the date palm genome by 2,100 years ago. Molecular-clocks placed the divergence of P. theophrasti from P. dactylifera/P. sylvestris and that of P. dactylifera from P. sylvestris in the Upper Miocene, but strongly supported, conflicting topologies point to older gene flow between P. theophrasti and P. dactylifera, and P. sylvestris and P. dactylifera. Our work highlights the ancient hybrid origin of the date palms, and prompts the investigation of the functional significance of genetic material introgressed from both close relatives, which in turn could prove useful for modern date palm breeding.
Mutation–selection phylogenetic codon models are grounded on population genetics first principles and represent a principled approach for investigating the intricate interplay between mutation, selection, and drift. In their current form, mutation–selection codon models are entirely characterized by the collection of site-specific amino-acid fitness profiles. However, thus far, they have relied on the assumption of a constant genetic drift, translating into a unique effective population size (Ne) across the phylogeny, clearly an unrealistic assumption. This assumption can be alleviated by introducing variation in Ne between lineages. In addition to Ne, the mutation rate (μ) is susceptible to vary between lineages, and both should covary with life-history traits (LHTs). This suggests that the model should more globally account for the joint evolutionary process followed by all of these lineage-specific variables (Ne, μ, and LHTs). In this direction, we introduce an extended mutation–selection model jointly reconstructing in a Bayesian Monte Carlo framework the fitness landscape across sites and long-term trends in Ne, μ, and LHTs along the phylogeny, from an alignment of DNA coding sequences and a matrix of observed LHTs in extant species. The model was tested against simulated data and applied to empirical data in mammals, isopods, and primates. The reconstructed history of Ne in these groups appears to correlate with LHTs or ecological variables in a way that suggests that the reconstruction is reasonable, at least in its global trends. On the other hand, the range of variation in Ne inferred across species is surprisingly narrow. This last point suggests that some of the assumptions of the model, in particular concerning the assumed absence of epistatic interactions between sites, are potentially problematic.
AbstractAnimals evolved a broad repertoire of innate immune sensors and downstream effector cascades for defense against RNA viruses. Yet, this system varies greatly among different bilaterian animals, masking its ancestral state. In this study, we aimed to characterize the antiviral immune response of the cnidarian Nematostella vectensis and decipher the function of the retinoic acid-inducible gene I (RIG-I)-like receptors (RLRs) known to detect viral double-stranded RNA (dsRNA) in bilaterians but activate different antiviral pathways in vertebrates and nematodes. We show that polyinosinic:polycytidylic acid (poly(I:C)), a mimic of long viral dsRNA and a primary ligand for the vertebrate RLR melanoma differentiation-associated protein 5 (MDA5), triggers a complex antiviral immune response bearing features distinctive for both vertebrate and invertebrate systems. Importantly, a well-characterized agonist of the vertebrate RIG-I receptor does not induce a significant transcriptomic response that bears signature of the antiviral immune response, which experimentally supports the results of a phylogenetic analysis indicating clustering of the two N. vectensis RLR paralogs (NveRLRa and NveRLRb) with MDA5. Furthermore, the results of affinity assays reveal that NveRLRb binds poly(I:C) and long dsRNA and its knockdown impairs the expression of putative downstream effector genes including RNA interference components. Our study provides for the first time the functional evidence for the conserved role of RLRs in initiating immune response to dsRNA that originated before the cnidarian–bilaterian split and lay a strong foundation for future research on the evolution of the immune responses to RNA viruses.
AbstractThe fitness cost of complex pleiotropic mutations is generally difficult to assess. On the one hand, it is necessary to identify which molecular properties are directly altered by the mutation. On the other, this alteration modifies the activity of many genetic targets with uncertain consequences. Here, we examine the possibility of addressing these challenges by identifying unique predictors of these costs. To this aim, we consider mutations in the RNA polymerase (RNAP) in Escherichia coli as a model of complex mutations. Changes in RNAP modify the global program of transcriptional regulation, with many consequences. Among others is the difficulty to decouple the direct effect of the mutation from the response of the whole system to such mutation. A problem that we solve quantitatively with data of a set of constitutive genes, those on which the global program acts most directly. We provide a statistical framework that incorporates the direct effects and other molecular variables linked to this program as predictors, which leads to the identification that some genes are more suitable to determine costs than others. Therefore, we not only identified which molecular properties best anticipate fitness, but we also present the paradoxical result that, despite pleiotropy, specific genes serve as more solid predictors. These results have connotations for the understanding of the architecture of robustness in biological systems.
AbstractUV irradiation induces the formation of cyclobutane pyrimidine dimers (CPDs) and 6-4 photoproducts in DNA. These two types of lesions can be directly photorepaired by CPD photolyases and 6-4 photolyases, respectively. Recently, a new class of 6-4 photolyases named iron–sulfur bacterial cryptochromes and photolyases (FeS-BCPs) were found, which were considered as the ancestors of all photolyases and their homologs—cryptochromes. However, a controversy exists regarding 6-4 photoproducts only constituting ∼10–30% of the total UV-induced lesions that primordial organisms would hardly survive without a CPD repair enzyme. By extensive phylogenetic analyses, we identified a novel class of proteins, all from eubacteria. They have relatively high similarity to class I/III CPD photolyases, especially in the putative substrate-binding and FAD-binding regions. However, these proteins are shorter, and they lack the “N-terminal α/β domain” of normal photolyases. Therefore, we named them short photolyase-like. Nevertheless, similar to FeS-BCPs, some of short photolyase-likes also contain four conserved cysteines, which may also coordinate an iron–sulfur cluster as FeS-BCPs. A member from Rhodococcus fascians was cloned and expressed. It was demonstrated that the protein contains a FAD cofactor and an iron–sulfur cluster, and has CPD repair activity. It was speculated that this novel class of photolyases may be the real ancestors of the cryptochrome/photolyase family.
AbstractBacteriophages and bacterial toxins are promising antibacterial agents to treat infections caused by multidrug-resistant (MDR) bacteria. In fact, bacteriophages have recently been successfully used to treat life-threatening infections caused by MDR bacteria (Schooley RT, Biswas B, Gill JJ, Hernandez-Morales A, Lancaster J, Lessor L, Barr JJ, Reed SL, Rohwer F, Benler S, et al. 2017. Development and use of personalized bacteriophage-based therapeutic cocktails to treat a patient with a disseminated resistant Acinetobacter baumannii infection. Antimicrob Agents Chemother. 61(10); Chan BK, Turner PE, Kim S, Mojibian HR, Elefteriades JA, Narayan D. 2018. Phage treatment of an aortic graft infected with Pseudomonas aeruginosa. Evol Med Public Health. 2018(1):60–66; Petrovic Fabijan A, Lin RCY, Ho J, Maddocks S, Ben Zakour NL, Iredell JR, Westmead Bacteriophage Therapy Team. 2020. Safety of bacteriophage therapy in severe Staphylococcus aureus infection. Nat Microbiol. 5(3):465–472). One potential problem with using these antibacterial agents is the evolution of resistance against them in the long term. Here, we studied the fitness landscape of the Escherichia coli TolC protein, an outer membrane efflux protein that is exploited by a pore forming toxin called colicin E1 and by TLS phage (Pagie L, Hogeweg P. 1999. Colicin diversity: a result of eco-evolutionary dynamics. J Theor Biol. 196(2):251–261; Andersen C, Hughes C, Koronakis V. 2000. Chunnel vision. Export and efflux through bacterial channel-tunnels. EMBO Rep. 1(4):313–318; Koronakis V, Andersen C, Hughes C. 2001. Channel-tunnels. Curr Opin Struct Biol. 11(4):403–407; Czaran TL, Hoekstra RF, Pagie L. 2002. Chemical warfare between microbes promotes biodiversity. Proc Natl Acad Sci U S A. 99(2):786–790; Cascales E, Buchanan SK, Duché D, Kleanthous C, Lloubès R, Postle K, Riley M, Slatin S, Cavard D. 2007. Colicin biology. Microbiol Mol Biol Rev. 71(1):158–229). By systematically assessing the distribution of fitness effects of ∼9,000 single amino acid replacements in TolC using either positive (antibiotics and bile salts) or negative (colicin E1 and TLS phage) selection pressures, we quantified evolvability of the TolC. We demonstrated that the TolC is highly optimized for the efflux of antibiotics and bile salts. In contrast, under colicin E1 and TLS phage selection, TolC sequence is very sensitive to mutations. Finally, we have identified a large set of mutations in TolC that increase resistance of E. coli against colicin E1 or TLS phage without changing antibiotic susceptibility of bacterial cells. Our findings suggest that TolC is a highly evolvable target under negative selection which may limit the potential clinical use of bacteriophages and bacterial toxins if evolutionary aspects are not taken into account.
AbstractEmergence of resistant bacteria during antimicrobial treatment is one of the most critical and universal health threats. It is known that several stress-induced mutagenesis and heteroresistance mechanisms can enhance microbial adaptation to antibiotics. Here, we demonstrate that the pathogen Bartonella can undergo stress-induced mutagenesis despite the fact it lacks error-prone polymerases, the rpoS gene and functional UV-induced mutagenesis. We demonstrate that Bartonella acquire de novo single mutations during rifampicin exposure at suprainhibitory concentrations at a much higher rate than expected from spontaneous fluctuations. This is while exhibiting a minimal heteroresistance capacity. The emerged resistant mutants acquired a single rpoB mutation, whereas no other mutations were found in their whole genome. Interestingly, the emergence of resistance in Bartonella occurred only during gradual exposure to the antibiotic, indicating that Bartonella sense and react to the changing environment. Using a mathematical model, we demonstrated that, to reproduce the experimental results, mutation rates should be transiently increased over 1,000-folds, and a larger population size or greater heteroresistance capacity is required. RNA expression analysis suggests that the increased mutation rate is due to downregulation of key DNA repair genes (mutS, mutY, and recA), associated with DNA breaks caused by massive prophage inductions. These results provide new evidence of the hazard of antibiotic overuse in medicine and agriculture.
AbstractThe frameshift hypothesis is a widely accepted model of bird wing evolution. This hypothesis postulates a shift in positional values, or molecular-developmental identity, that caused a change in digit phenotype. The hypothesis synthesized developmental and paleontological data on wing digit homology. The “most anterior digit” (MAD) hypothesis presents an alternative view based on changes in transcriptional regulation in the limb. The molecular evidence for both hypotheses is that the MAD expresses Hoxd13 but not Hoxd11 and Hoxd12. This digit I “signature” is thought to characterize all amniotes. Here, we studied Hoxd expression patterns in a phylogenetic sample of 18 amniotes. Instead of a conserved molecular signature in digit I, we find wide variation of Hoxd11, Hoxd12, and Hoxd13 expression in digit I. Patterns of apoptosis, and Sox9 expression, a marker of the phalanx-forming region, suggest that phalanges were lost from wing digit IV because of early arrest of the phalanx-forming region followed by cell death. Finally, we show that multiple amniote lineages lost phalanges with no frameshift. Our findings suggest that the bird wing evolved by targeted loss of phalanges under selection. Consistent with our view, some recent phylogenies based on dinosaur fossils eliminate the need to postulate a frameshift in the first place. We suggest that the phenotype of the Archaeopteryx lithographica wing is also consistent with phalanx loss. More broadly, our results support a gradualist model of evolution based on tinkering with developmental gene expression.
AbstractDissecting the link between genetic variation and adaptive phenotypes provides outstanding opportunities to understand fundamental evolutionary processes. Here, we use a museomics approach to investigate the genetic basis and evolution of winter coat coloration morphs in least weasels (Mustela nivalis), a repeated adaptation for camouflage in mammals with seasonal pelage color moults across regions with varying winter snow. Whole-genome sequence data were obtained from biological collections and mapped onto a newly assembled reference genome for the species. Sampling represented two replicate transition zones between nivalis and vulgaris coloration morphs in Europe, which typically develop white or brown winter coats, respectively. Population analyses showed that the morph distribution across transition zones is not a by-product of historical structure. Association scans linked a 200-kb genomic region to coloration morph, which was validated by genotyping museum specimens from intermorph experimental crosses. Genotyping the wild populations narrowed down the association to pigmentation gene MC1R and pinpointed a candidate amino acid change cosegregating with coloration morph. This polymorphism replaces an ancestral leucine residue by lysine at the start of the first extracellular loop of the protein in the vulgaris morph. A selective sweep signature overlapped the association region in vulgaris, suggesting that past adaptation favored winter-brown morphs and can anchor future adaptive responses to decreasing winter snow. Using biological collections as valuable resources to study natural adaptations, our study showed a new evolutionary route generating winter color variation in mammals and that seasonal camouflage can be modulated by changes at single key genes.
AbstractUnderstanding the evolutionary history of crops, including identifying wild relatives, helps to provide insight for conservation and crop breeding efforts. Cultivated Brassica oleracea has intrigued researchers for centuries due to its wide diversity in forms, which include cabbage, broccoli, cauliflower, kale, kohlrabi, and Brussels sprouts. Yet, the evolutionary history of this species remains understudied. With such different vegetables produced from a single species, B. oleracea is a model organism for understanding the power of artificial selection. Persistent challenges in the study of B. oleracea include conflicting hypotheses regarding domestication and the identity of the closest living wild relative. Using newly generated RNA-seq data for a diversity panel of 224 accessions, which represents 14 different B. oleracea crop types and nine potential wild progenitor species, we integrate phylogenetic and population genetic techniques with ecological niche modeling, archaeological, and literary evidence to examine relationships among cultivars and wild relatives to clarify the origin of this horticulturally important species. Our analyses point to the Aegean endemic B. cretica as the closest living relative of cultivated B. oleracea, supporting an origin of cultivation in the Eastern Mediterranean region. Additionally, we identify several feral lineages, suggesting that cultivated plants of this species can revert to a wild-like state with relative ease. By expanding our understanding of the evolutionary history in B. oleracea, these results contribute to a growing body of knowledge on crop domestication that will facilitate continued breeding efforts including adaptation to changing environmental conditions.
AbstractThe Peranakan Chinese are culturally unique descendants of immigrants from China who settled in the Malay Archipelago ∼300–500 years ago. Today, among large communities in Southeast Asia, the Peranakans have preserved Chinese traditions with strong influence from the local indigenous Malays. Yet, whether or to what extent genetic admixture co-occurred with the cultural mixture has been a topic of ongoing debate. We performed whole-genome sequencing (WGS) on 177 Singapore (SG) Peranakans and analyzed the data jointly with WGS data of Asian and European populations. We estimated that Peranakan Chinese inherited ∼5.62% (95% confidence interval [CI]: 4.76–6.49%) Malay ancestry, much higher than that in SG Chinese (1.08%, 0.65–1.51%), southern Chinese (0.86%, 0.50–1.23%), and northern Chinese (0.25%, 0.18–0.32%). A sex-biased admixture history, in which the Malay ancestry was contributed primarily by females, was supported by X chromosomal variants, and mitochondrial (MT) and Y haplogroups. Finally, we identified an ancient admixture event shared by Peranakan Chinese and SG Chinese ∼1,612 (95% CI: 1,345–1,923) years ago, coinciding with the settlement history of Han Chinese in southern China, apart from the recent admixture event with Malays unique to Peranakan Chinese ∼190 (159–213) years ago. These findings greatly advance our understanding of the dispersal history of Chinese and their interaction with indigenous populations in Southeast Asia.
AbstractMajor changes in chromosome number and structure are linked to a series of evolutionary phenomena, including intrinsic barriers to gene flow or suppression of recombination due to chromosomal rearrangements. However, chromosome rearrangements can also affect the fundamental dynamics of molecular evolution within populations by changing relationships between linked loci and altering rates of recombination. Here, we build chromosome-level assembly Eueides isabella and, together with a recent chromosome-level assembly of Dryas iulia, examine the evolutionary consequences of multiple chromosome fusions in Heliconius butterflies. These assemblies pinpoint fusion points on 10 of the 20 autosomal chromosomes and reveal striking differences in the characteristics of fused and unfused chromosomes. The ten smallest autosomes in D. iulia and E. isabella, which have each fused to a longer chromosome in Heliconius, have higher repeat and GC content, and longer introns than predicted by their chromosome length. When fused, these characteristics change to become more in line with chromosome length. The fusions also led to reduced diversity, which likely reflects increased background selection and selection against introgression between diverging populations, following a reduction in per-base recombination rate. We further show that chromosome size and fusion impact turnover rates of functional loci at a macroevolutionary scale. Together these results provide further evidence that chromosome fusion in Heliconius likely had dramatic effects on population level processes shaping rates of neutral and adaptive divergence. These effects may have impacted patterns of diversification in Heliconius, a classic example of an adaptive radiation.
AbstractTo investigate novel patterns and processes of protein evolution, we have focused in the metallothioneins (MTs), a singular group of metal-binding, cysteine-rich proteins that, due to their high degree of sequence diversity, still represents a “black hole” in Evolutionary Biology. We have identified and analyzed more than 160 new MTs in nonvertebrate chordates (especially in 37 species of ascidians, 4 thaliaceans, and 3 appendicularians) showing that prototypic tunicate MTs are mono-modular proteins with a pervasive preference for cadmium ions, whereas vertebrate and cephalochordate MTs are bimodular proteins with diverse metal preferences. These structural and functional differences imply a complex evolutionary history of chordate MTs—including de novo emergence of genes and domains, processes of convergent evolution, events of gene gains and losses, and recurrent amplifications of functional domains—that would stand for an unprecedented case in the field of protein evolution.
AbstractGenetic variation is the raw material upon which selection acts. The majority of environmental conditions change over time and therefore may result in variable selective effects. How temporally fluctuating environments impact the distribution of fitness effects and in turn population diversity is an unresolved question in evolutionary biology. Here, we employed continuous culturing using chemostats to establish environments that switch periodically between different nutrient limitations and compared the dynamics of selection to static conditions. We used the pooled Saccharomyces cerevisiae haploid gene deletion collection as a synthetic model for populations comprising thousands of unique genotypes. Using barcode sequencing, we find that static environments are uniquely characterized by a small number of high-fitness genotypes that rapidly dominate the population leading to dramatic decreases in genetic diversity. By contrast, fluctuating environments are enriched in genotypes with neutral fitness effects and an absence of extreme fitness genotypes contributing to the maintenance of genetic diversity. We also identified a unique class of genotypes whose frequencies oscillate sinusoidally with a period matching the environmental fluctuation. Oscillatory behavior corresponds to large differences in short-term fitness that are not observed across long timescales pointing to the importance of balancing selection in maintaining genetic diversity in fluctuating environments. Our results are consistent with a high degree of environmental specificity in the distribution of fitness effects and the combined effects of reduced and balancing selection in maintaining genetic diversity in the presence of variable selection.
AbstractReconstructing the histories of complex adaptations and identifying the evolutionary mechanisms underlying their origins are two of the primary goals of evolutionary biology. Taricha newts, which contain high concentrations of the deadly toxin tetrodotoxin (TTX) as an antipredator defense, have evolved resistance to self-intoxication, which is a complex adaptation requiring changes in six paralogs of the voltage-gated sodium channel (Nav) gene family, the physiological target of TTX. Here, we reconstruct the origins of TTX self-resistance by sequencing the entire Nav gene family in newts and related salamanders. We show that moderate TTX resistance evolved early in the salamander lineage in three of the six Nav paralogs, preceding the proposed appearance of tetrodotoxic newts by ∼100 My. TTX-bearing newts possess additional unique substitutions across the entire Nav gene family that provide physiological TTX resistance. These substitutions coincide with signatures of positive selection and relaxed purifying selection, as well as gene conversion events, that together likely facilitated their evolution. We also identify a novel exon duplication within Nav1.4 encoding an expressed TTX-binding site. Two resistance-conferring changes within newts appear to have spread via nonallelic gene conversion: in one case, one codon was copied between paralogs, and in the second, multiple substitutions were homogenized between the duplicate exons of Nav1.4. Our results demonstrate that gene conversion can accelerate the coordinated evolution of gene families in response to a common selection pressure.
AbstractUnderstanding the drivers of spatial patterns of genomic diversity has emerged as a major goal of evolutionary genetics. The flexibility of forward-time simulation makes it especially valuable for these efforts, allowing for the simulation of arbitrarily complex scenarios in a way that mimics how real populations evolve. Here, we present Geonomics, a Python package for performing complex, spatially explicit, landscape genomic simulations with full spatial pedigrees that dramatically reduces user workload yet remains customizable and extensible because it is embedded within a popular, general-purpose language. We show that Geonomics results are consistent with expectations for a variety of validation tests based on classic models in population genetics and then demonstrate its utility and flexibility with a trio of more complex simulation scenarios that feature polygenic selection, selection on multiple traits, simulation on complex landscapes, and nonstationary environmental change. We then discuss runtime, which is primarily sensitive to landscape raster size, memory usage, which is primarily sensitive to maximum population size and recombination rate, and other caveats related to the model’s methods for approximating recombination and movement. Taken together, our tests and demonstrations show that Geonomics provides an efficient and robust platform for population genomic simulations that capture complex spatial and evolutionary dynamics.
AbstractHow consistent are the evolutionary trajectories of sex chromosomes shortly after they form? Insights into the evolution of recombination, differentiation, and degeneration can be provided by comparing closely related species with homologous sex chromosomes. The sex chromosomes of the threespine stickleback (Gasterosteus aculeatus) and its sister species, the Japan Sea stickleback (G. nipponicus), have been well characterized. Little is known, however, about the sex chromosomes of their congener, the blackspotted stickleback (G. wheatlandi). We used pedigrees to obtain experimentally phased whole genome sequences from blackspotted stickleback X and Y chromosomes. Using multispecies gene trees and analysis of shared duplications, we demonstrate that Chromosome 19 is the ancestral sex chromosome and that its oldest stratum evolved in the common ancestor of the genus. After the blackspotted lineage diverged, its sex chromosomes experienced independent and more extensive recombination suppression, greater X–Y differentiation, and a much higher rate of Y degeneration than the other two species. These patterns may result from a smaller effective population size in the blackspotted stickleback. A recent fusion between the ancestral blackspotted stickleback Y chromosome and Chromosome 12, which produced a neo-X and neo-Y, may have been favored by the very small size of the recombining region on the ancestral sex chromosome. We identify six strata on the ancestral and neo-sex chromosomes where recombination between the X and Y ceased at different times. These results confirm that sex chromosomes can evolve large differences within and between species over short evolutionary timescales.
AbstractLivestock farming across the world is constantly threatened by the evolutionary turnover of foot-and-mouth disease virus (FMDV) strains in endemic systems, the underlying dynamics of which remain to be elucidated. Here, we map the eco-evolutionary landscape of cocirculating FMDV lineages within an important endemic virus pool encompassing Western, Central, and parts of Southern Asia, reconstructing the evolutionary history and spatial dynamics over the last 20 years that shape the current epidemiological situation. We demonstrate that new FMDV variants periodically emerge from Southern Asia, precipitating waves of virus incursions that systematically travel in a westerly direction. We evidence how metapopulation dynamics drive the emergence and extinction of spatially structured virus populations, and how transmission in different host species regulates the evolutionary space of virus serotypes. Our work provides the first integrative framework that defines coevolutionary signatures of FMDV in regional contexts to help understand the complex interplay between virus phenotypes, host characteristics, and key epidemiological determinants of transmission that drive FMDV evolution in endemic settings.
AbstractTransposable elements (TE) are an important source of genetic variation with a dynamic and content that greatly differ in a wide range of species. The origin of the intraspecific content variation is not always clear and little is known about the precise nature of it. Here, we surveyed the species-wide content of the Ty LTR-retrotransposons in a broad collection of 1,011 Saccharomyces cerevisiae natural isolates to understand what can stand behind the variation of the repertoire that is the type and number of Ty elements. We have compiled an exhaustive catalog of all the TE sequence variants present in the S. cerevisiae species by identifying a large set of new sequence variants. The characterization of the TE content in each isolate clearly highlighted that each subpopulation exhibits a unique and specific repertoire, retracing the evolutionary history of the species. Most interestingly, we have shown that ancient interspecific hybridization events had a major impact in the birth of new sequence variants and therefore in the shaping of the TE repertoires. We also investigated the transpositional activity of these elements in a large set of natural isolates, and we found a broad variability related to the level of ploidy as well as the genetic background. Overall, our results pointed out that the evolution of the Ty content is deeply impacted by clade-specific events such as introgressions and therefore follows the population structure. In addition, our study lays the foundation for future investigations to better understand the transpositional regulation and more broadly the TE–host interactions.
AbstractA key component of pathogen-specific adaptive immunity in vertebrates is the presentation of pathogen-derived antigenic peptides by major histocompatibility complex (MHC) molecules. The excessive polymorphism observed at MHC genes is widely presumed to result from the need to recognize diverse pathogens, a process called pathogen-driven balancing selection. This process assumes that pathogens differ in their peptidomes—the pool of short peptides derived from the pathogen’s proteome—so that different pathogens select for different MHC variants with distinct peptide-binding properties. Here, we tested this assumption in a comprehensive data set of 51.9 Mio peptides, derived from the peptidomes of 36 representative human pathogens. Strikingly, we found that 39.7% of the 630 pairwise comparisons among pathogens yielded not a single shared peptide and only 1.8% of pathogen pairs shared more than 1% of their peptides. Indeed, 98.8% of all peptides were unique to a single pathogen species. Using computational binding prediction to characterize the binding specificities of 321 common human MHC class-I variants, we investigated quantitative differences among MHC variants with regard to binding peptides from distinct pathogens. Our analysis showed signatures of specialization toward specific pathogens especially by MHC variants with narrow peptide-binding repertoires. This supports the hypothesis that such fastidious MHC variants might be maintained in the population because they provide an advantage against particular pathogens. Overall, our results establish a key selection factor for the excessive allelic diversity at MHC genes observed in natural populations and illuminate the evolution of variable peptide-binding repertoires among MHC variants.
AbstractMost empirical studies of linkage disequilibrium (LD) study its magnitude, ignoring its sign. Here, we examine patterns of signed LD in two population genomic data sets, one from Capsella grandiflora and one from Drosophila melanogaster. We consider how processes such as drift, admixture, Hill–Robertson interference, and epistasis may contribute to these patterns. We report that most types of mutations exhibit positive LD, particularly, if they are predicted to be less deleterious. We show with simulations that this pattern arises easily in a model of admixture or distance-biased mating, and that genome-wide differences across site types are generally expected due to differences in the strength of purifying selection even in the absence of epistasis. We further explore how signed LD decays on a finer scale, showing that loss of function mutations exhibit particularly positive LD across short distances, a pattern consistent with intragenic antagonistic epistasis. Controlling for genomic distance, signed LD in C. grandiflora decays faster within genes, compared with between genes, likely a by-product of frequent recombination in gene promoters known to occur in plant genomes. Finally, we use information from published biological networks to explore whether there is evidence for negative synergistic epistasis between interacting radical missense mutations. In D. melanogaster networks, we find a modest but significant enrichment of negative LD, consistent with the possibility of intranetwork negative synergistic epistasis.
AbstractIdentifying our most distant animal relatives has emerged as one of the most challenging problems in phylogenetics. This debate has major implications for our understanding of the origin of multicellular animals and of the earliest events in animal evolution, including the origin of the nervous system. Some analyses identify sponges as our most distant animal relatives (Porifera-sister hypothesis), and others identify comb jellies (Ctenophora-sister hypothesis). These analyses vary in many respects, making it difficult to interpret previous tests of these hypotheses. To gain insight into why different studies yield different results, an important next step in the ongoing debate, we systematically test these hypotheses by synthesizing 15 previous phylogenomic studies and performing new standardized analyses under consistent conditions with additional models. We find that Ctenophora-sister is recovered across the full range of examined conditions, and Porifera-sister is recovered in some analyses under narrow conditions when most outgroups are excluded and site-heterogeneous CAT models are used. We additionally find that the number of categories in site-heterogeneous models is sufficient to explain the Porifera-sister results. Furthermore, our cross-validation analyses show CAT models that recover Porifera-sister have hundreds of additional categories and fail to fit significantly better than site-heterogenuous models with far fewer categories. Systematic and standardized testing of diverse phylogenetic models suggests that we should be skeptical of Porifera-sister results both because they are recovered under such narrow conditions and because the models in these conditions fit the data no better than other models that recover Ctenophora-sister.
AbstractNatural hybrid zones offer a powerful framework for understanding the genetic basis of speciation in progress because ongoing hybridization continually creates unfavorable gene combinations. Evidence indicates that postzygotic reproductive isolation is often caused by epistatic interactions between mutations in different genes that evolved independently of one another (hybrid incompatibilities). We examined the potential to detect epistatic selection against incompatibilities from genome sequence data using the site frequency spectrum (SFS) of polymorphisms by conducting individual-based simulations in SLiM. We found that the genome-wide SFS in hybrid populations assumes a diagnostic shape, with the continual input of fixed differences between source populations via migration inducing a mass at intermediate allele frequency. Epistatic selection locally distorts the SFS as non-incompatibility alleles rise in frequency in a manner analogous to a selective sweep. Building on these results, we present a statistical method to identify genomic regions containing incompatibility loci that locates departures in the local SFS compared with the genome-wide SFS. Cross-validation studies demonstrate that our method detects recessive and codominant incompatibilities across a range of scenarios varying in the strength of epistatic selection, migration rate, and hybrid zone age. Our approach takes advantage of whole genome sequence data, does not require knowledge of demographic history, and can be applied to any pair of nascent species that forms a hybrid zone.
AbstractTo address the void in the availability of high-quality proteomic data traversing the animal tree, we have implemented a pipeline for generating de novo assemblies based on publicly available data from the NCBI Sequence Read Archive, yielding a comprehensive collection of proteomes from 100 species spanning 21 animal phyla. We have also created the Animal Proteome Database (AniProtDB), a resource providing open access to this collection of high-quality metazoan proteomes, along with information on predicted proteins and protein domains for each taxonomic classification and the ability to perform sequence similarity searches against all proteomes generated using this pipeline. This solution vastly increases the utility of these data by removing the barrier to access for research groups who do not have the expertise or resources to generate these data themselves and enables the use of data from nontraditional research organisms that have the potential to address key questions in biomedicine.
AbstractThe standard genetic code (SGC) has been extensively analyzed for the biological ramifications of its nonrandom structure. For instance, mismatch errors due to point mutation or mistranslation have an overall smaller effect on the amino acid polar requirement under the SGC than under random genetic codes (RGCs). A similar observation was recently made for frameshift errors, prompting the assertion that the SGC has been shaped by natural selection for frameshift-robustness—conservation of certain amino acid properties upon a frameshift mutation or translational frameshift. However, frameshift-robustness confers no benefit because frameshifts usually create premature stop codons that cause nonsense-mediated mRNA decay or production of nonfunctional truncated proteins. We here propose that the frameshift-robustness of the SGC is a byproduct of its mismatch-robustness. Of 564 amino acid properties considered, the SGC exhibits mismatch-robustness in 93–133 properties and frameshift-robustness in 55 properties, respectively, and that the latter is largely a subset of the former. For each of the 564 real and 564 randomly constructed fake properties of amino acids, there is a positive correlation between mismatch-robustness and frameshift-robustness across one million RGCs; this correlation arises because most amino acid changes resulting from a frameshift are also achievable by a mismatch error. Importantly, the SGC does not show significantly higher frameshift-robustness in any of the 55 properties than RGCs of comparable mismatch-robustness. These findings support that the frameshift-robustness of the SGC need not originate through direct selection and can instead be a site effect of its mismatch-robustness.
AbstractLikelihood-based phylogenetic inference posits a probabilistic model of character state change along branches of a phylogenetic tree. These models typically assume statistical independence of sites in the sequence alignment. This is a restrictive assumption that facilitates computational tractability, but ignores how epistasis, the effect of genetic background on mutational effects, influences the evolution of functional sequences. We consider the effect of using a misspecified site-independent model on the accuracy of Bayesian phylogenetic inference in the setting of pairwise-site epistasis. Previous work has shown that as alignment length increases, tree reconstruction accuracy also increases. Here, we present a simulation study demonstrating that accuracy increases with alignment size even if the additional sites are epistatically coupled. We introduce an alignment-based test statistic that is a diagnostic for pairwise epistasis and can be used in posterior predictive checks.
AbstractThe effect of a mutation on fitness may differ between populations depending on environmental and genetic context, but little is known about the factors that underlie such differences. To quantify genome-wide correlations in mutation fitness effects, we developed a novel concept called a joint distribution of fitness effects (DFE) between populations. We then proposed a new statistic w to measure the DFE correlation between populations. Using simulation, we showed that inferring the DFE correlation from the joint allele frequency spectrum is statistically precise and robust. Using population genomic data, we inferred DFE correlations of populations in humans, Drosophila melanogaster, and wild tomatoes. In these species, we found that the overall correlation of the joint DFE was inversely related to genetic differentiation. In humans and D. melanogaster, deleterious mutations had a lower DFE correlation than tolerated mutations, indicating a complex joint DFE. Altogether, the DFE correlation can be reliably inferred, and it offers extensive insight into the genetics of population divergence.
AbstractWhen species are continuously distributed across environmental gradients, the relative strength of selection and gene flow shape spatial patterns of genetic variation, potentially leading to variable levels of differentiation across loci. Determining whether adaptive genetic variation tends to be structured differently than neutral variation along environmental gradients is an open and important question in evolutionary genetics. We performed exome-wide population genomic analysis on deer mice sampled along an elevational gradient of nearly 4,000 m of vertical relief. Using a combination of selection scans, genotype−environment associations, and geographic cline analyses, we found that a large proportion of the exome has experienced a history of altitude-related selection. Elevational clines for nearly 30% of these putatively adaptive loci were shifted significantly up- or downslope of clines for loci that did not bear similar signatures of selection. Many of these selection targets can be plausibly linked to known phenotypic differences between highland and lowland deer mice, although the vast majority of these candidates have not been reported in other studies of highland taxa. Together, these results suggest new hypotheses about the genetic basis of physiological adaptation to high altitude, and the spatial distribution of adaptive genetic variation along environmental gradients.
AbstractTransposable elements (TEs) are ubiquitous and mobile repeated sequences. They are major determinants of host fitness. Here, we characterized the TE content of the spotted wing fly Drosophila suzukii. Using a recently improved genome assembly, we reconstructed TE sequences de novo and found that TEs occupy 47% of the genome and are mostly located in gene-poor regions. The majority of TE insertions segregate at low frequencies, indicating a recent and probably ongoing TE activity. To explore TE dynamics in the context of biological invasions, we studied the variation of TE abundance in genomic data from 16 invasive and six native populations of D. suzukii. We found a large increase of the TE load in invasive populations correlated with a reduced Watterson estimate of genetic diversity θw^ a proxy of effective population size. We did not find any correlation between TE contents and bioclimatic variables, indicating a minor effect of environmentally induced TE activity. A genome-wide association study revealed that ca. 2,000 genomic regions are associated with TE abundance. We did not find, however, any evidence in such regions of an enrichment for genes known to interact with TE activity (e.g., transcription factor encoding genes or genes of the piRNA pathway). Finally, the study of TE insertion frequencies revealed 15 putatively adaptive TE insertions, six of them being likely associated with the recent invasion history of the species.
AbstractBreeding for climate resilience is currently an important goal for sustainable livestock production. Local adaptations exhibited by indigenous livestock allow investigating the genetic control of this resilience. Ecological niche modeling (ENM) provides a powerful avenue to identify the main environmental drivers of selection. Here, we applied an integrative approach combining ENM with genome-wide selection signature analyses (XPEHH and Fst) and genotype−environment association (redundancy analysis), with the aim of identifying the genomic signatures of adaptation in African village chickens. By dissecting 34 agro-climatic variables from the ecosystems of 25 Ethiopian village chicken populations, ENM identified six key drivers of environmental challenges: One temperature variable—strongly correlated with elevation, three precipitation variables as proxies for water availability, and two soil/land cover variables as proxies of food availability for foraging chickens. Genome analyses based on whole-genome sequencing (n = 245), identified a few strongly supported genomic regions under selection for environmental challenges related to altitude, temperature, water scarcity, and food availability. These regions harbor several gene clusters including regulatory genes, suggesting a predominantly oligogenic control of environmental adaptation. Few candidate genes detected in relation to heat-stress, indicates likely epigenetic regulation of thermo-tolerance for a domestic species originating from a tropical Asian wild ancestor. These results provide possible explanations for the rapid past adaptation of chickens to diverse African agro-ecologies, while also representing new landmarks for sustainable breeding improvement for climate resilience. We show that the pre-identification of key environmental drivers, followed by genomic investigation, provides a powerful new approach for elucidating adaptation in domestic animals.
AbstractThe number of olfactory receptor genes (ORs), which are responsible for detecting diverse odor molecules varies extensively among mammals as a result of frequent gene gains and losses that contribute to olfactory specialization. However, how OR expansions/contractions in fish are influenced by habitat and feeding habit and which OR subfamilies are important in each ecological niche is unknown. Here, we report a major OR expansion in a freshwater herbivorous fish, Megalobrama amblycephala, using a highly contiguous, chromosome-level assembly. We evaluate the possible contribution of OR expansion to habitat and feeding specialization by comparing the OR repertoire in 28 phylogenetically and ecologically diverse teleosts. In total, we analyzed > 4,000 ORs including 3,253 intact, 122 truncated, and 913 pseudogenes. The number of intact ORs is highly variable ranging from 20 to 279. We estimate that the most recent common ancestor of Osteichthyes had 62 intact ORs, which declined in most lineages except the freshwater Otophysa clade that has a substantial expansion in subfamily β and ε ORs. Across teleosts, we found a strong association between duplications of β and ε ORs and freshwater habitat. Nearly, all ORs were expressed in the olfactory epithelium (OE) in three tested fish species. Specifically, all the expanded β and ε ORs were highly expressed in OE of M. amblycephala. Together, we provide molecular and functional evidence for how OR repertoires in fish have undergone gain and loss with respect to ecological factors and highlight the role of β and ε OR in freshwater adaptation.
AbstractPathogens and associated outbreaks of infectious disease exert selective pressure on human populations, and any changes in allele frequencies that result may be especially evident for genes involved in immunity. In this regard, the 1346-1353 Yersinia pestis-caused Black Death pandemic, with continued plague outbreaks spanning several hundred years, is one of the most devastating recorded in human history. To investigate the potential impact of Y. pestis on human immunity genes, we extracted DNA from 36 plague victims buried in a mass grave in Ellwangen, Germany in the 16th century. We targeted 488 immune-related genes, including HLA, using a novel in-solution hybridization capture approach. In comparison with 50 modern native inhabitants of Ellwangen, we find differences in allele frequencies for variants of the innate immunity proteins Ficolin-2 and NLRP14 at sites involved in determining specificity. We also observed that HLA-DRB1*13 is more than twice as frequent in the modern population, whereas HLA-B alleles encoding an isoleucine at position 80 (I-80+), HLA C*06:02 and HLA-DPB1 alleles encoding histidine at position 9 are half as frequent in the modern population. Simulations show that natural selection has likely driven these allele frequency changes. Thus, our data suggest that allele frequencies of HLA genes involved in innate and adaptive immunity responsible for extracellular and intracellular responses to pathogenic bacteria, such as Y. pestis, could have been affected by the historical epidemics that occurred in Europe.
AbstractHorizontal gene transfer (HGT) is a major driving force for bacterial evolution. To avoid the deleterious effects due to the unregulated expression of newly acquired foreign genes, bacteria have evolved specific proteins named xenogeneic silencers to recognize foreign DNA sequences and suppress their transcription. As there is considerable diversity in genomic base compositions among bacteria, how xenogeneic silencers distinguish self- from nonself DNA in different bacteria remains poorly understood. This review summarizes the progress in studying the DNA binding preferences and the underlying molecular mechanisms of known xenogeneic silencer families, represented by H-NS of Escherichia coli, Lsr2 of Mycobacterium, MvaT of Pseudomonas, and Rok of Bacillus. Comparative analyses of the published data indicate that the differences in DNA recognition mechanisms enable these xenogeneic silencers to have clear characteristics in DNA sequence preferences, which are further correlated with different host genomic features. These correlations provide insights into the mechanisms of how these xenogeneic silencers selectively target foreign DNA in different genomic backgrounds. Furthermore, it is revealed that the genomic AT contents of bacterial species with the same xenogeneic silencer family proteins are distributed in a limited range and are generally lower than those species without any known xenogeneic silencers in the same phylum/class/genus, indicating that xenogeneic silencers have multifaceted roles on bacterial genome evolution. In addition to regulating horizontal gene transfer, xenogeneic silencers also act as a selective force against the GC to AT mutational bias found in bacterial genomes and help the host genomic AT contents maintained at relatively low levels.
AbstractPopulation genetic theory predicts that small effective population sizes (Ne) and restricted gene flow limit the potential for local adaptation. In particular, the probability of evolving similar phenotypes based on shared genetic mechanisms (i.e., parallel evolution), is expected to be reduced. We tested these predictions in a comparative genomic study of two ecologically similar and geographically codistributed stickleback species (viz. Gasterosteus aculeatus and Pungitius pungitius). We found that P. pungitius harbors less genetic diversity and exhibits higher levels of genetic differentiation and isolation-by-distance than G. aculeatus. Conversely, G. aculeatus exhibits a stronger degree of genetic parallelism across freshwater populations than P. pungitius: 2,996 versus 379 single nucleotide polymorphisms located within 26 versus 9 genomic regions show evidence of selection in multiple freshwater populations of G. aculeatus and P. pungitius, respectively. Most regions involved in parallel evolution in G. aculeatus showed increased levels of divergence, suggestive of selection on ancient haplotypes. In contrast, haplotypes involved in freshwater adaptation in P. pungitius were younger. In accordance with theory, the results suggest that connectivity and genetic drift play crucial roles in determining the levels and geographic distribution of standing genetic variation, providing evidence that population subdivision limits local adaptation and therefore also the likelihood of parallel evolution.
AbstractThe origin of the jaw is a long-standing problem in vertebrate evolutionary biology. Classical hypotheses of serial homology propose that the upper and lower jaw evolved through modifications of dorsal and ventral gill arch skeletal elements, respectively. If the jaw and gill arches are derived members of a primitive branchial series, we predict that they would share common developmental patterning mechanisms. Using candidate and RNAseq/differential gene expression analyses, we find broad conservation of dorsoventral (DV) patterning mechanisms within the developing mandibular, hyoid, and gill arches of a cartilaginous fish, the skate (Leucoraja erinacea). Shared features include expression of genes encoding members of the ventralizing BMP and endothelin signaling pathways and their effectors, the joint markers nkx3.2 and gdf5 and prochondrogenic transcription factor barx1, and the dorsal territory marker pou3f3. Additionally, we find that mesenchymal expression of eya1/six1 is an ancestral feature of the mandibular arch of jawed vertebrates, whereas differences in notch signaling distinguish the mandibular and gill arches in skate. Comparative transcriptomic analyses of mandibular and gill arch tissues reveal additional genes differentially expressed along the DV axis of the pharyngeal arches, including scamp5 as a novel marker of the dorsal mandibular arch, as well as distinct transcriptional features of mandibular and gill arch muscle progenitors and developing gill buds. Taken together, our findings reveal conserved patterning mechanisms in the pharyngeal arches of jawed vertebrates, consistent with serial homology of their skeletal derivatives, as well as unique transcriptional features that may underpin distinct jaw and gill arch morphologies.
AbstractPrevious evolutionary reconstructions have concluded that early eukaryotic ancestors including both the last common ancestor of eukaryotes and of all fungi had intron-rich genomes. By contrast, some extant eukaryotes have few introns, underscoring the complex histories of intron–exon structures, and raising the question as to why these few introns are retained. Here, we have used recently available fungal genomes to address a variety of questions related to intron evolution. Evolutionary reconstruction of intron presence and absence using 263 diverse fungal species supports the idea that massive intron reduction through intron loss has occurred in multiple clades. The intron densities estimated in various fungal ancestors differ from zero to 7.6 introns per 1 kb of protein-coding sequence. Massive intron loss has occurred not only in microsporidian parasites and saccharomycetous yeasts, but also in diverse smuts and allies. To investigate the roles of the remaining introns in highly-reduced species, we have searched for their special characteristics in eight intron-poor fungi. Notably, the introns of ribosome-associated genes RPL7 and NOG2 have conserved positions; both intron-containing genes encoding snoRNAs. Furthermore, both the proteins and snoRNAs are involved in ribosome biogenesis, suggesting that the expression of the protein-coding genes and noncoding snoRNAs may be functionally coordinated. Indeed, these introns are also conserved in three-quarters of fungi species. Our study shows that fungal introns have a complex evolutionary history and underappreciated roles in gene expression.
AbstractThe Taiwanese people are composed of diverse indigenous populations and the Taiwanese Han. About 95% of the Taiwanese identify themselves as Taiwanese Han, but this may not be a homogeneous population because they migrated to the island from various regions of continental East Asia over a period of 400 years. Little is known about the underlying patterns of genetic ancestry, population admixture, and evolutionary adaptation in the Taiwanese Han people. Here, we analyzed the whole-genome single-nucleotide polymorphism genotyping data from 14,401 individuals of Taiwanese Han collected by the Taiwan Biobank and the whole-genome sequencing data for a subset of 772 people. We detected four major genetic ancestries with distinct geographic distributions (i.e., Northern, Southeastern, Japonic, and Island Southeast Asian ancestries) and signatures of population mixture contributing to the genomes of Taiwanese Han. We further scanned for signatures of positive natural selection that caused unusually long-range haplotypes and elevations of hitchhiked variants. As a result, we identified 16 candidate loci in which selection signals can be unambiguously localized at five single genes: CTNNA2, LRP1B, CSNK1G3, ASTN2, and NEO1. Statistical associations were examined in 16 metabolic-related traits to further elucidate the functional effects of each candidate gene. All five genes appear to have pleiotropic connections to various types of disease susceptibility and significant associations with at least one metabolic-related trait. Together, our results provide critical insights for understanding the evolutionary history and adaption of the Taiwanese Han population.
When this manuscript published online, a previous version of the supplementary material was included. This has been removed and replaced with the correct version of the supplementary material.
Genome Biology and Evolution, Volume 13, Issue 2, evab001, https://doi.org/10.1093/gbe/evab001
“The days of ‘junk DNA’ are over,” according to Christoph Grunau and Christoph Grevelding, the senior authors of a new research article in Genome Biology and Evolution. Their study provides an in-depth look at an enigmatic superfamily of repetitive DNA sequences known as W elements in the genome of the human parasite Schistosoma mansoni (Stitz et al. 2021). Titled “Satellite-like W elements: repetitive, transcribed, and putative mobile genetic factors with potential roles for biology and evolution of Schistosoma mansoni,” the analysis reveals structural, functional, and evolutionary aspects of these elements and shows that, far from being “junk,” they may exert an enduring influence on the biology of S. mansoni.
AbstractThe contribution of gene duplications to the evolution of eukaryotic genomes is well studied. By contrast, studies of gene duplications in prokaryotes are scarce and generally limited to a handful of genes or careful analysis of a few prokaryotic lineages. Systematic broad-scale studies of prokaryotic genomes that sample available data are lacking, leaving gaps in our understanding of the contribution of gene duplications as a source of genetic novelty in the prokaryotic world. Here, we report conservative and robust estimates for the frequency of recent gene duplications within prokaryotic genomes relative to recent lateral gene transfer (LGT), as mechanisms to generate multiple copies of related sequences in the same genome. We obtain our estimates by focusing on evolutionarily recent events among 5,655 prokaryotic genomes, thereby avoiding vagaries of deep phylogenetic inference and confounding effects of ancient events and differential loss. We find that recent, genome-specific gene duplications are at least 50 times less frequent and probably 100 times less frequent than recent, genome-specific, gene acquisitions via LGT. The frequency of gene duplications varies across lineages and functional categories. The findings improve our understanding of genome evolution in prokaryotes and have far-reaching implications for evolutionary models that entail LGT to gene duplications ratio as a parameter.
AbstractThe Himalayan giant honeybee, Apis laboriosa, is the largest individual honeybee with major ecological and economic importance in high-latitude environments. However, our understanding of its environmental adaptations is circumscribed by the paucity of genomic data for this species. Here, we provide a draft genome of wild A. laboriosa, along with a comparison to its closely related species, Apis dorsata. The draft genome of A. laboriosa based on the de novo assembly is 226.1 Mbp in length with a scaffold N50 size of 3.34 Mbp, a GC content of 32.2%, a repeat content of 6.86%, and a gene family number of 8,404. Comparative genomics analysis revealed that the genes in A. laboriosa genome have undergone stronger positive selection (2.5 times more genes) and more recent duplication/loss events (6.1 times more events) than those in the A. dorsata genome. Our study implies the potential molecular mechanisms underlying the high-altitude adaptation of A. laboriosa and will catalyze future comparative studies to understand the environmental adaptation of modern honeybees.
AbstractMuseum collections contain enormous quantities of insect specimens collected over the past century, covering a period of increased and varied insecticide usage. These historic collections are therefore incredibly valuable as genomic snapshots of organisms before, during, and after exposure to novel selective pressures. However, these samples come with their own challenges compared with present-day collections, as they are fragile and retrievable DNA is low yield and fragmented. In this article, we tested several DNA extraction procedures across pinned historic Diptera specimens from four disease vector genera: Anopheles, Aedes, Culex, and Glossina. We identify an approach that minimizes morphological damage while maximizing DNA retrieval for Illumina library preparation and sequencing that can accommodate the fragmented and low yield nature of historic DNA. We identify several key points in retrieving sufficient DNA while keeping morphological damage to a minimum: an initial rehydration step, a short incubation without agitation in a modified low salt Proteinase K buffer (referred to as “lysis buffer C” throughout), and critical point drying of samples post-extraction to prevent tissue collapse caused by air drying. The suggested method presented here provides a solid foundation for exploring the genomes and morphology of historic Diptera collections.
AbstractThe bluntnose knifefish Brachyhypopomus occidentalis is a primary freshwater fish from north-western South America and Lower Central America. Like other Gymnotiformes, it has an electric organ that generates electric discharges used for both communication and electrolocation. We assembled a high-quality reference genome sequence of B. occidentalis by combining Oxford Nanopore and 10X Genomics linked-reads technologies. We also describe its demographic history in the context of the rise of the Isthmus of Panama. The size of the assembled genome is 540.3 Mb with an N50 scaffold length of 5.4 Mb, which includes 93.8% complete, 0.7% fragmented, and 5.5% of missing vertebrate/Actinoterigie Benchmarking Universal Single-Copy Orthologs. Repetitive elements account for 11.04% of the genome, and 34,347 protein-coding genes were predicted, of which 23,935 have been functionally annotated. Demographic analysis suggests a rapid effective population expansion between 3 and 5 Myr, corresponding to the final closure of the Isthmus of Panama (2.8–3.5 Myr). This event was followed by a sudden and constant population decline during the last 1 Myr, likely associated with strong shifts in both precipitation and sea level during the Pleistocene glacial-interglacial cycles. The de novo genome assembly of B. occidentalis will provide novel insights into the molecular basis of both electric signal productions and detection and will be fundamental for understanding the processes that have shaped the diversity of Neotropical freshwater environments.
AbstractSpecies of infraorder Gryllidea, or crickets, are useful invertebrate models for studying developmental biology and neuroscience. They have also attracted attention as alternative protein sources for human food and animal feed. Mitochondrial genomic information on related invertebrates, such as katydids, and locusts, has recently become available in attempt to clarify the controversial classification schemes, although robust phylogenetic relationships with emphasis on crickets remain elusive. Here, we report newly sequenced complete mitochondrial genomes of crickets to study their phylogeny, genomic rearrangements, and adaptive evolution. First, we conducted de novo assembly of mitochondrial genomes from eight cricket species and annotated protein-coding genes and transfer and ribosomal RNAs using automatic annotations and manual curation. Next, by combining newly described protein-coding genes with public data of the complete Gryllidea genomes and gene annotations, we performed phylogenetic analysis and found gene order rearrangements in several branches. We further analyzed genetic signatures of selection in ant-loving crickets (Myrmecophilidae), which are small wingless crickets that inhabit ant nests. Three distinct approaches revealed two positively selected sites in the cox1 gene in these crickets. Protein 3D structural analyses suggested that these selected sites could influence the interaction of respiratory complex proteins, conferring benefits to ant-loving crickets with a unique ecological niche and morphology. These findings enhance our understanding of the genetic basis of cricket evolution without relying on estimates based on a limited number of molecular markers.
AbstractComparison of the androgen-binding protein (Abp) gene regions of six Mus genomes provides insights into the evolutionary history of this large murid rodent gene family. We identified 206 unique Abp sequences and mapped their physical relationships. At least 48 are duplicated and thus present in more than two identical copies. All six taxa have substantially elevated LINE1 densities in Abp regions compared with flanking regions, similar to levels in mouse and rat genomes, although nonallelic homologous recombination seems to have only occurred in Mus musculus domesticus. Phylogenetic and structural relationships support the hypothesis that the extensive Abp expansion began in an ancestor of the genus Mus. We also found duplicated Abpa27’s in two taxa, suggesting that previously reported selection on a27 alleles may have actually detected selection on haplotypes wherein different paralogs were lost in each. Other studies reported that a27 gene and species trees were incongruent, likely because of homoplasy. However, L1MC3 phylogenies, supposed to be homoplasy-free compared with coding regions, support our paralog hypothesis because the L1MC3 phylogeny was congruent with the a27 topology. This paralog hypothesis provides an alternative explanation for the origin of the a27 gene that is suggested to be fixed in the three different subspecies of Mus musculus and to mediate sexual selection and incipient reinforcement between at least two of them. Finally, we ask why there are so many Abp genes, especially given the high frequency of pseudogenes and suggest that relaxed selection operates over a large part of the gene clusters.
AbstractThe plastid genomes of photosynthetic green plants have largely maintained conserved gene content and order as well as structure over hundreds of millions of years of evolution. Several plant lineages, however, have departed from this conservation and contain many plastome structural rearrangements, which have been associated with an abundance of repeated sequences both overall and near rearrangement endpoints. We sequenced the plastomes of 25 taxa of Astragalus L. (Fabaceae), a large genus in the inverted repeat-lacking clade of legumes, to gain a greater understanding of the connection between repeats and plastome inversions. We found plastome repeat structure has a strong phylogenetic signal among these closely related taxa mostly in the New World clade of Astragalus called Neo-Astragalus. Taxa without inversions also do not differ substantially in their overall repeat structure from four taxa each with one large-scale inversion. For two taxa with inversion endpoints between the same pairs of genes, differences in their exact endpoints indicate the inversions occurred independently. Our proposed mechanism for inversion formation suggests the short inverted repeats now found near the endpoints of the four inversions may be there as a result of these inversions rather than their cause. The longer inverted repeats now near endpoints may have allowed the inversions first mediated by shorter microhomologous sequences to propagate, something that should be considered in explaining how any plastome rearrangement becomes fixed regardless of the mechanism of initial formation.
AbstractThe giant black tiger shrimp (Penaeus monodon) is native to the Indo-Pacific and is the second most farmed penaeid shrimp species globally. Understanding genetic structure, connectivity, and local adaptation among Indo-Pacific black tiger shrimp populations is important for informing sustainable fisheries management and aquaculture breeding programs. Population genetic and outlier detection analyses were undertaken using 10,593 genome-wide single nucleotide polymorphisms (SNPs) from 16 geographically disparate Indo-Pacific P. monodon populations. Levels of genetic diversity were highest for Southeast Asian populations and were lowest for Western Indian Ocean (WIO) populations. Both neutral (n = 9,930) and outlier (n = 663) loci datasets revealed a pattern of strong genetic structure of P. monodon corresponding with broad geographical regions and clear genetic breaks among samples within regions. Neutral loci revealed seven genetic clusters and the separation of Fiji and WIO clusters from all other clusters, whereas outlier loci revealed six genetic clusters and high genetic differentiation among populations. The neutral loci dataset estimated five migration events that indicated migration to Southeast Asia from the WIO, with partial connectivity to populations in both oceans. We also identified 26 putatively adaptive SNPs that exhibited significant Pearson correlation (P < 0.05) between minor allele frequency and maximum or minimum sea surface temperature. Matched transcriptome contig annotations suggest putatively adaptive SNPs involvement in cellular and metabolic processes, pigmentation, immune response, and currently unknown functions. This study provides novel genome-level insights that have direct implications for P. monodon aquaculture and fishery management practices.
AbstractGlobin-X (GbX) is an enigmatic member of the vertebrate globin gene family with a wide phyletic distribution that spans protostomes and deuterostomes. Unlike canonical globins such as hemoglobins and myoglobins, functional data suggest that GbX does not have a primary respiratory function. Instead, evidence suggests that the monomeric, membrane-bound GbX may play a role in cellular signaling or protection against the oxidation of membrane lipids. Recently released genomes from key vertebrates provide an excellent opportunity to address questions about the early stages of the evolution of GbX in vertebrates. We integrate bioinformatics, synteny, and phylogenetic analyses to characterize the diversity of GbX genes in nonteleost ray-finned fishes, resolve relationships between the GbX genes of cartilaginous fish and bony vertebrates, and demonstrate that the GbX genes of cyclostomes and gnathostomes derive from independent duplications. Our study highlights the role that whole-genome duplications (WGDs) have played in expanding the repertoire of genes in vertebrate genomes. Our results indicate that GbX paralogs have a remarkably high rate of retention following WGDs relative to other globin genes and provide an evolutionary framework for interpreting results of experiments that examine functional properties of GbX and patterns of tissue-specific expression. By identifying GbX paralogs that are products of different WGDs, our results can guide the design of experimental work to explore whether gene duplicates that originate via WGDs have evolved novel functional properties or expression profiles relative to singleton or tandemly duplicated copies of GbX.
AbstractA large portion of animal and plant genomes consists of noncoding DNA. This part includes tandemly repeated sequences and gained attention because it offers exciting insights into genome biology. We investigated satellite-DNA elements of the platyhelminth Schistosoma mansoni, a parasite with remarkable biological features. Schistosoma mansoni lives in the vasculature of humans causing schistosomiasis, a disease of worldwide importance. Schistosomes are the only trematodes that have evolved separate sexes, and the sexual maturation of the female depends on constant pairing with the male. The schistosome karyotype comprises eight chromosome pairs, males are homogametic (ZZ) and females are heterogametic (ZW). Part of the repetitive DNA of S. mansoni are W-elements (WEs), originally discovered as female-specific satellite DNAs in the heterochromatic block of the W-chromosome. Based on new genome and transcriptome data, we performed a reanalysis of the W-element families (WEFs). Besides a new classification of 19 WEFs, we provide first evidence for stage-, sex-, pairing-, gonad-, and strain-specific/preferential transcription of WEs as well as their mobile nature, deduced from autosomal copies of full-length and partial WEs. Structural analyses suggested roles as sources of noncoding RNA-like hammerhead ribozymes, for which we obtained functional evidence. Finally, the variable WEF occurrence in different schistosome species revealed remarkable divergence. From these results, we propose that WEs potentially exert enduring influence on the biology of S. mansoni. Their variable occurrence in different strains, isolates, and species suggests that schistosome WEs may represent genetic factors taking effect on variability and evolution of the family Schistosomatidae.
AbstractOwing to a lag between a deleterious mutation’s appearance and its selective removal, gold-standard methods for mutation rate estimation assume no meaningful loss of mutations between parents and offspring. Indeed, from analysis of closely related lineages, in SARS-CoV-2, the Ka/Ks ratio was previously estimated as 1.008, suggesting no within-host selection. By contrast, we find a higher number of observed SNPs at 4-fold degenerate sites than elsewhere and, allowing for the virus’s complex mutational and compositional biases, estimate that the mutation rate is at least 49–67% higher than would be estimated based on the rate of appearance of variants in sampled genomes. Given the high Ka/Ks one might assume that the majority of such intrahost selection is the purging of nonsense mutations. However, we estimate that selection against nonsense mutations accounts for only ∼10% of all the “missing” mutations. Instead, classical protein-level selective filters (against chemically disparate amino acids and those predicted to disrupt protein functionality) account for many missing mutations. It is less obvious why for an intracellular parasite, amino acid cost parameters, notably amino acid decay rate, is also significant. Perhaps most surprisingly, we also find evidence for real-time selection against synonymous mutations that move codon usage away from that of humans. We conclude that there is common intrahost selection on SARS-CoV-2 that acts on nonsense, missense, and possibly synonymous mutations. This has implications for methods of mutation rate estimation, for determining times to common ancestry and the potential for intrahost evolution including vaccine escape.
AbstractMany animals including birds, reptiles, insects, and teleost fishes can see ultraviolet (UV) light (shorter than 400 nm), which has functional importance for foraging and communication. For coral reef fishes, shallow reef environments transmit a broad spectrum of light, rich in UV, driving the evolution of diverse spectral sensitivities. However, the identities and sites of the specific visual genes that underly vision in reef fishes remain elusive and are useful in determining how evolution has tuned vision to suit life on the reef. We investigated the visual systems of 11 anemonefish (Amphiprioninae) species, specifically probing for the molecular pathways that facilitate UV-sensitivity. Searching the genomes of anemonefishes, we identified a total of eight functional opsin genes from all five vertebrate visual opsin subfamilies. We found rare instances of teleost UV-sensitive SWS1 opsin gene duplications that produced two functionally coding paralogs (SWS1α and SWS1β) and a pseudogene. We also found separate green sensitive RH2A opsin gene duplicates not yet reported in the family Pomacentridae. Transcriptome analysis revealed false clown anemonefish (Amphiprion ocellaris) expressed one rod opsin (RH1) and six cone opsins (SWS1β, SWS2B, RH2B, RH2A-1, RH2A-2, LWS) in the retina. Fluorescent in situ hybridization highlighted the (co-)expression of SWS1β with SWS2B in single cones, and either RH2B, RH2A, or RH2A together with LWS in different members of double cone photoreceptors (two single cones fused together). Our study provides the first in-depth characterization of visual opsin genes found in anemonefishes and provides a useful basis for the further study of UV-vision in reef fishes.