Molecular Biology and Evolution, Volume 38, Issue 1, January 2021, Pages 16–30, https://doi.org/10.1093/molbev/msaa216
Molecular Biology and Evolution, msab030, https://doi.org/10.1093/molbev/msab030
AbstractThe Pop-Gen Pipeline Platform (PPP) is a software platform for population genomic analyses. The PPP was designed as a collection of scripts that facilitate common population genomic workflows in a consistent and standardized Python environment. Functions were developed to encompass entire workflows, including input preparation, file format conversion, various population genomic analyses, and output generation. The platform has also been developed with reproducibility and extensibility of analyses in mind. The PPP is an open-source package that is available for download and use at https://ppp.readthedocs.io/en/latest/PPP_pages/install.html.
AbstractGlobal sequencing of genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continued to reveal new genetic variants that are the key to unraveling its early evolutionary history and tracking its global spread over time. Here we present the heretofore cryptic mutational history and spatiotemporal dynamics of SARS-CoV-2 from an analysis of thousands of high-quality genomes. We report the likely most recent common ancestor of SARS-CoV-2, reconstructed through a novel application and advancement of computational methods initially developed to infer the mutational history of tumor cells in a patient. This progenitor genome differs from genomes of the first coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the United States harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide months before and after the first reported cases of COVID-19 in China. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains that have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for most of the pandemic in 2020. There have been multiple replacements of predominant coronavirus strains in Europe and Asia as well as continued presence of multiple high-frequency strains in Asia and North America. We have developed a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread (http://sars2evo.datamonkey.org/).
AbstractEmerging evidence links genes within human-specific segmental duplications (HSDs) to traits and diseases unique to our species. Strikingly, despite being nearly identical by sequence (>98.5%), paralogous HSD genes are differentially expressed across human cell and tissue types, though the underlying mechanisms have not been examined. We compared cross-tissue mRNA levels of 75 HSD genes from 30 families between humans and chimpanzees and found expression patterns consistent with relaxed selection on or neofunctionalization of derived paralogs. In general, ancestral paralogs exhibited greatest expression conservation with chimpanzee orthologs, though exceptions suggest certain derived paralogs may retain or supplant ancestral functions. Concordantly, analysis of long-read isoform sequencing data sets from diverse human tissues and cell lines found that about half of derived paralogs exhibited globally lower expression. To understand mechanisms underlying these differences, we leveraged data from human lymphoblastoid cell lines (LCLs) and found no relationship between paralogous expression divergence and post-transcriptional regulation, sequence divergence, or copy-number variation. Considering cis-regulation, we reanalyzed ENCODE data and recovered hundreds of previously unidentified candidate CREs in HSDs. We also generated large-insert ChIP-sequencing data for active chromatin features in an LCL to better distinguish paralogous regions. Some duplicated CREs were sufficient to drive differential reporter activity, suggesting they may contribute to divergent cis-regulation of paralogous genes. This work provides evidence that cis-regulatory divergence contributes to novel expression patterns of recent gene duplicates in humans.
AbstractThe study of domestication contributes to our knowledge of evolution and crop genetic resources. Human selection has shaped wild Brassica rapa into diverse turnip, leafy, and oilseed crops. Despite its worldwide economic importance and potential as a model for understanding diversification under domestication, insights into the number of domestication events and initial crop(s) domesticated in B. rapa have been limited due to a lack of clarity about the wild or feral status of conspecific noncrop relatives. To address this gap and reconstruct the domestication history of B. rapa, we analyzed 68,468 genotyping-by-sequencing-derived single nucleotide polymorphisms for 416 samples in the largest diversity panel of domesticated and weedy B. rapa to date. To further understand the center of origin, we modeled the potential range of wild B. rapa during the mid-Holocene. Our analyses of genetic diversity across B. rapa morphotypes suggest that noncrop samples from the Caucasus, Siberia, and Italy may be truly wild, whereas those occurring in the Americas and much of Europe are feral. Clustering, tree-based analyses, and parameterized demographic inference further indicate that turnips were likely the first crop type domesticated, from which leafy types in East Asia and Europe were selected from distinct lineages. These findings clarify the domestication history and nature of wild crop genetic resources for B. rapa, which provides the first step toward investigating cases of possible parallel selection, the domestication and feralization syndrome, and novel germplasm for Brassica crop improvement.
AbstractThailand and Laos, located in the center of Mainland Southeast Asia (MSEA), harbor diverse ethnolinguistic groups encompassing all five language families of MSEA: Tai-Kadai (TK), Austroasiatic (AA), Sino-Tibetan (ST), Hmong-Mien (HM), and Austronesian (AN). Previous genetic studies of Thai/Lao populations have focused almost exclusively on uniparental markers and there is a paucity of genome-wide studies. We therefore generated genome-wide SNP data for 33 ethnolinguistic groups, belonging to the five MSEA language families from Thailand and Laos, and analyzed these together with data from modern Asian populations and SEA ancient samples. Overall, we find genetic structure according to language family, albeit with heterogeneity in the AA-, HM-, and ST-speaking groups, and in the hill tribes, that reflects both population interactions and genetic drift. For the TK speaking groups, we find localized genetic structure that is driven by different levels of interaction with other groups in the same geographic region. Several Thai groups exhibit admixture from South Asia, which we date to ∼600–1000 years ago, corresponding to a time of intensive international trade networks that had a major cultural impact on Thailand. An AN group from Southern Thailand shows both South Asian admixture as well as overall affinities with AA-speaking groups in the region, suggesting an impact of cultural diffusion. Overall, we provide the first detailed insights into the genetic profiles of Thai/Lao ethnolinguistic groups, which should be helpful for reconstructing human genetic history in MSEA and selecting populations for participation in ongoing whole genome sequence and biomedical studies.
AbstractVariation at the ABO locus was one of the earliest sources of data in the study of human population identity and history, and to this day remains widely genotyped due to its importance in blood and tissue transfusions. Here, we look at ABO blood type variants in our archaic relatives: Neanderthals and Denisovans. Our goal is to understand the genetic landscape of the ABO gene in archaic humans, and how it relates to modern human ABO variation. We found two Neanderthal variants of the O allele in the Siberian Neanderthals (O1 and O2), one of these variants is shared with an European Neanderthal, who is a heterozygote for this O1 variant and a rare cis-AB variant. The Denisovan individual is heterozygous for two variants of the O1 allele, functionally similar to variants found widely in modern humans. Perhaps more surprisingly, the O2 allele variant found in Siberian Neanderthals can be found at low frequencies in modern Europeans and Southeast Asians, and the O1 allele variant found in Siberian and European Neanderthal is also found at very low frequency in modern East Asians. Our genetic distance analyses suggest both alleles survive in modern humans due to inbreeding with Neanderthals. We find that the sequence backgrounds of the surviving Neanderthal-like O alleles in modern humans retain a higher sequence divergence than other surviving Neanderthal genome fragments, supporting a view of balancing selection operating in the Neanderthal ABO alleles by retaining highly diverse haplotypes compared with portions of the genome evolving neutrally.
AbstractThe high mutational load of mitochondrial genomes combined with their uniparental inheritance and high polyploidy favors the maintenance of deleterious mutations within populations. How cells compose and adapt to the accumulation of disadvantageous mitochondrial alleles remains unclear. Most harmful changes are likely corrected by purifying selection, however, the intimate collaboration between mitochondria- and nuclear-encoded gene products offers theoretical potential for compensatory adaptive changes. In plants, cytoplasmic male sterilities are known examples of nucleo-mitochondrial coadaptation situations in which nuclear-encoded restorer of fertility (Rf) genes evolve to counteract the effect of mitochondria-encoded cytoplasmic male sterility (CMS) genes and restore fertility. Most cloned Rfs belong to a small monophyletic group, comprising 26 pentatricopeptide repeat genes in Arabidopsis, called Rf-like (RFL). In this analysis, we explored the functional diversity of RFL genes in Arabidopsis and found that the RFL8 gene is not related to CMS suppression but essential for plant embryo development. In vitro-rescued rfl8 plantlets are deficient in the production of the mitochondrial heme–lyase complex. A complete ensemble of molecular and genetic analyses allowed us to demonstrate that the RFL8 gene has been selected to permit the translation of the mitochondrial ccmFN2 gene encoding a heme–lyase complex subunit which derives from the split of the ccmFN gene, specifically in Brassicaceae plants. This study represents thus a clear case of nuclear compensation to a lineage-specific mitochondrial genomic rearrangement in plants and demonstrates that RFL genes can be selected in response to other mitochondrial deviancies than CMS suppression.
AbstractBacterial persistence is a potential cause of antibiotic therapy failure. Antibiotic-tolerant persisters originate from phenotypic differentiation within a susceptible population, occurring with a frequency that can be altered by mutations. Recent studies have proven that persistence is a highly evolvable trait and, consequently, an important evolutionary strategy of bacterial populations to adapt to high-dose antibiotic therapy. Yet, the factors that govern the evolutionary dynamics of persistence are currently poorly understood. Theoretical studies predict far-reaching effects of bottlenecking on the evolutionary adaption of bacterial populations, but these effects have never been investigated in the context of persistence. Bottlenecking events are frequently encountered by infecting pathogens during host-to-host transmission and antibiotic treatment. In this study, we used a combination of experimental evolution and barcoded knockout libraries to examine how population bottlenecking affects the evolutionary dynamics of persistence. In accordance with existing hypotheses, small bottlenecks were found to restrict the adaptive potential of populations and result in more heterogeneous evolutionary outcomes. Evolutionary trajectories followed in small-bottlenecking regimes additionally suggest that the fitness landscape associated with persistence has a rugged topography, with distinct trajectories toward increased persistence that are accessible to evolving populations. Furthermore, sequencing data of evolved populations and knockout libraries after selection reveal various genes that are potentially involved in persistence, including previously known as well as novel targets. Together, our results do not only provide experimental evidence for evolutionary theories, but also contribute to a better understanding of the environmental and genetic factors that guide bacterial adaptation to antibiotic treatment.
AbstractResolving the genomic basis underlying phenotypic variations is a question of great importance in evolutionary biology. However, understanding how genotypes determine the phenotypes is still challenging. Centuries of artificial selective breeding for beauty and aggression resulted in a plethora of colors, long-fin varieties, and hyper-aggressive behavior in the air-breathing Siamese fighting fish (Betta splendens), supplying an excellent system for studying the genomic basis of phenotypic variations. Combining whole-genome sequencing, quantitative trait loci mapping, genome-wide association studies, and genome editing, we investigated the genomic basis of huge morphological variation in fins and striking differences in coloration in the fighting fish. Results revealed that the double tail, elephant ear, albino, and fin spot mutants each were determined by single major-effect loci. The elephant ear phenotype was likely related to differential expression of a potassium ion channel gene, kcnh8. The albinotic phenotype was likely linked to a cis-regulatory element acting on the mitfa gene and the double-tail mutant was suggested to be caused by a deletion in a zic1/zic4 coenhancer. Our data highlight that major loci and cis-regulatory elements play important roles in bringing about phenotypic innovations and establish Bettas as new powerful model to study the genomic basis of evolved changes.
AbstractLTR retrotransposons comprise a major component of the genomes of eukaryotes. On occasion, retrotransposon genes can be recruited by their hosts for diverse functions, a process formally referred to as co-option. However, a comprehensive picture of LTR retrotransposon gag gene co-option in eukaryotes is still lacking, with several documented cases exclusively involving Ty3/Gypsy retrotransposons in animals. Here, we use a phylogenomic approach to systemically unearth co-option of retrotransposon gag genes above the family level of taxonomy in 2,011 eukaryotes, namely co-option occurring during the deep evolution of eukaryotes. We identify a total of 14 independent gag gene co-option events across more than 740 eukaryote families, eight of which have not been reported previously. Among these retrotransposon gag gene co-option events, nine, four, and one involve gag genes of Ty3/Gypsy, Ty1/Copia, and Bel-Pao retrotransposons, respectively. Seven, four, and three co-option events occurred in animals, plants, and fungi, respectively. Interestingly, two co-option events took place in the early evolution of angiosperms. Both selective pressure and gene expression analyses further support that these co-opted gag genes might perform diverse cellular functions in their hosts, and several co-opted gag genes might be subject to positive selection. Taken together, our results provide a comprehensive picture of LTR retrotransposon gag gene co-option events that occurred during the deep evolution of eukaryotes and suggest paucity of LTR retrotransposon gag gene co-option during the deep evolution of eukaryotes.
AbstractThe relationships among the four major embryophyte lineages (mosses, liverworts, hornworts, vascular plants) and the timing of the origin of land plants are enigmatic problems in plant evolution. Here, we resolve the monophyly of bryophytes by improving taxon sampling of hornworts and eliminating the effect of synonymous substitutions. We then estimate the divergence time of crown embryophytes based on three fossil calibration strategies, and reveal that maximum calibration constraints have a major effect on estimating the time of origin of land plants. Moreover, comparison of priors and posteriors provides a guide for evaluating the optimal calibration strategy. By considering the reliability of fossil calibrations and the influences of molecular data, we estimate that land plants originated in the Precambrian (980–682 Ma), much older than widely recognized. Our study highlights the important contribution of molecular data when faced with contentious fossil evidence, and that fossil calibrations used in estimating the timescale of plant evolution require critical scrutiny.
AbstractMechanical properties such as substrate stiffness are a ubiquitous feature of a cell’s environment. Many types of animal cells exhibit canonical phenotypic plasticity when grown on substrates of differing stiffness, in vitro and in vivo. Whether such plasticity is a multivariate optimum due to hundreds of millions of years of animal evolution, or instead is a compromise between conflicting selective demands, is unknown. We addressed these questions by means of experimental evolution of populations of mouse fibroblasts propagated for approximately 90 cell generations on soft or stiff substrates. The ancestral cells grow twice as fast on stiff substrate as on soft substrate and exhibit the canonical phenotypic plasticity. Soft-selected lines derived from a genetically diverse ancestral population increased growth rate on soft substrate to the ancestral level on stiff substrate and evolved the same multivariate phenotype. The pattern of plasticity in the soft-selected lines was opposite of the ancestral pattern, suggesting that reverse plasticity underlies the observed rapid evolution. Conversely, growth rate and phenotypes did not change in selected lines derived from clonal cells. Overall, our results suggest that the changes were the result of genetic evolution and not phenotypic plasticity per se. Whole-transcriptome analysis revealed consistent differentiation between ancestral and soft-selected populations, and that both emergent phenotypes and gene expression tended to revert in the soft-selected lines. However, the selected populations appear to have achieved the same phenotypic outcome by means of at least two distinct transcriptional architectures related to mechanotransduction and proliferation.
AbstractAging and cancer are two interrelated processes, with aging being a major risk factor for the development of cancer. Parallel epigenetic alterations have been described for both, although differences, especially within the DNA hypomethylation scenario, have also been recently reported. Although many of these observations arise from the use of mouse models, there is a lack of systematic comparisons of human and mouse epigenetic patterns in the context of disease. However, such comparisons are significant as they allow to establish the extent to which some of the observed similarities or differences arise from pre-existing species-specific epigenetic traits. Here, we have used reduced representation bisulfite sequencing to profile the brain methylomes of young and old, tumoral and nontumoral brain samples from human and mouse. We first characterized the baseline epigenomic patterns of the species and subsequently focused on the DNA methylation alterations associated with cancer and aging. Next, we described the functional genomic and epigenomic context associated with the alterations, and finally, we integrated our data to study interspecies DNA methylation levels at orthologous CpG sites. Globally, we found considerable differences between the characteristics of DNA methylation alterations in cancer and aging in both species. Moreover, we describe robust evidence for the conservation of the specific cancer and aging epigenomic signatures in human and mouse. Our observations point toward the preservation of the functional consequences of these alterations at multiple levels of genomic regulation. Finally, our analyses reveal a role for the genomic context in explaining disease- and species-specific epigenetic traits.
AbstractThe activity of a gene newly integrated into a chromosome depends on the genomic context of the integration site. This “position effect” has been widely reported, although the other side of the coin, that is, how integration affects the local chromosomal environment, has remained largely unexplored, as have the mechanism and phenotypic consequences of this “externality” of the position effect. Here, we examined the transcriptome profiles of approximately 250 Saccharomyces cerevisiae strains, each with GFP integrated into a different locus of the wild-type strain. We found that in genomic regions enriched in essential genes, GFP expression tended to be lower, and the genes near the integration site tended to show greater expression reduction. Further joint analysis with public genome-wide histone modification profiles indicated that this effect was associated with H3K4me2. More importantly, we found that changes in the expression of neighboring genes, but not GFP expression, significantly altered the cellular growth rate. As a result, genomic loci that showed high GFP expression immediately after integration were associated with growth disadvantages caused by elevated expression of neighboring genes, ultimately leading to a low total yield of GFP in the long run. Our results were consistent with competition for transcriptional resources among neighboring genes and revealed a previously unappreciated facet of position effects. This study highlights the impact of position effects on the fate of exogenous gene integration and has significant implications for biological engineering and the pathology of viral integration into the host genome.
AbstractMicroRNAs (miRNAs) are important gene expression regulators implicated in many biological processes, but we lack a global understanding of how miRNA genes evolve and contribute to developmental canalization and phenotypic diversification. Whole-genome duplication events likely provide a substrate for species divergence and phenotypic change by increasing gene numbers and relaxing evolutionary pressures. To understand the consequences of genome duplication on miRNA evolution, we studied miRNA genes following the teleost genome duplication (TGD). Analysis of miRNA genes in four teleosts and in spotted gar, whose lineage diverged before the TGD, revealed that miRNA genes were retained in ohnologous pairs more frequently than protein-coding genes, and that gene losses occurred rapidly after the TGD. Genomic context influenced retention rates, with clustered miRNA genes retained more often than nonclustered miRNA genes and intergenic miRNA genes retained more frequently than intragenic miRNA genes, which often shared the evolutionary fate of their protein-coding host. Expression analyses revealed both conserved and divergent expression patterns across species in line with miRNA functions in phenotypic canalization and diversification, respectively. Finally, major strands of miRNA genes experienced stronger purifying selection, especially in their seeds and 3′-complementary regions, compared with minor strands, which nonetheless also displayed evolutionary features compatible with constrained function. This study provides the first genome-wide, multispecies analysis of the mechanisms influencing metazoan miRNA evolution after whole-genome duplication.
AbstractIt has been hypothesized that early enzymes are more promiscuous than their extant orthologs. Whether or not this hypothesis applies to the translation machinery, the oldest molecular machine of life, is not known. Efficient protein synthesis relies on a cascade of specific interactions between the ribosome and the translation factors. Here, using elongation factor-Tu (EF-Tu) as a model system, we have explored the evolution of ribosome specificity in translation factors. Employing presteady state fast kinetics using quench flow, we have quantitatively characterized the specificity of two sequence-reconstructed 1.3- to 3.3-Gy-old ancestral EF-Tus toward two unrelated bacterial ribosomes, mesophilic Escherichia coli and thermophilic Thermus thermophilus. Although the modern EF-Tus show clear preference for their respective ribosomes, the ancestral EF-Tus show similar specificity for diverse ribosomes. In addition, despite increase in the catalytic activity with temperature, the ribosome specificity of the thermophilic EF-Tus remains virtually unchanged. Our kinetic analysis thus suggests that EF-Tu proteins likely evolved from the catalytically promiscuous, “generalist” ancestors. Furthermore, compatibility of diverse ribosomes with the modern and ancestral EF-Tus suggests that the ribosomal core probably evolved before the diversification of the EF-Tus. This study thus provides important insights regarding the evolution of modern translation machinery.
AbstractAlternative synonymous codons are often used at unequal frequencies. Classically, studies of such codon usage bias (CUB) attempted to separate the impact of neutral from selective forces by assuming that deviations from a predicted neutral equilibrium capture selection. However, GC-biased gene conversion (gBGC) can also cause deviation from a neutral null. Alternatively, selection has been inferred from CUB in highly expressed genes, but the accuracy of this approach has not been extensively tested, and gBGC can interfere with such extrapolations (e.g., if expression and gene conversion rates covary). It is therefore critical to examine deviations from a mutational null in a species with no gBGC. To achieve this goal, we implement such an analysis in the highly AT rich genome of Dictyostelium discoideum, where we find no evidence of gBGC. We infer neutral CUB under mutational equilibrium to quantify “adaptive codon preference,” a nontautologous genome wide quantitative measure of the relative selection strength driving CUB. We observe signatures of purifying selection consistent with selection favoring adaptive codon preference. Preferred codons are not GC rich, underscoring the independence from gBGC. Expression-associated “preference” largely matches adaptive codon preference but does not wholly capture the influence of selection shaping patterns across all genes, suggesting selective constraints associated specifically with high expression. We observe patterns consistent with effects on mRNA translation and stability shaping adaptive codon preference. Thus, our approach to quantifying adaptive codon preference provides a framework for inferring the sources of selection that shape CUB across different contexts within the genome.
AbstractGenetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common among redundant gene pairs, but a predictive model of genetic redundancy incorporating a wide variety of features derived from accumulating omics and mutant phenotype data is yet to be established. In addition, the relative importance of these features for genetic redundancy remains largely unclear. Here, we establish machine learning models for predicting whether a gene pair is likely redundant or not in the model plant Arabidopsis thaliana based on six feature categories: functional annotations, evolutionary conservation including duplication patterns and mechanisms, epigenetic marks, protein properties including posttranslational modifications, gene expression, and gene network properties. The definition of redundancy, data transformations, feature subsets, and machine learning algorithms used significantly affected model performance based on holdout, testing phenotype data. Among the most important features in predicting gene pairs as redundant were having a paralog(s) from recent duplication events, annotation as a transcription factor, downregulation during stress conditions, and having similar expression patterns under stress conditions. We also explored the potential reasons underlying mispredictions and limitations of our studies. This genetic redundancy model sheds light on characteristics that may contribute to long-term maintenance of paralogs, and will ultimately allow for more targeted generation of functionally informative double mutants, advancing functional genomic studies.
AbstractThe main bacterial pathway for inserting proteins into the plasma membrane relies on the signal recognition particle (SRP), composed of the Ffh protein and an associated RNA component, and the SRP-docking protein FtsY. Eukaryotes use an equivalent system of archaeal origin to deliver proteins into the endoplasmic reticulum, whereas a bacteria-derived SRP and FtsY function in the plastid. Here we report on the presence of homologs of the bacterial Ffh and FtsY proteins in various unrelated plastid-lacking unicellular eukaryotes, namely Heterolobosea, Alveida, Goniomonas, and Hemimastigophora. The monophyly of novel eukaryotic Ffh and FtsY groups, predicted mitochondrial localization experimentally confirmed for Naegleria gruberi, and a strong alphaproteobacterial affinity of the Ffh group, collectively suggest that they constitute parts of an ancestral mitochondrial signal peptide-based protein-targeting system inherited from the last eukaryotic common ancestor, but lost from the majority of extant eukaryotes. The ability of putative signal peptides, predicted in a subset of mitochondrial-encoded N. gruberi proteins, to target a reporter fluorescent protein into the endoplasmic reticulum of Trypanosoma brucei, likely through their interaction with the cytosolic SRP, provided further support for this notion. We also illustrate that known mitochondrial ribosome-interacting proteins implicated in membrane protein targeting in opisthokonts (Mba1, Mdm38, and Mrx15) are broadly conserved in eukaryotes and nonredundant with the mitochondrial SRP system. Finally, we identified a novel mitochondrial protein (MAP67) present in diverse eukaryotes and related to the signal peptide-binding domain of Ffh, which may well be a hitherto unrecognized component of the mitochondrial membrane protein-targeting machinery.
AbstractAntibiotic resistance often generates defects in bacterial growth called fitness cost. Understanding the causes of this cost is of paramount importance, as it is one of the main determinants of the prevalence of resistances upon reducing antibiotics use. Here we show that the fitness costs of antibiotic resistance mutations that affect transcription and translation in Escherichia coli strongly correlate with DNA breaks, which are generated via transcription–translation uncoupling, increased formation of RNA–DNA hybrids (R-loops), and elevated replication–transcription conflicts. We also demonstrated that the mechanisms generating DNA breaks are repeatedly targeted by compensatory evolution, and that DNA breaks and the cost of resistance can be increased by targeting the RNase HI, which specifically degrades R-loops. We further show that the DNA damage and thus the fitness cost caused by lack of RNase HI function drive resistant clones to extinction in populations with high initial frequency of resistance, both in laboratory conditions and in a mouse model of gut colonization. Thus, RNase HI provides a target specific against resistant bacteria, which we validate using a repurposed drug. In summary, we revealed key mechanisms underlying the fitness cost of antibiotic resistance mutations that can be exploited to specifically eliminate resistant bacteria.
AbstractEvidence is accumulating that gene flow commonly occurs between recently diverged species, despite the existence of barriers to gene flow in their genomes. However, we still know little about what regions of the genome become barriers to gene flow and how such barriers form. Here, we compare genetic differentiation across the genomes of bumblebee species living in sympatry and allopatry to reveal the potential impact of gene flow during species divergence and uncover genetic barrier loci. We first compared the genomes of the alpine bumblebee Bombus sylvicola and a previously unidentified sister species living in sympatry in the Rocky Mountains, revealing prominent islands of elevated genetic divergence in the genome that colocalize with centromeres and regions of low recombination. This same pattern is observed between the genomes of another pair of closely related species living in allopatry (B. bifarius and B. vancouverensis). Strikingly however, the genomic islands exhibit significantly elevated absolute divergence (dXY) in the sympatric, but not the allopatric, comparison indicating that they contain loci that have acted as barriers to historical gene flow in sympatry. Our results suggest that intrinsic barriers to gene flow between species may often accumulate in regions of low recombination and near centromeres through processes such as genetic hitchhiking, and that divergence in these regions is accentuated in the presence of gene flow.
AbstractEvolutionary dynamics at the population level play a central role in creating the diversity of life on our planet. In this study, we sought to understand the origins of such population-level variation in mating systems and defensive acylsugar chemistry in Solanum habrochaites—a wild tomato species found in diverse Andean habitats in Ecuador and Peru. Using Restriction-site-Associated-DNA-Sequencing (RAD-seq) of 50 S. habrochaites accessions, we identified eight population clusters generated via isolation and hybridization dynamics of 4–6 ancestral populations. Detailed characterization of mating systems of these clusters revealed emergence of multiple self-compatible (SC) groups from progenitor self-incompatible populations in the northern part of the species range. Emergence of these SC groups was also associated with fixation of deleterious alleles inactivating acylsugar acetylation. The Amotape-Huancabamba Zone—a geographical landmark in the Andes with high endemism and isolated microhabitats—was identified as a major driver of differentiation in the northern species range, whereas large geographical distances contributed to population structure and evolution of a novel SC group in the central and southern parts of the range, where the species was also inferred to have originated. Findings presented here highlight the role of the diverse ecogeography of Peru and Ecuador in generating population differentiation, and enhance our understanding of the microevolutionary processes that create biological diversity.
AbstractAccurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology—evolutionary relatedness—is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit—from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses.
AbstractNative cattle breeds represent an important cultural heritage. They are a reservoir of genetic variation useful for properly responding to agriculture needs in the light of ongoing climate changes. Evolutionary processes that occur in response to extreme environmental conditions could also be better understood using adapted local populations. Herein, different evolutionary histories of the world northernmost native cattle breeds from Russia were investigated. They highlighted Kholmogory as a typical taurine cattle, whereas Yakut cattle separated from European taurines approximately 5,000 years ago and contain numerous ancestral and some novel genetic variants allowing their adaptation to harsh conditions of living above the Polar Circle. Scans for selection signatures pointed to several common gene pathways related to adaptation to harsh climates in both breeds. But genes affected by selection from these pathways were mostly different. A Yakut cattle breed-specific missense mutation in a highly conserved NRAP gene represents a unique example of a young amino acid residue convergent change shared with at least 16 species of hibernating/cold-adapted mammals from six distinct phylogenetic orders. This suggests a convergent evolution event along the mammalian phylogenetic tree and fast fixation in a single isolated cattle population exposed to a harsh climate.
AbstractUnderstanding and predicting how amino acid substitutions affect proteins are keys to our basic understanding of protein function and evolution. Amino acid changes may affect protein function in a number of ways including direct perturbations of activity or indirect effects on protein folding and stability. We have analyzed 6,749 experimentally determined variant effects from multiplexed assays on abundance and activity in two proteins (NUDT15 and PTEN) to quantify these effects and find that a third of the variants cause loss of function, and about half of loss-of-function variants also have low cellular abundance. We analyze the structural and mechanistic origins of loss of function and use the experimental data to find residues important for enzymatic activity. We performed computational analyses of protein stability and evolutionary conservation and show how we may predict positions where variants cause loss of activity or abundance. In this way, our results link thermodynamic stability and evolutionary conservation to experimental studies of different properties of protein fitness landscapes.
AbstractThe persistence of plasmids in bacterial populations represents a puzzling evolutionary problem with serious clinical implications due to their role in the ongoing antibiotic resistance crisis. Recently, major advancements have been made toward resolving this “plasmid paradox” but mainly in a nonclinical context. Here, we propose an additional explanation for the maintenance of multidrug‐resistance plasmids in clinical Escherichia coli strains. After coevolving two multidrug‐resistance plasmids encoding resistance to last resort carbapenems with an extraintestinal pathogenic E. coli strain, we observed that chromosomal media adaptive mutations in the global regulatory systems CCR (carbon catabolite repression) and ArcAB (aerobic respiration control) pleiotropically improved the maintenance of both plasmids. Mechanistically, a net downregulation of plasmid gene expression reduced the fitness cost. Our results suggest that global chromosomal transcriptional rewiring during bacterial niche adaptation may facilitate plasmid maintenance.
AbstractThe Sox family of transcription factors regulates many processes during metazoan development, including stem cell maintenance and nervous system specification. Characterizing the repertoires and roles of these genes can therefore provide important insights into animal evolution and development. We further characterized the Sox repertoires of several arachnid species with and without an ancestral whole-genome duplication and compared their expression between the spider Parasteatoda tepidariorum and the harvestman Phalangium opilio. We found that most Sox families have been retained as ohnologs after whole-genome duplication and evidence for potential subfunctionalization and/or neofunctionalization events. Our results also suggest that Sox21b-1 likely regulated segmentation ancestrally in arachnids, playing a similar role to the closely related SoxB gene, Dichaete, in insects. We previously showed that Sox21b-1 is required for the simultaneous formation of prosomal segments and sequential addition of opisthosomal segments in P. tepidariorum. We studied the expression and function of Sox21b-1 further in this spider and found that although this gene regulates the generation of both prosomal and opisthosomal segments, it plays different roles in the formation of these tagmata reflecting their contrasting modes of segmentation and deployment of gene regulatory networks with different architectures.
AbstractUnderstanding how genes interact is a central challenge in biology. Experimental evolution provides a useful, but underutilized, tool for identifying genetic interactions, particularly those that involve non-loss-of-function mutations or mutations in essential genes. We previously identified a strong positive genetic interaction between specific mutations in KEL1 (P344T) and HSL7 (A695fs) that arose in an experimentally evolved Saccharomyces cerevisiae population. Because this genetic interaction is not phenocopied by gene deletion, it was previously unknown. Using “evolutionary replay” experiments, we identified additional mutations that have positive genetic interactions with the kel1-P344T mutation. We replayed the evolution of this population 672 times from six timepoints. We identified 30 populations where the kel1-P344T mutation reached high frequency. We performed whole-genome sequencing on these populations to identify genes in which mutations arose specifically in the kel1-P344T background. We reconstructed mutations in the ancestral and kel1-P344T backgrounds to validate positive genetic interactions. We identify several genetic interactors with KEL1, we validate these interactions by reconstruction experiments, and we show these interactions are not recapitulated by loss-of-function mutations. Our results demonstrate the power of experimental evolution to identify genetic interactions that are positive, allele specific, and not readily detected by other methods, shedding light on an underexplored region of the yeast genetic interaction network.
AbstractThe cichlids of Lake Victoria are a textbook example of adaptive radiation, as >500 endemic species arose in just 14,600 years. The degree of genetic differentiation among species is very low due to the short period of time after the radiation, which allows us to ascertain highly differentiated genes that are strong candidates for driving speciation and adaptation. Previous studies have revealed the critical contribution of vision to speciation by showing the existence of highly differentiated alleles in the visual opsin gene among species with different habitat depths. In contrast, the processes of species-specific adaptation to different ecological backgrounds remain to be investigated. Here, we used genome-wide comparative analyses of three species of Lake Victoria cichlids that inhabit different environments—Haplochromis chilotes, H. sauvagei, and Lithochromis rufus—to elucidate the processes of adaptation by estimating population history and by searching for candidate genes that contribute to adaptation. The patterns of changes in population size were quite distinct among the species according to their habitats. We identified many novel adaptive candidate genes, some of which had surprisingly long divergent haplotypes between species, thus showing the footprint of selective sweep events. Molecular phylogenetic analyses revealed that a large fraction of the allelic diversity among Lake Victoria cichlids was derived from standing genetic variation that originated before the adaptive radiation. Our analyses uncovered the processes of species-specific adaptation of Lake Victoria cichlids and the complexity of the genomic substrate that facilitated this adaptation.
AbstractCichlid fishes exhibit rapid, extensive, and replicative adaptive radiation in feeding morphology. Plasticity of the cichlid jaw has also been well documented, and this combination of iterative evolution and developmental plasticity has led to the proposition that the cichlid feeding apparatus represents a morphological “flexible stem.” Under this scenario, the fixation of environmentally sensitive genetic variation drives evolutionary divergence along a phenotypic axis established by the initial plastic response. Thus, if plasticity is predictable then so too should be the evolutionary response. We set out to explore these ideas at the molecular level by identifying genes that underlie both the evolution and plasticity of the cichlid jaw. As a first step, we fine-mapped an environment-specific quantitative trait loci for lower jaw shape in cichlids, and identified a nonsynonymous mutation in the ciliary rootlet coiled-coil 2 (crocc2), which encodes a major structural component of the primary cilium. Given that primary cilia play key roles in skeletal mechanosensing, we reasoned that this gene may confer its effects by regulating the sensitivity of bone to respond to mechanical input. Using both cichlids and zebrafish, we confirmed this prediction through a series of experiments targeting multiple levels of biological organization. Taken together, our results implicate crocc2 as a novel mediator of bone formation, plasticity, and evolution.
AbstractSpatially explicit phylogeographic analyses can be performed with an inference framework that employs relaxed random walks to reconstruct phylogenetic dispersal histories in continuous space. This core model was first implemented 10 years ago and has opened up new opportunities in the field of phylodynamics, allowing researchers to map and analyze the spatial dissemination of rapidly evolving pathogens. We here provide a detailed and step-by-step guide on how to set up, run, and interpret continuous phylogeographic analyses using the programs BEAUti, BEAST, Tracer, and TreeAnnotator.
Evolution is often portrayed as a tree, with new species branching off from existing lineages, never again to meet. The truth however is often much messier. In the case of adaptive radiation, in which species diversify rapidly to fill different ecological niches, it can be difficult to resolve relationships, and the phylogeny (i.e., evolutionary tree) may look more like a web than a tree. This is because lineages may continue to interbreed as new species are established, and/or they may diverge and then rehybridize, resulting in genetically mixed populations (known as admixture). Even after species diverge, the introduction of genes from one species to another (known as introgression) can occur. All of this results in a network of related species, rather than a simple tree. The extent to which these processes occur and their evolutionary and genomic impacts are not well understood, partially due to the “tree-like” assumptions of the models that are used to construct phylogenies. In a new study in Genome Biology and Evolution titled “Rampant genome-wide admixture across the Heliconius radiation,” Krzysztof Kozak of the University of Cambridge and colleagues demonstrate the key role that interspecific gene flow played in the continent-wide adaptive radiation of the Heliconius butterflies (Kozak et al. 2021). This study adds to the rich literature on Heliconius, a genus that provided some of the earliest evidence for the theory of evolution thanks to their distinctive wing patterns and colors, which help warn predators of their toxic nature.
AbstractThe painted lady butterfly, Vanessa cardui, has the longest migration routes, the widest hostplant diversity, and one of the most complex wing patterns of any insect. Due to minimal culturing requirements, easily characterized wing pattern elements, and technical feasibility of CRISPR/Cas9 genome editing, V. cardui is emerging as a functional genomics model for diverse research programs. Here, we report a high-quality, annotated genome assembly of the V. cardui genome, generated using 84× coverage of PacBio long-read data, which we assembled into 205 contigs with a total length of 425.4 Mb (N50 = 10.3 Mb). The genome was very complete (single-copy complete Benchmarking Universal Single-Copy Orthologs [BUSCO] 97%), with contigs assembled into presumptive chromosomes using synteny analyses. Our annotation used embryonic, larval, and pupal transcriptomes, and 20 transcriptomes across five different wing developmental stages. Gene annotations showed a high level of accuracy and completeness, with 14,437 predicted protein-coding genes. This annotated genome assembly constitutes an important resource for diverse functional genomic studies ranging from the developmental genetic basis of butterfly color pattern, to coevolution with diverse hostplants.
AbstractEukaryotic DNA replication begins at genomic loci termed origins, which are bound by the origin recognition complex (ORC). Although ORC is conserved across species, the sequence composition of origins is more varied. In the budding yeast Saccharomyces cerevisiae, the ORC-binding motif consists of an A/T-rich 17 bp “extended ACS” sequence adjacent to a B1 element composed of two 3-bp motifs. Similar sequences occur at origins in closely related species, but it is not clear when this type of replication origin arose and whether it predated a whole-genome duplication that occurred around 100 Ma in the budding yeast lineage. To address these questions, we identified the ORC-binding sequences in the nonduplicated species Torulaspora delbrueckii. We used chromatin immunoprecipitation followed by sequencing and identified 190 ORC-binding sites distributed across the eight T. delbrueckii chromosomes. Using these sites, we identified an ORC-binding motif that is nearly identical to the known motif in S. cerevisiae. We also found that the T. delbrueckii ORC-binding sites function as origins in T. delbrueckii when cloned onto a plasmid and that the motif is required for plasmid replication. Finally, we compared an S. cerevisiae origin with two T. delbrueckii ORC-binding sites and found that they conferred similar stabilities to a plasmid. These results reveal that the ORC-binding motif arose prior to the whole-genome duplication and has been maintained for over 100 Myr.
AbstractIn butterflies and moths, which exhibit highly variable sex determination mechanisms, the homogametic Z chromosome is deeply conserved and is featured in many genome assemblies. The evolution and origin of the female W sex chromosome, however, remains mostly unknown. Previous studies have proposed that a ZZ/Z0 sex determination system is ancestral to Lepidoptera, and that W chromosomes may originate from sex-linked B chromosomes. Here, we sequence and assemble the female Dryas iulia genome into 32 highly contiguous ordered and oriented chromosomes, including the Z and W sex chromosomes. We then use sex-specific Hi-C, ATAC-seq, PRO-seq, and whole-genome DNA sequence data sets to test if features of the D. iulia W chromosome are consistent with a hypothesized B chromosome origin. We show that the putative W chromosome displays female-associated DNA sequence, gene expression, and chromatin accessibility to confirm the sex-linked function of the W sequence. In contrast with expectations from studies of homologous sex chromosomes, highly repetitive DNA content on the W chromosome, the sole presence of domesticated repetitive elements in functional DNA, and lack of sequence homology with the Z chromosome or autosomes is most consistent with a B chromosome origin for the W, although it remains challenging to rule out extensive sequence divergence. Synteny analysis of the D. iulia W chromosome with other female lepidopteran genome assemblies shows no homology between W chromosomes and suggests multiple, independent origins of the W chromosome from a B chromosome likely occurred in butterflies.
AbstractColeoptera is the most species-rich insect order, yet is currently underrepresented in genomic databases. An assembly was generated for ca. 1.7 Gb genome of the leaf beetle Gonioctena quinquepunctata by first assembling long-sequence reads (Oxford Nanopore; ± 27-fold coverage) and subsequently polishing the resulting assembly with short sequence reads (Illumina; ± 85-fold coverage). The unusually large size (most Coleoptera species are associated with a reported size below 1 Gb) was at least partially attributed to the presence of a large fraction of repeated elements (73.8%). The final assembly was characterized by an N50 length of 432 kb and a BUSCO score of 95.5%. The heterozygosity rate was ± 0.6%. Automated genome annotation informed by RNA-Seq resulted in 40,568 predicted proteins, which is much larger than the typical range 17,000–23,000 predicted for other Coleoptera. However, no evidence of a genome duplication was detected. This new reference genome will contribute to our understanding of genetic variation in the Coleoptera. Among others, it will also allow exploring reproductive barriers between species, investigating introgression in the nuclear genome, and identifying genes involved in resistance to extreme climate conditions.
AbstractThe Piwi-interacting RNA (piRNA) pathway is a genomic defense system that controls the movement of transposable elements (TEs) through transcriptional and post-transcriptional silencing. Although TE defense is critical to ensuring germline genome integrity, it is equally critical that the piRNA pathway avoids autoimmunity in the form of silencing host genes. Ongoing cycles of selection for expanded control of invading TEs, followed by selection for increased specificity to reduce impacts on host genes, are proposed to explain the frequent signatures of adaptive evolution among piRNA pathway proteins. However, empirical tests of this model remain limited, particularly with regards to selection against genomic autoimmunity.I examined three adaptively evolving piRNA proteins, Rhino, Deadlock, and Cutoff, for evidence of interspecific divergence in autoimmunity between Drosophila melanogaster and Drosophila simulans. I tested a key prediction of the autoimmunity hypothesis that foreign heterospecific piRNA proteins will exhibit enhanced autoimmunity, due to the absence of historical selection against off-target effects. Consistent with this prediction, full-length D. simulans Cutoff, as well as the D. simulans hinge and chromo domains of Rhino, exhibit expanded regulation of D. melanogaster genes. I further demonstrate that this autoimmunity is dependent on known incompatibilities between D. simulans proteins or domains and their interacting partners in D. melanogaster. My observations reveal that the same protein–protein interaction domains that are interfaces of adaptive evolution in Rhino and Cutoff also determine their potential for autoimmunity.
AbstractTransposable elements (TEs) are a major source of genetic and regulatory variation in their host genome and are consequently thought to play important roles in evolution. Many fungal and oomycete plant pathogens have evolved dynamic and TE-rich genomic regions containing genes that are implicated in host colonization and adaptation. TEs embedded in these regions have typically been thought to accelerate the evolution of these genomic compartments, but little is known about their dynamics in strains that harbor them. Here, we used whole-genome sequencing data of 42 strains of the fungal plant pathogen Verticillium dahliae to systematically identify polymorphic TEs that may be implicated in genomic as well as in gene expression variation. We identified 2,523 TE polymorphisms and characterize a subset of 8% of the TEs as polymorphic elements that are evolutionary younger, less methylated, and more highly expressed when compared with the remaining 92% of the total TE complement. As expected, the polyrmorphic TEs are enriched in the adaptive genomic regions. Besides, we observed an association of polymorphic TEs with pathogenicity-related genes that localize nearby and that display high expression levels. Collectively, our analyses demonstrate that TE dynamics in V. dahliae contributes to genomic variation, correlates with expression of pathogenicity-related genes, and potentially impacts the evolution of adaptive genomic regions.
AbstractOrganellar genomes serve as useful models for genome evolution and contain some of the most widely used phylogenetic markers, but they are poorly characterized in many lineages. Here, we report 20 novel mitochondrial genomes and 16 novel plastid genomes from the brown algae. We focused our efforts on the orders Chordales and Laminariales but also provide the first plastid genomes (plastomes) from Desmarestiales and Sphacelariales, the first mitochondrial genome (mitome) from Ralfsiales and a nearly complete mitome from Sphacelariales. We then compared gene content, sequence evolution rates, shifts in genome structural arrangements, and intron distributions across lineages. We confirm that gene content is largely conserved in both organellar genomes across the brown algal tree of life, with few cases of gene gain or loss. We further show that substitution rates are generally lower in plastid than mitochondrial genes, but plastomes are more variable in gene arrangement, as mitomes tend to be colinear even among distantly related lineages (with exceptions). Patterns of intron distribution across organellar genomes are complex. In particular, the mitomes of several laminarialean species possess group II introns that have T7-like ORFs, found previously only in mitochondrial genomes of Pylaiella spp. (Ectocarpales). The distribution of these mitochondrial introns is inconsistent with vertical transmission and likely reflects invasion by horizontal gene transfer between lineages. In the most extreme case, the mitome of Hedophyllum nigripes is ∼40% larger than the mitomes of close relatives because of these introns. Our results provide substantial insight into organellar evolution across the brown algae.
AbstractThe membrane trafficking is an essential process of eukaryotic cells, as it manages vesicular trafficking toward different parts of the cell. In this process, membrane fusions between vesicles and target membranes are mediated by several factors, including the multisubunit tethering complexes. One type of multisubunit tethering complex, the complexes associated with tethering containing helical rods (CATCHR), encompasses the exocyst, COG, GARP, and DSL1 complexes. The CATCHR share similarities at sequence, structural, and protein-complex organization level although their actual relationship is still poorly understood. In this study, we have re-evaluated CATCHR at different levels, demonstrating that gene duplications followed by neofunctionalization, were key for their origin. Our results, reveals that there are specific homology relationships and parallelism within and between the CATCHR suggesting that most of these complexes are composed by modular tetramers of four different kinds of proteins, three of them having a clear common origin. The extension of CATCHR family occurred concomitantly with the protein family expansions of their molecular partners, such as small GTPases and SNAREs, among others, and likely providing functional specificity. Our results provide novel insights into the structural organization and mechanism of action of CATCHR, with implications for the evolution of the endomembrane system of eukaryotes and promoting CATCHR as ideal candidates to study the evolution of multiprotein complexes.
AbstractMarasmius oreades is a basidiomycete fungus that grows in so called “fairy rings,” which are circular, underground mycelia common in lawns across temperate areas of the world. Fairy rings can be thought of as natural, long-term evolutionary experiments. As each ring has a common origin and expands radially outwards over many years, different sectors will independently accumulate mutations during growth. The genotype can be followed to the next generation, as mushrooms producing the sexual spores are formed seasonally at the edge of the ring. Here, we present new genomic data from 95 single-spore isolates of the species, which we used to construct a genetic linkage map and an updated version of the genome assembly. The 44-Mb assembly was anchored to 11 linkage groups, producing chromosome-length scaffolds. Gene annotation revealed 13,891 genes, 55% of which contained a pfam domain. The repetitive fraction of the genome was 22%, and dominated by retrotransposons and DNA elements of the KDZ and Plavaka groups. The level of assembly contiguity we present is so far rare in mushroom-forming fungi, and we expect studies of genomics, transposons, phylogenetics, and evolution to be facilitated by the data we present here of the iconic fairy-ring mushroom.
AbstractAmniotes possess astonishing variability in sex determination ranging from environmental sex determination (ESD) to genotypic sex determination (GSD) with highly differentiated sex chromosomes. Geckos are one of the few amniote groups with substantial variability in sex determination. What makes them special in this respect? We hypothesized that the extraordinary variability of sex determination in geckos can be explained by two alternatives: 1) unusual lability of sex determination, predicting that the current GSD systems were recently formed and are prone to turnovers; and 2) independent transitions from the ancestral ESD to later stable GSD, which assumes that geckos possessed ancestrally ESD, but once sex chromosomes emerged, they remain stable in the long term. Here, based on genomic data, we document that the differentiated ZZ/ZW sex chromosomes evolved within carphodactylid geckos independently from other gekkotan lineages and remained stable in the genera Nephrurus, Underwoodisaurus, and Saltuarius for at least 15 Myr and potentially up to 45 Myr. These results together with evidence for the stability of sex chromosomes in other gekkotan lineages support more our second hypothesis suggesting that geckos do not dramatically differ from the evolutionary transitions in sex determination observed in the majority of the amniote lineages.
AbstractIn a wide range of taxa, proteins encoded by mitochondrial genomes are involved in adaptation to lifestyle that requires oxygen starvation or elevation of metabolism rate. It remains poorly understood to what extent adaptation to similar conditions is associated with parallel changes in these proteins. We search for a genetic signal of parallel or convergent evolution in recurrent molecular adaptation to high altitude, migration, diving, wintering, unusual flight abilities, or loss of flight in mitochondrial genomes of birds. Developing on previous work, we design an approach for the detection of recurrent coincident changes in genotype and phenotype, indicative of an association between the two. We describe a number of candidate sites involved in recurrent adaptation in ND genes. However, we find that the majority of convergence events can be explained by random coincidences without invoking adaptation.
AbstractBoth symbiotic and pathogenic bacteria in the family Coxiellaceae cause morbidity and mortality in humans and animals. For instance, Coxiella-like endosymbionts (CLEs) improve the reproductive success of ticks—a major disease vector, while Coxiella burnetii causes human Q fever, and uncharacterized coxiellae infect both animals and humans. To better understand the evolution of pathogenesis and symbiosis in this group of intracellular bacteria, we sequenced the genome of a CLE present in the soft tick Ornithodoros amblus (CLEOA) and compared it to the genomes of other bacteria in the order Legionellales. Our analyses confirmed that CLEOA is more closely related to C. burnetii, the human pathogen, than to CLEs in hard ticks, and showed that most clades of CLEs contain both endosymbionts and pathogens, indicating that several CLE lineages have evolved independently from pathogenic Coxiella. We also determined that the last common ancestorof CLEOA and C. burnetii was equipped to infect macrophages and that even though horizontal gene transfer (HGT) contributed significantly to the evolution of C. burnetii, most acquisition events occurred primarily in ancestors predating the CLEOA–C. burnetii divergence. These discoveries clarify the evolution of C. burnetii, which previously was assumed to have emerged when an avirulent tick endosymbiont recently gained virulence factors via HGT. Finally, we identified several metabolic pathways, including heme biosynthesis, that are likely critical to the intracellular growth of the human pathogen but not the tick symbiont, and show that the use of heme analog is a promising approach to controlling C. burnetii infections.
AbstractWolbachia is a widespread, vertically transmitted bacterial endosymbiont known for manipulating arthropod reproduction. Its most common form of reproductive manipulation is cytoplasmic incompatibility (CI), observed when a modification in the male sperm leads to embryonic lethality unless a compatible rescue factor is present in the female egg. CI attracts scientific attention due to its implications for host speciation and in the use of Wolbachia for controlling vector-borne diseases. However, our understanding of CI is complicated by the complexity of the phenotype, whose expression depends on both symbiont and host factors. In the present study, we perform a comparative analysis of nine complete Wolbachia genomes with known CI properties in the same genetic host background, Drosophila simulans STC. We describe genetic differences between closely related strains and uncover evidence that phages and other mobile elements contribute to the rapid evolution of both genomes and phenotypes of Wolbachia. Additionally, we identify both known and novel genes associated with the modification and rescue functions of CI. We combine our observations with published phenotypic information and discuss how variability in cif genes, novel CI-associated genes, and Wolbachia titer might contribute to poorly understood aspects of CI such as strength and bidirectional incompatibility. We speculate that high titer CI strains could be better at invading new hosts already infected with a CI Wolbachia, due to a higher rescue potential, and suggest that titer might thus be a relevant parameter to consider for future strategies using CI Wolbachia in biological control.
AbstractAdaptive radiations are characterized by the diversification and ecological differentiation of species, and replicated cases of this process provide natural experiments for understanding the repeatability and pace of molecular evolution. During adaptive radiation, genes related to ecological specialization may be subject to recurrent positive directional selection. However, it is not clear to what extent patterns of lineage-specific ecological specialization (including phenotypic convergence) are correlated with shared signatures of molecular evolution. To test this, we sequenced whole exomes from a phylogenetically dispersed sample of 38 murine rodent species, a group characterized by multiple, nested adaptive radiations comprising extensive ecological and phenotypic diversity. We found that genes associated with immunity, reproduction, diet, digestion, and taste have been subject to pervasive positive selection during the diversification of murine rodents. We also found a significant correlation between genome-wide positive selection and dietary specialization, with a higher proportion of positively selected codon sites in derived dietary forms (i.e., carnivores and herbivores) than in ancestral forms (i.e., omnivores). Despite striking convergent evolution of skull morphology and dentition in two distantly related worm-eating specialists, we did not detect more genes with shared signatures of positive or relaxed selection than in a nonconvergent species comparison. Although a small number of the genes we detected can be incidentally linked to craniofacial morphology or diet, protein-coding regions are unlikely to be the primary genetic basis of this complex convergent phenotype. Our results suggest a link between positive selection and derived ecological phenotypes, and highlight specific genes and general functional categories that may have played an integral role in the extensive and rapid diversification of murine rodents.
AbstractColor and color pattern are critical for animal camouflage, reproduction, and defense. Few studies, however, have attempted to identify candidate genes for color and color pattern in squamate reptiles, a colorful group with over 10,000 species. We used comparative transcriptomic analyses between white, orange, and yellow skin in a color-polymorphic species of anole lizard to 1) identify candidate color and color-pattern genes in squamates and 2) assess if squamates share an underlying genetic basis for color and color pattern variation with other vertebrates. Squamates have three types of chromatophores that determine color pattern: guanine-filled iridophores, carotenoid- or pteridine-filled xanthophores/erythrophores, and melanin-filled melanophores. We identified 13 best candidate squamate color and color-pattern genes shared with other vertebrates: six genes linked to pigment synthesis pathways, and seven genes linked to chromatophore development and maintenance. In comparisons of expression profiles between pigment-rich and white skin, pigment-rich skin upregulated the pteridine pathway as well as xanthophore/erythrophore development and maintenance genes; in comparisons between orange and yellow skin, orange skin upregulated the pteridine and carotenoid pathways as well as melanophore maintenance genes. Our results corroborate the predictions that squamates can produce similar colors using distinct color-reflecting molecules, and that both color and color-pattern genes are likely conserved across vertebrates. Furthermore, this study provides a concise list of candidate genes for future functional verification, representing a first step in determining the genetic basis of color and color pattern in anoles.
AbstractModern accounts of eukaryogenesis entail an endosymbiotic encounter between an archaeal host and a proteobacterial endosymbiont, with subsequent evolution giving rise to a unicell possessing a single nucleus and mitochondria. The mononucleate state of the last eukaryotic common ancestor (LECA) is seldom, if ever, questioned, even though cells harboring multiple (syncytia, coenocytes, and polykaryons) are surprisingly common across eukaryotic supergroups. Here, we present a survey of multinucleated forms. Ancestral character state reconstruction for representatives of 106 eukaryotic taxa using 16 different possible roots and supergroup sister relationships, indicate that LECA, in addition to being mitochondriate, sexual, and meiotic, was multinucleate. LECA exhibited closed mitosis, which is the rule for modern syncytial forms, shedding light on the mechanics of its chromosome segregation. A simple mathematical model shows that within LECA’s multinucleate cytosol, relationships among mitochondria and nuclei were neither one-to-one, nor one-to-many, but many-to-many, placing mitonuclear interactions and cytonuclear compatibility at the evolutionary base of eukaryotic cell origin. Within a syncytium, individual nuclei and individual mitochondria function as the initial lower-level evolutionary units of selection, as opposed to individual cells, during eukaryogenesis. Nuclei within a syncytium rescue each other’s lethal mutations, thereby postponing selection for viable nuclei and cytonuclear compatibility to the generation of spores, buffering transitional bottlenecks at eukaryogenesis. The prokaryote-to-eukaryote transition is traditionally thought to have left no intermediates, yet if eukaryogenesis proceeded via a syncytial common ancestor, intermediate forms have persisted to the present throughout the eukaryotic tree as syncytia but have so far gone unrecognized.
AbstractThe chlorophyte green algae (Chlorophyta) are species-rich ancient groups ubiquitous in various habitats with high cytological diversity, ranging from microscopic to macroscopic organisms. However, the deep phylogeny within core Chlorophyta remains unresolved, in part due to the relatively sparse taxon and gene sampling in previous studies. Here we contribute new transcriptomic data and reconstruct phylogenetic relationships of core Chlorophyta based on four large data sets up to 2,698 genes of 70 species, representing 80% of extant orders. The impacts of outgroup choice, missing data, bootstrap-support cutoffs, and model misspecification in phylogenetic inference of core Chlorophyta are examined. The species tree topologies of core Chlorophyta from different analyses are highly congruent, with strong supports at many relationships (e.g., the Bryopsidales and the Scotinosphaerales-Dasycladales clade). The monophyly of Chlorophyceae and of Trebouxiophyceae as well as the uncertain placement of Chlorodendrophyceae and Pedinophyceae corroborate results from previous studies. The reconstruction of ancestral scenarios illustrates the evolution of the freshwater-sea and microscopic–macroscopic transition in the Ulvophyceae, and the transformation of unicellular→colonial→multicellular in the chlorophyte green algae. In addition, we provided new evidence that serine is encoded by both canonical codons and noncanonical TAG code in Scotinosphaerales, and stop-to-sense codon reassignment in the Ulvophyceae has originated independently at least three times. Our robust phylogenetic framework of core Chlorophyta unveils the evolutionary history of phycoplast, cyto-morphology, and noncanonical genetic codes in chlorophyte green algae.
AbstractHow frequent is gene flow between species? The pattern of evolution is typically portrayed as a phylogenetic tree, yet gene flow between good species may be an important mechanism in diversification, spreading adaptive traits and leading to a complex pattern of phylogenetic incongruence. This process has thus far been studied mainly among a few closely related species, or in geographically restricted areas such as islands, but not on the scale of a continental radiation. Using a genomic representation of 40 out of 47 species in the genus, we demonstrate that admixture has played a role throughout the evolution of the charismatic Neotropical butterflies Heliconius. Modeling of phylogenetic networks based on the exome uncovers up to 13 instances of interspecific gene flow. Admixture is detected among the relatives of Heliconius erato, as well as between the ancient lineages leading to modern clades. Interspecific gene flow played a role throughout the evolution of the genus, although the process has been most frequent in the clade of Heliconius melpomene and relatives. We identify Heliconius hecalesia and relatives as putative hybrids, including new evidence for introgression at the loci controlling the mimetic wing patterns. Models accounting for interspecific gene flow yield a more complete picture of the radiation as a network, which will improve our ability to study trait evolution in a realistic comparative framework.
AbstractHow do species respond or adapt to environmental changes? The answer to this depends partly on mitochondrial epigenetics and genetics, new players in promoting adaptation to both short- and long-term environmental changes. In this review, we explore how mitochondrial epigenetics and genetics mechanisms, such as mtDNA methylation, mtDNA-derived noncoding RNAs, micropeptides, mtDNA mutations, and adaptations, can contribute to animal plasticity and adaptation. We also briefly discuss the challenges in assessing mtDNA adaptive evolution. In sum, this review covers new advances in the field of mitochondrial genomics, many of which are still controversial, and discusses processes still somewhat obscure, and some of which are still quite speculative and require further robust experimentation.
AbstractThe members of the globin superfamily are a classical model system to investigate gene evolution and their fates as well as the diversity of protein function. One of the best-known globins is myoglobin (Mb), which is mainly expressed in heart muscle and transports oxygen from the sarcolemma to the mitochondria. Most vertebrates harbor a single copy of the myoglobin gene, but some fish species have multiple myoglobin genes. Phylogenetic analyses indicate an independent emergence of multiple myoglobin genes, whereby the origin is mostly the last common ancestor of each order. By analyzing different transcriptome data sets, we found at least 15 multiple myoglobin genes in the polypterid gray bichir (Polypterus senegalus) and reedfish (Erpetoichthys calabaricus). In reedfish, the myoglobin genes are expressed in a broad range of tissues but show very different expression values. In contrast, the Mb genes of the gray bichir show a rather scattered expression pattern; only a few Mb genes were found expressed in the analyzed tissues. Both, gray bichir and reedfish possess lungs which enable them to inhabit shallow and swampy waters throughout tropical Africa with frequently fluctuating and low oxygen concentrations. The myoglobin repertoire probably reflects the molecular adaptation to these conditions. The sequence divergence, the substitution rate, and the different expression pattern of multiple myoglobin genes in gray bichir and reedfish imply different functions, probably through sub- and neofunctionalization during evolution.
AbstractHeliconius butterflies (Lepidoptera: Nymphalidae) are a group of 48 neotropical species widely studied in evolutionary research. Despite the wealth of genomic data generated in past years, chromosomal level genome assemblies currently exist for only two species, Heliconius melpomene and Heliconius erato, each a representative of one of the two major clades of the genus. Here, we use these reference genomes to improve the contiguity of previously published draft genome assemblies of 16 Heliconius species. Using a reference-assisted scaffolding approach, we place and order the scaffolds of these genomes onto chromosomes, resulting in 95.7–99.9% of their genomes anchored to chromosomes. Genome sizes are somewhat variable among species (270–422 Mb) and in one small group of species (Heliconius hecale, Heliconius elevatus, and Heliconius pardalinus) expansions in genome size are driven mainly by repetitive sequences that map to four small regions in the H. melpomene reference genome. Genes from these repeat regions show an increase in exon copy number, an absence of internal stop codons, evidence of constraint on nonsynonymous changes, and increased expression, all of which suggest that at least some of the extra copies are functional. Finally, we conducted a systematic search for inversions and identified five moderately large inversions fixed between the two major Heliconius clades. We infer that one of these inversions was transferred by introgression between the lineages leading to the erato/sara and burneyi/doris clades. These reference-guided assemblies represent a major improvement in Heliconius genomic resources that enable further genetic and evolutionary discoveries in this genus.