Molecular Biology and Evolution (MBE) is pleased to announcethat Masatoshi Nei recently published a memoir entitled “My Life as a Molecular Evolutionist” from the Institute of Genomics and Evolutionary Medicine, Temple University Philadelphia, PA. The purpose of this memoir is to discuss the origin and development of the discipline of molecular evolution from the point of view of a cofounder of MBE. He was born in Japan in 1931 and immigrated to the United States in 1969 to do the work on molecular evolution. Since then, most of his professional work was done in the United States. In this memoir, he discusses how he grew up in Japan and how he ended up with working on molecular evolution, including some of enjoyment and grief in conducting the work. He emphasizes the importance of working with collaborators and indicates that many of his studies were done in these collaborations. The memoir is about 200 pages long and consists of 9 chapters and additional sections (see Box 1).
AbstractDeep sequencing of viral populations using next-generation sequencing (NGS) offers opportunities to understand and investigate evolution, transmission dynamics, and population genetics. Currently, the standard practice for processing NGS data to study viral populations is to summarize all the observed sequences from a sample as a single consensus sequence, thus discarding valuable information about the intrahost viral molecular epidemiology. Furthermore, existing analytical pipelines may only analyze genomic regions involved in drug resistance, thus are not suited for full viral genome analysis. Here, we present HAPHPIPE, a HAplotype and PHylodynamics PIPEline for genome-wide assembly of viral consensus sequences and haplotypes. The HAPHPIPE protocol includes modules for quality trimming, error correction, de novo assembly, alignment, and haplotype reconstruction. The resulting consensus sequences, haplotypes, and alignments can be further analyzed using a variety of phylogenetic and population genetic software. HAPHPIPE is designed to provide users with a single pipeline to rapidly analyze sequences from viral populations generated from NGS platforms and provide quality output properly formatted for downstream evolutionary analyses.
AbstractOrganisms face tradeoffs in performing multiple tasks. Identifying the optimal phenotypes maximizing the organismal fitness (or Pareto front) and inferring the relevant tasks allow testing phenotypic adaptations and help delineate evolutionary constraints, tradeoffs, and critical fitness components, so are of broad interest. It has been proposed that Pareto fronts can be identified from high-dimensional phenotypic data, including molecular phenotypes such as gene expression levels, by fitting polytopes (lines, triangles, tetrahedrons, and so on), and a program named ParTI was recently introduced for this purpose. ParTI has identified Pareto fronts and inferred phenotypes best for individual tasks (or archetypes) from numerous data sets such as the beak morphologies of Darwin’s finches and mRNA concentrations in human tumors, implying evolutionary optimizations of the involved traits. Nevertheless, the reliabilities of these findings are unknown. Using real and simulated data that lack evolutionary optimization, we here report extremely high false-positive rates of ParTI. The errors arise from phylogenetic relationships or population structures of the organisms analyzed and the flexibility of data analysis in ParTI that is equivalent to p-hacking. Because these problems are virtually universal, our findings cast doubt on almost all ParTI-based results and suggest that reliably identifying Pareto fronts and archetypes from high-dimensional phenotypic data are currently generally difficult.
AbstractSturgeons and paddlefishes (Acipenseriformes) occupy the basal position of ray-finned fishes, although they have cartilaginous skeletons as in Chondrichthyes. This evolutionary status and their morphological specializations make them a research focus, but their complex genomes (polyploidy and the presence of microchromosomes) bring obstacles and challenges to molecular studies. Here, we generated the first high-quality genome assembly of the American paddlefish (Polyodon spathula) at a chromosome level. Comparative genomic analyses revealed a recent species-specific whole-genome duplication event, and extensive chromosomal changes, including head-to-head fusions of pairs of intact, large ancestral chromosomes within the paddlefish. We also provide an overview of the paddlefish SCPP (secretory calcium-binding phosphoprotein) repertoire that is responsible for tissue mineralization, demonstrating that the earliest flourishing of SCPP members occurred at least before the split between Acipenseriformes and teleosts. In summary, this genome assembly provides a genetic resource for understanding chromosomal evolution in polyploid nonteleost fishes and bone mineralization in early vertebrates.
AbstractHuman leukocyte antigen (HLA) genes are among the most polymorphic of our genome, as a likely consequence of balancing selection related to their central role in adaptive immunity. HLA-A and HLA-B genes were recently suggested to evolve through a model of joint divergent asymmetric selection conferring all human populations, including those with severe loss of diversity, an equivalent immune potential. However, the mechanisms by which these two genes might undergo joint evolution while displaying very distinct allelic profiles in populations are still unknown. To address this issue, we carried out extensive data analyses (among which factorial correspondence analysis and linear modeling) on 2,909 common and rare HLA-A, HLA-B, and HLA-C alleles and 200,000 simulated pathogenic peptides by taking into account sequence variation, predicted peptide-binding affinity and HLA allele frequencies in 123 populations worldwide. Our results show that HLA-A and HLA-B (but not HLA-C) molecules maintain considerable functional divergence in almost all populations, which likely plays an instrumental role in their immune defense. We also provide robust evidence of functional complementarity between HLA-A and HLA-B molecules, which display asymmetric relationships in terms of amino acid diversity at both inter- and intraprotein levels and in terms of promiscuous or fastidious peptide-binding specificities. Like two wings of a flying bird, the functional complementarity of HLA-A and HLA-B is a perfect example, in our genome, of duplicated genes sharing their capacity of assuming common vital functions while being submitted to complex and sometimes distinct environmental pressures.
AbstractEmerging bacterial pathogens threaten global health and food security, and so it is important to ask whether these transitions to pathogenicity have any common features. We present a systematic study of the claim that pathogenicity is associated with genome reduction and gene loss. We compare broad-scale patterns across all bacteria, with detailed analyses of Streptococcus suis, an emerging zoonotic pathogen of pigs, which has undergone multiple transitions between disease and carriage forms. We find that pathogenicity is consistently associated with reduced genome size across three scales of divergence (between species within genera, and between and within genetic clusters of S. suis). Although genome reduction is also found in mutualist and commensal bacterial endosymbionts, genome reduction in pathogens cannot be solely attributed to the features of their ecology that they share with these species, that is, host restriction or intracellularity. Moreover, other typical correlates of genome reduction in endosymbionts (reduced metabolic capacity, reduced GC content, and the transient expansion of nonfunctional elements) are not consistently observed in pathogens. Together, our results indicate that genome reduction is a consistent correlate of pathogenicity in bacteria.
AbstractThe Red Queen hypothesis depicts evolution as the continual struggle to adapt. According to this hypothesis, new genes, especially those originating from nongenic sequences (i.e., de novo genes), are eliminated unless they evolve continually in adaptation to a changing environment. Here, we analyze two Drosophila de novo miRNAs that are expressed in a testis-specific manner with very high rates of evolution in their DNA sequence. We knocked out these miRNAs in two sibling species and investigated their contributions to different fitness components. We observed that the fitness contributions of miR-975 in Drosophila simulans seem positive, in contrast to its neutral contributions in D. melanogaster, whereas miR-983 appears to have negative contributions in both species, as the fitness of the knockout mutant increases. As predicted by the Red Queen hypothesis, the fitness difference of these de novo miRNAs indicates their different fates.
AbstractUltraconserved elements (UCEs) are stretches of hundreds of nucleotides with highly conserved cores flanked by variable regions. Although the selective forces responsible for the preservation of UCEs are unknown, they are nonetheless believed to contain phylogenetically meaningful information from deep to shallow divergence events. Phylogenetic applications of UCEs assume the same degree of rate heterogeneity applies across the entire locus, including variable flanking regions. We present a Wright–Fisher model of selection on nucleotides (SelON) which includes the effects of mutation, drift, and spatially varying, stabilizing selection for an optimal nucleotide sequence. The SelON model assumes the strength of stabilizing selection follows a position-dependent Gaussian function whose exact shape can vary between UCEs. We evaluate SelON by comparing its performance to a simpler and spatially invariant GTR+Γ model using an empirical data set of 400 vertebrate UCEs used to determine the phylogenetic position of turtles. We observe much improvement in model fit of SelON over the GTR+Γ model, and support for turtles as sister to lepidosaurs. Overall, the UCE-specific parameters SelON estimates provide a compact way of quantifying the strength and variation in selection within and across UCEs. SelON can also be extended to include more realistic mapping functions between sequence and stabilizing selection as well as allow for greater levels of rate heterogeneity. By more explicitly modeling the nature of selection on UCEs, SelON and similar approaches can be used to better understand the biological mechanisms responsible for their preservation across highly divergent taxa and long evolutionary time scales.
AbstractChromosome size and morphology vary within and among species, but little is known about the proximate or ultimate causes of these differences. Cichlid fish species in the tribe Oreochromini share an unusual giant chromosome that is ∼3 times longer than the other chromosomes. This giant chromosome functions as a sex chromosome in some of these species. We test two hypotheses of how this giant sex chromosome may have evolved. The first hypothesis proposes that it evolved by accumulating repetitive elements as recombination was reduced around a dominant sex determination locus, as suggested by canonical models of sex chromosome evolution. An alternative hypothesis is that the giant sex chromosome originated via the fusion of an autosome with a highly repetitive B chromosome, one of which carried a sex determination locus. We test these hypotheses using comparative analysis of chromosome-scale cichlid and teleost genomes. We find that the giant sex chromosome consists of three distinct regions based on patterns of recombination, gene and transposable element content, and synteny to the ancestral autosome. The WZ sex determination locus encompasses the last ∼105 Mb of the 134-Mb giant chromosome. The last 47 Mb of the giant chromosome shares no obvious homology to any ancestral chromosome. Comparisons across 69 teleost genomes reveal that the giant sex chromosome contains unparalleled amounts of endogenous retroviral elements, immunoglobulin genes, and long noncoding RNAs. The results favor the B chromosome fusion hypothesis for the origin of the giant chromosome.
AbstractThe rooting of the SARS-CoV-2 phylogeny is important for understanding the origin and early spread of the virus. Previously published phylogenies have used different rootings that do not always provide consistent results. We investigate several different strategies for rooting the SARS-CoV-2 tree and provide measures of statistical uncertainty for all methods. We show that methods based on the molecular clock tend to place the root in the B clade, whereas methods based on outgroup rooting tend to place the root in the A clade. The results from the two approaches are statistically incompatible, possibly as a consequence of deviations from a molecular clock or excess back-mutations. We also show that none of the methods provide strong statistical support for the placement of the root in any particular edge of the tree. These results suggest that phylogenetic evidence alone is unlikely to identify the origin of the SARS-CoV-2 virus and we caution against strong inferences regarding the early spread of the virus based solely on such evidence.
AbstractKnowledge on genetic structure is key to understand species connectivity patterns and to define the spatiotemporal scales over which conservation management plans should be designed and implemented. The distribution of genetic diversity (within and among populations) greatly influences species ability to cope and adapt to environmental changes, ultimately determining their long-term resilience to ecological disturbances. Yet, the drivers shaping connectivity and structure in marine fish populations remain elusive, as are the effects of fishing activities on genetic subdivision. To investigate these questions, we conducted a meta-analysis and compiled genetic differentiation data (FST/ΦST estimates) for more than 170 fish species from over 200 published studies globally distributed. We modeled the effects of multiple life-history traits, distance metrics, and methodological factors on observed population differentiation indices and specifically tested whether any signal arising from different exposure to fishing exploitation could be detected. Although the myriad of variables shaping genetic structure makes it challenging to isolate the influence of single drivers, results showed a significant correlation between commercial importance and genetic structure, with widespread lower population differentiation in commercially exploited species. Moreover, models indicate that variables commonly used as proxy for connectivity, such as larval pelagic duration, might be insufficient, and suggest that deep-sea species may disperse further. Overall, these results contribute to the growing body of knowledge on marine genetic connectivity and suggest a potential effect of commercial fisheries on the homogenization of genetic diversity, highlighting the need for additional research focused on dispersal ecology to ensure long-term sustainability of exploited marine species.
AbstractGene duplication is a major mechanism to create new genes. After gene duplication, some duplicated genes undergo functionalization, whereas others largely maintain redundant functions. Duplicated genes comprise various degrees of functional diversification in plants. However, the evolutionary fate of high and low diversified duplicates is unclear at genomic scale. To infer high and low diversified duplicates in Arabidopsis thaliana genome, we generated a prediction method for predicting whether a pair of duplicate genes was subjected to high or low diversification based on the phenotypes of knock-out mutants. Among 4,017 pairs of recently duplicated A. thaliana genes, 1,052 and 600 are high and low diversified duplicate pairs, respectively. The predictions were validated based on the phenotypes of generated knock-down transgenic plants. We determined that the high diversified duplicates resulting from tandem duplications tend to have lineage-specific functions, whereas the low diversified duplicates produced by whole-genome duplications are related to essential signaling pathways. To assess the evolutionary impact of high and low diversified duplicates in closely related species, we compared the retention rates and selection pressures on the orthologs of A. thaliana duplicates in two closely related species. Interestingly, high diversified duplicates resulting from tandem duplications tend to be retained in multiple lineages under positive selection. Low diversified duplicates by whole-genome duplications tend to be retained in multiple lineages under purifying selection. Taken together, the functional diversities determined by different duplication mechanisms had distinct effects on plant evolution.
AbstractThe rise and expansion of Tibetan Empire in the 7th to 9th centuries AD affected the course of history across East Eurasia, but the genetic impact of Tibetans on surrounding populations remains undefined. We sequenced 60 genomes for four populations from Pakistan and Tajikistan to explore their demographic history. We showed that the genomes of Balti people from Baltistan comprised 22.6–26% Tibetan ancestry. We inferred a single admixture event and dated it to about 39–21 generations ago, a period that postdated the conquest of Baltistan by the ancient Tibetan Empire. The analyses of mitochondrial DNA, Y, and X chromosome data indicated that both ancient Tibetan males and females were involved in the male-biased dispersal. Given the fact that the Balti people adopted Tibetan language and culture in history, our study suggested the impact of Tibetan Empire on Baltistan involved dominant cultural and minor demic diffusion.
AbstractParasites are a major evolutionary force, driving adaptive responses in host populations. Although the link between phenotypic response to parasite-mediated natural selection and the underlying genetic architecture often remains obscure, this link is crucial for understanding the evolution of resistance and predicting associated allele frequency changes in the population. To close this gap, we monitored the response to selection during epidemics of a virulent bacterial pathogen, Pasteuria ramosa, in a natural host population of Daphnia magna. Across two epidemics, we observed a strong increase in the proportion of resistant phenotypes as the epidemics progressed. Field and laboratory experiments confirmed that this increase in resistance was caused by selection from the local parasite. Using a genome-wide association study, we built a genetic model in which two genomic regions with dominance and epistasis control resistance polymorphism in the host. We verified this model by selfing host genotypes with different resistance phenotypes and scoring their F1 for segregation of resistance and associated genetic markers. Such epistatic effects with strong fitness consequences in host–parasite coevolution are believed to be crucial in the Red Queen model for the evolution of genetic recombination.
AbstractHybrids between species often show extreme phenotypes, including some that take place at the molecular level. In this study, we investigated the phenotypes of an interspecies diploid hybrid in terms of protein–protein interactions inferred from protein correlation profiling. We used two yeast species, Saccharomyces cerevisiae and Saccharomyces uvarum, which are interfertile, but yet have proteins diverged enough to be differentiated using mass spectrometry. Most of the protein–protein interactions are similar between hybrid and parents, and are consistent with the assembly of chimeric complexes, which we validated using an orthogonal approach for the prefoldin complex. We also identified instances of altered protein–protein interactions in the hybrid, for instance, in complexes related to proteostasis and in mitochondrial protein complexes. Overall, this study uncovers the likely frequent occurrence of chimeric protein complexes with few exceptions, which may result from incompatibilities or imbalances between the parental proteomes.
AbstractIt was long thought that solely three different transposable elements (TEs)—the I-element, the P-element, and hobo—invaded natural Drosophila melanogaster populations within the last century. By sequencing the “living fossils” of Drosophila research, that is, D. melanogaster strains sampled from natural populations at different time points, we show that a fourth TE, Tirant, invaded D. melanogaster populations during the past century. Tirant likely spread in D. melanogaster populations around 1938, followed by the I-element, hobo, and, lastly, the P-element. In addition to the recent insertions of the canonical Tirant, D. melanogaster strains harbor degraded Tirant sequences in the heterochromatin which are likely due to an ancient invasion, likely predating the split of D. melanogaster and D. simulans. These degraded insertions produce distinct piRNAs that were unable to prevent the novel Tirant invasion. In contrast to the I-element, P-element, and hobo, we did not find that Tirant induces any hybrid dysgenesis symptoms. This absence of apparent phenotypic effects may explain the late discovery of the Tirant invasion. Recent Tirant insertions were found in all investigated natural populations. Populations from Tasmania carry distinct Tirant sequences, likely due to a founder effect. By investigating the TE composition of natural populations and strains sampled at different time points, insertion site polymorphisms, piRNAs, and phenotypic effects, we provide a comprehensive study of a natural TE invasion.
AbstractAlthough gene duplications provide genetic backup and allow genomic changes under relaxed selection, they may potentially limit gene flow. When different copies of a duplicated gene are pseudofunctionalized in different genotypes, genetic incompatibilities can arise in their hybrid offspring. Although such cases have been reported after manual crosses, it remains unclear whether they occur in nature and how they affect natural populations. Here, we identified four duplicated-gene based incompatibilities including one previously not reported within an artificial Arabidopsis intercross population. Unexpectedly, however, for each of the genetic incompatibilities we also identified the incompatible alleles in natural populations based on the genomes of 1,135 Arabidopsis accessions published by the 1001 Genomes Project. Using the presence of incompatible allele combinations as phenotypes for GWAS, we mapped genomic regions that included additional gene copies which likely rescue the genetic incompatibility. Reconstructing the geographic origins and evolutionary trajectories of the individual alleles suggested that incompatible alleles frequently coexist, even in geographically closed regions, and that their effects can be overcome by additional gene copies collectively shaping the evolutionary dynamics of duplicated genes during population history.
AbstractGenomic variation in the model plant Arabidopsis thaliana has been extensively used to understand evolutionary processes in natural populations, mainly focusing on single-nucleotide polymorphisms. Conversely, structural variation has been largely ignored in spite of its potential to dramatically affect phenotype. Here, we identify 155,440 indels and structural variants ranging in size from 1 bp to 10 kb, including presence/absence variants (PAVs), inversions, and tandem duplications in 1,301 A. thaliana natural accessions from Morocco, Madeira, Europe, Asia, and North America. We show evidence for strong purifying selection on PAVs in genes, in particular for housekeeping genes and homeobox genes, and we find that PAVs are concentrated in defense-related genes (R-genes, secondary metabolites) and F-box genes. This implies the presence of a “core” genome underlying basic cellular processes and a “flexible” genome that includes genes that may be important in spatially or temporally varying selection. Further, we find an excess of intermediate frequency PAVs in defense response genes in nearly all populations studied, consistent with a history of balancing selection on this class of genes. Finally, we find that PAVs in genes involved in the cold requirement for flowering (vernalization) and drought response are strongly associated with temperature at the sites of origin.
AbstractIntegration of a conjugative plasmid into a bacterial chromosome can promote the transfer of chromosomal DNA to other bacteria. Intraspecies chromosomal conjugation is believed responsible for creating the global pathogens Klebsiella pneumoniae ST258 and Escherichia coli ST1193. Interspecies conjugation is also possible but little is known about the genetic architecture or fitness of such hybrids. To study this, we generated by conjugation 14 hybrids of E. coli and Salmonella enterica. These species belong to different genera, diverged from a common ancestor >100 Ma, and share a conserved order of orthologous genes with ∼15% nucleotide divergence. Genomic analysis revealed that all but one hybrid had acquired a contiguous segment of donor E. coli DNA, replacing a homologous region of recipient Salmonella chromosome, and ranging in size from ∼100 to >4,000 kb. Recombination joints occurred in sequences with higher-than-average nucleotide identity. Most hybrid strains suffered a large reduction in growth rate, but the magnitude of this cost did not correlate with the length of foreign DNA. Compensatory evolution to ameliorate the cost of low-fitness hybrids pointed towards disruption of complex genetic networks as a cause. Most interestingly, 4 of the 14 hybrids, in which from 45% to 90% of the Salmonella chromosome was replaced with E. coli DNA, showed no significant reduction in growth fitness. These data suggest that the barriers to creating high-fitness interspecies hybrids may be significantly lower than generally appreciated with implications for the creation of novel species.
AbstractOdorant receptors (ORs) are essential for plant–insect interactions. However, despite the global impacts of Lepidoptera (moths and butterflies) as major herbivores and pollinators, little functional data are available about Lepidoptera ORs involved in plant-volatile detection. Here, we initially characterized the plant-volatile-sensing function(s) of 44 ORs from the cotton bollworm Helicoverpa armigera, and subsequently conducted a large-scale comparative analysis that establishes how most orthologous ORs have functionally diverged among closely related species whereas some rare ORs are functionally conserved. Specifically, our systematic analysis of H. armigera ORs cataloged the wide functional scope of the H. armigera OR repertoire, and also showed that HarmOR42 and its Spodoptera littoralis ortholog are functionally conserved. Pursuing this, we characterized the HarmOR42-orthologous ORs from 11 species across the Glossata suborder and confirmed the HarmOR42 orthologs form a unique OR lineage that has undergone strong purifying selection in Glossata species and whose members are tuned with strong specificity to phenylacetaldehyde, a floral scent component common to most angiosperms. In vivo studies via HarmOR42 knockout support that HarmOR42-related ORs are essential for host-detection by sensing phenylacetaldehyde. Our work also supports that these ORs coevolved with the tube-like proboscis, and has maintained functional stability throughout the long-term coexistence of Lepidoptera with angiosperms. Thus, beyond providing a rich empirical resource for delineating the precise functions of H. armigera ORs, our results enable a comparative analysis of insect ORs that have apparently facilitated and currently sustain the intimate adaptations and ecological interactions among nectar feeding insects and flowering plants.
AbstractIn studies of hominin adaptations to fire use, the role of the aryl hydrocarbon receptor (AHR) in the evolution of detoxification has been highlighted, including statements that the modern human AHR confers a significantly better capacity to deal with toxic smoke components than the Neanderthal AHR. To evaluate this, we compared the AHR-controlled induction of cytochrome P4501A1 (CYP1A1) mRNA in HeLa human cervix epithelial adenocarcinoma cells transfected with an Altai-Neanderthal or a modern human reference AHR expression construct, and exposed to 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). We compared the complete AHR mRNA sequences including the untranslated regions (UTRs), maintaining the original codon usage. We observe no significant difference in CYP1A1 induction by TCDD between Neanderthal and modern human AHR, whereas a 150–1,000 times difference was previously reported in a study of the AHR coding region optimized for mammalian codon usage and expressed in rat cells. Our study exemplifies that expression in a homologous cellular background is of major importance to determine (ancient) protein activity. The Neanderthal and modern human dose–response curves almost coincide, except for a slightly higher extrapolated maximum for the Neanderthal AHR, possibly caused by a 5′-UTR G-variant known from modern humans (rs7796976). Our results are strongly at odds with a major role of the modern human AHR in the evolution of hominin detoxification of smoke components and consistent with our previous study based on 18 relevant genes in addition to AHR, which concluded that efficient detoxification alleles are more dominant in ancient hominins, chimpanzees, and gorillas than in modern humans.
AbstractMutations play a key role in the development of disease in an individual and the evolution of traits within species. Recent work in humans and other primates has clarified the origins and patterns of single-nucleotide variants, showing that most arise in the father’s germline during spermatogenesis. It remains unknown whether larger mutations, such as deletions and duplications of hundreds or thousands of nucleotides, follow similar patterns. Such mutations lead to copy-number variation (CNV) within and between species, and can have profound effects by deleting or duplicating genes. Here, we analyze patterns of CNV mutations in 32 rhesus macaque individuals from 14 parent–offspring trios. We find the rate of CNV mutations per generation is low (less than one per genome) and we observe no correlation between parental age and the number of CNVs that are passed on to offspring. We also examine segregating CNVs within the rhesus macaque sample and compare them to a similar data set from humans, finding that both species have far more segregating deletions than duplications. We contrast this with long-term patterns of gene copy-number evolution between 17 mammals, where the proportion of deletions that become fixed along the macaque lineage is much smaller than the proportion of segregating deletions. These results suggest purifying selection acting on deletions, such that the majority of them are removed from the population over time. Rhesus macaques are an important biomedical model organism, so these results will aid in our understanding of this species and the disease models it supports.
AbstractAs actors of global carbon cycle, Agaricomycetes (Basidiomycota) have developed complex enzymatic machineries that allow them to decompose all plant polymers, including lignin. Among them, saprotrophic Agaricales are characterized by an unparalleled diversity of habitats and lifestyles. Comparative analysis of 52 Agaricomycetes genomes (14 of them sequenced de novo) reveals that Agaricales possess a large diversity of hydrolytic and oxidative enzymes for lignocellulose decay. Based on the gene families with the predicted highest evolutionary rates—namely cellulose-binding CBM1, glycoside hydrolase GH43, lytic polysaccharide monooxygenase AA9, class-II peroxidases, glucose–methanol–choline oxidase/dehydrogenases, laccases, and unspecific peroxygenases—we reconstructed the lifestyles of the ancestors that led to the extant lignocellulose-decomposing Agaricomycetes. The changes in the enzymatic toolkit of ancestral Agaricales are correlated with the evolution of their ability to grow not only on wood but also on leaf litter and decayed wood, with grass-litter decomposers as the most recent eco-physiological group. In this context, the above families were analyzed in detail in connection with lifestyle diversity. Peroxidases appear as a central component of the enzymatic toolkit of saprotrophic Agaricomycetes, consistent with their essential role in lignin degradation and high evolutionary rates. This includes not only expansions/losses in peroxidase genes common to other basidiomycetes but also the widespread presence in Agaricales (and Russulales) of new peroxidases types not found in wood-rotting Polyporales, and other Agaricomycetes orders. Therefore, we analyzed the peroxidase evolution in Agaricomycetes by ancestral-sequence reconstruction revealing several major evolutionary pathways and mapped the appearance of the different enzyme types in a time-calibrated species tree.
AbstractEusociality is a highly conspicuous and ecologically impactful behavioral syndrome that has evolved independently across multiple animal lineages. So far, comparative genomic analyses of advanced sociality have been mostly limited to insects. Here, we study the only clade of animals known to exhibit eusociality in the marine realm—lineages of socially diverse snapping shrimps in the genus Synalpheus. To investigate the molecular impact of sociality, we assembled the mitochondrial genomes of eight Synalpheus species that represent three independent origins of eusociality and analyzed patterns of molecular evolution in protein-coding genes. Synonymous substitution rates are lower and potential signals of relaxed purifying selection are higher in eusocial relative to noneusocial taxa. Our results suggest that mitochondrial genome evolution was shaped by eusociality-linked traits—extended generation times and reduced effective population sizes that are hallmarks of advanced animal societies. This is the first direct evidence of eusociality impacting genome evolution in marine taxa. Our results also strongly support the idea that eusociality can shape genome evolution through profound changes in life history and demography.
AbstractWe developed dbCNS (http://yamasati.nig.ac.jp/dbcns), a new database for conserved noncoding sequences (CNSs). CNSs exist in many eukaryotes and are assumed to be involved in protein expression control. Version 1 of dbCNS, introduced here, includes a powerful and precise CNS identification pipeline for multiple vertebrate genomes. Mutations in CNSs may induce morphological changes and cause genetic diseases. For this reason, many vertebrate CNSs have been identified, with special reference to primate genomes. We integrated ∼6.9 million CNSs from many vertebrate genomes into dbCNS, which allows users to extract CNSs near genes of interest using keyword searches. In addition to CNSs, dbCNS contains published genome sequences of 161 species. With purposeful taxonomic sampling of genomes, users can employ CNSs as queries to reconstruct CNS alignments and phylogenetic trees, to evaluate CNS modifications, acquisitions, and losses, and to roughly identify species with CNSs having accelerated substitution rates. dbCNS also produces links to dbSNP for searching pathogenic single-nucleotide polymorphisms in human CNSs. Thus, dbCNS connects morphological changes with genetic diseases. A test analysis using 38 gnathostome genomes was accomplished within 30 s. dbCNS results can evaluate CNSs identified by other stand-alone programs using genome-scale data.
AbstractMalaria has been one of the strongest selective pressures on our species. Many of the best-characterized cases of adaptive evolution in humans are in genes tied to malaria resistance. However, the complex evolutionary patterns at these genes are poorly captured by standard scans for nonneutral evolution. Here, we present three new statistical tests for selection based on population genetic patterns that are observed more than once among key malaria resistance loci. We assess these tests using forward-time evolutionary simulations and apply them to global whole-genome sequencing data from humans, and thus we show that they are effective at distinguishing selection from neutrality. Each test captures a distinct evolutionary pattern, here called Divergent Haplotypes, Repeated Shifts, and Arrested Sweeps, associated with a particular period of human prehistory. We clarify the selective signatures at known malaria-relevant genes and identify additional genes showing similar adaptive evolutionary patterns. Among our top outliers, we see a particular enrichment for genes involved in erythropoiesis and for genes previously associated with malaria resistance, consistent with a major role for malaria in shaping these patterns of genetic diversity. Polymorphisms at these genes are likely to impact resistance to malaria infection and contribute to ongoing host–parasite coevolutionary dynamics.
AbstractNearly all current Bayesian phylogenetic applications rely on Markov chain Monte Carlo (MCMC) methods to approximate the posterior distribution for trees and other parameters of the model. These approximations are only reliable if Markov chains adequately converge and sample from the joint posterior distribution. Although several studies of phylogenetic MCMC convergence exist, these have focused on simulated data sets or select empirical examples. Therefore, much that is considered common knowledge about MCMC in empirical systems derives from a relatively small family of analyses under ideal conditions. To address this, we present an overview of commonly applied phylogenetic MCMC diagnostics and an assessment of patterns of these diagnostics across more than 18,000 empirical analyses. Many analyses appeared to perform well and failures in convergence were most likely to be detected using the average standard deviation of split frequencies, a diagnostic that compares topologies among independent chains. Different diagnostics yielded different information about failed convergence, demonstrating that multiple diagnostics must be employed to reliably detect problems. The number of taxa and average branch lengths in analyses have clear impacts on MCMC performance, with more taxa and shorter branches leading to more difficult convergence. We show that the usage of models that include both Γ-distributed among-site rate variation and a proportion of invariable sites is not broadly problematic for MCMC convergence but is also unnecessary. Changes to heating and the usage of model-averaged substitution models can both offer improved convergence in some cases, but neither are a panacea.
AbstractChitinases enzymatically hydrolyze chitin, a highly abundant and utilized polymer of N-acetyl-glucosamine. Fungi are a rich source of chitinases; however, the phylogenetic and functional diversity of fungal chitinases are not well understood. We surveyed fungal chitinases from 373 publicly available genomes, characterized domain architecture, and conducted phylogenetic analyses of the glycoside hydrolase (GH18) domain. This large-scale analysis does not support the previous division of fungal chitinases into three major clades (A, B, C) as chitinases previously assigned to the “C” clade are not resolved as distinct from the “A” clade. Fungal chitinase diversity was partly shaped by horizontal gene transfer, and at least one clade of bacterial origin occurs among chitinases previously assigned to the “B” clade. Furthermore, chitin-binding domains (including the LysM domain) do not define specific clades, but instead are found more broadly across clades of chitinases. To gain insight into biological function diversity, we characterized all eight chitinases (Cts) from the thermally dimorphic fungus, Histoplasma capsulatum: six A clade, one B clade, and one formerly classified C clade chitinases. Expression analyses showed variable induction of chitinase genes in the presence of chitin but preferential expression of CTS3 in the mycelial stage. Activity assays demonstrated that Cts1 (B-I), Cts2 (A-V), Cts3 (A-V), Cts4 (A-V) have endochitinase activities with varying degrees of chitobiosidase function. Cts6 (C-I) has activity consistent with N-acetyl-glucosaminidase exochitinase function and Cts8 (A-II) has chitobiase activity. These results suggest chitinase activity is variable even within subclades and that predictions of functionality require more sophisticated models.
AbstractMicrobiota can protect their hosts from infection. The short timescales in which microbes can evolve presents the possibility that “protective microbes” can take-over from the immune system of longer-lived hosts in the coevolutionary race against pathogens. Here, we found that coevolution between a protective bacterium (Enterococcus faecalis) and a virulent pathogen (Staphylococcus aureus) within an animal population (Caenorhabditis elegans) resulted in more disease suppression than when the protective bacterium adapted to uninfected hosts. At the same time, more protective E. faecalis populations became costlier to harbor and altered the expression of 134 host genes. Many of these genes appear to be related to the mechanism of protection, reactive oxygen species production. Crucially, more protective E. faecalis populations downregulated a key immune gene, , known to be effective against S. aureus infection. These results suggest that a microbial line of defense is favored by microbial coevolution and may cause hosts to plastically divest of their own immunity.
AbstractHow gene function evolves is a central question of evolutionary biology. It can be investigated by comparing functional genomics results between species and between genes. Most comparative studies of functional genomics have used pairwise comparisons. Yet it has been shown that this can provide biased results, as genes, like species, are phylogenetically related. Phylogenetic comparative methods should be used to correct for this, but they depend on strong assumptions, including unbiased tree estimates relative to the hypothesis being tested. Such methods have recently been used to test the “ortholog conjecture,” the hypothesis that functional evolution is faster in paralogs than in orthologs. Although pairwise comparisons of tissue specificity (τ) provided support for the ortholog conjecture, phylogenetic independent contrasts did not. Our reanalysis on the same gene trees identified problems with the time calibration of duplication nodes. We find that the gene trees used suffer from important biases, due to the inclusion of trees with no duplication nodes, to the relative age of speciations and duplications, to systematic differences in branch lengths, and to non-Brownian motion of tissue specificity on many trees. We find that incorrect implementation of phylogenetic method in empirical gene trees with duplications can be problematic. Controlling for biases allows successful use of phylogenetic methods to study the evolution of gene function and provides some support for the ortholog conjecture using three different phylogenetic approaches.
AbstractAs species struggle to keep pace with the rapidly warming climate, adaptive introgression of beneficial alleles from closely related species or populations provides a possible avenue for rapid adaptation. We investigate the potential for adaptive introgression in the copepod, Tigriopus californicus, by hybridizing two populations with divergent heat tolerance limits. We subjected hybrids to strong heat selection for 15 generations followed by whole-genome resequencing. Utilizing a hybridize evolve and resequence (HER) technique, we can identify loci responding to heat selection via a change in allele frequency. We successfully increased the heat tolerance (measured as LT50) in selected lines, which was coupled with higher frequencies of alleles from the southern (heat tolerant) population. These repeatable changes in allele frequencies occurred on all 12 chromosomes across all independent selected lines, providing evidence that heat tolerance is polygenic. These loci contained genes with lower protein-coding sequence divergence than the genome-wide average, indicating that these loci are highly conserved between the two populations. In addition, these loci were enriched in genes that changed expression patterns between selected and control lines in response to a nonlethal heat shock. Therefore, we hypothesize that the mechanism of heat tolerance divergence is explained by differential gene expression of highly conserved genes. The HER approach offers a unique solution to identifying genetic variants contributing to polygenic traits, especially variants that might be missed through other population genomic approaches.
AbstractRapid adaptation to novel environments may drive changes in genomic regions through natural selection. However, the genetic architecture underlying these adaptive changes is still poorly understood. Using population genomic approaches, we investigated the genomic architecture that underlies rapid parallel adaptation of Coilia nasus to fresh water by comparing four freshwater-resident populations with their ancestral anadromous population. Linkage disequilibrium network analysis and population genetic analyses revealed two putative large chromosome inversions on LG6 and LG22, which were enriched for outlier loci and exhibited parallel association with freshwater adaptation. Drastic frequency shifts and elevated genetic differentiation were observed for the two chromosome inversions among populations, suggesting that both inversions would undergo divergent selection between anadromous and resident ecotypes. Enrichment analysis of genes within chromosome inversions showed significant enrichment of genes involved in metabolic process, immunoregulation, growth, maturation, osmoregulation, and so forth, which probably underlay differences in morphology, physiology and behavior between the anadromous and freshwater-resident forms. The availability of beneficial standing genetic variation, large optimum shift between marine and freshwater habitats, and high efficiency of selection with large population size could lead to the observed rapid parallel adaptive genomic change. We propose that chromosomal inversions might have played an important role during the evolution of rapid parallel ecological divergence in the face of environmental heterogeneity in C. nasus. Our study provides insights into the genomic basis of rapid adaptation of complex traits in novel habitats and highlights the importance of structural genomic variants in analyses of ecological adaptation.
AbstractNovel coronaviruses, including SARS-CoV-2, SARS, and MERS, often originate from recombination events. The mechanism of recombination in RNA viruses is template switching. Coronavirus transcription also involves template switching at specific regions, called transcriptional regulatory sequences (TRS). It is hypothesized but not yet verified that TRS sites are prone to recombination events. Here, we developed a tool called SuPER to systematically identify TRS in coronavirus genomes and then investigated whether recombination is more common at TRS. We ran SuPER on 506 coronavirus genomes and identified 465 TRS-L and 3,509 TRS-B. We found that the TRS-L core sequence (CS) and the secondary structure of the leader sequence are generally conserved within coronavirus genera but different between genera. By examining the location of recombination breakpoints with respect to TRS-B CS, we observed that recombination hotspots are more frequently colocated with TRS-B sites than expected.
AbstractAvian genomes are small and lack some genes that are conserved in the genomes of most other vertebrates including nonavian sauropsids. One hypothesis stated that paralogs may provide biochemical or physiological compensation for certain gene losses; however, no functional evidence has been reported to date. By integrating evolutionary analysis, physiological genomics, and experimental gene interference, we clearly demonstrate functional compensation for gene loss. A large-scale phylogenetic analysis of over 1,400 SLC2 gene sequences identifies six new SLC2 genes from nonmammalian vertebrates and divides the SLC2 gene family into four classes. Vertebrates retain class III SLC2 genes but partially lack the more recent duplicates of classes I and II. Birds appear to have completely lost the SLC2A4 gene that encodes an important insulin-sensitive GLUT in mammals. We found strong evidence for positive selection, indicating that the N-termini of SLC2A4 and SLC2A12 have undergone diversifying selection in birds and mammals, and there is a significant correlation between SLC2A12 functionality and basal metabolic rates in endotherms. Physiological genomics have uncovered that SLC2A12 expression and allelic variants are associated with insulin sensitivity and blood glucose levels in wild birds. Functional tests have indicated that SLC2A12 abrogation causes hyperglycemia, insulin resistance, and high relative activity, thus increasing energy expenditures that resemble a diabetic phenotype. These analyses suggest that the SLC2A12 gene not only functionally compensates insulin response for SLC2A4 loss but also affects daily physical behavior and basal metabolic rate during bird evolution, highlighting that older genes retain a higher level of functional diversification.
AbstractThe Neisseria gonorrhoeae multilocus sequence type (ST) 1901 is among the lineages most commonly associated with treatment failure. Here, we analyze a global collection of ST-1901 genomes to shed light on the emergence and spread of alleles associated with reduced susceptibility to extended-spectrum cephalosporins (ESCs).The genetic diversity of ST-1901 falls into a minor and a major clade, both of which were inferred to have originated in East Asia. The dispersal of the major clade from Asia happened in two separate waves expanding from ∼1987 and 1996, respectively. Both waves first reached North America, and from there spread to Europe and Oceania, with multiple secondary reintroductions to Asia.The ancestor of the second wave acquired the penA 34.001 allele, which significantly reduces susceptibility to ESCs. Our results suggest that the acquisition of this allele granted the second wave a fitness advantage at a time when ESCs became the key drug class used to treat gonorrhea. Following its establishment globally, the lineage has served as a reservoir for the repeated emergence of clones fully resistant to the ESC ceftriaxone, an essential drug for effective treatment of gonorrhea.We infer that the effective population sizes of both clades went into decline as treatment schemes shifted from fluoroquinolones via ESC monotherapy to dual therapy with ceftriaxone and azithromycin in Europe and the United States. Despite the inferred recent population size decline, the short evolutionary path from the penA 34.001 allele to alleles providing full ceftriaxone resistance is a cause of concern.
AbstractPlant phenotypic plasticity describes altered phenotypic performance of an individual when grown in different environments. Exploring genetic architecture underlying plant plasticity variation may help mitigate the detrimental effects of a rapidly changing climate on agriculture, but little research has been done in this area to date. In the present study, we established a population of 976 maize F1 hybrids by crossing 488 diverse inbred lines with two elite testers. Genome-wide association study identified hundreds of quantitative trait loci associated with phenotypic plasticity variation across diverse F1 hybrids, the majority of which contributed very little variance, in accordance with the polygenic nature of these traits. We identified several quantitative trait locus regions that may have been selected during the tropical-temperate adaptation process. We also observed heterosis in terms of phenotypic plasticity, in addition to the traditional genetic value differences measured between hybrid and inbred lines, and the pattern of which was affected by genetic background. Our results demonstrate a landscape of phenotypic plasticity in maize, which will aid in the understanding of its genetic architecture, its contribution to adaptation and heterosis, and how it may be exploited for future maize breeding in a rapidly changing environment.
AbstractSince the start of the COVID-19 pandemic, an unprecedented number of genomic sequences of SARS-CoV-2 have been generated and shared with the scientific community. The unparalleled volume of available genetic data presents a unique opportunity to gain real-time insights into the virus transmission during the pandemic, but also a daunting computational hurdle if analyzed with gold-standard phylogeographic approaches. To tackle this practical limitation, we here describe and apply a rapid analytical pipeline to analyze the spatiotemporal dispersal history and dynamics of SARS-CoV-2 lineages. As a proof of concept, we focus on the Belgian epidemic, which has had one of the highest spatial densities of available SARS-CoV-2 genomes. Our pipeline has the potential to be quickly applied to other countries or regions, with key benefits in complementing epidemiological analyses in assessing the impact of intervention measures or their progressive easement.
Genome Biol. Evol. 13(2) doi:10.1093/gbe/evaa259
In the species descriptions of the species Myxococcus llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogochensis sp. nov., Myxococcus vastator sp. nov., and Pyxidicoccus caerfyrddinensis sp. nov. the acronyms of the culture collections to which the type strains were deposited contained typographic errors. The corrected species descriptions are provided below:
AbstractResearch on the genetics of domestication most often focuses on the protein-coding exons. However, exons cover only a minor part (1–2%) of the canine genome, whereas functional mutations may be located also in regions beyond the exome, in regulatory regions. Therefore, a large proportion of phenotypical differences between dogs and wolves may remain genetically unexplained. In this study, we identified variants that have high allelic frequency differences (i.e., highly differentiated variants) between wolves and dogs across the canine genome and investigated the potential functionality. We found that the enrichment of highly differentiated variants was substantially higher in promoters than in exons and that such variants were enriched also in enhancers. Several enriched pathways were identified including oxytocin signaling, carbohydrate digestion and absorption, cancer risk, and facial and body features, many of which reflect phenotypes of potential importance during domestication, including phenotypes of the domestication syndrome. The results highlight the importance of regulatory mutations during dog domestication and motivate the functional annotation of the noncoding part of the canine genome.
Birds have been shaped by evolution in many ways that have made them distinct from their vertebrate cousins. Over millions of years of evolution, our feathered friends have taken to the skies, accompanied by unique changes to their skeleton, musculature, respiration, and even reproductive systems. Recent genomic analyses have identified another unique aspect of the avian lineage: streamlined genomes. Although bird genomes contain roughly the same number of protein-coding genes as other vertebrates, their genomes are smaller, containing less noncoding DNA. Scientists are still exploring the potential consequences of this genome reduction on bird biology. In a new article in Genome Biology and Evolution titled “Genome size reduction and transposon activity impact tRNA gene diversity while ensuring translational stability in birds,” Claudia Kutter and her colleagues reveal that, in addition to fewer protein-coding genes, bird genomes also contain surprisingly few tRNA genes, while nonetheless exhibiting the same tRNA usage patterns as other vertebrates (Ottenburghs et al. 2021). As tRNAs are a pivotal part of the cellular machinery that translates messenger RNA (mRNA) into protein, this suggests that birds have evolved to use their limited tRNA repertoire more efficiently.
Genome Biol. Evol. 12(9):1493–1503; doi:10.1093/gbe/evaa138
AbstractThe history of modern humans in the Iberian Peninsula includes a variety of population arrivals sometimes presenting admixture with resident populations. Genetic data from current Iberian populations revealed an overall east–west genetic gradient that some authors interpreted as a direct consequence of the Reconquista, where Catholic Kingdoms expanded their territories toward the south while displacing Muslims. However, this interpretation has not been formally evaluated. Here, we present a qualitative analysis of the causes of the current genetic gradient observed in the Iberian Peninsula using extensive spatially explicit computer simulations based on a variety of evolutionary scenarios. Our results indicate that the Neolithic range expansion clearly produces the orientation of the observed genetic gradient. Concerning the Reconquista (including political borders among Catholic Kingdoms and regions with different languages), if modeled upon a previous Neolithic expansion, it effectively favored the orientation of the observed genetic gradient and shows local isolation of certain regions (i.e., Basques and Galicia). Despite additional evolutionary scenarios could be evaluated to more accurately decipher the causes of the Iberian genetic gradient, here we show that this gradient has a more complex explanation than that previously hypothesized.
AbstractThe painted urchin Lytechinus pictus is a sea urchin in the family Toxopneustidae and one of several sea urchin species that are routinely used as an experimental research organism. Recently, L. pictus has emerged as a tractable model system for establishing transgenic sea urchin lines due to its amenability to long term laboratory culture. We present the first published genome of L. pictus. This chromosomal-level assembly was generated using Illumina sequencing in conjunction with Oxford Nanopore Technologies long read sequencing and HiC chromatin conformation capture sequencing. The 998.9-Mb assembly exhibits high contiguity and has a scaffold length N50 of 46.0 Mb with 97% of the sequence assembled into 19 chromosomal-length scaffolds. These 19 scaffolds exhibit a high degree of synteny compared with the 19 chromosomes of a related species Lytechinus variegatus. Ab initio and transcript evidence gene modeling, combined with sequence homology, identified 28,631 gene models that capture 92% of BUSCO orthologs. This annotation strategy was validated by manual curation of gene models for the ABC transporter superfamily, which confirmed the completeness and accuracy of the annotations. Thus, this genome assembly, in conjunction with recent high contiguity assemblies of related species, positions L. pictus as an exceptional model system for comparative functional genomics and it will be a key resource for the developmental, toxicological, and ecological biology scientific communities.
AbstractWe report a chromosome-level assembly for Pieris macdunnoughii, a North American butterfly whose involvement in an evolutionary trap imposed by an invasive Eurasian mustard has made it an emerging model system for studying maladaptation in plant–insect interactions. Assembled using nearly 100× coverage of Oxford Nanopore long reads, the contig-level assembly comprised 106 contigs totaling 316,549,294 bases, with an N50 of 5.2 Mb. We polished the assembly with PoolSeq Illumina short-read data, demonstrating for the first time the comparable performance of individual and pooled short reads as polishing data sets. Extensive synteny between the reported contig-level assembly and a published, chromosome-level assembly of the European butterfly Pieris napi allowed us to generate a pseudochromosomal assembly of 47 contigs, placing 91.1% of our 317 Mb genome into a chromosomal framework. Additionally, we found support for a Z chromosome arrangement in P. napi, showing that the fusion event leading to this rearrangement predates the split between European and North American lineages of Pieris butterflies. This genome assembly and its functional annotation lay the groundwork for future research into the genetic basis of adaptive and maladaptive egg-laying behavior by P. macdunnoughii, contributing to our understanding of the susceptibility and responses of insects to evolutionary traps.
AbstractCyanobacteria are prolific producers of natural products, including polyketides and hybrid compounds thereof. Type III polyketide synthases (PKSs) are of particular interest, due to their wide substrate specificity and simple reaction mechanism, compared with both type I and type II PKSs. Surprisingly, only two type III PKS products, hierridins, and (7.7)paracyclophanes, have been isolated from cyanobacteria. Here, we report the mining of 517 cyanobacterial genomes for type III PKS biosynthesis gene clusters. Approximately 17% of the genomes analyzed encoded one or more type III PKSs. Together with already characterized type III PKSs, the phylogeny of this group of enzymes was investigated. Our analysis showed that type III PKSs in cyanobacteria evolved into three major lineages, including enzymes associated with 1) (7.7)paracyclophane-like biosynthesis gene clusters, 2) hierridin-like biosynthesis gene clusters, and 3) cytochrome b5 genes. The evolutionary history of these enzymes is complex, with some sequences partitioning primarily according to speciation and others putatively according to their reaction type. Protein modeling showed that cyanobacterial type III PKSs generally have a smaller active site cavity (mean = 109.035 Å3) compared with enzymes from other organisms. The size of the active site did not correlate well with substrate size, however, the “Gatekeeper” amino acid residues within the active site were strongly correlated to enzyme phylogeny. Our study provides unprecedented insight into the distribution, diversity, and molecular evolution of cyanobacterial type III PKSs, which could facilitate the discovery, characterization, and exploitation of novel enzymes, biochemical pathways, and specialized metabolites from this biosynthetically talented clade of microorganisms.
AbstractUnderstanding how selection shapes population differentiation and local adaptation in marine species remains one of the greatest challenges in the field of evolutionary biology. The selection of genes in response to environment-specific factors and microenvironmental variation often results in chaotic genetic patchiness, which is commonly observed in rocky shore organisms. To identify these genes, the expression profile of the marine gastropod Littoraria flava collected from four Southeast Brazilian locations in ten rocky shore sites was analyzed. In this first L. flava transcriptome, 250,641 unigenes were generated, and 24% returned hits after functional annotation. Independent paired comparisons between 1) transects, 2) sites within transects, and 3) sites from different transects were performed for differential expression, detecting 8,622 unique differentially expressed genes. Araçá (AR) and São João (SJ) transect comparisons showed the most divergent gene products. For local adaptation, fitness-related differentially expressed genes were chosen for selection tests. Nine and 24 genes under adaptative and purifying selection, respectively, were most related to biomineralization in AR and chaperones in SJ. The biomineralization-genes perlucin and gigasin-6 were positively selected exclusively in the site toward the open ocean in AR, with sequence variants leading to pronounced protein structure changes. Despite an intense gene flow among L. flava populations due to its planktonic larva, gene expression patterns within transects may be the result of selective pressures. Our findings represent the first step in understanding how microenvironmental genetic variation is maintained in rocky shore populations and the mechanisms underlying local adaptation in marine species.
AbstractOne of the central goals in molecular evolutionary biology is to determine the sources of variation in the rate of sequence evolution among proteins. Gene expression level is widely accepted as the primary determinant of protein evolutionary rate, because it scales with the extent of selective constraints imposed on a protein, leading to the well-known negative correlation between expression level and protein evolutionary rate (the E–R anticorrelation). Selective constraints have been hypothesized to entail the maintenance of protein function, the avoidance of cytotoxicity caused by protein misfolding or nonspecific protein–protein interactions, or both. However, empirical tests evaluating the relative importance of these hypotheses remain scarce, likely due to the nontrivial difficulties in distinguishing the effect of a deleterious mutation on a protein’s function versus its cytotoxicity. We realized that examining the sequence evolution of viral proteins could overcome this hurdle. It is because purifying selection against mutations in a viral protein that result in cytotoxicity per se is likely relaxed, whereas purifying selection against mutations that impair viral protein function persists. Multiple analyses of SARS-CoV-2 and nine other virus species revealed a complete absence of any E–R anticorrelation. As a control, the E–R anticorrelation does exist in human endogenous retroviruses where purifying selection against cytotoxicity is present. Taken together, these observations do not support the maintenance of protein function as the main constraint on protein sequence evolution in cellular organisms.
AbstractBasidiomycete yeasts have recently been reported as stably associated secondary fungal symbionts of many lichens, but their role in the symbiosis remains unknown. Attempts to sequence their genomes have been hampered both by the inability to culture them and their low abundance in the lichen thallus alongside two dominant eukaryotes (an ascomycete fungus and chlorophyte alga). Using the lichen Alectoria sarmentosa, we selectively dissolved the cortex layer in which secondary fungal symbionts are embedded to enrich yeast cell abundance and sequenced DNA from the resulting slurries as well as bulk lichen thallus. In addition to yielding a near-complete genome of the filamentous ascomycete using both methods, metagenomes from cortex slurries yielded a 36- to 84-fold increase in coverage and near-complete genomes for two basidiomycete species, members of the classes Cystobasidiomycetes and Tremellomycetes. The ascomycete possesses the largest gene repertoire of the three. It is enriched in proteases often associated with pathogenicity and harbors the majority of predicted secondary metabolite clusters. The basidiomycete genomes possess ∼35% fewer predicted genes than the ascomycete and have reduced secretomes even compared with close relatives, while exhibiting signs of nutrient limitation and scavenging. Furthermore, both basidiomycetes are enriched in genes coding for enzymes producing secreted acidic polysaccharides, representing a potential contribution to the shared extracellular matrix. All three fungi retain genes involved in dimorphic switching, despite the ascomycete not being known to possess a yeast stage. The basidiomycete genomes are an important new resource for exploration of lifestyle and function in fungal–fungal interactions in lichen symbioses.
AbstractA manually curated set of ohnolog families has been assembled, for seven species of bony vertebrates, that includes 255 four-member families and 631 three-member families, encompassing over 2,900 ohnologs. Across species, the patterns of chromosomes upon which the ohnologs reside fall into 17 distinct categories. These 17 paralogons reflect the 17 ancestral chromosomes that existed in our chordate ancestor immediately prior to the two rounds of whole-genome duplication (2R-WGD) that occurred around 600 Ma. Within each paralogon, it has now been possible to assign those pairs of ohnologs that diverged from each other at the first round of duplication, through analysis of the molecular phylogeny of four-member families. Comparison with another recent analysis has identified four apparently incorrect assignments of pairings following 2R, along with several omissions, in that study. By comparison of the patterns between paralogons, it has also been possible to identify nine chromosomal fusions that occurred between 1R and 2R, and three chromosomal fusions that occurred after 2R, that generated an ancestral bony-vertebrate karyotype comprising 47 chromosomes. At least 27 of those ancestral bony-vertebrate chromosomes can, in some extant species, be shown not to have undergone any fusion or fission events. Such chromosomes are here termed “archeochromosomes,” and have each survived essentially unchanged in their content of genes for some 400 Myr. Their utility lies in their potential for tracking the various fusion and fission events that have occurred in different lineages throughout the expansion of bony vertebrates.
AbstractCobalamin is a cofactor present in essential metabolic pathways in animals and one of the water-soluble vitamins. It is a complex compound synthesized solely by prokaryotes. Cobalamin dependence is scattered across the tree of life. In particular, fungi and plants were deemed devoid of cobalamin. We demonstrate that cobalamin is utilized by all non-Dikarya fungi lineages. This observation is supported by the genomic presence of both B12-dependent enzymes and cobalamin modifying enzymes. Fungal cobalamin-dependent enzymes are highly similar to their animal homologs. Phylogenetic analyses support a scenario of vertical inheritance of the cobalamin usage with several losses. Cobalamin usage was probably lost in Mucorinae and at the base of Dikarya which groups most of the model organisms and which hindered B12-dependent metabolism discovery in fungi. Our results indicate that cobalamin dependence was a widely distributed trait at least in Opisthokonta, across diverse microbial eukaryotes and was likely present in the LECA.
AbstractThe evolution of gene order rearrangements within bacterial chromosomes is a fast process. Closely related species can have almost no conservation in long-range gene order. A prominent exception to this rule is a >40 kb long cluster of five core operons (secE-rpoBC-str-S10-spc-alpha) and three variable adjacent operons (cysS, tufB, and ecf) that together contain 57 genes of the transcriptional and translational machinery. Previous studies have indicated that at least part of this operon cluster might have been present in the last common ancestor of bacteria and archaea. Using 204 whole genome sequences, ∼2 Gy of evolution of the operon cluster were reconstructed back to the last common ancestors of the Gammaproteobacteria and of the Bacilli. A total of 163 independent evolutionary events were identified in which the operon cluster was altered. Further examination showed that the process of disconnecting two operons generally follows the same pattern. Initially, a small number of genes is inserted between the operons breaking the concatenation followed by a second event that fully disconnects the operons. While there is a general trend for loss of gene synteny over time, there are examples of increased alteration rates at specific branch points or within specific bacterial orders. This indicates the recurrence of relaxed selection on the gene order within bacterial chromosomes. The analysis of the alternation events indicates that segmental genome duplications and/or transposon-directed recombination play a crucial role in rearrangements of the operon cluster.
AbstractDetecting natural selection signals in admixed populations can be problematic since the source of the signal typically dates back prior to the admixture event. On one hand, it is now possible to study various source populations before a particular admixture thanks to the developments in ancient DNA (aDNA) in the last decade. However, aDNA availability is limited to certain geographical regions and the sample sizes and quality of the data might not be sufficient for selection analysis in many cases. In this study, we explore possible ways to improve detection of pre-admixture signals in admixed populations using a local ancestry inference approach. We used masked haplotypes for population branch statistic (PBS) and full haplotypes constructed following our approach from Yelmen et al. (2019) for cross-population extended haplotype homozygosity (XP-EHH), utilizing forward simulations to test the power of our analysis. The PBS results on simulated data showed that using masked haplotypes obtained from ancestry deconvolution instead of the admixed population might improve detection quality. On the other hand, XP-EHH results using the admixed population were better compared with the local ancestry method. We additionally report correlation for XP-EHH scores between source and admixed populations, suggesting that haplotype-based approaches must be used cautiously for recently admixed populations. Additionally, we performed PBS on real South Asian populations masked with local ancestry deconvolution and report here the first possible selection signals on the autochthonous South Asian component of contemporary South Asian populations.
AbstractThe common chaffinch, Fringilla coelebs, is one of the most common, widespread, and well-studied passerines in Europe, with a broad distribution encompassing Western Europe and parts of Asia, North Africa, and the Macaronesian archipelagos. We present a high-quality genome assembly of the common chaffinch generated using Illumina shotgun sequencing in combination with Chicago and Hi-C libraries. The final genome is a 994.87-Mb chromosome-level assembly, with 98% of the sequence data located in chromosome scaffolds and a N50 statistic of 69.73 Mb. Our genome assembly shows high completeness, with a complete BUSCO score of 93.9% using the avian data set. Around 7.8% of the genome contains interspersed repetitive elements. The structural annotation yielded 17,703 genes, 86.5% of which have a functional annotation, including 7,827 complete universal single-copy orthologs out of 8,338 genes represented in the BUSCO avian data set. This new annotated genome assembly will be a valuable resource as a reference for comparative and population genomic analyses of passerine, avian, and vertebrate evolution.
AbstractLegionella spp. are ubiquitous bacteria principally found in water networks and ∼20 species are implicated in Legionnaire’s disease. Among them, Legionella pneumophila is an intracellular pathogen of environmental protozoa, responsible for ∼90% of cases in the world. Legionella pneumophila regulates in part its virulence by a quorum sensing system named “Legionella quorum sensing,” composed of a signal synthase LqsA, two histidine kinase membrane receptors LqsS and LqsT and a cytoplasmic receptor LqsR. To date, this communication system was only found in L. pneumophila. Here, we investigated 58 Legionella genomes to determine the presence of a lqs cluster or homologous receptors using TBlastN. This analysis revealed three categories of species: 19 harbored a complete lqs cluster, 20 did not possess lqsA but maintained the receptor lqsR and/or lqsS, and 19 did not have any of the lqs genes. No correlation was observed between pathogenicity and the presence of a quorum sensing system. We determined by RT-qPCR that the lqsA gene was expressed at least in four strains among different species available in our laboratory. Furthermore, we showed that the lqs genomic region was conserved even in species possessing only the receptors of the quorum sensing system, indicating an ancestral acquisition and various loss dynamics during evolution. This system could therefore function in interspecific communication as well.
AbstractTransposable elements (TEs) inflict numerous negative effects on health and fitness as they replicate by integrating into new regions of the host genome. Even though organisms employ powerful mechanisms to demobilize TEs, transposons gradually lose repression during aging. The rising TE activity causes genomic instability and was implicated in age-dependent neurodegenerative diseases, inflammation, and the determination of lifespan. It is therefore conceivable that long-lived individuals have improved TE silencing mechanisms resulting in reduced TE expression relative to their shorter-lived counterparts and fewer genomic insertions. Here, we test this hypothesis by performing the first genome-wide analysis of TE insertions and expression in populations of Drosophila melanogaster selected for longevity through late-life reproduction for 50–170 generations from four independent studies. Contrary to our expectation, TE families were generally more abundant in long-lived populations compared with nonselected controls. Although simulations showed that this was not expected under neutrality, we found little evidence for selection driving TE abundance differences. Additional RNA-seq analysis revealed a tendency for reducing TE expression in selected populations, which might be more important for lifespan than regulating genomic insertions. We further find limited evidence of parallel selection on genes related to TE regulation and transposition. However, telomeric TEs were genomically and transcriptionally more abundant in long-lived flies, suggesting improved telomere maintenance as a promising TE-mediated mechanism for prolonging lifespan. Our results provide a novel viewpoint indicating that reproduction at old age increases the opportunity of TEs to be passed on to the next generation with little impact on longevity.
AbstractContemporary individuals are the combination of genetic fragments inherited from ancestors belonging to multiple populations, as the result of migration and admixture. Isolating and characterizing these layers are crucial to the understanding of the genetic history of a given population. Ancestry deconvolution approaches make use of a large amount of source individuals, therefore constraining the performance of Local Ancestry Inferences when only few genomes are available from a given population. Here we present WINC, a local ancestry framework derived from the combination of ChromoPainter and NNLS approaches, as a method to retrieve local genetic assignments when only a few reference individuals are available. The framework is aided by a score assignment based on source differentiation to maximize the amount of sequences retrieved and is capable of retrieving accurate ancestry assignments when only two individuals for source populations are used.
AbstractStaphylococcus cohnii (SC), a coagulase-negative bacterium, was first isolated in 1975 from human skin. Early phenotypic analyses led to the delineation of two subspecies (subsp.), Staphylococcus cohnii subsp. cohnii (SCC) and Staphylococcus cohnii subsp. urealyticus (SCU). SCC was considered to be specific to humans, whereas SCU apparently demonstrated a wider host range, from lower primates to humans. The type strains ATCC 29974 and ATCC 49330 have been designated for SCC and SCU, respectively. Comparative analysis of 66 complete genome sequences—including a novel SC isolate—revealed unexpected patterns within the SC complex, both in terms of genomic sequence identity and gene content, highlighting the presence of 3 phylogenetically distinct groups. Based on our observations, and on the current guidelines for taxonomic classification for bacterial species, we propose a revision of the SC species complex. We suggest that SCC and SCU should be regarded as two distinct species: SC and SU (Staphylococcus urealyticus), and that two distinct subspecies, SCC and SCB (SC subsp. barensis, represented by the novel strain isolated in Bari) should be recognized within SC. Furthermore, since large-scale comparative genomics studies recurrently suggest inconsistencies or conflicts in taxonomic assignments of bacterial species, we believe that the approach proposed here might be considered for more general application.
AbstractAs a highly diverse vertebrate class, bird species have adapted to various ecological systems. How this phenotypic diversity can be explained genetically is intensively debated and is likely grounded in differences in the genome content. Larger and more complex genomes could allow for greater genetic regulation that results in more phenotypic variety. Surprisingly, avian genomes are much smaller compared to other vertebrates but contain as many protein-coding genes as other vertebrates. This supports the notion that the phenotypic diversity is largely determined by selection on non-coding gene sequences. Transfer RNAs (tRNAs) represent a group of non-coding genes. However, the characteristics of tRNA genes across bird genomes have remained largely unexplored. Here, we exhaustively investigated the evolution and functional consequences of these crucial translational regulators within bird species and across vertebrates. Our dense sampling of 55 avian genomes representing each bird order revealed an average of 169 tRNA genes with at least 31% being actively used. Unlike other vertebrates, avian tRNA genes are reduced in number and complexity but are still in line with vertebrate wobble pairing strategies and mutation-driven codon usage. Our detailed phylogenetic analyses further uncovered that new tRNA genes can emerge through multiplication by transposable elements. Together, this study provides the first comprehensive avian and cross-vertebrate tRNA gene analyses and demonstrates that tRNA gene evolution is flexible albeit constrained within functional boundaries of general mechanisms in protein translation.
AbstractSchlegelella thermodepolymerans is a moderately thermophilic bacterium capable of producing polyhydroxyalkanoates—biodegradable polymers representing an alternative to conventional plastics. Here, we present the first complete genome of the type strain S. thermodepolymerans DSM 15344 that was assembled by hybrid approach using both long (Oxford Nanopore) and short (Illumina) reads. The genome consists of a single 3,858,501-bp-long circular chromosome with GC content of 70.3%. Genome annotation identified 3,650 genes in total, whereas 3,598 open reading frames belonged to protein-coding genes. Functional annotation of the genome and division of genes into clusters of orthologous groups revealed a relatively high number of 1,013 genes with unknown function or unknown clusters of orthologous groups, which reflects the fact that only a little is known about thermophilic polyhydroxyalkanoates-producing bacteria on a genome level. On the other hand, 270 genes involved in energy conversion and production were detected. This group covers genes involved in catabolic processes, which suggests capability of S. thermodepolymerans DSM 15344 to utilize and biotechnologically convert various substrates such as lignocellulose-based saccharides, glycerol, or lipids. Based on the knowledge of its genome, it can be stated that S. thermodepolymerans DSM 15344 is a very interesting, metabolically versatile bacterium with great biotechnological potential.