Grant L. Filowitz, Rajendhran Rajakumar, Katherine L. O'Shaughnessy, and Martin J. Cohn
AbstractNatural selection works best when the two alleles in a diploid organism are transmitted to offspring at equal frequencies. Despite this, selfish loci known as meiotic drivers that bias their own transmission into gametes are found throughout eukaryotes. Drive is thought to be a powerful evolutionary force, but empirical evolutionary analyses of drive systems are limited by low numbers of identified meiotic drive genes. Here, we analyze the evolution of the wtf gene family of Schizosaccharomyces pombe that contains both killer meiotic drive genes and suppressors of drive. We completed assemblies of all wtf genes for two S. pombe isolates, as well as a subset of wtf genes from over 50 isolates. We find that wtf copy number can vary greatly between isolates and that amino acid substitutions, expansions and contractions of DNA sequence repeats, and nonallelic gene conversion between family members all contribute to dynamic wtf gene evolution. This work demonstrates the power of meiotic drive to foster rapid evolution and identifies a recombination mechanism through which transposons can indirectly mobilize meiotic drivers.
Mol. Biol. Evol. 35(3):549–563, doi:10.1093/molbev/msx247
AbstractGenomic imprinting is an epigenetic phenomenon where autosomal genes display uniparental expression depending on whether they are maternally or paternally inherited. Genomic imprinting can arise from parental conflicts over resource allocation to the offspring, which could drive imprinted loci to evolve by positive selection. We investigate whether positive selection is associated with genomic imprinting in the inbreeding species Arabidopsis thaliana. Our analysis of 140 genes regulated by genomic imprinting in the A. thaliana seed endosperm demonstrates they are evolving more rapidly than expected. To investigate whether positive selection drives this evolutionary acceleration, we identified orthologs of each imprinted gene across 34 plant species and elucidated their evolutionary trajectories. Increased positive selection was sought by comparing its incidence among imprinted genes with nonimprinted controls. Strikingly, we find a statistically significant enrichment of imprinted paternally expressed genes (iPEGs) evolving under positive selection, 50.6% of the total, but no such enrichment for positive selection among imprinted maternally expressed genes (iMEGs). This suggests that maternally- and paternally expressed imprinted genes are subject to different selective pressures. Almost all positively selected amino acids were fixed across 80 sequenced A. thaliana accessions, suggestive of selective sweeps in the A. thaliana lineage. The imprinted genes under positive selection are involved in processes important for seed development including auxin biosynthesis and epigenetic regulation. Our findings support a genomic imprinting model for plants where positive selection can affect paternally expressed genes due to continued conflict with maternal sporophyte tissues, even when parental conflict is reduced in predominantly inbreeding species.
AbstractIn species with chromosomal sex determination, X chromosomes are predicted to evolve faster than autosomes because of positive selection on recessive alleles or weak purifying selection. We investigated X chromosome evolution in Stegodyphus spiders that differ in mating system, sex ratio, and population dynamics. We assigned scaffolds to X chromosomes and autosomes using a novel method based on flow cytometry of sperm cells and reduced representation sequencing. We estimated coding substitution patterns (dN/dS) in a subsocial outcrossing species (S. africanus) and its social inbreeding and female-biased sister species (S. mimosarum), and found evidence for faster-X evolution in both species. X chromosome-to-autosome diversity (piX/piA) ratios were estimated in multiple populations. The average piX/piA estimates of S. africanus (0.57 [95% CI: 0.55–0.60]) was lower than the neutral expectation of 0.75, consistent with more hitchhiking events on X-linked loci and/or a lower X chromosome mutation rate, and we provide evidence in support of both. The social species S. mimosarum has a significantly higher piX/piA ratio (0.72 [95% CI: 0.65–0.79]) in agreement with its female-biased sex ratio. Stegodyphus mimosarum also have different piX/piA estimates among populations, which we interpret as evidence for recurrent founder events. Simulations show that recurrent founder events are expected to decrease the piX/piA estimates in S. mimosarum, thus underestimating the true effect of female-biased sex ratios. Finally, we found lower synonymous divergence on X chromosomes in both species, and the male-to-female substitution ratio to be higher than 1, indicating a higher mutation rate in males.
AbstractIncreasingly, large phylogenomic data sets include transcriptomic data from nonmodel organisms. This not only has allowed controversial and unexplored evolutionary relationships in the tree of life to be addressed but also increases the risk of inadvertent inclusion of paralogs in the analysis. Although this may be expected to result in decreased phylogenetic support, it is not clear if it could also drive highly supported artifactual relationships. Many groups, including the hyperdiverse Lissamphibia, are especially susceptible to these issues due to ancient gene duplication events and small numbers of sequenced genomes and because transcriptomes are increasingly applied to resolve historically conflicting taxonomic hypotheses. We tested the potential impact of paralog inclusion on the topologies and timetree estimates of the Lissamphibia using published and de novo sequencing data including 18 amphibian species, from which 2,656 single-copy gene families were identified. A novel paralog filtering approach resulted in four differently curated data sets, which were used for phylogenetic reconstructions using Bayesian inference, maximum likelihood, and quartet-based supertrees. We found that paralogs drive strongly supported conflicting hypotheses within the Lissamphibia (Batrachia and Procera) and older divergence time estimates even within groups where no variation in topology was observed. All investigated methods, except Bayesian inference with the CAT-GTR model, were found to be sensitive to paralogs, but with filtering convergence to the same answer (Batrachia) was observed. This is the first large-scale study to address the impact of orthology selection using transcriptomic data and emphasizes the importance of quality over quantity particularly for understanding relationships of poorly sampled taxa.
AbstractGenomes are dynamic biological units, with processes of gene duplication and loss triggering evolutionary novelty. The mammalian skin provides a remarkable case study on the occurrence of adaptive morphological innovations. Skin sebaceous glands (SGs), for instance, emerged in the ancestor of mammals serving pivotal roles, such as lubrication, waterproofing, immunity, and thermoregulation, through the secretion of sebum, a complex mixture of various neutral lipids such as triacylglycerol, free fatty acids, wax esters, cholesterol, and squalene. Remarkably, SGs are absent in a few mammalian lineages, including the iconic Cetacea. We investigated the evolution of the key molecular components responsible for skin sebum production: Dgat2l6, Awat1, Awat2, Elovl3, Mogat3, and Fabp9. We show that all analyzed genes have been rendered nonfunctional in Cetacea species (toothed and baleen whales). Transcriptomic analysis, including a novel skin transcriptome from blue whale, supports gene inactivation. The conserved mutational pattern found in most analyzed genes, indicates that pseudogenization events took place prior to the diversification of modern Cetacea lineages. Genome and skin transcriptome analysis of the common hippopotamus highlighted the convergent loss of a subset of sebum-producing genes, notably Awat1 and Mogat3. Partial loss profiles were also detected in non-Cetacea aquatic mammals, such as the Florida manatee, and in terrestrial mammals displaying specialized skin phenotypes such as the African elephant, white rhinoceros and pig. Our findings reveal a unique landscape of “gene vestiges” in the Cetacea sebum-producing compartment, with limited gene loss observed in other mammalian lineages: suggestive of specific adaptations or specializations of skin lipids.
AbstractExtensive European and African admixture coupled with loss of Amerindian lineages makes the reconstruction of pre-Columbian history of Native Americans based on present-day genomes extremely challenging. Still open questions remain about the dispersals that occurred throughout the continent after the initial peopling from the Beringia, especially concerning the number and dynamics of diffusions into South America. Indeed, if environmental and historical factors contributed to shape distinct gene pools in the Andes and Amazonia, the origins of this East-West genetic structure and the extension of further interactions between populations residing along this divide are still not well understood.To this end, we generated new high-resolution genome-wide data for 229 individuals representative of one Central and ten South Amerindian ethnic groups from Mexico, Peru, Bolivia, and Argentina. Low levels of European and African admixture in the sampled individuals allowed the application of fine-scale haplotype-based methods and demographic modeling approaches. These analyses revealed highly specific Native American genetic ancestries and great intragroup homogeneity, along with limited traces of gene flow mainly from the Andes into Peruvian Amazonians. Substantial amount of genetic drift differentially experienced by the considered populations underlined distinct patterns of recent inbreeding or prolonged isolation. Overall, our results support the hypothesis that all non-Andean South Americans are compatible with descending from a common lineage, while we found low support for common Mesoamerican ancestors of both Andeans and other South American groups. These findings suggest extensive back-migrations into Central America from non-Andean sources or conceal distinct peopling events into the Southern Continent.
AbstractOne approach to the reconstruction of infectious disease transmission trees from pathogen genomic data has been to use a phylogenetic tree, reconstructed from pathogen sequences, and annotate its internal nodes to provide a reconstruction of which host each lineage was in at each point in time. If only one pathogen lineage can be transmitted to a new host (i.e., the transmission bottleneck is complete), this corresponds to partitioning the nodes of the phylogeny into connected regions, each of which represents evolution in an individual host. These partitions define the possible transmission trees that are consistent with a given phylogenetic tree. However, the mathematical properties of the transmission trees given a phylogeny remain largely unexplored. Here, we describe a procedure to calculate the number of possible transmission trees for a given phylogeny, and we then show how to uniformly sample from these transmission trees. The procedure is outlined for situations where one sample is available from each host and trees do not have branch lengths, and we also provide extensions for incomplete sampling, multiple sampling, and the application to time trees in a situation where limits on the period during which each host could have been infected and infectious are known. The sampling algorithm is available as an R package (STraTUS).
AbstractExtracellular matrix (ECM) is considered central to the evolution of metazoan multicellularity; however, the repertoire of ECM proteins in nonbilaterians remains unclear. Thrombospondins (TSPs) are known to be well conserved from cnidarians to vertebrates, yet to date have been considered a unique family, principally studied for matricellular functions in vertebrates. Through searches utilizing the highly conserved C-terminal region of TSPs, we identify undisclosed new families of TSP-related proteins in metazoans, designated mega-TSP, sushi-TSP, and poriferan-TSP, each with a distinctive phylogenetic distribution. These proteins share the TSP C-terminal region domain architecture, as determined by domain composition and analysis of molecular models against known structures. Mega-TSPs, the only form identified in ctenophores, are typically >2,700 aa and are also characterized by N-terminal leucine-rich repeats and central cadherin/immunoglobulin domains. In cnidarians, which have a well-defined ECM, Mega-TSP was expressed throughout embryogenesis in Nematostella vectensis, with dynamic endodermal expression in larvae and primary polyps and widespread ectodermal expression in adult Nematostella vectensis and Hydra magnipapillata polyps. Hydra Mega-TSP was also expressed during regeneration and siRNA-silencing of Mega-TSP in Hydra caused specific blockade of head regeneration. Molecular phylogenetic analyses based on the conserved TSP C-terminal region identified each of the TSP-related groups to form clades distinct from the canonical TSPs. We discuss models for the evolution of the newly defined TSP superfamily by gene duplications, radiation, and gene losses from a debut in the last metazoan common ancestor. Together, the data provide new insight into the evolution of ECM and tissue organization in metazoans.
AbstractThe importance of climate in determining biodiversity patterns has been well documented. However, the relationship between climate and rates of genetic evolution remains controversial. Latitude and elevation have been associated with rates of change in genetic markers such as cytochrome b. What is not known, however, is the strength of such associations and whether patterns found among these genes apply across entire genomes. Here, using bumblebee genetic data from seven subgenera of Bombus, we demonstrate that all species occupying warmer elevations have undergone faster genome-wide evolution than those in the same subgenera occupying cooler elevations. Our findings point to a critical biogeographic role in the relative rates of whole species evolution, potentially influencing global biodiversity patterns.
AbstractSeasonal influenza viruses undergo frequent mutations on their surface hemagglutinin (HA) proteins to escape the host immune response. In these mutations, a few key amino acid sites are associated with significant antigenic cluster transitions. To recognize the cluster-transition determining sites of seasonal influenza A/H3N2 and A/H1N1 viruses systematically and quickly, we developed a computational model named RECDS (recognition of cluster-transition determining sites) to evaluate the contribution of a specific amino acid site on the HA protein in the whole history of antigenic evolution. In RECDS, we ranked all of the HA sites by calculating the contribution scores derived from the forest of gradient boosting classifiers trained by various sequence- and structure-based features. With the RECDS model, we found out that the sites determining influenza antigenicity were mostly around the receptor-binding domain both for the influenza A/H3N2 and A/H1N1 viruses. Specifically, half of the cluster-transition determining sites of the influenza A/H1N1 virus were located in the vestigial esterase domain and basic path area on the HA, which indicated that the differential driving force of the antigenic evolution of the A/H1N1 virus refers to the A/H3N2 virus. Beyond that, the footprints of substitutions responsible for antigenic evolution were inferred according to the phylogenetic trees for the cluster-transition determining sites. The monitoring of genetic variation occurring at these cluster-transition determining sites in circulating influenza viruses on a large scale will potentially reduce current assay workloads in influenza surveillance and the selection of new influenza vaccine strains.
AbstractThe mass application of whole mitogenome (MG) sequencing has great potential for resolving complex phylogeographic patterns that cannot be resolved by partial mitogenomic sequences or nuclear markers. North American periodical cicadas (Magicicada) are well known for their periodical mass emergence at 17- and 13-year intervals in the north and south, respectively. Magicicada comprises three species groups, each containing one 17-year species and one or two 13-year species. Within each life cycle, single-aged cohorts, called broods, of periodical cicadas emerge in different years, and most broods contain members of all three species groups. There are 12 and three extant broods of 17- and 13-year cicadas, respectively. The phylogeographic relationships among the populations and broods within the species groups have not been clearly resolved. We analyzed 125 whole MG sequences from all broods and seven species within three species groups to ascertain the divergence history of the geographic and allochronic populations and their life cycles. Our mitogenomic phylogeny analysis clearly revealed that each of the three species groups had largely similar phylogeographic subdivisions (east, middle, and west) and demographic histories (rapid population expansion after the last glacial period). The mitogenomic phylogeny also partly resolved the brood diversification process, which could be explained by hypothetical temporary life cycle shifts, and showed that none of the 13- and 17-year species within the species groups was monophyletic, possibly due to gene flow between them. Our findings clearly reveal phylogeographic structures in the three Magicicada species groups, demonstrating the advantage of whole MG sequence data in phylogeographic studies.
AbstractThere are numerous sources of variation in the rate of synonymous substitutions inside genes, such as direct selection on the nucleotide sequence, or mutation rate variation. Yet scans for positive selection rely on codon models which incorporate an assumption of effectively neutral synonymous substitution rate, constant between sites of each gene. Here we perform a large-scale comparison of approaches which incorporate codon substitution rate variation and propose our own simple yet effective modification of existing models. We find strong effects of substitution rate variation on positive selection inference. More than 70% of the genes detected by the classical branch-site model are presumably false positives caused by the incorrect assumption of uniform synonymous substitution rate. We propose a new model which is strongly favored by the data while remaining computationally tractable. With the new model we can capture signatures of nucleotide level selection acting on translation initiation and on splicing sites within the coding region. Finally, we show that rate variation is highest in the highly recombining regions, and we propose that recombination and mutation rate variation, such as high CpG mutation rate, are the two main sources of nucleotide rate variation. Although we detect fewer genes under positive selection in Drosophila than without rate variation, the genes which we detect contain a stronger signal of adaptation of dynein, which could be associated with Wolbachia infection. We provide software to perform positive selection analysis using the new model.
AbstractWe present a method that jointly analyzes the polymorphism and divergence sites in genomic sequences of multiple species to identify the genes under natural selection and pinpoint the occurrence time of selection to a specific lineage of the species phylogeny. This method integrates population genetics models using a Bayesian Poisson random field framework and combines information over all gene loci to boost the power for detecting selection. The method provides posterior distributions of the fitness effects of each gene along with parameters associated with the evolutionary history, including the species divergence time and effective population size of external species. The results of simulations demonstrate that our method achieves a high power to identify genes under positive selection for a wide range of selection intensity and provides reasonably accurate estimates of the population genetic parameters. The proposed method is applied to genomic sequences of humans, chimpanzees, gorillas, and orangutans and identifies a list of lineage-specific targets of positive selection. The positively selected genes in the human lineage are enriched in pathways of gene expression regulation, immune system and metabolism, etc. Our analysis provides insights into natural evolution in the coding regions of humans and great apes and thus serves as a basis for further molecular and functional studies.
AbstractThe Ashkenazi Jews (AJ) are a population isolate sharing ancestry with both European and Middle Eastern populations that has likely resided in Central Europe since at least the tenth century. Between the 11th and 16th centuries, the AJ population expanded eastward leading to two culturally distinct communities in Western/Central and Eastern Europe. Our aim was to determine whether the western and eastern groups are genetically distinct, and if so, what demographic processes contributed to population differentiation. We used Approximate Bayesian Computation to choose among models of AJ history and to infer demographic parameter values, including divergence times, effective population sizes, and levels of gene flow. For the ABC analysis, we used allele frequency spectrum and identical by descent-based statistics to capture information on a wide timescale. We also mitigated the effects of ascertainment bias when performing ABC on SNP array data by jointly modeling and inferring SNP discovery. We found that the most likely model was population differentiation between Eastern and Western AJ ∼400 years ago. The differentiation between the Eastern and Western AJ could be attributed to more extreme population growth in the Eastern AJ (0.250 per generation) than the Western AJ (0.069 per generation).
AbstractPyricularia is a fungal genus comprising several pathogenic species causing the blast disease in monocots. Pyricularia oryzae, the best-known species, infects rice, wheat, finger millet, and other crops. As past comparative and population genomics studies mainly focused on isolates of P. oryzae, the genomes of the other Pyricularia species have not been well explored. In this study, we obtained a chromosomal-level genome assembly of the finger millet isolate P. oryzae MZ5-1-6 and also highly contiguous assemblies of Pyricularia sp. LS, P. grisea, and P. pennisetigena. The differences in the genomic content of repetitive DNA sequences could largely explain the variation in genome size among these new genomes. Moreover, we found extensive gene gains and losses and structural changes among Pyricularia genomes, including a large interchromosomal translocation. We searched for homologs of known blast effectors across fungal taxa and found that most avirulence effectors are specific to Pyricularia, whereas many other effectors share homologs with distant fungal taxa. In particular, we discovered a novel effector family with metalloprotease activity, distinct from the well-known AVR-Pita family. We predicted 751 gene families containing putative effectors in 7 Pyricularia genomes and found that 60 of them showed differential expression in the P. oryzae MZ5-1-6 transcriptomes obtained under experimental conditions mimicking the pathogen infection process. In summary, this study increased our understanding of the structural, functional, and evolutionary genomics of the blast pathogen and identified new potential effector genes, providing useful data for developing crops with durable resistance.
AbstractAs limits on O2 availability during submergence impose severe constraints on aerobic respiration, the oxygen binding globin proteins of marine mammals are expected to have evolved under strong evolutionary pressures during their land-to-sea transition. Here, we address this question for the order Sirenia by retrieving, annotating, and performing detailed selection analyses on the globin repertoire of the extinct Steller’s sea cow (Hydrodamalis gigas), dugong (Dugong dugon), and Florida manatee (Trichechus manatus latirostris) in relation to their closest living terrestrial relatives (elephants and hyraxes). These analyses indicate most loci experienced elevated nucleotide substitution rates during their transition to a fully aquatic lifestyle. While most of these genes evolved under neutrality or strong purifying selection, the rate of nonsynonymous/synonymous replacements increased in two genes (Hbz-T1 and Hba-T1) that encode the α-type chains of hemoglobin (Hb) during each stage of life. Notably, the relaxed evolution of Hba-T1 is temporally coupled with the emergence of a chimeric pseudogene (Hba-T2/Hbq-ps) that contributed to the tandemly linked Hba-T1 of stem sirenians via interparalog gene conversion. Functional tests on recombinant Hb proteins from extant and ancestral sirenians further revealed that the molecular remodeling of Hba-T1 coincided with increased Hb–O2 affinity in early sirenians. Available evidence suggests that this trait evolved to maximize O2 extraction from finite lung stores and suppress tissue O2 offloading, thereby facilitating the low metabolic intensities of extant sirenians. In contrast, the derived reduction in Hb–O2 affinity in (sub)Arctic Steller’s sea cows is consistent with fueling increased thermogenesis by these once colossal marine herbivores.
AbstractMolecular phylogenetics has neglected polymorphisms within present and ancestral populations for a long time. Recently, multispecies coalescent based methods have increased in popularity, however, their application is limited to a small number of species and individuals. We introduced a polymorphism-aware phylogenetic model (PoMo), which overcomes this limitation and scales well with the increasing amount of sequence data whereas accounting for present and ancestral polymorphisms. PoMo circumvents handling of gene trees and directly infers species trees from allele frequency data. Here, we extend the PoMo implementation in IQ-TREE and integrate search for the statistically best-fit mutation model, the ability to infer mutation rate variation across sites, and assessment of branch support values. We exemplify an analysis of a hundred species with ten haploid individuals each, showing that PoMo can perform inference on large data sets. While PoMo is more accurate than standard substitution models applied to concatenated alignments, it is almost as fast. We also provide bmm-simulate, a software package that allows simulation of sequences evolving under PoMo. The new options consolidate the value of PoMo for phylogenetic analyses with population data.
AbstractTranscription regulatory networks (TRNs) are of central importance for both short-term phenotypic adaptation in response to environmental fluctuations and long-term evolutionary adaptation, with global regulatory genes often being targets of natural selection in laboratory experiments. Here, we combined evolution experiments, whole-genome resequencing, and molecular genetics to investigate the driving forces, genetic constraints, and molecular mechanisms that dictate how bacteria can cope with a drastic perturbation of their TRNs. The crp gene, encoding a major global regulator in Escherichia coli, was deleted in four different genetic backgrounds, all derived from the Long-Term Evolution Experiment (LTEE) but with different TRN architectures. We confirmed that crp deletion had a more deleterious effect on growth rate in the LTEE-adapted genotypes; and we showed that the ptsG gene, which encodes the major glucose-PTS transporter, gained CRP (cyclic AMP receptor protein) dependence over time in the LTEE. We then further evolved the four crp-deleted genotypes in glucose minimal medium, and we found that they all quickly recovered from their growth defects by increasing glucose uptake. We showed that this recovery was specific to the selective environment and consistently relied on mutations in the cis-regulatory region of ptsG, regardless of the initial genotype. These mutations affected the interplay of transcription factors acting at the promoters, changed the intrinsic properties of the existing promoters, or produced new transcription initiation sites. Therefore, the plasticity of even a single promoter region can compensate by three different mechanisms for the loss of a key regulatory hub in the E. coli TRN.
AbstractSex determination in varanids, Gila monsters, beaded lizards, and other anguimorphan lizards is still poorly understood. Sex chromosomes were reported only in a few species based solely on cytogenetics, which precluded assessment of their homology. We uncovered Z-chromosome-specific genes in varanids from their transcriptomes. Comparison of differences in gene copy numbers between sexes across anguimorphan lizards and outgroups revealed that homologous differentiated ZZ/ZW sex chromosomes are present in Gila monsters, beaded lizards, alligator lizards, and a wide phylogenetic spectrum of varanids. However, these sex chromosomes are not homologous to those known in other amniotes. We conclude that differentiated sex chromosomes were already present in the common ancestor of Anguimorpha living in the early Cretaceous or even in the Jurassic Period, 115–180 Ma, placing anguimorphan sex chromosomes among the oldest known in vertebrates. The analysis of transcriptomes of Komodo dragon (Varanus komodoensis) showed that the expression levels of genes linked to anguimorphan sex chromosomes are not balanced between sexes. Besides expanding our knowledge on vertebrate sex chromosome evolution, our study has important practical relevance for breeding and ecological studies. We introduce the first, widely applicable technique of molecular sexing in varanids, Gila monsters, and beaded lizards, where reliable determination of sex based on external morphology is dubious even in adults.
Genome Biology and Evolution, Volume 11, Issue 7, July 2019, Pages 1712–1722, doi:10.1093/gbe/evz120
AbstractMutations are the origin of genetic diversity, and the mutation rate is a fundamental parameter to understand all aspects of molecular evolution. The combination of mutation–accumulation experiments and high-throughput sequencing enabled the estimation of mutation rates in most model organisms, but several major eukaryotic lineages remain unexplored. Here, we report the first estimation of the spontaneous mutation rate in a model unicellular eukaryote from the Stramenopile kingdom, the diatom Phaeodactylum tricornutum (strain RCC2967). We sequenced 36 mutation accumulation lines for an average of 181 generations per line and identified 156 de novo mutations. The base substitution mutation rate per site per generation is μbs = 4.77 × 10−10 and the insertion–deletion mutation rate is μid = 1.58 × 10−11. The mutation rate varies as a function of the nucleotide context and is biased toward an excess of mutations from GC to AT, consistent with previous observations in other species. Interestingly, the mutation rates between the genomes of organelles and the nucleus differ, with a significantly higher mutation rate in the mitochondria. This confirms previous claims based on indirect estimations of the mutation rate in mitochondria of photosynthetic eukaryotes that acquired their plastid through a secondary endosymbiosis. This novel estimate enables us to infer the effective population size of P. tricornutum to be Ne∼8.72 × 106.
AbstractLeeches (Hirudinida) comprise a charismatic, yet often maligned group of organisms. Despite their ecological, economic, and medical importance, a general consensus on the phylogenetic relationships of major hirudinidan lineages is lacking. This absence of a consistent, robust phylogeny of early-diverging lineages has hindered our understanding of the underlying processes that enabled evolutionary diversification of this clade. Here, we used an anchored hybrid enrichment-based phylogenomic approach, capturing hundreds of loci to investigate phylogenetic relationships among major hirudinidan lineages and their closest living relatives. Our results suggest that a dramatic reinterpretation of early leech evolution is warranted. We recovered Branchiobdellida as sister to a clade that includes all major lineages of hirudinidans, but found Acanthobdella to be nested within Oceanobdelliformes. These results cast doubt on the utility of Acanthobdella as a “missing link” used to explain the origin of blood-feeding in hirudineans. Further, our results support a deep divergence between predominantly marine and freshwater lineages, while not supporting the reciprocal monophyly of jawed and proboscis-bearing leeches. To sum up, our phylogenomic resolution of early-diverging leeches provides a necessary foundation for illuminating the evolution of host–symbiont associations and key adaptations that have allowed leeches to colonize a wide diversity of habitats worldwide.
AbstractNature has found many ways to utilize transposable elements (TEs) throughout evolution. Many molecular and cellular processes depend on DNA-binding proteins recognizing hundreds or thousands of similar DNA motifs dispersed throughout the genome that are often provided by TEs. It has been suggested that TEs play an important role in the evolution of such systems, in particular, the rewiring of gene regulatory networks. One mechanism that can further enhance the rewiring of regulatory networks is nonallelic gene conversion between copies of TEs. Here, we will first review evidence for nonallelic gene conversion in TEs. Then, we will illustrate the benefits nonallelic gene conversion provides in rewiring regulatory networks. For instance, nonallelic gene conversion between TE copies offers an alternative mechanism to spread beneficial mutations that improve the network, it allows multiple mutations to be combined and transferred together, and it allows natural selection to work efficiently in spreading beneficial mutations and removing disadvantageous mutations. Future studies examining the role of nonallelic gene conversion in the evolution of TEs should help us to better understand how TEs have contributed to evolution.
AbstractThe phylogeny of Isopoda, a speciose order of crustaceans, remains unresolved, with different data sets (morphological, nuclear, mitochondrial) often producing starkly incongruent phylogenetic hypotheses. We hypothesized that extreme diversity in their life histories might be causing compositional heterogeneity/heterotachy in their mitochondrial genomes, and compromising the phylogenetic reconstruction. We tested the effects of different data sets (mitochondrial, nuclear, nucleotides, amino acids, concatenated genes, individual genes, gene orders), phylogenetic algorithms (assuming data homogeneity, heterogeneity, and heterotachy), and partitioning; and found that almost all of them produced unique topologies. As we also found that mitogenomes of Asellota and two Cymothoida families (Cymothoidae and Corallanidae) possess inversed base (GC) skew patterns in comparison to other isopods, we concluded that inverted skews cause long-branch attraction phylogenetic artifacts between these taxa. These asymmetrical skews are most likely driven by multiple independent inversions of origin of replication (i.e., nonadaptive mutational pressures). Although the PhyloBayes CAT-GTR algorithm managed to attenuate some of these artifacts (and outperform partitioning), mitochondrial data have limited applicability for reconstructing the phylogeny of Isopoda. Regardless of this, our analyses allowed us to propose solutions to some unresolved phylogenetic debates, and support Asellota are the most likely candidate for the basal isopod branch. As our findings show that architectural rearrangements might produce major compositional biases even on relatively short evolutionary timescales, the implications are that proving the suitability of data via composition skew analyses should be a prerequisite for every study that aims to use mitochondrial data for phylogenetic reconstruction, even among closely related taxa.
AbstractNitrogen fixation in legumes occurs via symbiosis with rhizobia. This process involves packages of symbiotic genes on mobile genetic elements that are readily transferred within or between rhizobial species, furnishing the recipient with the ability to interact with plant hosts. However, it remains elusive whether plant host migration has played a role in shaping the current distribution of genetic variation in symbiotic genes. Herein, we examined the genetic structure and phylogeographic pattern of symbiotic genes in 286 symbiotic strains of Mesorhizobium nodulating black locust (Robinia pseudoacacia), a cross-continental invasive legume species that is native to North America. We conducted detailed phylogeographic analysis and approximate Bayesian computation to unravel the complex demographic history of five key symbiotic genes. The sequencing results indicate an origin of symbiotic genes in Germany rather than North America. Our findings provide strong evidence of prehistoric lineage splitting and spatial expansion events resulting in multiple radiations of descendent clones from founding sequence types worldwide. Estimates of the timescale of divergence in North American and Chinese subclades suggest that black locust-specific symbiotic genes have been present in these continent many thousands of years before recent migration of plant host. Although numerous crop plants, including legumes, have found their centers of origin as centers of evolution and diversity, the number of legume-specific symbiotic genes with a known geographic origin is limited. This work sheds light on the coevolution of legumes and rhizobia.
AbstractEndosymbioses necessitate functional cooperation of cellular compartments to avoid pathway redundancy and streamline the control of biological processes. To gain insight into the metabolic compartmentation in chromerids, phototrophic relatives to apicomplexan parasites, we prepared a reference set of proteins probably localized to mitochondria, cytosol, and the plastid, taking advantage of available genomic and transcriptomic data. Training of prediction algorithms with the reference set now allows a genome-wide analysis of protein localization in Chromera velia and Vitrella brassicaformis. We confirm that the chromerid plastids house enzymatic pathways needed for their maintenance and photosynthetic activity, but for carbon and nitrogen allocation, metabolite exchange is necessary with the cytosol and mitochondria. This indeed suggests that the regulatory mechanisms operate in the cytosol to control carbon metabolism based on the availability of both light and nutrients. We discuss that this arrangement is largely shared with apicomplexans and dinoflagellates, possibly stemming from a common ancestral metabolic architecture, and supports the mixotrophy of the chromerid algae.
AbstractUnderstanding the patterns of genetic diversity and adaptation across species’ range is crucial to assess its long-term persistence and determine appropriate conservation measures. The impacts of human activities on the genetic diversity and genetic adaptation to heterogeneous environments remain poorly understood in the marine realm. The roughskin sculpin (Trachidermus fasciatus) is a small catadromous fish, and has been listed as a second-class state protected aquatic animal since 1988 in China. To elucidate the underlying mechanism of population genetic structuring and genetic adaptations to local environments, RAD tags were sequenced for 202 individuals in nine populations across the range of T. fasciatus in China. The pairwise FST values over 9,271 filtered SNPs were significant except that between Dongying and Weifang. All the genetic clustering analysis revealed significant population structure with high support for eight distinct genetic clusters. Both the minor allele frequency spectra and Ne estimations suggested extremely small Ne in some populations (e.g., Qinhuangdao, Rongcheng, Wendeng, and Qingdao), which might result from recent population bottleneck. The strong genetic structure can be partly attributed to genetic drift and habitat fragmentation, likely due to the anthropogenic activities. Annotations of candidate adaptive loci suggested that genes involved in metabolism, development, and osmoregulation were critical for adaptation to spatially heterogenous environment of local populations. In the context of anthropogenic activities and environmental change, results of the present population genomic work provided important contributions to the understanding of genetic differentiation and adaptation to changing environments.
AbstractTranscription factor (TF) binding is determined by sequence as well as chromatin accessibility. Although the role of accessibility in shaping TF-binding landscapes is well recorded, its role in evolutionary divergence of TF binding, which in turn can alter cis-regulatory activities, is not well understood. In this work, we studied the evolution of genome-wide binding landscapes of five major TFs in the core network of mesoderm specification, between Drosophila melanogaster and Drosophila virilis, and examined its relationship to accessibility and sequence-level changes. We generated chromatin accessibility data from three important stages of embryogenesis in both Drosophila melanogaster and Drosophila virilis and recorded conservation and divergence patterns. We then used multivariable models to correlate accessibility and sequence changes to TF-binding divergence. We found that accessibility changes can in some cases, for example, for the master regulator Twist and for earlier developmental stages, more accurately predict binding change than is possible using TF-binding motif changes between orthologous enhancers. Accessibility changes also explain a significant portion of the codivergence of TF pairs. We noted that accessibility and motif changes offer complementary views of the evolution of TF binding and developed a combined model that captures the evolutionary data much more accurately than either view alone. Finally, we trained machine learning models to predict enhancer activity from TF binding and used these functional models to argue that motif and accessibility-based predictors of TF-binding change can substitute for experimentally measured binding change, for the purpose of predicting evolutionary changes in enhancer activity.
AbstractWe present PopNetD3, a web tool that provides an integrated approach for the network-based visualization of population structure based on the PopNet clustering framework. Users first submit a tab-delimited file that defines diversity of SNPs across the genome which is subsequently processed by the PopNet backend to define patterns of conservation at the chromosome level. The resulting population structure is visualized through a dedicated D3-based tool, allowing users to interactively examine chromosomal regions predicted to share ancestry. We illustrate the capabilities of PopNetD3 through an analysis of 16 strains of Neisseria gonorrhoeae. PopNetD3 is capable of processing population data sets consisting of hundreds of individuals and is publicly available online at: http://compsysbio.org/popnetd3 Last Accessed: May 17, 2019.
AbstractSpermatozoa are one of the most strikingly diverse animal cell types. One poorly understood example of this diversity is sperm heteromorphism, where males produce multiple distinct morphs of sperm in a single ejaculate. Typically, only one morph is capable of fertilization and the function of the nonfertilizing morph, called parasperm, remains to be elucidated. Sperm heteromorphism has multiple independent origins, including Lepidoptera (moths and butterflies), where males produce a fertilizing eupyrene sperm and an apyrene parasperm, which lacks a nucleus and nuclear DNA. Here we report a comparative proteomic analysis of eupyrene and apyrene sperm between two distantly related lepidopteran species, the monarch butterfly (Danaus plexippus) and Carolina sphinx moth (Manduca sexta). In both species, we identified ∼700 sperm proteins, with half present in both morphs and the majority of the remainder observed only in eupyrene sperm. Apyrene sperm thus have a distinctly less complex proteome. Gene ontology (GO) analysis revealed proteins shared between morphs tend to be associated with canonical sperm cell structures (e.g., flagellum) and metabolism (e.g., ATP production). GO terms for morph-specific proteins broadly reflect known structural differences, but also suggest a role for apyrene sperm in modulating female neurobiology. Comparative analysis indicates that proteins shared between morphs are most conserved between species as components of sperm, whereas morph-specific proteins turn over more quickly, especially in apyrene sperm. The rapid divergence of apyrene sperm content is consistent with a relaxation of selective constraints associated with fertilization and karyogamy. On the other hand, parasperm generally exhibit greater evolutionary lability, and our observations may therefore reflect adaptive responses to shifting regimes of sexual selection.