The Society for Molecular Biology and Evolution is now accepting abstracts for the 2016 annual meeting, taking place on the Gold Coast, Queensland, Australia, July 3-7, 2016.
Continue Reading →
Julie Marin and S. Blair Hedges
AbstractChoanoflagellates and filastereans are the closest known single celled relatives of Metazoa within Holozoa and provide insight into how animals evolved from their unicellular ancestors. Codon usage bias has been extensively studied in metazoans, with both natural selection and mutation pressure playing important roles in different species. The disparate nature of metazoan codon usage patterns prevents the reconstruction of ancestral traits. However, traits conserved across holozoan protists highlight characteristics in the unicellular ancestors of Metazoa. Presented here are the patterns of codon usage in the choanoflagellates Monosiga brevicollis and Salpingoeca rosetta, as well as the filasterean Capsaspora owczarzaki. Codon usage is shown to be remarkably conserved. Highly biased genes preferentially use GC-ending codons, however there is limited evidence this is driven by local mutation pressure. The analyses presented provide strong evidence that natural selection, for both translational accuracy and efficiency, dominates codon usage bias in holozoan protists. In particular, the signature of selection for translational accuracy can be detected even in the most weakly biased genes. Biased codon usage is shown to have coevolved with the tRNA species, with optimal codons showing complementary binding to the highest copy number tRNA genes. Furthermore, tRNA modification is shown to be a common feature for amino acids with higher levels of degeneracy and highly biased genes show a strong preference for using modified tRNAs in translation. The translationally optimal codons defined here will be of benefit to future transgenics work in holozoan protists, as their use should maximise protein yields from edited transgenes.
AbstractMultiple sequence alignment is a prerequisite for many evolutionary analyses. Multiple Alignment of Coding Sequences (MACSE) is a multiple sequence alignment program that explicitly accounts for the underlying codon structure of protein-coding nucleotide sequences. Its unique characteristic allows building reliable codon alignments even in the presence of frameshifts. This facilitates downstream analyses such as selection pressure estimation based on the ratio of nonsynonymous to synonymous substitutions. Here, we present MACSE v2, a major update with an improved version of the initial algorithm enriched with a complete toolkit to handle multiple alignments of protein-coding sequences. A graphical interface now provides user-friendly access to the different subprograms.
AbstractNeuropeptides are neurosecretory signaling molecules in protostomes and deuterostomes (together Nephrozoa). Little, however, is known about the neuropeptide complement of the sister group of Nephrozoa, the Xenacoelomorpha, which together form the Bilateria. Because members of the xenacoelomorph clades Xenoturbella, Nemertodermatida, and Acoela differ extensively in their central nervous system anatomy, the reconstruction of the xenacoelomorph and bilaterian neuropeptide complements may provide insights into the relationship between nervous system evolution and peptidergic signaling. Here, we analyzed transcriptomes of seven acoels, four nemertodermatids, and two Xenoturbella species using motif searches, similarity searches, mass spectrometry and phylogenetic analyses to characterize neuropeptide precursors and neuropeptide receptors. Our comparison of these repertoires with previously reported nephrozoan and cnidarian sequences shows that the majority of annotated neuropeptide GPCRs in cnidarians are not orthologs of specific bilaterian neuropeptide receptors, which suggests that most of the bilaterian neuropeptide systems evolved after the cnidarian–bilaterian evolutionary split. This expansion of more than 20 peptidergic systems in the stem leading to the Bilateria predates the evolution of complex nephrozoan organs and nervous system architectures. From this ancient set of neuropeptides, acoels show frequent losses that correlate with their divergent central nervous system anatomy. We furthermore detected the emergence of novel neuropeptides in xenacoelomorphs and their expansion along the nemertodermatid and acoel lineages, the two clades that evolved nervous system condensations. Together, our study provides fundamental insights into the early evolution of the bilaterian peptidergic systems, which will guide future functional and comparative studies of bilaterian nervous systems.
AbstractAmong mammals, several lineages have independently adapted to a subterranean niche and possess similar phenotypic traits for burrowing (e.g., cylindrical bodies, short limbs, and absent pinnae). Previous research on mole-rats has revealed molecular adaptations for coping with reduced oxygen, elevated carbon dioxide, and the absence of light. In contrast, almost nothing is known regarding molecular adaptations in other subterranean lineages (e.g., true moles and golden moles). Therefore, the extent to which the recurrent phenotypic adaptations of divergent subterranean taxa have arisen via parallel routes of molecular evolution remains untested. To address these issues, we analyzed ∼8,000 loci in 15 representative subterranean taxa of four independent transitions to an underground niche for signatures of positive selection and convergent amino acid substitutions. Complementary analyses were performed in nonsubterranean “control” taxa to assess the biological significance of results. We found comparable numbers of positively selected genes in each of the four subterranean groups; however, correspondence in terms of gene identity between gene sets was low. Furthermore, we did not detect evidence of more convergent amino acids among subterranean species pairs compared with levels found between nonsubterranean controls. Comparisons with nonsubterranean taxa also revealed loci either under positive selection or with convergent substitutions, with similar functional enrichment (e.g., cell adhesion, immune response, and coagulation). Given the limited indication that positive selection and convergence occurred in the same loci, we conclude that selection may have acted on different loci across subterranean mammal lineages to produce similar phenotypes.
Joana Isabel Meier, David Alexander Marques, Catherine Elise Wagner, Laurent Excoffier, and Ole Seehausen
AbstractDeep coalescence and introgression make it challenging to infer phylogenetic relationships among closely related species that arose through radiative speciation events. Despite numerous phylogenetic analyses and the availability of whole genomes, the phylogeny in the Anopheles gambiae species complex has not been confidently resolved. Here we extract over 80, 000 coding and noncoding short segments (called loci) from the genomes of six members of the species complex and use a Bayesian method under the multispecies coalescent model to infer the species tree, which takes into account genealogical heterogeneity across the genome and uncertainty in the gene trees. We obtained a robust estimate of the species tree from the distal region of the X chromosome: (A. merus, ((A. melas, (A. arabiensis, A. quadriannulatus)), (A. gambiae, A. coluzzii))), with A. merus to be the earliest branching species. This species tree agrees with the chromosome inversion phylogeny and provides a parsimonious interpretation of inversion and introgression events. Simulation informed by the real data suggest that the coalescent approach is reliable while the sliding-window analysis used in a previous phylogenomic study generates artifactual species trees. Likelihood ratio test of gene flow revealed strong evidence of autosomal introgression from A. arabiensis into A. gambiae (at the average rate of ∼0.2 migrants per generation), but not in the opposite direction, and introgression of the 3 L chromosomal region from A. merus into A. quadriannulatus. Our results highlight the importance of accommodating incomplete lineage sorting and introgression in phylogenomic analyses of species that arose through recent radiative speciation events.
AbstractOverlapping genes in viruses maximize the coding capacity of their genomes and allow the generation of new genes without major increases in genome size. Despite their importance, the evolution and function of overlapping genes are often not well understood, in part due to difficulties in their detection. In addition, most bioinformatic approaches for the detection of overlapping genes require the comparison of multiple genome sequences that may not be available in metagenomic surveys of virus biodiversity. We introduce a simple new method for identifying candidate functional overlapping genes using single virus genome sequences. Our method uses randomization tests to estimate the expected length of open reading frames and then identifies overlapping open reading frames that significantly exceed this length and are thus predicted to be functional. We applied this method to 2548 reference RNA virus genomes and find that it has both high sensitivity and low false discovery for genes that overlap by at least 50 nucleotides. Notably, this analysis provided evidence for 29 previously undiscovered functional overlapping genes, some of which are coded in the antisense direction suggesting there are limitations in our current understanding of RNA virus replication.
AbstractTransposable elements (TEs) contribute to a large fraction of the expansion of many eukaryotic genomes due to the capability of TEs duplicating themselves through transposition. A first step to understanding the roles of TEs in a eukaryotic genome is to characterize the population-wide variation of TE insertions in the species. Here, we present a maximum-likelihood (ML) method for estimating allele frequencies and detecting selection on TE insertions in a diploid population, based on the genotypes at TE insertion sites detected in multiple individuals sampled from the population using paired-end (PE) sequencing reads. Tests of the method on simulated data show that it can accurately estimate the allele frequencies of TE insertions even when the PE sequencing is conducted at a relatively low coverage (=5X). The method can also detect TE insertions under strong selection, and the detection ability increases with sample size in a population, although a substantial fraction of actual TE insertions under selection may be undetected. Application of the ML method to genomic sequencing data collected from a natural Daphnia pulex population shows that, on the one hand, most (>90%) TE insertions present in the reference D. pulex genome are either fixed or nearly fixed (with allele frequencies >0.95); on the other hand, among the nonreference TE insertions (i.e., those detected in some individuals in the population but absent from the reference genome), the majority (>70%) are still at low frequencies (<0.1). Finally, we detected a substantial fraction (∼9%) of nonreference TE insertions under selection.
AbstractUnderstanding the relationship between protein sequence, function, and stability is a fundamental problem in biology. The essential function of many proteins that fold into a specific structure is their ability to bind to a ligand, which can be assayed for thousands of mutated variants. However, binding assays do not distinguish whether mutations affect the stability of the binding interface or the overall fold. Here, we introduce a statistical method to infer a detailed energy landscape of how a protein folds and binds to a ligand by combining information from many mutated variants. We fit a thermodynamic model describing the bound, unbound, and unfolded states to high quality data of protein G domain B1 binding to IgG-Fc. We infer distinct folding and binding energies for each mutation providing a detailed view of how mutations affect binding and stability across the protein. We accurately infer the folding energy of each variant in physical units, validated by independent data, whereas previous high-throughput methods could only measure indirect changes in stability. While we assume an additive sequence–energy relationship, the binding fraction is epistatic due its nonlinear relation to energy. Despite having no epistasis in energy, our model explains much of the observed epistasis in binding fraction, with the remaining epistasis identifying conformationally dynamic regions.
AbstractThe rate of recombination impacts on rates of protein evolution for at least two reasons: it affects the efficacy of selection due to linkage and influences sequence evolution through the process of GC-biased gene conversion (gBGC). We studied how recombination, via gBGC, affects inferences of selection in gene sequences using comparative genomic and population genomic data from the collared flycatcher (Ficedula albicollis). We separately analyzed different mutation categories (“strong”-to-“weak,” “weak-to-strong,” and GC-conservative changes) and found that gBGC impacts on the distribution of fitness effects of new mutations, and leads to that the rate of adaptive evolution and the proportion of adaptive mutations among nonsynonymous substitutions are underestimated by 22–33%. It also biases inferences of demographic history based on the site frequency spectrum. In light of this impact, we suggest that inferences of selection (and demography) in lineages with pronounced gBGC should be based on GC-conservative changes only. Doing so, we estimate that 10% of nonsynonymous mutations are effectively neutral and that 27% of nonsynonymous substitutions have been fixed by positive selection in the flycatcher lineage. We also find that gene expression level, sex-bias in expression, and the number of protein–protein interactions, but not Hill–Robertson interference (HRI), are strong determinants of selective constraint and rate of adaptation of collared flycatcher genes. This study therefore illustrates the importance of disentangling the effects of different evolutionary forces and genetic factors in interpretation of sequence data, and from that infer the role of natural selection in DNA sequence evolution.
AbstractThe Shine–Dalgarno (SD) sequence motif facilitates translation initiation and is frequently found upstream of bacterial start codons. However, thousands of instances of this motif occur throughout the middle of protein coding genes in a typical bacterial genome. Here, we use comparative evolutionary analysis to test whether SD sequences located within genes are functionally constrained. We measure the conservation of SD sequences across Enterobacteriales, and find that they are significantly less conserved than expected. Further, the strongest SD sequences are the least conserved whereas we find evidence of conservation for the weakest possible SD sequences given amino acid constraints. Our findings indicate that most SD sequences within genes are likely to be deleterious and removed via selection. To illustrate the origin of these deleterious costs, we show that ATG start codons are significantly depleted downstream of SD sequences within genes, highlighting the constraint that these sequences impose on the surrounding nucleotides to minimize the potential for erroneous translation initiation.
AbstractGene duplication is an important driver for the evolution of new genes and protein functions. Duplication of DNA-dependent RNA polymerase (Pol) II subunits within plants led to the emergence of RNA Pol IV and V complexes, each of which possess unique functions necessary for RNA-directed DNA Methylation. Comprehensive identification of Pol V subunit orthologs across the monocot radiation revealed a duplication of the largest two subunits within the grasses (Poaceae), including critical cereal crops. These paralogous Pol subunits display sequence conservation within catalytic domains, but their carboxy terminal domains differ in length and character of the Ago-binding platform, suggesting unique functional interactions. Phylogenetic analysis of the catalytic region indicates positive selection on one paralog following duplication, consistent with retention via neofunctionalization. Positive selection on residue pairs that are predicted to interact between subunits suggests that paralogous subunits have evolved specific assembly partners. Additional Pol subunits as well as Pol-interacting proteins also possess grass-specific paralogs, supporting the hypothesis that a novel Pol complex with distinct function has evolved in the grass family, Poaceae.
AbstractViral genome integration provides a complex route to biological innovation that has rarely but repeatedly occurred in one of the most diverse lineages of organisms on the planet, parasitoid wasps. We describe a novel endogenous virus in braconid wasps derived from pathogenic alphanudiviruses. Limited to a subset of the genus Fopius, this recent acquisition allows an unprecedented opportunity to examine early endogenization events. Massive amounts of virus-like particles (VLPs) are produced in wasp ovaries. Unlike most endogenous viruses of parasitoid wasps, the VLPs do not contain DNA, translating to major differences in parasitism-promoting strategies. Rapid changes include genomic rearrangement, loss of DNA processing proteins, and wasp control of viral gene expression. These events precede the full development of tissue-specific viral gene expression observed in older associations. These data indicate that viral endogenization can rapidly result in functional and evolutionary changes associated with genomic novelty and adaptation in parasitoids.
AbstractThe multispecies coalescent provides a natural framework for accommodating ancestral genetic polymorphism and coalescent processes that can cause different genomic regions to have different genealogical histories. The Bayesian program BPP includes a full-likelihood implementation of the multispecies coalescent, using transmodel Markov chain Monte Carlo to calculate the posterior probabilities of different species trees. BPP is suitable for analyzing multilocus sequence data sets and it accommodates the heterogeneity of gene trees (both the topology and branch lengths) among loci and gene tree uncertainties due to limited phylogenetic information at each locus. Here, we provide a practical guide to the use of BPP in species tree estimation. BPP is a command-line program that runs on linux, macosx, and windows. This protocol shows how to use both BPP 3.4 (http://abacus.gene.ucl.ac.uk/software/) and BPP 4.0 (https://github.com/bpp/).
AbstractExpression of transposable elements (TE) is transiently activated during human preimplantation embryogenesis in a developmental stage- and cell type-specific manner and TE-mediated epigenetic regulation is intrinsically wired in developmental genetic networks in human embryos and embryonic stem cells. However, there are no systematic studies devoted to a comprehensive analysis of the TE transcriptome in human adult organs and tissues, including human neural tissues. To investigate TE expression in the human Dorsolateral Prefrontal Cortex (DLPFC), we developed and validated a straightforward analytical approach to chart quantitative genome-wide expression profiles of all annotated TE loci based on unambiguous mapping of discrete TE-encoded transcripts using a de novo assembly strategy. To initially evaluate the potential regulatory impact of DLPFC-expressed TE, we adopted a comparative evolutionary genomics approach across humans, primates, and rodents to document conservation patterns, lineage-specificity, and colocalizations with transcription factor binding sites mapped within primate- and human-specific TE. We identified 654,665 transcripts expressed from 477,507 distinct loci of different TE classes and families, the majority of which appear to have originated from primate-specific sequences. We discovered 4,687 human-specific and transcriptionally active TEs in DLPFC, of which the prominent majority (80.2%) appears spliced. Our analyses revealed significant associations of DLPFC-expressed TE with primate- and human-specific transcription factor binding sites, suggesting potential cross-talks of concordant regulatory functions. We identified 1,689 TEs differentially expressed in the DLPFC of Schizophrenia patients, a majority of which is located within introns of 1,137 protein-coding genes. Our findings imply that identified DLPFC-expressed TEs may affect human brain structures and functions following different evolutionary trajectories. On one side, hundreds of thousands of TEs maintained a remarkably high conservation for ∼8 My of primates’ evolution, suggesting that they are likely conveying evolutionary-constrained primate-specific regulatory functions. In parallel, thousands of transcriptionally active human-specific TE loci emerged more recently, suggesting that they could be relevant for human-specific behavioral or cognitive functions.
AbstractTranscriptome-based exon capture methods provide an approach to recover several hundred markers from genomic DNA, allowing for robust phylogenetic estimation at deep timescales. We applied this method to a highly diverse group of venomous marine snails, Conoidea, for which published phylogenetic trees remain mostly unresolved for the deeper nodes. We targeted 850 protein coding genes (678,322 bp) in ca. 120 samples, spanning all (except one) known families of Conoidea and a broad selection of non-Conoidea neogastropods. The capture was successful for most samples, although capture efficiency decreased when DNA libraries were of insufficient quality and/or quantity (dried samples or low starting DNA concentration) and when targeting the most divergent lineages. An average of 75.4% of proteins was recovered, and the resulting tree, reconstructed using both supermatrix (IQ-tree) and supertree (Astral-II, combined with the Weighted Statistical Binning method) approaches, are almost fully supported. A reconstructed fossil-calibrated tree dates the origin of Conoidea to the Lower Cretaceous. We provide descriptions for two new families. The phylogeny revealed in this study provides a robust framework to reinterpret changes in Conoidea anatomy through time. Finally, we used the phylogeny to test the impact of the venom gland and radular type on diversification rates. Our analyses revealed that repeated losses of the venom gland had no effect on diversification rates, while families with a breadth of radula types showed increases in diversification rates, thus suggesting that trophic ecology may have an impact on the evolution of Conoidea.
AbstractBats are excellent models for studying the molecular basis of sensory adaptation. In Chiroptera, a sensory trade-off has been proposed between the visual and auditory systems, though the extent of this association has yet to be fully examined. To investigate whether variation in visual performance is associated with echolocation, we experimentally assayed the dim-light visual pigment rhodopsin from bat species with differing echolocation abilities. While spectral tuning properties were similar among bats, we found that the rate of decay of their light-activated state was significantly slower in a nonecholocating bat relative to species that use distinct echolocation strategies, consistent with a sensory trade-off hypothesis. We also found that these rates of decay were remarkably slower compared with those of other mammals, likely indicating an adaptation to dim light. To examine whether functional changes in rhodopsin are associated with shifts in selection intensity upon bat Rh1 sequences, we implemented selection analyses using codon-based likelihood clade models. While no shifts in selection were identified in response to diverse echolocation abilities of bats, we detected a significant increase in the intensity of evolutionary constraint accompanying the diversification of Chiroptera. Taken together, this suggests that substitutions that modulate the stability of the light-activated rhodopsin state were likely maintained through intensified constraint after bats diversified, being finely tuned in response to novel sensory specializations. Our study demonstrates the power of combining experimental and computational approaches for investigating functional mechanisms underlying the evolution of complex sensory adaptations.
AbstractViral evolutionary pathways are determined by the fitness landscape, which maps viral genotype to fitness. However, a quantitative description of the landscape and the evolutionary forces on it remain elusive. Here, we apply a biophysical fitness model based on capsid folding stability and antibody binding affinity to predict the evolutionary pathway of norovirus escaping a neutralizing antibody. The model is validated by experimental evolution in bulk culture and in a drop-based microfluidics that propagates millions of independent small viral subpopulations. We demonstrate that along the axis of binding affinity, selection for escape variants and drift due to random mutations have the same direction, an atypical case in evolution. However, along folding stability, selection and drift are opposing forces whose balance is tuned by viral population size. Our results demonstrate that predictable epistatic tradeoffs between molecular traits of viral proteins shape viral evolution.
AbstractManaging the emergence and spread of crop pests and pathogens is essential for global food security. Understanding how organisms have adapted to their native climate is key to predicting the impact of climate change. The potato cyst nematodes Globodera pallida and G. rostochiensis are economically important plant pathogens that cause yield losses of up to 50% in potato. The two species have different thermal optima that may relate to differences in the altitude of their regions of origin in the Andes. Here, we demonstrate that juveniles of G. pallida are less able to recover from heat stress than those of G. rostochiensis. Genome-wide analysis revealed that while both Globodera species respond to heat stress by induction of various protective heat-inducible genes, G. pallida experiences heat stress at lower temperatures. We use C. elegans as a model to demonstrate the dependence of the heat stress response on expression of Heat Shock Factor-1 (HSF-1). Moreover, we show that hsp-110 is induced by heat stress in G. rostochiensis, but not in the less thermotolerant G. pallida. Sequence analysis revealed that this gene and its promoter was duplicated in G. rostochiensis and acquired thermoregulatory properties. We show that hsp-110 is required for recovery from acute thermal stress in both C. elegans and in G. rostochiensis. Our findings point towards an underlying molecular mechanism that allows the differential expansion of one species relative to another closely related species under current climate change scenarios. Similar mechanisms may be true of other invertebrate species with pest status.
AbstractThe mutation rate of an organism is influenced by the interaction of evolutionary forces such as natural selection and genetic drift. However, the mutation spectrum (i.e., the frequency distribution of different types of mutations) can be heavily influenced by DNA repair. Using mutation-accumulation lines of the extremophile bacterium Deinococcus radiodurans ΔmutS1 and the model soil bacterium Pseudomonas fluorescens wild-type and MMR− (Methyl-dependent Mismatch Repair-deficient) strains, we report the mutational features of these two important bacteria. We find that P. fluorescens has one of the highest MMR repair efficiencies among tested bacteria. We also discover that MMR of D. radiodurans preferentially repairs deletions, contrary to all other bacteria examined. We then, for the first time, quantify genome-wide efficiency and specificity of MMR in repairing different genomic regions and mutation types, by evaluating the P. fluorescens and D. radiodurans mutation data sets, along with previously reported ones of Bacillus subtilis subsp. subtilis, Escherichia coli, Vibrio cholerae, and V. fischeri. MMR in all six bacteria shares two general features: 1) repair efficiency is influenced by the neighboring base composition for both transitions and transversions, not limited to transversions as previously reported; and 2) MMR only recognizes indels <4 bp in length. This study demonstrates the power of mutation accumulation lines in quantifying DNA repair and mutagenesis patterns.
AbstractThe heterochromatic genome compartment mediates strictly conserved cellular processes such as chromosome segregation, telomere integrity, and genome stability. Paradoxically, heterochromatic DNA sequence is wildly unconserved. Recent reports that many hybrid incompatibility genes encode heterochromatin proteins, together with the observation that interspecies hybrids suffer aberrant heterochromatin-dependent processes, suggest that heterochromatic DNA packaging requires species-specific innovations. Testing this model of coevolution between fast-evolving heterochromatic DNA and its packaging proteins begins with defining the latter. Here we describe many such candidates encoded by the Heterochromatin Protein 1 (HP1) gene family across Diptera, an insect Order that encompasses dramatic episodes of heterochromatic sequence turnover. Using BLAST, synteny analysis, and phylogenetic tree building across 64 Diptera genomes, we discovered a staggering 121 HP1 duplication events. In contrast, we observed virtually no gene duplication in gene families that share a common “chromodomain” with HP1s, including Polycomb and Su(var)3-9. The remarkably high number of Dipteran HP1 paralogs arises from distant clades undergoing convergent HP1 family amplifications. These independently derived, young HP1s span diverse ages, domain structures, and rates of molecular evolution, including episodes of positive selection. Moreover, independently derived HP1s exhibit convergent expression evolution. While ancient HP1 parent genes are transcribed ubiquitously, young HP1 paralogs are transcribed primarily in male germline tissue, a pattern typical of young genes. Pervasive gene youth, rapid evolution, and germline specialization implicate heterochromatin-encoded selfish elements driving recurrent HP1 gene family expansions. The 121 young genes offer valuable experimental traction for elucidating the germline processes shaped by Diptera’s many dramatic episodes of heterochromatin turnover.
Nature provides countless examples of evolutionary arms races, in which species develop adaptations and counter-adaptations in a struggle for survival and reproduction. Such arms races are common between predator and prey or between parasite and host. Understanding this coevolutionary process can aid in our ability to develop necessary countermeasures, such as overcoming bacterial resistance to antibiotics.
AbstractGene duplication and loss contribute to gene content differences as well as phenotypic divergence across species. However, the extent to which gene content varies among closely related plant species and the factors responsible for such variation remain unclear. Here, using the Solanaceae family as a model and Pfam domain families as a proxy for gene families, we investigated variation in gene family sizes across species and the likely factors contributing to the variation. We found that genes in highly variable families have high turnover rates and tend to be involved in processes that have diverged between Solanaceae species, whereas genes in low-variability families tend to have housekeeping roles. In addition, genes in high- and low-variability gene families tend to be duplicated by tandem and whole genome duplication, respectively. This finding together with the observation that genes duplicated by different mechanisms experience different selection pressures suggest that duplication mechanism impacts gene family turnover. We explored using pseudogene number as a proxy for gene loss but discovered that a substantial number of pseudogenes are actually products of pseudogene duplication, contrary to the expectation that most plant pseudogenes are remnants of once-functional duplicates. Our findings reveal complex relationships between variation in gene family size, gene functions, duplication mechanism, and evolutionary rate. The patterns of lineage-specific gene family expansion within the Solanaceae provide the foundation for a better understanding of the genetic basis underlying phenotypic diversity in this economically important family.
AbstractEndogenous viral sequences in eukaryotic genomes, such as those derived from plant pararetroviruses (PRVs), can serve as genomic fossils to study viral macroevolution. Many aspects of viral evolutionary rates are heterogeneous, including substitution rate differences between genes. However, the evolutionary dynamics of this viral gene rate heterogeneity (GRH) have been rarely examined. Characterizing such GRH may help to elucidate viral adaptive evolution. In this study, based on robust phylogenetic analysis, we determined an ancient endogenous PRV group in Oryza genomes in the range of being 2.41–15.00 Myr old. We subsequently used this ancient endogenous PRV group and three younger groups to estimate the GRH of PRVs. Long-term substitution rates for the most conserved gene and a divergent gene were 2.69 × 10−8 to 8.07 × 10−8 and 4.72 × 10−8 to 1.42 × 10−7 substitutions/site/year, respectively. On the basis of a direct comparison, a long-term GRH of 1.83-fold was identified between these two genes, which is unexpectedly low and lower than the short-term GRH (>3.40-fold) of PRVs calculated using published data. The lower long-term GRH of PRVs was due to the slightly faster rate decay of divergent genes than of conserved genes during evolution. To the best of our knowledge, we quantified for the first time the long-term GRH of viral genes using paleovirological analyses, and proposed that the GRH of PRVs might be heterogeneous on time scales (time-dependent GRH). Our findings provide special insights into viral gene macroevolution and should encourage a more detailed examination of the viral GRH.
AbstractVitellogenin (Vtg) is a glycolipophosphoprotein produced by oviparous and ovoviviparous species and is the precursor protein of the yolk, an essential nutrient reserve for embryonic development and early larval stages. Vtg is encoded by a family of paralog genes whose number varies in the different vertebrate lineages. Its evolution has been the subject of considerable analyses but it remains still unclear. In this work, microsyntenic and phylogenetic analyses were performed in order to increase our knowledge on the evolutionary history of this gene family in vertebrates. Our results support the hypothesis that the vitellogenin gene family is expanded from two genes both present at the beginning of vertebrate radiation through multiple independent duplication events occurred in the diverse lineages.
AbstractIt is often unavoidable to combine data from different sequencing centers or sequencing platforms when compiling data sets with a large number of individuals. However, the different data are likely to contain specific systematic errors that will appear as SNPs. Here, we devise a method to detect systematic errors in combined data sets. To measure quality differences between individual genomes, we study pairs of variants that reside on different chromosomes and co-occur in individuals. The abundance of these pairs of variants in different genomes is then used to detect systematic errors due to batch effects. Applying our method to the 1000 Genomes data set, we find that coding regions are enriched for errors, where ∼1% of the higher frequency variants are predicted to be erroneous, whereas errors outside of coding regions are much rarer (<0.001%). As expected, predicted errors are found less often than other variants in a data set that was generated with a different sequencing technology, indicating that many of the candidates are indeed errors. However, predicted 1000 Genomes errors are also found in other large data sets; our observation is thus not specific to the 1000 Genomes data set. Our results show that batch effects can be turned into a virtue by using the resulting variation in large scale data sets to detect systematic errors.
AbstractChlamydiae are an example of obligate intracellular bacteria that possess highly reduced, compact genomes (1.0–3.5 Mbp), reflective of their abilities to sequester many essential nutrients from the host that they no longer need to synthesize themselves. The Chlamydiae is a phylum with a very wide host range spanning mammals, birds, fish, invertebrates, and unicellular protists. This ecological and phylogenetic diversity offers ongoing opportunities to study intracellular survival and metabolic pathways and adaptations. Of particular evolutionary significance are Chlamydiae from the recently proposed Ca. Parilichlamydiaceae, the earliest diverging clade in this phylum, species of which are found only in aquatic vertebrates. Gill extracts from three Chlamydiales-positive Australian aquaculture species (Yellowtail kingfish, Striped trumpeter, and Barramundi) were subject to DNA preparation to deplete host DNA and enrich microbial DNA, prior to metagenome sequencing. We assembled chlamydial genomes corresponding to three Ca. Parilichlamydiaceae species from gill metagenomes, and conducted functional genomics comparisons with diverse members of the phylum. This revealed highly reduced genomes more similar in size to the terrestrial Chlamydiaceae, standing in contrast to members of the Chlamydiae with a demonstrated cosmopolitan host range. We describe a reduction in genes encoding synthesis of nucleotides and amino acids, among other nutrients, and an enrichment of predicted transport proteins. Ca. Parilichlamydiaceae share 342 orthologs with other chlamydial families. We hypothesize that the genome reduction exhibited by Ca. Parilichlamydiaceae and Chlamydiaceae is an example of within-phylum convergent evolution. The factors driving these events remain to be elucidated.
AbstractPrediction of evolutionary trajectories has been an elusive goal, requiring a deep knowledge of underlying mechanisms that relate genotype to phenotype plus understanding how phenotype impacts organismal fitness. We tested our ability to predict molecular regulatory evolution in a bacteriophage (T7) whose RNA polymerase (RNAP) was altered to recognize a heterologous promoter differing by three nucleotides from the wild-type promoter. A mutant of wild-type T7 lacking its RNAP gene was passaged on a bacterial strain providing the novel RNAP in trans. Higher fitness rapidly evolved. Predicting the evolutionary trajectory of this adaptation used measured in vitro transcription rates of the novel RNAP on the six promoter sequences capturing all possible one-step pathways between the wild-type and the heterologous promoter sequences. The predictions captured some of the regulatory evolution but failed both in explaining 1) a set of T7 promoters that consistently failed to evolve and 2) some promoter evolution that fell outside the expected one-step pathways. Had a more comprehensive set of transcription assays been undertaken initially, all promoter evolution would have fallen within predicted bounds, but the lack of evolution in some promoters is unresolved. Overall, this study points toward the increasing feasibility of predicting evolution in well-characterized, simple systems.
AbstractThe Lennoaceae, a small monophyletic plant family of root parasites endemic to the Americas, are one of the last remaining independently evolved lineages of parasitic angiosperms lacking a published plastome. In this study, we present the assembled and annotated plastomes of two species spanning the crown node of Lennoaceae, Lennoa madreporoides and Pholisma arenarium, as well as their close autotrophic relative from the sister family Ehretiaceae, Tiquilia plicata. We find that the plastomes of L. madreporoides and P. arenarium are similar in size and gene content, and substantially reduced compared to T. plicata, consistent with trends seen in other holoparasitic lineages. In particular, most plastid genes involved in photosynthesis function have been lost, whereas housekeeping genes (ribosomal protein-coding genes, rRNAs, and tRNAs) are retained. One notable exception is the persistence of a rbcL open reading frame in P. arenarium but not L. madreporoides suggesting a nonphotosynthetic function for this gene. Of the retained coding genes, dN/dS ratios indicate that some remain under purifying selection, whereas others show relaxed selection. Overall, this study supports the mounting evidence for convergent plastome evolution in flowering plants following the shift to heterotrophy.
AbstractFeather diversity is striking in many aspects. Although the development of feather has been studied for decades, genetic and genomic studies of feather diversity have begun only recently. Many questions remain to be answered by multidisciplinary approaches. In this review, we discuss three levels of feather diversity: Feather morphotypes, intraspecific variations, and interspecific variations. We summarize recent studies of feather evolution in terms of genetics, genomics, and developmental biology and provide perspectives for future research. Specifically, this review includes the following topics: 1) Diversity of feather morphotype; 2) feather diversity among different breeds of domesticated birds, including variations in pigmentation pattern, in feather length or regional identity, in feather orientation, in feather distribution, and in feather structure; and 3) diversity of feathers among avian species, including plumage color and morph differences between species and the regulatory differences in downy feather development between altricial and precocial birds. Finally, we discussed future research directions.
AbstractAphids are a diverse group of taxa that contain agronomically important species, which vary in their host range and ability to infest crop plants. The genome evolution underlying agriculturally important aphid traits is not well understood. We generated draft genome assemblies for two aphid species: Myzus cerasi (black cherry aphid) and the cereal specialist Rhopalosiphum padi. Using a de novo gene prediction pipeline on both these, and three additional aphid genome assemblies (Acyrthosiphon pisum, Diuraphis noxia, and Myzus persicae), we show that aphid genomes consistently encode similar gene numbers. We compare gene content, gene duplication, synteny, and putative effector repertoires between these five species to understand the genome evolution of globally important plant parasites. Aphid genomes show signs of relatively distant gene duplication, and substantial, relatively recent, gene birth. Putative effector repertoires, originating from duplicated and other loci, have an unusual genomic organization and evolutionary history. We identify a highly conserved effector pair that is tightly physically linked in the genomes of all aphid species tested. In R. padi, this effector pair is tightly transcriptionally linked and shares an unknown transcriptional control mechanism with a subset of ∼50 other putative effectors and secretory proteins. This study extends our current knowledge on the evolution of aphid genomes and reveals evidence for an as-of-yet unknown shared control mechanism, which underlies effector expression, and ultimately plant parasitism.
AbstractThe frequency of horizontal transfers of transposable elements (HTTs) varies among the types of elements according to the transposition mode and the geographical and temporal overlap of the species involved in the transfer. The drosophilid species of the genus Zaprionus and those of the melanogaster, obscura, repleta, and virilis groups of the genus Drosophila investigated in this study shared space and time at some point in their evolutionary history. This is particularly true of the subgenus Zaprionus and the melanogaster subgroup, which overlapped both geographically and temporally in Tropical Africa during their period of origin and diversification. Here, we tested the hypothesis that this overlap may have facilitated the transfer of retrotransposons without long terminal repeats (non-LTRs) between these species. We estimated the HTT frequency of the non-LTRs BS and Helena at the genome-wide scale by using a phylogenetic framework and a vertical and horizontal inheritance consistence analysis (VHICA). An excessively low synonymous divergence among distantly related species and incongruities between the transposable element and species phylogenies allowed us to propose at least four relatively recent HTT events of Helena and BS involving ancestors of the subgroup melanogaster and ancestors of the subgenus Zaprionus during their concomitant diversification in Tropical Africa, along with older possible events between species of the subgenera Drosophila and Sophophora. This study provides the first evidence for HTT of non-LTRs retrotransposons between Drosophila and Zaprionus, including an in-depth reconstruction of the time frame and geography of these events.
AbstractPlastid genomes display remarkable organizational stability over evolutionary time. From green algae to angiosperms, most plastid genomes are largely collinear, with only a few cases of inversion, gene loss, or, in extremely rare cases, gene addition. These plastome insertions are mostly clade-specific and are typically of nuclear or mitochondrial origin. Here, we expand on these findings and present the first family-level survey of plastome evolution in ferns, revealing a novel suite of dynamic mobile elements. Comparative plastome analyses of the Pteridaceae expose several mobile open reading frames that vary in sequence length, insertion site, and configuration among sampled taxa. Even between close relatives, the presence and location of these elements is widely variable when viewed in a phylogenetic context. We characterize these elements and refer to them collectively as Mobile Open Reading Frames in Fern Organelles (MORFFO). We further note that the presence of MORFFO is not restricted to Pteridaceae, but is found across ferns and other plant clades. MORFFO elements are regularly associated with inversions, intergenic expansions, and changes to the inverted repeats. They likewise appear to be present in mitochondrial and nuclear genomes of ferns, indicating that they can move between genomic compartments with relative ease. The origins and functions of these mobile elements are unknown, but MORFFO appears to be a major driver of structural genome evolution in the plastomes of ferns, and possibly other groups of plants.
AbstractThe transcriptome of the venom duct of the Atlantic piscivorous cone species Chelyconus ermineus (Born, 1778) was determined. The venom repertoire of this species includes at least 378 conotoxin precursors, which could be ascribed to 33 known and 22 new (unassigned) protein superfamilies, respectively. Most abundant superfamilies were T, W, O1, M, O2, and Z, accounting for 57% of all detected diversity. A total of three individuals were sequenced showing considerable intraspecific variation: each individual had many exclusive conotoxin precursors, and only 20% of all inferred mature peptides were common to all individuals. Three different regions (distal, medium, and proximal with respect to the venom bulb) of the venom duct were analyzed independently. Diversity (in terms of number of distinct members) of conotoxin precursor superfamilies increased toward the distal region whereas transcripts detected toward the proximal region showed higher expression levels. Only the superfamilies A and I3 showed statistically significant differential expression across regions of the venom duct. Sequences belonging to the alpha (motor cabal) and kappa (lightning-strike cabal) subfamilies of the superfamily A were mainly detected in the proximal region of the venom duct. The mature peptides of the alpha subfamily had the α4/4 cysteine spacing pattern, which has been shown to selectively target muscle nicotinic-acetylcholine receptors, ultimately producing paralysis. This function is performed by mature peptides having a α3/5 cysteine spacing pattern in piscivorous cone species from the Indo-Pacific region, thereby supporting a convergent evolution of piscivory in cones.
AbstractThis work presents a systematic approach to study the conservation of genes between fruit flies and mammals. We have listed 971 Drosophila genes involved in female reproduction at the ovarian level and systematically looked for orthologs in the Ciona, zebrafish, coelacanth, lizard, chicken, and mouse. Depending on the species, the percentage of these Drosophila genes with at least one ortholog varies between 69% and 78%. In comparison, only 42% of all the Drosophila genes have an ortholog in the mouse genome (P < 0.0001), suggesting a dramatically higher evolutionary conservation of ovarian genes. The 177 Drosophila genes that have no ortholog in mice and other vertebrates correspond to genes that are involved in mechanisms of oogenesis that are specific to the fruit fly or the insects. Among 759 genes with at least one ortholog in the zebrafish, 73 have an expression enriched in the ovary in this species (RNA-seq data). Among 760 genes that have at least one ortholog in the mouse; 76 and 11 orthologs are reported to be preferentially and exclusively expressed in the mouse ovary, respectively (based on the UniGene expressed sequence tag database). Several of them are already known to play a key role in murine oogenesis and/or to be enriched in the mouse/zebrafish oocyte, whereas others have remained unreported. We have investigated, by RNA-seq and real-time quantitative PCR, the exclusive ovarian expression of 10 genes in fish and mammals. Overall, we have found several novel candidates potentially involved in mammalian oogenesis by an evolutionary approach and using the fruit fly as an animal model.