Mol. Biol. Evol. 36(3):604-612. 2019 doi:10.1093/molbev/msz002
AbstractThe mammalian inner ear possesses functional and morphological innovations that contribute to its unique hearing capacities. The genetic bases underlying the evolution of this mammalian landmark are poorly understood. We propose that the emergence of morphological and functional innovations in the mammalian inner ear could have been driven by adaptive molecular evolution.In this work, we performed a meta-analysis of available inner ear gene expression data sets in order to identify genes that show signatures of adaptive evolution in the mammalian lineage. We analyzed ∼1,300 inner ear expressed genes and found that 13% show signatures of positive selection in the mammalian lineage. Several of these genes are known to play an important function in the inner ear. In addition, we identified that a significant proportion of genes showing signatures of adaptive evolution in mammals have not been previously reported to participate in inner ear development and/or physiology. We focused our analysis in two of these genes: STRIP2 and ABLIM2 by generating null mutant mice and analyzed their auditory function. We found that mice lacking Strip2 displayed a decrease in neural response amplitudes. In addition, we observed a reduction in the number of afferent synapses, suggesting a potential cochlear neuropathy.Thus, this study shows the usefulness of pursuing a high-throughput evolutionary approach followed by functional studies to track down genes that are important for inner ear function. Moreover, this approach sheds light on the genetic bases underlying the evolution of the mammalian inner ear.
With new genome analysis tools, scientists have made significant advances in our understanding of modern humans’ origins and ancient migrations.
AbstractA general south-north genetic divergence has been observed among Han Chinese in previous studies. However, these studies, especially those on mitochondrial DNA (mtDNA), are based either on partial mtDNA sequences or on limited samples. Given that Han Chinese comprise the world’s largest population and reside around the whole China, whether the north-south divergence can be observed after all regional populations are considered remains unknown. Moreover, factors involved in shaping the genetic landscape of Han Chinese need further investigation. In this study, we dissected the matrilineal landscape of Han Chinese by studying 4,004 mtDNA haplogroup-defining variants in 21,668 Han samples from virtually all provinces in China. Our results confirmed the genetic divergence between southern and northern Han populations. However, we found a significant genetic divergence among populations from the three main river systems, that is, the Yangtze, the Yellow, and the Zhujiang (Pearl) rivers, which largely attributed to the prevalent distribution of haplogroups D4, B4, and M7 in these river valleys. Further analyses based on 4,986 mitogenomes, including 218 newly generated sequences, indicated that this divergence was already established during the early Holocene and may have resulted from population expansion facilitated by ancient agricultures along these rivers. These results imply that the maternal gene pools of the contemporary Han populations have retained the genetic imprint of early Neolithic farmers from different river basins, or that river valleys represented relative migration barriers that facilitated genetic differentiation, thus highlighting the importance of the three ancient agricultures in shaping the genetic landscape of the Han Chinese.
AbstractMetazoan miRNAs are significantly enriched in clusters. In a previous study (Wang, et al. 2016), we proposed a “functional co-adaptation” model to explain how clustering helps new miRNAs survive and develop functions during long-term evolution. Recently, Marco re-analyzed our data and came to a different conclusion. The major concern Marco raised is whether the observed number of genes targeted by at least two conserved miRNAs with different seeds from the same miRNA clusters is statistically higher than the number obtained under the assumption of randomness. Marco claimed that our approach of shuffling miRNA–target interactions would lead to spuriously low P values. Marco also argued our observation that clustered miRNAs have more common targets than expected is mostly contributed by seeds with similar sequences. However, we found his analyses were conducted in an inappropriate approach and do not refute our model. We also provide new evidences to reaffirm our model.
MicroRNAs are often clustered in the genome and it has been hypothesized that clustered microRNAs share common functions. However, statistical support for this hypothesis is lacking. Recently, Wang et al. (2016) stated that clustered microRNAs evolve to co-ordinately regulate common genes, targeting more common genes than expected by chance (P < 0.001; their fig. 3D). I explored their results in detail to identify potential clusters of interest. When their methodology is reproduced, clustered microRNAs had more common targets than expected by chance (P = 0.0350; Marco, 2018) but this was due to the presence of two clustered microRNAs with very similar sequences (and therefore similar targets). After removing this cluster the result was no longer significant (P = 0.2753). In general, the similarity between microRNAs is what determines the number of common targets, and not whether these microRNAs are clustered or not (Marco 2018).
AbstractIdentifying genomic elements underlying phenotypic adaptations is an important problem in evolutionary biology. Comparative analyses learning from convergent evolution of traits are gaining momentum in accurately detecting such elements. We previously developed a method for predicting phenotypic associations of genetic elements by contrasting patterns of sequence evolution in species showing a phenotype with those that do not. Using this method, we successfully demonstrated convergent evolutionary rate shifts in genetic elements associated with two phenotypic adaptations, namely the independent subterranean and marine transitions of terrestrial mammalian lineages. Our original method calculates gene-specific rates of evolution on branches of phylogenetic trees using linear regression. These rates represent the extent of sequence divergence on a branch after removing the expected divergence on the branch due to background factors. The rates calculated using this regression analysis exhibit an important statistical limitation, namely heteroscedasticity. We observe that the rates on branches that are longer on average show higher variance, and describe how this problem adversely affects the confidence with which we can make inferences about rate shifts. Using a combination of data transformation and weighted regression, we have developed an updated method that corrects this heteroscedasticity in the rates. We additionally illustrate the improved performance offered by the updated method at robust detection of convergent rate shifts in phylogenetic trees of protein-coding genes across mammals, as well as using simulated tree data sets. Overall, we present an important extension to our evolutionary-rates-based method that performs more robustly and consistently at detecting convergent shifts in evolutionary rates.
AbstractThe relationship between DNA sequence, biochemical function, and molecular evolution is relatively well-described for protein-coding regions of genomes, but far less clear in noncoding regions, particularly, in eukaryote genomes. In part, this is because we lack a complete description of the essential noncoding elements in a eukaryote genome. To contribute to this challenge, we used saturating transposon mutagenesis to interrogate the Schizosaccharomyces pombe genome. We generated 31 million transposon insertions, a theoretical coverage of 2.4 insertions per genomic site. We applied a five-state hidden Markov model (HMM) to distinguish insertion-depleted regions from insertion biases. Both raw insertion-density and HMM-defined fitness estimates showed significant quantitative relationships to gene knockout fitness, genetic diversity, divergence, and expected functional regions based on transcription and gene annotations. Through several analyses, we conclude that transposon insertions produced fitness effects in 66–90% of the genome, including substantial portions of the noncoding regions. Based on the HMM, we estimate that 10% of the insertion depleted sites in the genome showed no signal of conservation between species and were weakly transcribed, demonstrating limitations of comparative genomics and transcriptomics to detect functional units. In this species, 3′- and 5′-untranslated regions were the most prominent insertion-depleted regions that were not represented in measures of constraint from comparative genomics. We conclude that the combination of transposon mutagenesis, evolutionary, and biochemical data can provide new insights into the relationship between genome function and molecular evolution.
AbstractAdvances in sequencing technology have resulted in the expectation that genomic studies will become more representative of organismal diversity. To test this expectation, we explored species representation of nonhuman eukaryotes in the Sequence Read Archive. Though species richness has been increasing steadily, species evenness is decreasing over time. Moreover, the top 1% most studied organisms increasingly represent a larger proportion of total experiments, demonstrating growing bias in favor of a small minority of species. To better understand molecular processes and patterns, genomic studies should reverse current trends by adopting more comparative approaches.
AbstractCetaceans are a clade of highly specialized aquatic mammals that include the largest animals that have ever lived. The largest whales can have ∼1,000× more cells than a human, with long lifespans, leaving them theoretically susceptible to cancer. However, large-bodied and long-lived animals do not suffer higher risks of cancer mortality than humans—an observation known as Peto’s Paradox. To investigate the genomic bases of gigantism and other cetacean adaptations, we generated a de novo genome assembly for the humpback whale (Megaptera novaeangliae) and incorporated the genomes of ten cetacean species in a comparative analysis. We found further evidence that rorquals (family Balaenopteridae) radiated during the Miocene or earlier, and inferred that perturbations in abundance and/or the interocean connectivity of North Atlantic humpback whale populations likely occurred throughout the Pleistocene. Our comparative genomic results suggest that the evolution of cetacean gigantism was accompanied by strong selection on pathways that are directly linked to cancer. Large segmental duplications in whale genomes contained genes controlling the apoptotic pathway, and genes inferred to be under accelerated evolution and positive selection in cetaceans were enriched for biological processes such as cell cycle checkpoint, cell signaling, and proliferation. We also inferred positive selection on genes controlling the mammalian appendicular and cranial skeletal elements in the cetacean lineage, which are relevant to extensive anatomical changes during cetacean evolution. Genomic analyses shed light on the molecular mechanisms underlying cetacean traits, including gigantism, and will contribute to the development of future targets for human cancer therapies.
AbstractEstimating multiple sequence alignments (MSAs) and inferring phylogenies are essential for many aspects of comparative biology. Yet, many bioinformatics tools for such analyses have focused on specific clades, with greatest attention paid to plants, animals, and fungi. The rapid increase in high-throughput sequencing (HTS) data from diverse lineages now provides opportunities to estimate evolutionary relationships and gene family evolution across the eukaryotic tree of life. At the same time, these types of data are known to be error-prone (e.g., substitutions, contamination). To address these opportunities and challenges, we have refined a phylogenomic pipeline, now named PhyloToL, to allow easy incorporation of data from HTS studies, to automate production of both MSAs and gene trees, and to identify and remove contaminants. PhyloToL is designed for phylogenomic analyses of diverse lineages across the tree of life (i.e., at scales of >100 My). We demonstrate the power of PhyloToL by assessing stop codon usage in Ciliophora, identifying contamination in a taxon- and gene-rich database and exploring the evolutionary history of chromosomes in the kinetoplastid parasite Trypanosoma brucei, the causative agent of African sleeping sickness. Benchmarking PhyloToL’s homology assessment against that of OrthoMCL and a published paper on superfamilies of bacterial and eukaryotic organellar outer membrane pore-forming proteins demonstrates the power of our approach for determining gene family membership and inferring gene trees. PhyloToL is highly flexible and allows users to easily explore HTS data, test hypotheses about phylogeny and gene family evolution and combine outputs with third-party tools (e.g., PhyloChromoMap, iGTP).
AbstractModern phylodynamic methods interpret an inferred phylogenetic tree as a partial transmission chain providing information about the dynamic process of transmission and removal (where removal may be due to recovery, death, or behavior change). Birth–death and coalescent processes have been introduced to model the stochastic dynamics of epidemic spread under common epidemiological models such as the SIS and SIR models and are successfully used to infer phylogenetic trees together with transmission (birth) and removal (death) rates. These methods either integrate analytically over past incidence and prevalence to infer rate parameters, and thus cannot explicitly infer past incidence or prevalence, or allow such inference only in the coalescent limit of large population size. Here, we introduce a particle filtering framework to explicitly infer prevalence and incidence trajectories along with phylogenies and epidemiological model parameters from genomic sequences and case count data in a manner consistent with the underlying birth–death model. After demonstrating the accuracy of this method on simulated data, we use it to assess the prevalence through time of the early 2014 Ebola outbreak in Sierra Leone.
AbstractMultidrug-resistant clinical isolates are common in certain pathogens, but rare in others. This pattern may be due to the fact that mutations shaping resistance have species-specific effects. To investigate this issue, we transferred a range of resistance-conferring mutations and a full resistance gene into Escherichia coli and closely related bacteria. We found that resistance mutations in one bacterial species frequently provide no resistance, in fact even yielding drug hypersensitivity in close relatives. In depth analysis of a key gene involved in aminoglycoside resistance (trkH) indicated that preexisting mutations in other genes—intergenic epistasis—underlie such extreme differences in mutational effects between species. Finally, reconstruction of adaptive landscapes under multiple antibiotic stresses revealed that mutations frequently provide multidrug resistance or elevated drug susceptibility (i.e., collateral sensitivity) only with certain combinations of other resistance mutations. We conclude that resistance and collateral sensitivity are contingent upon the genetic makeup of the bacterial population, and such contingency could shape the long-term fate of resistant bacteria. These results underlie the importance of species-specific treatment strategies.
AbstractUnderstanding the molecular basis of hybrid incompatibilities is a fundamental pursuit in evolutionary genetics. In crosses between Drosophila melanogaster females and Drosophila simulans males, an interaction between at least three genes is necessary for hybrid male lethality: Hmr mel, Lhr sim, and gfzf sim. Although HMR and LHR physically bind each other and function together in a single complex, the connection between gfzf and either of these proteins remains mysterious. Here, we show that GFZF localizes to many regions of the genome in both D. melanogaster and D. simulans, including at telomeric retrotransposon repeats. We find that GFZF localization at telomeres is significantly different between these two species, reflecting the rapid evolution of telomeric retrotransposon copy number composition between the two species. Next, we show that GFZF and HMR normally do not colocalize in D. melanogaster. In interspecies hybrids, however, HMR shows extensive mis-localization to GFZF sites, thus uncovering a new molecular interaction between these hybrid incompatibility factors. We find that spreading of HMR to GFZF sites requires gfzf sim but not Lhr sim, suggesting distinct roles for these factors in the hybrid incompatibility. Finally, we find that overexpression of HMR and LHR within species is sufficient to mis-localize HMR to GFZF binding sites, indicating that HMR has a natural low affinity for GFZF sites. Together, these studies provide the first insights into the different properties of gfzf between D. melanogaster and D. simulans, and uncover a molecular interaction between gfzf and Hmr in the form of altered protein localization.
AbstractAneuploidy is common both in tumor cells responding to chemotherapeutic agents and in fungal cells adapting to antifungal drugs. Because aneuploidy simultaneously affects many genes, it has the potential to confer multiple phenotypes to the same cells. Here, we analyzed the mechanisms by which Candida albicans, the most prevalent human fungal pathogen, acquires the ability to survive both chemotherapeutic agents and antifungal drugs. Strikingly, adaptation to both types of drugs was accompanied by the acquisition of specific whole-chromosome aneuploidies, with some aneuploid karyotypes recovered independently and repeatedly from very different drug conditions. Specifically, strains selected for survival in hydroxyurea, an anticancer drug, acquired cross-adaptation to caspofungin, a first-line antifungal drug, and both acquired traits were attributable to trisomy of the same chromosome: loss of trisomy was accompanied by loss of adaptation to both drugs. Mechanistically, aneuploidy simultaneously altered the copy number of most genes on chromosome 2, yet survival in hydroxyurea or caspofungin required different genes and stress response pathways. Similarly, chromosome 5 monosomy conferred increased tolerance to both fluconazole and to caspofungin, antifungals with different mechanisms of action. Thus, the potential for cross-adaptation is not a feature of aneuploidy per se; rather, it is dependent on specific genes harbored on given aneuploid chromosomes. Furthermore, pre-exposure to hydroxyurea increased the frequency of appearance of caspofungin survivors, and hydroxyurea-adapted C. albicans cells were refractory to antifungal drug treatment in a mouse model of systemic candidiasis. This highlights the potential clinical consequences for the management of cancer chemotherapy patients at risk of fungal infections.
AbstractTransposable elements (TEs) make up a significant portion of eukaryotic genomes and are important drivers of genome evolution. However, the extent to which TEs affect gene expression variation on a genome-wide scale in comparison with other types of variants is still unclear. We characterized TE insertion polymorphisms and their association with gene expression in 124 whole-genome sequences from a single population of Capsella grandiflora, and contrasted this with the effects of single nucleotide polymorphisms (SNPs). Population frequency of insertions was negatively correlated with distance to genes, as well as density of conserved noncoding elements, suggesting that the negative effects of TEs on gene regulation are important in limiting their abundance. Rare TE variants strongly influence gene expression variation, predominantly through downregulation. In contrast, rare SNPs contribute equally to up- and down-regulation, but have a weaker individual effect than TEs. An expression quantitative trait loci (eQTL) analysis shows that a greater proportion of common TEs are eQTLs as opposed to common SNPs, and a third of the genes with TE eQTLs do not have SNP eQTLs. In contrast with rare TE insertions, common insertions are more likely to increase expression, consistent with recent models of cis-regulatory evolution favoring enhancer alleles. Taken together, these results imply that TEs are a significant contributor to gene expression variation and are individually more likely than rare SNPs to cause extreme changes in gene expression.
AbstractSpeciation through homoploid hybridization (HHS) is considered extremely rare in animals. This is mainly because the establishment of reproductive isolation as a product of hybridization is uncommon. Additionally, many traits are underpinned by polygeny and/or incomplete dominance, where the hybrid phenotype is an additive blend of parental characteristics. Phenotypically intermediate hybrids are usually at a fitness disadvantage compared with parental species and tend to vanish through backcrossing with parental population(s). It is therefore unknown whether the additive nature of hybrid traits in itself could lead successfully to HHS. Using a multi-marker genetic data set and a meta-analysis of diet and morphology, we investigated a potential case of HHS in the prions (Pachyptila spp.), seabirds distinguished by their bills, prey choice, and timing of breeding. Using approximate Bayesian computation, we show that the medium-billed Salvin’s prion (Pachyptila salvini) could be a hybrid between the narrow-billed Antarctic prion (Pachyptila desolata) and broad-billed prion (Pachyptila vittata). Remarkably, P. salvini’s intermediate bill width has given it a feeding advantage with respect to the other Pachyptila species, allowing it to consume a broader range of prey, potentially increasing its fitness. Available metadata showed that P. salvini is also intermediate in breeding phenology and, with no overlap in breeding times, it is effectively reproductively isolated from either parental species through allochrony. These results provide evidence for a case of HHS in nature, and show for the first time that additivity of divergent parental traits alone can lead directly to increased hybrid fitness and reproductive isolation.
AbstractThe Cape bee (Apis mellifera capensis) is a subspecies of the honeybee, in which workers commonly lay diploid unfertilized eggs via a process known as thelytoky. A recent study aimed to map the genetic basis of this trait in the progeny of a single capensis queen where workers laid either diploid (thelytokous) or haploid (arrhenotokous) eggs. A nonsynonymous single nucleotide polymorphism (SNP) in a gene of unknown function was reported to be strongly associated with thelytoky in this colony. Here, we analyze genome sequences from a global sample of A. mellifera and identify populations where the proposed thelytoky allele at this SNP is common but thelytoky is absent. We also analyze genome sequences of three capensis queens produced by thelytoky and find that, contrary to predictions, they do not carry the proposed thelytoky allele. The proposed SNP is therefore neither sufficient nor required to produce thelytoky in A. mellifera.
AbstractThe fate of alleles in the human population is believed to be highly affected by the stochastic force of genetic drift. Estimation of the strength of natural selection in humans generally necessitates a careful modeling of drift including complex effects of the population history and structure. Protein-truncating variants (PTVs) are expected to evolve under strong purifying selection and to have a relatively high per-gene mutation rate. Thus, it is appealing to model the population genetics of PTVs under a simple deterministic mutation–selection balance, as has been proposed earlier (Cassa et al. 2017). Here, we investigated the limits of this approximation using both computer simulations and data-driven approaches. Our simulations rely on a model of demographic history estimated from 33,370 individual exomes of the Non-Finnish European subset of the ExAC data set (Lek et al. 2016). Additionally, we compared the African and European subset of the ExAC study and analyzed de novo PTVs. We show that the mutation–selection balance model is applicable to the majority of human genes, but not to genes under the weakest selection.
AbstractOne of the major challenges in evolutionary biology is the identification of the genetic basis of postzygotic reproductive isolation. Given its pivotal role in this process, here we explore the drivers that may account for the evolutionary dynamics of the PRDM9 gene between continental and island systems of chromosomal variation in house mice. Using a data set of nearly 400 wild-caught mice of Robertsonian systems, we identify the extent of PRDM9 diversity in natural house mouse populations, determine the phylogeography of PRDM9 at a local and global scale based on a new measure of pairwise genetic divergence, and analyze selective constraints. We find 57 newly described PRDM9 variants, this diversity being especially high on Madeira Island, a result that is contrary to the expectations of reduced variation for island populations. Our analysis suggest that the PRDM9 allelic variability observed in Madeira mice might be influenced by the presence of distinct chromosomal fusions resulting from a complex pattern of introgression or multiple colonization events onto the island. Importantly, we detect a significant reduction in the proportion of PRDM9 heterozygotes in Robertsonian mice, which showed a high degree of similarity in the amino acids responsible for protein–DNA binding. Our results suggest that despite the rapid evolution of PRDM9 and the variability detected in natural populations, functional constraints could facilitate the accumulation of allelic combinations that maintain recombination hotspot symmetry. We anticipate that our study will provide the basis for examining the role of different PRDM9 genetic backgrounds in reproductive isolation in natural populations.
AbstractMany factors complicate the estimation of time scales for phylogenetic histories, requiring increasingly complex evolutionary models and inference procedures. The widespread application of molecular clock dating has led to the insight that evolutionary rate estimates may vary with the time frame of measurement. This is particularly well established for rapidly evolving viruses that can accumulate sequence divergence over years or even months. However, this rapid evolution stands at odds with a relatively high degree of conservation of viruses or endogenous virus elements over much longer time scales. Building on recent insights into time-dependent evolutionary rates, we develop a formal and flexible Bayesian statistical inference approach that accommodates rate variation through time. We evaluate the novel molecular clock model on a foamy virus cospeciation history and a lentivirus evolutionary history and compare the performance to other molecular clock models. For both virus examples, we estimate a similarly strong time-dependent effect that implies rates varying over four orders of magnitude. The application of an analogous codon substitution model does not implicate long-term purifying selection as the cause of this effect. However, selection does appear to affect divergence time estimates for the less deep evolutionary history of the Ebolavirus genus. Finally, we explore the application of our approach on woolly mammoth ancient DNA data, which shows a much weaker, but still important, time-dependent rate effect that has a noticeable impact on node age estimates. Future developments aimed at incorporating more complex evolutionary processes will further add to the broad applicability of our approach.
AbstractA recent analysis of evolutionary rates in >500 globular soluble enzymes revealed pervasive conservation gradients toward catalytic residues. By looking at amino acid preference profiles rather than evolutionary rates in the same data set, we quantified the effects of active sites on site-specific constraints for physicochemical traits. We found that conservation gradients respond to constraints for polarity, hydrophobicity, flexibility, rigidity and structure in ways consistent with fold polarity principles; while sites far from active sites seem to experience no physicochemical constraint, rather being highly variable and favoring amino acids of low metabolic cost. Globally, our results highlight that amino acid variation contains finer information about protein structure than usually regarded in evolutionary models, and that this information is retrievable automatically with simple fits. We propose that analyses of the kind presented here incorporated into models of protein evolution should allow for better description of the physical chemistry that underlies molecular evolution.
AbstractThe structure of ligand-binding sites has been shown to profoundly influence the evolution of function in homomeric protein complexes. Complexes with multichain binding sites (MBSs) have more conserved quaternary structure, more similar binding sites and ligands between homologs, and evolve new functions slower than homomers with single-chain binding sites (SBSs). Here, using in silico analyses of protein dynamics, we investigate whether ligand-binding-site structure shapes allosteric signal transduction pathways, and whether the structural similarity of binding sites influences the evolution of allostery. Our analyses show that: 1) allostery is more frequent among MBS complexes than in SBS complexes, particularly in homomers; 2) in MBS homomers, semirigid communities and critical residues frequently connect interfaces and thus they are characterized by signal transduction pathways that cross protein–protein interfaces, whereas SBS homomers usually not; 3) ligand binding alters community structure differently in MBS and SBS homomers; and 4) except MBS homomers, allosteric proteins are more likely to have homologs with similar binding site than nonallosteric proteins, suggesting that binding site similarity is an important factor driving the evolution of allostery.
AbstractGenetic variation in contemporary South Asian populations follows a northwest to southeast decreasing cline of shared West Eurasian ancestry. A growing body of ancient DNA evidence is being used to build increasingly more realistic models of demographic changes in the last few thousand years. Through high-quality modern genomes, these models can be tested for gene and genome level deviations. Using local ancestry deconvolution and masking, we reconstructed population-specific surrogates of the two main ancestral components for more than 500 samples from 25 South Asian populations and showed our approach to be robust via coalescent simulations.Our f3 and f4 statistics–based estimates reveal that the reconstructed haplotypes are good proxies for the source populations that admixed in the area and point to complex interpopulation relationships within the West Eurasian component, compatible with multiple waves of arrival, as opposed to a simpler one wave scenario. Our approach also provides reliable local haplotypes for future downstream analyses. As one such example, the local ancestry deconvolution in South Asians reveals opposite selective pressures on two pigmentation genes (SLC45A2 and SLC24A5) that are common or fixed in West Eurasians, suggesting post-admixture purifying and positive selection signals, respectively.
AbstractThe maize stalk borer, Busseola fusca, is an important Lepidopteran pest of cereal crops in Central, East, and Southern Africa. Crop losses due to B. fusca feeding activity vary by region, but can result in total crop loss in areas with high levels of infestation. Genomic resources provide critical insight into the biology of pest species and can allow for the development of effective management tools and strategies to mitigate their impact on agriculture. To this end, we sequenced, assembled, and annotated the genome of B. fusca. The total assembled genome size was 492.9 Mb with 19,417 annotated protein-coding genes. Using a comparative approach, we identified a putative expansion in the Chorion gene family, which is involved in the formation of the egg shell structure. Our analysis revealed high repeat content within the B. fusca genome, with LTR sequences comprising the majority of the repetitive sequence. We hope genomic resources will provide a foundation for future work aimed at developing an integrated pest management strategy to reduce B. fusca’s impact on food security.
AbstractTerpenes are organic compounds and play important roles in plant growth and development as well as in mediating interactions of plants with the environment. Terpene synthases (TPSs) are the key enzymes responsible for the biosynthesis of terpenes. Although some species were employed for the genome-wide identification and characterization of the TPS family, limited information is available regarding the evolution, expansion, and retention mechanisms occurring in this gene family. We performed a genome-wide identification of the TPS family members in 50 sequenced genomes. Additionally, we also characterized the TPS family from aromatic spearmint and basil plants using RNA-Seq data. No TPSs were identified in algae genomes but the remaining plant species encoded various numbers of the family members ranging from 2 to 79 full-length TPSs. Some species showed lineage-specific expansion of certain subfamilies, which might have contributed toward species or ecotype divergence or environmental adaptation. A large-scale family expansion was observed mainly in dicot and monocot plants, which was accompanied by frequent domain loss. Both tandem and segmental duplication significantly contributed toward family expansion and expression divergence and played important roles in the survival of these expanded genes. Our data provide new insight into the TPS family expansion and evolution and suggest that TPSs might have originated from isoprenyl diphosphate synthase genes.
AbstractVision is underpinned by phototransduction, a signaling cascade that converts light energy into an electrical signal. Among insects, phototransduction is best understood in Drosophila melanogaster. Comparison of D. melanogaster against three insect species found several phototransduction gene gains and losses, however, lepidopterans were not examined. Diurnal butterflies and nocturnal moths occupy different light environments and have distinct eye morphologies, which might impact the expression of their phototransduction genes. Here we investigated: 1) how phototransduction genes vary in gene gain or loss between D. melanogaster and Lepidoptera, and 2) variations in phototransduction genes between moths and butterflies. To test our prediction of phototransduction differences due to distinct visual ecologies, we used insect reference genomes, phylogenetics, and moth and butterfly head RNA-Seq and transcriptome data. As expected, most phototransduction genes were conserved between D. melanogaster and Lepidoptera, with some exceptions. Notably, we found two lepidopteran opsins lacking a D. melanogaster ortholog. Using antibodies we found that one of these opsins, a candidate retinochrome, which we refer to as unclassified opsin (UnRh), is expressed in the crystalline cone cells and the pigment cells of the butterfly, Heliconius melpomene. Our results also show that butterflies express similar amounts of trp and trpl channel mRNAs, whereas moths express ∼50× less trp, a potential adaptation to darkness. Our findings suggest that while many single-copy D. melanogaster phototransduction genes are conserved in lepidopterans, phototransduction gene expression differences exist between moths and butterflies that may be linked to their visual light environment.
AbstractThe crested ibis (Nipponia nippon) is endangered worldwide. Although a series of conservation measures have markedly increased the population size and distribution area of these birds, the high mortality of embryos and nestlings considerably decreases the survival potential of this bird species. High-throughput sequencing technology was utilized to compare whole genomes between ten samples from dead crested ibises (including six dead embryos and four dead nestlings aged 0–45 days) and 32 samples from living birds. The results indicated that the dead samples all shared the genetic background of a specific ancestral subpopulation. Furthermore, the dead individuals were less genetically diverse and suffered higher degrees of inbreeding compared with these measures in live birds. Several candidate genes (KLHL3, SETDB2, TNNT2, PKP1, AK1, and EXOSC3) associated with detrimental diseases were identified in the genomic regions that differed between the alive and dead samples, which are likely responsible for the death of embryos and nestlings. In addition, in these regions, we also found several genes involved in the protein catabolic process (UBE4A and LONP1), lipid metabolism (ACOT1), glycan biosynthesis and metabolism (HYAL1 and HYAL4), and the immune system (JAM2) that are likely to promote the normal development of embryos and nestlings. The aberrant conditions of these genes and biological processes may contribute to the death of embryos and nestlings. Our data identify congenital factors underlying the death of embryos and nestlings at the whole genome level, which may be useful toward informing more effective conservation efforts for this bird species.
AbstractThe piranha enjoys notoriety due to its infamous predatory behavior but much is still not understood about its evolutionary origins and the underlying molecular mechanisms for its unusual feeding biology. We sequenced and assembled the red-bellied piranha (Pygocentrus nattereri) genome to aid future phenotypic and genetic investigations. The assembled draft genome is similar to other related fishes in repeat composition and gene count. Our evaluation of genes under positive selection suggests candidates for adaptations of piranhas’ feeding behavior in neural functions, behavior, and regulation of energy metabolism. In the fasted brain, we find genes differentially expressed that are involved in lipid metabolism and appetite regulation as well as genes that may control the aggression/boldness behavior of hungry piranhas. Our first analysis of the piranha genome offers new insight and resources for the study of piranha biology and for feeding motivation and starvation in other organisms.
AbstractThe function and evolution of eukaryotic cells depend upon direct molecular interactions between gene products encoded in nuclear and cytoplasmic genomes. Understanding how these cytonuclear interactions drive molecular evolution and generate genetic incompatibilities between isolated populations and species is of central importance to eukaryotic biology. Plants are an outstanding system to investigate such effects because of their two different genomic compartments present in the cytoplasm (mitochondria and plastids) and the extensive resources detailing subcellular targeting of nuclear-encoded proteins. However, the field lacks a consistent classification scheme for mitochondrial- and plastid-targeted proteins based on their molecular interactions with cytoplasmic genomes and gene products, which hinders efforts to standardize and compare results across studies. Here, we take advantage of detailed knowledge about the model angiosperm Arabidopsis thaliana to provide a curated database of plant cytonuclear interactions at the molecular level. CyMIRA (Cytonuclear Molecular Interactions Reference for Arabidopsis) is available at http://cymira.colostate.edu/ and https://github.com/dbsloan/cymira and will serve as a resource to aid researchers in partitioning evolutionary genomic data into functional gene classes based on organelle targeting and direct molecular interaction with cytoplasmic genomes and gene products. It includes 11 categories (and 27 subcategories) of different cytonuclear complexes and types of molecular interactions, and it reports residue-level information for cytonuclear contact sites. We hope that this framework will make it easier to standardize, interpret, and compare studies testing the functional and evolutionary consequences of cytonuclear interactions.
AbstractAs one economically important fish in the southeastern Himalayas, the giant devil catfish (Bagarius yarrelli) has been known for its extraordinarily large body size. It can grow up to 2 m, whereas the non-Bagarius sisorids only reach 10–30 cm. Another outstanding characteristic of Bagarius species is the salmonids-like reddish flesh color. Both body size and flesh color are interesting questions in science and also valuable features in aquaculture that worth of deep investigations. Bagarius species therefore are ideal materials for studying body size evolution and color depositions in fish muscles, and also potential organisms for extensive utilization in Asian freshwater aquaculture. In a combination of Illumina and PacBio sequencing technologies, we de novo assembled a 571-Mb genome for the giant devil catfish from a total of 153.4-Gb clean reads. The scaffold and contig N50 values are 3.1 and 1.6 Mb, respectively. This genome assembly was evaluated with 93.4% of Benchmarking Universal Single-Copy Orthologs completeness, 98% of transcripts coverage, and highly homologous with a chromosome-level-based genome of channel catfish (Ictalurus punctatus). We detected that 35.26% of the genome assembly is composed of repetitive elements. Employing homology, de novo, and transcriptome-based annotations, we annotated a total of 19,027 protein-coding genes for further use. In summary, we generated the first high-quality genome assembly of the giant devil catfish, which provides an important genomic resource for its future studies such as the body size and flesh color issues, and also for facilitating the conservation and utilization of this valuable catfish.
AbstractThe relationships of crustaceans and hexapods (Pancrustacea) have been much discussed and partially elucidated following the emergence of phylogenomic data sets. However, major uncertainties still remain regarding the position of iconic taxa such as Branchiopoda, Copepoda, Remipedia, and Cephalocarida, and the sister group relationship of hexapods. We assembled the most taxon-rich phylogenomic pancrustacean data set to date and analyzed it using a variety of methodological approaches. We prioritized low levels of missing data and found that some clades were consistently recovered independently of the analytical approach used. These include, for example, Oligostraca and Altocrustacea. Substantial support was also found for Allotriocarida, with Remipedia as the sister of Hexapoda (i.e., Labiocarida), and Branchiopoda as the sister of Labiocarida, a clade that we name Athalassocarida (=”nonmarine shrimps”). Within Allotriocarida, Cephalocarida was found as the sister of Athalassocarida. Finally, moderate support was found for Hexanauplia (Copepoda as sister to Thecostraca) in alliance with Malacostraca. Mapping key crustacean tagmosis patterns and developmental characters across the revised phylogeny suggests that the ancestral pancrustacean was relatively short-bodied, with extreme body elongation and anamorphic development emerging later in pancrustacean evolution.
AbstractSymbiosis with bacteria is common across insects, resulting in adaptive host phenotypes. The recently described bacterial symbionts Lactobacillus micheneri, Lactobacillus timberlakei, and Lactobacillus quenuiae are found in wild bee pollen provisions, bee guts, and flowers but have small genomes in comparison to other lactobacilli. We sequenced, assembled, and analyzed 27 new L. micheneri clade genomes to identify their possible ecological functions in flower and bee hosts. We determined possible key functions for the L. micheneri clade by identifying genes under positive selection, balancing selection, genes gained or lost, and population structure. A host adherence factor shows signatures of positive selection, whereas other orthologous copies are variable within the L. micheneri clade. The host adherence factors serve as strong evidence that these lactobacilli are adapted to animal hosts as their targets are found in the digestive tract of insects. Next, the L. micheneri clade is adapted toward a nutrient-rich environment, corroborating observations of where L. micheneri is most abundant. Additionally, genes involved in osmotolerance, pH tolerance, temperature resistance, detoxification, and oxidative stress response show signatures of selection that allow these bacteria to thrive in pollen and nectar masses in bee nests and in the bee gut. Altogether, these findings not only suggest that the L. micheneri clade is primarily adapted to the wild bee gut but also exhibit genomic features that would be beneficial to survival in flowers.
AbstractDifferences in gene regulation have been suggested to play essential roles in the evolution of phenotypic changes. Although DNA changes in cis-regulatory elements affect only the regulation of its corresponding gene, variations in gene regulatory factors (trans) can have a broader effect, because the expression of many target genes might be affected. Aiming to better understand how natural selection may have shaped the diversity of gene regulatory factors in human, we assembled a catalog of all proteins involved in controlling gene expression. We found that at least five DNA-binding transcription factor classes are enriched among genes located in candidate regions for selection, suggesting that they might be relevant for understanding regulatory mechanisms involved in human local adaptation. The class of KRAB-ZNFs, zinc-finger (ZNF) genes with a Krüppel-associated box, stands out by first, having the most genes located on candidate regions for positive selection. Second, displaying most nonsynonymous single nucleotide polymorphisms (SNPs) with high genetic differentiation between populations within these regions. Third, having 27 KRAB-ZNF gene clusters with high extended haplotype homozygosity. Our further characterization of nonsynonymous SNPs in ZNF genes located within candidate regions for selection, suggests regulatory modifications that might influence the expression of target genes at population level. Our detailed investigation of three candidate regions revealed possible explanations for how SNPs may influence the prevalence of schizophrenia, eye development, and fertility in humans, among other phenotypes. The genetic variation we characterized here may be responsible for subtle to rough regulatory changes that could be important for understanding human adaptation.
AbstractTransposable elements (TEs) play major roles in the evolution of genome structure and function. However, because of their repetitive nature, they are difficult to annotate and discovering the specific roles they may play in a lineage can be a daunting task. Heliconiine butterflies are models for the study of multiple evolutionary processes including phenotype evolution and hybridization. We attempted to determine how TEs may play a role in the diversification of genomes within this clade by performing a detailed examination of TE content and accumulation in 19 species whose genomes were recently sequenced. We found that TE content has diverged substantially and rapidly in the time since several subclades shared a common ancestor with each lineage harboring a unique TE repertoire. Several novel SINE lineages have been established that are restricted to a subset of species. Furthermore, the previously described SINE, Metulj, appears to have gone extinct in two subclades while expanding to significant numbers in others. This diversity in TE content and activity has the potential to impact how heliconiine butterflies continue to evolve and diverge.
AbstractThe interface between populations and evolving young species continues to generate much contemporary debate in systematics depending on the species concept(s) applied but which ultimately reduces to the fundamental question of “when do nondiscrete entities become distinct, mutually exclusive evolutionary units”? Species are perceived as critical biological entities, and the discovery and naming of new species is perceived by many authors as a major research aim for assessing current biodiversity before much of it becomes extinct. However, less attention is given to determining whether these names represent valid biological entities because this is perceived as both a laborious chore and an undesirable research outcome. The charismatic spurge hawkmoths (Hyles euphorbiae complex, HEC) offer an opportunity to study this less fashionable aspect of systematics. To elucidate this intriguing systematic challenge, we analyzed over 10,000 ddRAD single nucleotide polymorphisms from 62 individuals using coalescent-based and population genomic methodology. These genome-wide data reveal a clear overestimation of (sub)species-level diversity and demonstrate that the HEC taxonomy has been seriously oversplit. We conclude that only one valid species name should be retained for the entire HEC, namely Hyles euphorbiae, and we do not recognize any formal subspecies or other taxonomic subdivisions within it. Although the adoption of genetic tools has frequently revealed morphologically cryptic diversity, the converse, taxonomic oversplitting of species, is generally (and wrongly in our opinion) accepted as rare. Furthermore, taxonomic oversplitting is most likely to have taken place in intensively studied popular and charismatic organisms such as the HEC.