SMBE2018, Yokohama, Japan, website is live
Among the social insects, bees have developed a strong and rich social network, where busy worker bees tend to the queen, who in turn, controls reproduction for the benefit of the hive.
Cancer first develops as a single cell going rogue, with mutations that trigger aggressive growth at all costs to the health of the organism. But if cancer cells were accumulating harmful mutations faster than they could be purged, wouldn’t the population eventually die out?
AbstractWhole-genome sequencing (WGS) is increasingly used to aid the understanding of pathogen transmission. A first step in analyzing WGS data is usually to define “transmission clusters,” sets of cases that are potentially linked by direct transmission. This is often done by including two cases in the same cluster if they are separated by fewer single-nucleotide polymorphisms (SNPs) than a specified threshold. However, there is little agreement as to what an appropriate threshold should be. We propose a probabilistic alternative, suggesting that the key inferential target for transmission clusters is the number of transmissions separating cases. We characterize this by combining the number of SNP differences and the length of time over which those differences have accumulated, using information about case timing, molecular clock, and transmission processes. Our framework has the advantage of allowing for variable mutation rates across the genome and can incorporate other epidemiological data. We use two tuberculosis studies to illustrate the impact of our approach: with British Columbia data by using spatial divisions; with Republic of Moldova data by incorporating antibiotic resistance. Simulation results indicate that our transmission-based method is better in identifying direct transmissions than a SNP threshold, with dissimilarity between clusterings of on average 0.27 bits compared with 0.37 bits for the SNP-threshold method and 0.84 bits for randomly permuted data. These results show that it is likely to outperform the SNP-threshold method where clock rates are variable and sample collection times are spread out. We implement the method in the R package transcluster.
Molecular Biology and Evolution, msy184, https://doi.org/10.1093/molbev/msy184
AbstractThe Arabian Peninsula (AP) was an important crossroad between Africa, Asia, and Europe, being the cradle of the structure defining these main human population groups, and a continuing path for their admixture. The screening of 741,000 variants in 420 Arabians and 80 Iranians allowed us to quantify the dominant sub-Saharan African admixture in the west of the peninsula, whereas South Asian and Levantine/European influence was stronger in the east, leading to a rift between western and eastern sides of the Peninsula. Dating of the admixture events indicated that Indian Ocean slave trade and Islamization periods were important moments in the genetic makeup of the region. The western–eastern axis was also observable in terms of positive selection of diversity conferring lactose tolerance, with the West AP developing local adaptation and the East AP acquiring the derived allele selected in European populations and existing in South Asia. African selected malaria resistance through the DARC gene was enriched in all Arabian genomes, especially in the western part. Clear European influences associated with skin and eye color were equally frequent across the Peninsula.
AbstractWith the desire to model population genetic processes under increasingly realistic scenarios, forward genetic simulations have become a critical part of the toolbox of modern evolutionary biology. The SLiM forward genetic simulation framework is one of the most powerful and widely used tools in this area. However, its foundation in the Wright–Fisher model has been found to pose an obstacle to implementing many types of models; it is difficult to adapt the Wright–Fisher model, with its many assumptions, to modeling ecologically realistic scenarios such as explicit space, overlapping generations, individual variation in reproduction, density-dependent population regulation, individual variation in dispersal or migration, local extinction and recolonization, mating between subpopulations, age structure, fitness-based survival and hard selection, emergent sex ratios, and so forth. In response to this need, we here introduce SLiM 3, which contains two key advancements aimed at abolishing these limitations. First, the new non-Wright–Fisher or “nonWF” model type provides a much more flexible foundation that allows the easy implementation of all of the above scenarios and many more. Second, SLiM 3 adds support for continuous space, including spatial interactions and spatial maps of environmental variables. We provide a conceptual overview of these new features, and present several example models to illustrate their use.
AbstractLarge genomes with elevated mutation rates are prone to accumulating deleterious mutations more rapidly than natural selection can purge (Muller’s ratchet). As a consequence, it may lead to the extinction of small populations. Relative to most unicellular organisms, cancer cells, with large and nonrecombining genome and high mutation rate, could be particularly susceptible to such “mutational meltdown.” However, the most common type of mutation in organismal evolution, namely, deleterious mutation, has received relatively little attention in the cancer biology literature. Here, by monitoring single-cell clones from HeLa cell lines, we characterize deleterious mutations that retard the rate of cell proliferation. The main mutation events are copy number variations (CNVs), which, estimated from fitness data, happen at a rate of 0.29 event per cell division on average. The mean fitness reduction, estimated reaching 18% per mutation, is very high. HeLa cell populations therefore have very substantial genetic load and, at this level, natural population would likely face mutational meltdown. We suspect that HeLa cell populations may avoid extinction only after the population size becomes large enough. Because CNVs are common in most cell lines and tumor tissues, the observations hint at cancer cells’ vulnerability, which could be exploited by therapeutic strategies.
AbstractOrganismal adaptations to new environments often begin with plastic phenotypic changes followed by genetic phenotypic changes, but the relationship between the two types of changes is controversial. Contrary to the view that plastic changes serve as steppingstones to genetic adaptations, recent transcriptome studies reported that genetic gene expression changes more often reverse than reinforce plastic expression changes in experimental evolution. However, it was pointed out that this trend could be an artifact of the statistical nonindependence between the estimates of plastic and genetic phenotypic changes, because both estimates rely on the phenotypic measure at the plastic stage. Using computer simulation, we show that indeed the nonindependence can cause an apparent excess of expression reversion relative to reinforcement. We propose a parametric bootstrap method and show by simulation that it removes the bias almost entirely. Analyzing transcriptome data from a total of 34 parallel lines in 5 experimental evolution studies of Escherichia coli, yeast, and guppies that are amenable to our method confirms that genetic expression changes tend to reverse plastic changes. Thus, at least for gene expression traits, phenotypic plasticity does not generally facilitate genetic adaptation. Several other comparisons of statistically nonindependent estimates are commonly performed in evolutionary genomics such as that between cis- and trans-effects of mutations on gene expression and that between transcriptional and translational effects on gene expression. It is important to validate previous results from such comparisons, and our proposed statistical analyses can be useful for this purpose.
AbstractThe evolution of altruism in complex insect societies is arguably one of the major transitions in evolution and inclusive fitness theory plausibly explains why this is an evolutionary stable strategy. Yet, workers of the South African Cape honey bee (Apis mellifera capensis) can reverse to selfish behavior by becoming social parasites and parthenogenetically producing female offspring (thelytoky). Using a joint mapping and population genomics approach, in combination with a time-course transcript abundance dynamics analysis, we show that a single nucleotide polymorphism at the mapped thelytoky locus (Th) is associated with the iconic thelytokous phenotype. Th forms a linkage group with the ecdysis-triggering hormone receptor (Ethr) within a nonrecombining region under strong selection in the genome. A balanced detrimental allele system plausibly explains why the trait is specific to A. m. capensis and cannot easily establish itself into genomes of other honey bee subspecies.
AbstractMultipartite genomes, containing at least two large replicons, are found in diverse bacteria; however, the advantage of this genome structure remains incompletely understood. Here, we perform comparative genomics of hundreds of finished β-proteobacterial genomes to gain insights into the role and emergence of multipartite genomes. Almost all essential secondary replicons (chromids) of the β-proteobacteria are found in the family Burkholderiaceae. These replicons arose from just two plasmid acquisition events, and they were likely stabilized early in their evolution by the presence of core genes. On average, Burkholderiaceae genera with multipartite genomes had a larger total genome size, but smaller chromosome, than genera without secondary replicons. Pangenome-level functional enrichment analyses suggested that interreplicon functional biases are partially driven by the enrichment of secondary replicons in the accessory pangenome fraction. Nevertheless, the small overlap in orthologous groups present in each replicon’s pangenome indicated a clear functional separation of the replicons. Chromids appeared biased to environmental adaptation, as the functional categories enriched on chromids were also overrepresented on the chromosomes of the environmental genera (Paraburkholderia and Cupriavidus) compared with the pathogenic genera (Burkholderia and Ralstonia). Using ancestral state reconstruction, it was predicted that the rate of accumulation of modern-day genes by chromids was more rapid than the rate of gene accumulation by the chromosomes. Overall, the data are consistent with a model where the primary advantage of secondary replicons is in facilitating increased rates of gene acquisition through horizontal gene transfer, consequently resulting in replicons enriched in genes associated with adaptation to novel environments.
AbstractRecombination is expected to affect functional sequence evolution in several ways. On the one hand, recombination is thought to improve the efficiency of multilocus selection by dissipating linkage disequilibrium. On the other hand, natural selection can be counteracted by recombination-associated transmission distorters such as GC-biased gene conversion (gBGC), which tends to promote G and C alleles irrespective of their fitness effect in high-recombining regions. It has been suggested that gBGC might impact coding sequence evolution in vertebrates, and particularly the ratio of nonsynonymous to synonymous substitution rates (dN/dS). However, distinctive gBGC patterns have been reported in mammals and birds, maybe reflecting the documented contrasts in evolutionary dynamics of recombination rate between these two taxa. Here, we explore how recombination and gBGC affect coding sequence evolution in mammals and birds by analyzing proteome-wide data in six species of Galloanserae (fowls) and six species of catarrhine primates. We estimated the dN/dS ratio and rates of adaptive and nonadaptive evolution in bins of genes of increasing recombination rate, separately analyzing AT → GC, GC → AT, and G ↔ C/A ↔ T mutations. We show that in both taxa, recombination and gBGC entail a decrease in dN/dS. Our analysis indicates that recombination enhances the efficiency of purifying selection by lowering Hill–Robertson effects, whereas gBGC leads to an overestimation of the adaptive rate of AT → GC mutations. Finally, we report a mutagenic effect of recombination, which is independent of gBGC.
AbstractVertebrates have four classes of cone opsin genes derived from two rounds of genome duplication. These are short wavelength sensitive 1(SWS1), short wavelength sensitive 2(SWS2), medium wavelength sensitive (RH2), and long wavelength sensitive (LWS). Teleosts had another genome duplication at their origin and it is believed that only one of each cone opsin survived the ancestral teleost duplication event. We tested this by examining the retinal cones of a basal teleost group, the osteoglossomorphs. Surprisingly, this lineage has lost the typical vertebrate green-sensitive RH2 opsin gene and, instead, has a duplicate of the LWS opsin that is green sensitive. This parallels the situation in mammalian evolution in which the RH2 opsin gene was lost in basal mammals and a green-sensitive opsin re-evolved in Old World, and independently in some New World, primates from an LWS opsin gene. Another group of fish, the characins, possess green-sensitive LWS cones. Phylogenetic analysis shows that the evolution of green-sensitive LWS opsins in these two teleost groups derives from a common ancestral LWS opsin that acquired green sensitivity. Additionally, the nocturnally active African weakly electric fish (Mormyroideae), which are osteoglossomorphs, show a loss of the SWS1 opsin gene. In comparison with the independently evolved nocturnally active South American weakly electric fish (Gymnotiformes) with a functionally monochromatic LWS opsin cone retina, the presence of SWS2, LWS, and LWS2 cone opsins in mormyrids suggests the possibility of color vision.
AbstractPleiotropy is the well-established idea that a single mutation affects multiple phenotypes. If a mutation has opposite effects on fitness when expressed in different contexts, then genetic conflict arises. Pleiotropic conflict is expected to reduce the efficacy of selection by limiting the fixation of beneficial mutations through adaptation, and the removal of deleterious mutations through purifying selection. Although this has been widely discussed, in particular in the context of a putative “gender load,” it has yet to be systematically quantified. In this work, we empirically estimate to which extent different pleiotropic regimes impede the efficacy of selection in Drosophila melanogaster. We use whole-genome polymorphism data from a single African population and divergence data from D. simulans to estimate the fraction of adaptive fixations (α), the rate of adaptation (ωA), and the direction of selection (DoS). After controlling for confounding covariates, we find that the different pleiotropic regimes have a relatively small, but significant, effect on selection efficacy. Specifically, our results suggest that pleiotropic sexual antagonism may restrict the efficacy of selection, but that this conflict can be resolved by limiting the expression of genes to the sex where they are beneficial. Intermediate levels of pleiotropy across tissues and life stages can also lead to maladaptation in D. melanogaster, due to inefficient purifying selection combined with low frequency of mutations that confer a selective advantage. Thus, our study highlights the need to consider the efficacy of selection in the context of antagonistic pleiotropy, and of genetic conflict in general.
AbstractThe rate of evolution varies among sites within proteins. In enzymes, two rate gradients are observed: rate decreases with increasing local packing and it increases with increasing distance from catalytic residues. The rate-packing gradient would be mainly due to stability constraints and is well reproduced by biophysical models with selection for protein stability. However, stability constraints are unlikely to account for the rate-distance gradient. Here, to explore the mechanistic underpinnings of the rate gradients observed in enzymes, I propose a stability–activity model of enzyme evolution, MSA. This model is based on a two-dimensional fitness function that depends on stability, quantified by ΔG, the enzyme’s folding free energy, and activity, quantified by ΔG*, the activation energy barrier of the enzymatic reaction. I test MSA on a diverse data set of enzymes, comparing it with two simpler models: MS, which depends only on ΔG, and MA, which depends only on ΔG*. I found that MSA clearly outperforms both MS and MA and it accounts for both the rate-packing and rate-distance gradients. Thus, MSA captures the distribution of stability and activity constraints within enzymes, explaining the resulting patterns of rate variation among sites.
AbstractLong-term suppression of recombination ultimately leads to gene loss, as demonstrated by the depauperate Y and W chromosomes of long-established pairs of XY and ZW chromosomes. The young social supergene of the Solenopsis invicta red fire ant provides a powerful system to examine the effects of suppressed recombination over a shorter timescale. The two variants of this supergene are carried by a pair of heteromorphic chromosomes, referred to as the social B and social b (SB and Sb) chromosomes. The Sb variant of this supergene changes colony social organization and has an inheritance pattern similar to a Y or W chromosome because it is unable to recombine. We used high-resolution optical mapping, k-mer distribution analysis, and quantification of repetitive elements on haploid ants carrying alternate variants of this young supergene region. We find that instead of shrinking, the Sb variant of the supergene has increased in length by more than 30%. Surprisingly, only a portion of this length increase is due to consistent increases in the frequency of particular classes of repetitive elements. Instead, haplotypes of this supergene variant differ dramatically in the amounts of other repetitive elements, indicating that the accumulation of repetitive elements is a heterogeneous and dynamic process. This is the first comprehensive demonstration of degenerative expansion in an animal and shows that it occurs through nonlinear processes during the early evolution of a region of suppressed recombination.
AbstractRibulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) is considered to be the most abundant enzyme on Earth. Despite this, its full diversity and distribution across the domains of life remain to be determined. Here, we leverage a large set of bacterial, archaeal, and viral genomes recovered from the environment to expand our understanding of existing RuBisCO diversity and the evolutionary processes responsible for its distribution. Specifically, we report a new type of RuBisCO present in Candidate Phyla Radiation (CPR) bacteria that is related to the archaeal Form III enzyme and contains the amino acid residues necessary for carboxylase activity. Genome-level metabolic analyses supported the inference that these RuBisCO function in a CO2-incorporating pathway that consumes nucleotides. Importantly, some Gottesmanbacteria (CPR) also encode a phosphoribulokinase that may augment carbon metabolism through a partial Calvin–Benson–Bassham cycle. Based on the scattered distribution of RuBisCO and its discordant evolutionary history, we conclude that this enzyme has been extensively laterally transferred across the CPR bacteria and DPANN archaea. We also report RuBisCO-like proteins in phage genomes from diverse environments. These sequences cluster with proteins in the Beckwithbacteria (CPR), implicating phage as a possible mechanism of RuBisCO transfer. Finally, we synthesize our metabolic and evolutionary analyses to suggest that lateral gene transfer of RuBisCO may have facilitated major shifts in carbon metabolism in several important bacterial and archaeal lineages.
AbstractChanges in gene regulation have long been thought to play an important role in primate evolution. However, although a number of studies have compared genome-wide gene expression patterns across primate species, fewer have investigated the gene regulatory mechanisms that underlie such patterns, or the relative contribution of drift versus selection. Here, we profiled genome-scale DNA methylation levels in blood samples from five of the six extant species of the baboon genus Papio (4–14 individuals per species). This radiation presents the opportunity to investigate DNA methylation divergence at both shallow and deeper timescales (0.380–1.4 My). In contrast to studies in human populations, but similar to studies in great apes, DNA methylation profiles clearly mirror genetic and geographic structure. Divergence in DNA methylation proceeds fastest in unannotated regions of the genome and slowest in regions of the genome that are likely more constrained at the sequence level (e.g., gene exons). Both heuristic approaches and Ornstein–Uhlenbeck models suggest that DNA methylation levels at a small set of sites have been affected by positive selection, and that this class is enriched in functionally relevant contexts, including promoters, enhancers, and CpG islands. Our results thus indicate that the rate and distribution of DNA methylation changes across the genome largely mirror genetic structure. However, at some CpG sites, DNA methylation levels themselves may have been a target of positive selection, pointing to loci that could be important in connecting sequence variation to fitness-related traits.
AbstractIdentification of orthologous or paralogous relationships of coding genes is fundamental to all aspects of comparative genomics. For accurate identification of orthologs among deeply diversified bilaterian lineages, precise estimation of gene trees is indispensable, given the complicated histories of genes over millions of years. By estimating gene trees, orthologs can be identified as members of an orthogroup, a set of genes descended from a single gene in the last common ancestor of all the species being considered. In addition to comparisons with a given species tree, purposeful taxonomic sampling increases the accuracy of gene tree estimation and orthogroup identification. Although some major phylogenetic relationships of bilaterians are gradually being unraveled, the scattering of published genomic data among separate web databases is becoming a significant hindrance to identification of orthogroups with appropriate taxonomic sampling. By integrating more than 250 metazoan gene models predicted in genome projects, we developed a web tool called ORTHOSCOPE to identify orthogroups of specific protein-coding genes within major bilaterian lineages. ORTHOSCOPE allows users to employ several sequences of a specific molecule and broadly accepted nodes included in a user-specified species tree as queries and to evaluate the reliability of estimated orthogroups based on topologies and node support values of estimated gene trees. A test analysis using data from 36 bilaterians was accomplished within 140 s. ORTHOSCOPE results can be used to evaluate orthologs identified by other stand-alone programs using genome-scale data. ORTHOSCOPE is freely available at https://www.orthoscope.jp or https://github.com/jun-inoue/orthoscope (last accessed December 28, 2018).
AbstractThe ubiquity of plasmids in all prokaryotic phyla and habitats and their ability to transfer between cells marks them as prominent constituents of prokaryotic genomes. Many plasmids are found in their host cell in multiple copies. This leads to an increased mutational supply of plasmid-encoded genes and genetically heterogeneous plasmid genomes. Nonetheless, the segregation of plasmid copies into daughter cells during cell division is considered to occur in the absence of selection on the plasmid alleles. We investigate the implications of random genetic drift of multicopy plasmids during cell division—termed here “segregational drift”—to plasmid evolution. Performing experimental evolution of low- and high-copy non-mobile plasmids in Escherichia coli, we find that the evolutionary rate of multicopy plasmids does not reflect the increased mutational supply expected according to their copy number. In addition, simulated evolution of multicopy plasmid alleles demonstrates that segregational drift leads to increased loss frequency and extended fixation time of plasmid mutations in comparison to haploid chromosomes. Furthermore, an examination of the experimentally evolved hosts reveals a significant impact of the plasmid type on the host chromosome evolution. Our study demonstrates that segregational drift of multicopy plasmids interferes with the retention and fixation of novel plasmid variants. Depending on the selection pressure on newly emerging variants, plasmid genomes may evolve slower than haploid chromosomes, regardless of their higher mutational supply. We suggest that plasmid copy number is an important determinant of plasmid evolvability due to the manifestation of segregational drift.
AbstractThe origin and population history of the endangered golden snub-nosed monkey (Rhinopithecus roxellana) remain largely unavailable and/or controversial. We here integrate analyses of multiple genomic markers, including mitochondrial (mt) genomes, Y-chromosomes, and autosomes of 54 golden monkey individuals from all three geographic populations (SG, QL, and SNJ). Our results reveal contrasting population structures. Mt analyses suggest a division of golden monkeys into five lineages: one in SNJ, two in SG, and two in QL. One of the SG lineages (a mixed SG/QL lineage) is basal to all other lineages. In contrast, autosomal analyses place SNJ as the most basal lineage and identify one QL and three SG lineages. Notably, Y-chromosome analyses bear features similar to mt analyses in placing the SG/QL-mixed lineage as the first diverging lineage and dividing SG into two lineages, while resembling autosomal analyses in identifying one QL lineage. We further find bidirectional gene flow among all three populations at autosomal loci, while asymmetric gene flow is suggested at mt genomes and Y-chromosomes. We propose that different population structures and gene flow scenarios are the result of sex-linked differences in the dispersal pattern of R. roxellana. Moreover, our demographic simulation analyses support an origin hypothesis suggesting that the ancestral R. roxellana population was once widespread and then divided into SNJ and non-SNJ (SG and QL) populations. This differs from previous mt-based “mono-origin (SG is the source population)” and “multiorigin (SG is a fusion of QL and SNJ)” hypotheses. We provide a detailed and refined scenario for the origin and population history of this endangered primate species, which has a broader significance for Chinese biogeography. In addition, this study highlights the importance to investigate multiple genomic markers with different modes of inheritance to trace the complete evolutionary history of a species, especially for those exhibiting differential or mixed patterns of sex dispersal.
AbstractUntangling the functional basis of divergence between closely related species is a step toward understanding species dynamics within communities at both the evolutionary and ecological scales. We investigated cellular (i.e., growth, domoic acid production, and nutrient consumption) and molecular (transcriptomic analyses) responses to varying nutrient concentrations across several strains belonging to three species of the toxic diatom genus Pseudo-nitzschia. Three main results were obtained. First, strains from the same species displayed similar transcriptomic, but not necessarily cellular, responses to the experimental conditions. It showed the importance of considering intraspecific diversity to investigate functional divergence between species. Second, a major exception to the first finding was a strain recently isolated from the natural environment and displaying contrasting gene expression patterns related to cell motility and domoic acid production. This result illustrated the profound modifications that may occur when transferring a cell from the natural to the in vitro environment and asks for future studies to better understand the influence of culture duration and life cycle on expression patterns. Third, transcriptomic responses were more similar between the two species displaying similar ecology in situ, irrespective of the genetic distance. This was especially true for molecular responses related to TCA cycle, photosynthesis, and nitrogen metabolism. However, transcripts related to phosphate uptake were variable between species. It highlighted the importance of considering both overall genetic distance and ecological divergence to explain functional divergence between species.
AbstractFungal reproduction is regulated by the mating-type (MAT1) locus, which typically comprises two idiomorphic genes. The presence of one or both allelic variants at the locus determines the reproductive strategy in fungi—homothallism versus heterothallism. It has been hypothesized that self-fertility via homothallism is widespread in lichen-forming fungi. To test this hypothesis, we characterized the MAT1 locus of 41 genomes of lichen-forming fungi representing a wide range of growth forms and reproductive strategies in the class Lecanoromycetes, the largest group of lichen-forming fungi. Our results show the complete lack of genetic homothallism suggesting that lichens evolved from a heterothallic ancestor. We argue that this may be related to the symbiotic lifestyle of these fungi, and may be a key innovation that has contributed to the accelerated diversification rates in this fungal group.
AbstractThe nuclear pore complex (NPC) is a large macromolecular assembly situated within the pores of the nuclear envelope. Through interactions between its subcomplexes and import proteins, the NPC mediates the transport of molecules into and out of the nucleus and facilitates dynamic chromatin regulation and gene expression. Accordingly, the NPC constitutes a highly integrated nuclear component that is ubiquitous and conserved among eukaryotes. Potential exceptions to this are nucleomorphs: Highly reduced, relict nuclei that were derived from green and red algae following their endosymbiotic integration into two lineages, the chlorarachniophytes and the cryptophyceans. A previous investigation failed to identify NPC genes in nucleomorph genomes suggesting that these genes have either been relocated to the host nucleus or lost. Here, we sought to investigate the composition of the NPC in nucleomorphs by using genomic and transcriptomic data to identify and phylogenetically classify NPC proteins in nucleomorph-containing algae. Although we found NPC proteins in all examined lineages, most of those found in chlorarachniophytes and cryptophyceans were single copy, host-related proteins that lacked signal peptides. Two exceptions were Nup98 and Rae1, which had clear nucleomorph-derived homologs. However, these proteins alone are likely insufficient to structure a canonical NPC and previous reports revealed that Nup98 and Rae1 have other nuclear functions. Ultimately, these data indicate that nucleomorphs represent eukaryotic nuclei without a canonical NPC, raising fundamental questions about their structure and function.
AbstractWolbachia, an alpha-proteobacterium closely related to Rickettsia, is a maternally transmitted, intracellular symbiont of arthropods and nematodes. Aedes albopictus mosquitoes are naturally infected with Wolbachia strains wAlbA and wAlbB. Cell line Aa23 established from Ae. albopictus embryos retains only wAlbB and is a key model to study host–endosymbiont interactions. We have assembled the complete circular genome of wAlbB from the Aa23 cell line using long-read PacBio sequencing at 500× median coverage. The assembled circular chromosome is 1.48 megabases in size, an increase of more than 300 kb over the published draft wAlbB genome. The annotation of the genome identified 1,205 protein coding genes, 34 tRNA, 3 rRNA, 1 tmRNA, and 3 other ncRNA loci. The long reads enabled sequencing over complex repeat regions which are difficult to resolve with short-read sequencing. Thirteen percent of the genome comprised insertion sequence elements distributed throughout the genome, some of which cause pseudogenization. Prophage WO genes encoding some essential components of phage particle assembly are missing, while the remainder are found in five prophage regions/WO-like islands or scattered around the genome. Orthology analysis identified a core proteome of 535 orthogroups across all completed Wolbachia genomes. The majority of proteins could be annotated using Pfam and eggNOG analyses, including ankyrins and components of the Type IV secretion system. KEGG analysis revealed the absence of five genes in wAlbB which are present in other Wolbachia. The availability of a complete circular chromosome from wAlbB will enable further biochemical, molecular, and genetic analyses on this strain and related Wolbachia.
AbstractThe PSD-95/Dlg-A/ZO-1 (PDZ) domain is highly expanded, diversified, and well distributed across metazoa where it assembles diverse signaling components by virtue of interactions with other proteins in a sequence-specific manner. In contrast, in the microbial world they are reported to be involved in protein quality control during stress response. The distribution, functions, and origins of PDZ domain-containing proteins in the prokaryotic organisms remain largely unexplored. We analyzed 7,852 PDZ domain-containing proteins in 1,474 microbial genomes in this context. PDZ domain-containing proteins from planctomycetes, myxobacteria, and other eubacteria occupying terrestrial and aquatic niches are found to be in multiple copies within their genomes. Over 93% of the 7,852 PDZ domain-containing proteins were classified into 12 families including six novel families based on additional structural and functional domains present in these proteins. The higher PDZ domain encoding capacity of the investigated organisms was observed to be associated with adaptation to the ecological niche where multicellular life might have originated and flourished. Predicted subcellular localization of PDZ domain-containing proteins and their genomic context argue in favor of crucial roles in translation and membrane remodeling during stress response. Based on rigorous sequence, structure, and phylogenetic analyses, we propose that the highly diverse PDZ domain of the uncharacterized Fe–S oxidoreductase superfamily, exclusively found in gladobacteria and several anaerobes and acetogens, might represent the most ancient form among all the existing PDZ domains.
AbstractIn the nucleus of eukaryotic cells, genomic DNA associates with numerous protein complexes and RNAs, forming the chromatin landscape. Through a genome-wide study of chromatin-associated proteins in Drosophila cells, five major chromatin types were identified as a refinement of the traditional binary division into hetero- and euchromatin. These five types were given color names in reference to the Greek word chroma. They are defined by distinct but overlapping combinations of proteins and differ in biological and biochemical properties, including transcriptional activity, replication timing, and histone modifications. In this work, we assess the evolutionary relationships of chromatin-associated proteins and present an integrated view of the evolution and conservation of the fruit fly Drosophila melanogaster chromatin landscape. We combine homology prediction across a wide range of species with gene age inference methods to determine the origin of each chromatin-associated protein. This provides insight into the evolution of the different chromatin types. Our results indicate that for the euchromatic types, YELLOW and RED, young associated proteins are more specialized than old ones; and for genes found in either chromatin type, intron/exon structure is lineage-specific. Next, we provide evidence that a subset of GREEN-associated proteins is involved in a centromere drive in D. melanogaster. Our results on BLUE chromatin support the hypothesis that the emergence of Polycomb Group proteins is linked to eukaryotic multicellularity. In light of these results, we discuss how the regulatory complexification of chromatin links to the origins of eukaryotic multicellularity.
AbstractAmoebiasis is the third-most common cause of mortality worldwide from a parasitic disease. Although the primary etiological agent of amoebiasis is the obligate human parasite Entamoeba histolytica, other members of the genus Entamoeba can infect humans and may be pathogenic. Here, we present the first annotated reference genome for Entamoeba moshkovskii, a species that has been associated with human infections, and compare the genomes of E. moshkovskii, E. histolytica, the human commensal Entamoeba dispar, and the nonhuman pathogen Entamoeba invadens. Gene clustering and phylogenetic analyses show differences in expansion and contraction of families of proteins associated with host or bacterial interactions. They intimate the importance to parasitic Entamoeba species of surface-bound proteins involved in adhesion to extracellular membranes, such as the Gal/GalNAc lectin and members of the BspA and Ariel1 families. Furthermore, E. dispar is the only one of the four species to lack a functional copy of the key virulence factor cysteine protease CP-A5, whereas the gene’s presence in E. moshkovskii is consistent with the species’ potentially pathogenic nature. Entamoeba moshkovskii was found to be more diverse than E. histolytica across all sequence classes. The former is ∼200 times more diverse than latter, with the four E. moshkovskii strains tested having a most recent common ancestor nearly 500 times more ancient than the tested E. histolytica strains. A four-haplotype test indicates that these E. moshkovskii strains are not the same species and should be regarded as a species complex.
AbstractCancer is a threat to multicellular organisms, yet the molecular evolution of pathways that prevent the accumulation of genetic damage has been largely unexplored. The p53 network regulates how cells respond to DNA-damaging stressors. We know little about p53 network molecular evolution as a whole. In this study, we performed comparative genetic analyses of the p53 network to quantify the number of genes within the network that are rapidly evolving and constrained, and the association between lifespan and the patterns of evolution. Based on our previous published data set, we used genomes and transcriptomes of 34 sauropsids and 32 mammals to analyze the molecular evolution of 45 genes within the p53 network. We found that genes in the network exhibited evidence of positive selection and divergent molecular evolution in mammals and sauropsids. Specifically, we found more evidence of positive selection in sauropsids than mammals, indicating that sauropsids have different targets of selection. In sauropsids, more genes upstream in the network exhibited positive selection, and this observation is driven by positive selection in squamates, which is consistent with previous work showing rapid divergence and adaptation of metabolic and stress pathways in this group. Finally, we identified a negative correlation between maximum lifespan and the number of genes with evidence of divergent molecular evolution, indicating that species with longer lifespans likely experienced less variation in selection across the network. In summary, our study offers evidence that comparative genomic approaches can provide insights into how molecular networks have evolved across diverse species.
AbstractMulticellular organisms depend on oxygen-carrying proteins to transport oxygen throughout the body; therefore, proteins such as hemoglobins (Hbs), hemocyanins, and hemerythrins are essential for maintenance of tissues and cellular respiration. Vertebrate Hbs are among the most extensively studied proteins; however, much less is known about invertebrate Hbs. Recent studies of hemocyanins and hemerythrins have demonstrated that they have much wider distributions than previously thought, suggesting that oxygen-binding protein diversity is underestimated across metazoans. Hexagonal bilayer hemoglobin (HBL-Hb), a blood pigment found exclusively in annelids, is a polymer comprised up to 144 extracellular globins and 36 linker chains. To further understand the evolutionary history of this protein complex, we explored the diversity of linkers and extracellular globins from HBL-Hbs using in silico approaches on 319 metazoan and one choanoflagellate transcriptomes. We found 559 extracellular globin and 414 linker genes transcribed in 171 species from ten animal phyla with new records in Echinodermata, Hemichordata, Brachiopoda, Mollusca, Nemertea, Bryozoa, Phoronida, Platyhelminthes, and Priapulida. Contrary to previous suggestions that linkers and extracellular globins emerged in the annelid ancestor, our findings indicate that they have putatively emerged before the protostome–deuterostome split. For the first time, we unveiled the comprehensive evolutionary history of metazoan HBL-Hb components, which consists of multiple episodes of gene gains and losses. Moreover, because our study design surveyed linkers and extracellular globins independently, we were able to cross-validate our results, significantly reducing the rate of false positives. We confirmed that the distribution of HBL-Hb components has until now been underestimated among animals.
AbstractHuman skin is morphologically and physiologically different from the skin of other primates. However, the genetic causes underlying human-specific skin characteristics remain unclear. Here, we quantitatively demonstrate that the epidermis and dermis of human skin are significantly thicker than those of three Old World monkey species. In addition, we indicate that the topography of the epidermal basement membrane zone shows a rete ridge in humans but is flat in the Old World monkey species examined. Subsequently, we comprehensively compared gene expression levels between human and nonhuman great ape skin using next-generation cDNA sequencing (RNA-Seq). We identified four structural protein genes associated with the epidermal basement membrane zone or elastic fibers in the dermis (COL18A1, LAMB2, CD151, and BGN) that were expressed significantly greater in humans than in nonhuman great apes, suggesting that these differences may be related to the rete ridge and rich elastic fibers present in human skin. The rete ridge may enhance the strength of adhesion between the epidermis and dermis in skin. This ridge, along with a thick epidermis and rich elastic fibers might contribute to the physical strength of human skin with a low amount of hair. To estimate transcriptional regulatory regions for COL18A1, LAMB2, CD151, and BGN, we examined conserved noncoding regions with histone modifications that can activate transcription in skin cells. Human-specific substitutions in these regions, especially those located in binding sites of transcription factors which function in skin, may alter the gene expression patterns and give rise to the human-specific adaptive skin characteristics.