Does the high gene density in the sponge NK homeobox gene cluster reflect limited regulatory capacity?
Our understanding of the evolutionary transition to metazoan multicellularity is currently being transformed by comparative analyses of whole genomes (e.g., Putnam et al., 2007; King et al., 2008). To understand the events leading to the emergence of the most recent common ancestor to all living animals, comparisons need to be made between metazoans and close relatives within the Holozoa, which include choanoflagellates and members of related opisthokont lineages (e.g., Cavalier-Smith et al., 1996; King and Carroll, 2001; Snell et al., 2001; Lang et al., 2002; Burger et al., 2003; Cavalier-Smith and Chao, 2003; King et al., 2003; Steenkamp et al, 2006). The Metazoa consists of at least two ancient lineages of extant animals, the Eumetazoa (cnidarians, placozoans, and bilaterian phyla) and phylum Porifera (sponges) (e.g., Cavalier-Smith et al., 1996; Borchiellini et al., 2001; Medina et al., 2001; Collins, 2002; Wallberg et al., 2004). The recent sequencing of the genomes of the choanoflagellate Monosiga brevicollis (King et al., 2008), the sponge Amphimedon queenslandica, the cnidarians Nematostella vectensis (Putnam et al., 2007) and Hydra magnipapillata, and the placozoan Trichoplax adhaerens is allowing us to decipher the genomic events underpinning the transition to metazoan multicellularity.
It appears that a raft of genomic innovations occurred in the stem lineage leading to the metazoan last common ancestor. Sponges possess metazoan-specific transcription factors and signaling pathways (Larroux et al., 2006; Nichols et al., 2006; Adamska et al., 2007a, b; Larroux et al., 2007, 2008) whose orthologs populate the developmental gene regulatory networks (GRNs) underlying bilaterian embryogenesis (Davidson, 2006, and references therein). In contrast, the Monosiga brevicollis genome contains only a very small subset of these gene classes, despite encoding many cell adhesion and communication protein domains that are otherwise restricted to metazoans (King and Carroll, 2001; King et al., 2003, 2008; King, 2004; Larroux et al., 2008). Amphimedon possesses a large majority of transcription factor gene classes that have previously been found only in eumetazoans, including homeobox genes belonging to ANTP, prd-like, Pax, POU, LIM-HD, Six, and TALE classes, as well as basic helix-loop helix (bHLH), Rel, nuclear receptor, Mef2, Ets, Sox, T-box, and Fox genes (Larroux et al., 2006, 2007, 2008; Simionato et al., 2007; Gauthier and Degnan, 2008). Likewise, genes encoding components of all the major eumetazoan developmental signaling pathways are present in the Amphimedon genome (Adamska et al., 2007a, b; Adamska, Richards, Gauthier, and Degnan, unpub1.). Together, these observations indicate that the repertoire of gene classes comprising the developmental regulatory toolkit evolved before the divergence of sponge and eumetazoan lineages. The genesis of many of these metazoan-specific gene classes may have provided the molecular preadaptations that enabled the evolution of metazoan development and multicellularity. On the basis of the number of developmental gene families represented in the sponge genome, it is not hard to envisage the last common ancestor to all extant metazoans being developmentally and morphologically complex (Adamska et al., 2007a; Degnan and Degnan, 2006; Degnan et al., 2005).
Despite the qualitative conservation of the metazoan developmental "regulome" between sponges and eumetazoans, there must be inherent differences in the genomes of the ancestors that gave rise to these lineages. Why is it that the eumetazoan lineage consists of a wide diversity of body plans while sponge morphologies represent modifications of a unique aquiferous body plan that lacks cellular and morphological features found elsewhere in the animal kingdom (Simpson, 1984)? Why has the sponge body plan remained relatively unchanged since well before the Cambrian (Li et al., 1998)? These fundamental differences in complexity and diversity should be manifested in extant genomes. One obvious difference is that representative eumetazoans have a much larger number of developmental genes compared to Amphimedon, with individual transcription factor and signaling pathway gene classes and families having expanded differentially early in the eumetazoan lineage (Kusserow et al., 2005; Magie et al., 2005; Miller et al., 2005; Chourrout et al., 2006; Kamm et al., 2006; Ryan et al., 2006; Larroux et al., 2007, 2008; Putnam et al., 2007; Yamada et al., 2007). This increase in gene repertoire through duplication and divergence may have allowed for (i) the expansion of ancient GRNs that originally would have underpinned the primary generation and patterning of differentiated cell types in the first metazoan embryos, and (ii) the co-option of duplicates into novel developmental roles. Cnidarians and bilaterians have strikingly similar gene memberships (Kusserow et al., 2005; Magie et al., 2005; Miller et al., 2005; Chourrout et al., 2006; Kamm et al., 2006; Ryan et al., 2006; Larroux et al., 2007; Putnam et al., 2007; Yamada et al., 2007, 2008), suggesting that body plan complexity cannot solely be attributed to the growth in gene class and family size. Another possibility is that individual genes expanded their regulatory systems, allowing for their expression in multiple developmental contexts (i.e., individual genes were co-opted into new developmental roles). This increase in regulatory information is likely to manifest as an increase in the length of DNA sequence responsible for the regulation of a particular gene.
Critical cis-regulatory DNA sequences tend to be located in the vicinity of the transcription start site, often just upstream, although other regulatory information can be localized at a great distance from the coding region (reviewed in Davidson, 2006). These sequences act as binding sites for sequence-specific transcription factors that directly control the activation and repression of transcription. Detailed experimental analyses of putative regulatory regions of particular genes in a handful of animals (i.e., sea urchin, ascidian, Caenorhabditis elegans, Drosophila, and model vertebrates) have defined the role of proximal and distant cis-regulatory elements in controlling gene expression. While such analyses currently are not possible in Amphimedon, we can use expression patterns as a proxy to predict the complexity of the regulatory information for a given gene. In bilaterians, most transcription factor genes are used in a number of developmental contexts, with each context requiring its own cis-regulatory module (Davidson, 2006). Here, we explore this concept using a set of NK homeobox genes that are clustered in the genome of Amphimedon (Larroux et al, 2007). NK genes are used at all levels of the bilaterian developmental program, from early germ layer formation to terminal differentiation, and are involved in cell fate determination and differentiation, patterning and morphogenetic processes (reviewed in Banerjee-Basu and Baxevanis, 2001; Jagla et al., 2001; Carroll et al., 2005; Garcia-Fernandez, 2005; Stanfel et al., 2005; Slack, 2006). They are expressed in all three germ layers, and their roles in mesodermal and nervous system development are well studied.
By comparing the upstream intergenic regions of NK genes in Amphimedon, the cnidarian Nematostella vectensis, and various bilaterians covering a range of genome sizes, we show that the Amphimedon genes have markedly smaller intergenic regions. To test whether this trend is specific to developmental genes, we also performed these analyses on five non-homeobox structural genes, which are clustered with the NK genes in Amphimedon. No significant differences were detected for these genes between different organisms, suggesting that the size difference in regulatory regions between sponges and eumetazoans applies specifically to regulatory genes that may belong to developmental GRNs. The relatively simple expression patterns of a selection of Amphimedon NK genes during embryogenesis support the contention that the small intergenic regions may reflect the limited cis-regulatory information associated with these genes. Genome-wide studies in Drosophila and C. elegans have made a similar correlation between gene regulatory complexity and intergenic sequence size (Nelson et al., 2004). Although this case study is restricted to a very small number of genes and thus may not reflect broad genomic principles, it does present a supposition that can be addressed with genome-wide comparative analyses.
Materials and Methods
Analysis of the Amphimedon NK cluster
The Amphimedon cluster of NK homeobox genes was initially assembled using an in-house assembly and scaffolding pipeline as described in Larroux et al. (2007). A draft genome assembly from the US Department of Energy Joint Genome Institute was later consulted to confirm the in-house assembly. We employed the AUGUSTUS gene prediction program to further systematically evaluate this assembly (Stanke et al., 2006) and manually modified models to incorporate regions of homology suggested by BLASTx alignments to sequences in the National Center for Biotechnology Information database. Where available, expressed sequence tags (ESTs) and 5' and 3' RACE sequences (homeobox genes only) were used to confirm coding sequences and to derive 5' and 3' untranslated regions.
Analysis of the expression of NK-related genes
Specimens of Amphimedon queenslandica Hooper and van Soest, 2006 (Porifera, Demospongiae, Haplosclerida, Niphatidae) were collected on Heron Island Reef, Great Barrier Reef, Australia, as described in Leys and Degnan (2002), and in situ hybridization was performed as described in Larroux et al. (2006). Detailed protocols and probe details are available upon request.
Sequence alignments (tBlastn) using homeodomains of proteins belonging to NK2, NK3, NK4, NK5, NK6, NK7, Msx, Hex, and Tlx families were derived from Lottia gigantea, Branchiostoma floridae, and Ciona intestinalis genome contigs available at http://genome.jgi-psf.org/ and from Caenorhabditis elegans and Drosophila melanogaster genomes at WormBase (http://www.wormbase.org/) and FlyBase (http://www.flybase.org/), respectively. Genes belonging to these NK families were previously characterized in the Nematostella vectensis genome by Ryan et al. (2006; see also the phylogenetic analysis in Larroux et al., 2007). Similarly, we searched these genomes (and the Nematostella vectensis genome) for orthologs of kinesin 2 KIF3B/C, tetracyclin resistance, Vacuolar Protein Sorting 8, Inositol Polyphosphate-5-Phosphatase A, and kinesin 5 KIF 11. Identified sequences were classified into the different families using BlastP and/or a neighbor-joining phylogenetic tree (data not shown). Upstream intergenic region length and orientation of upstream gene were then determined through visual inspection of gene models in the genome browsers.
Results and Discussion
Structure and composition of the Amphimedon NK cluster
The demosponge Amphimedon queenslandica has 8 NK genes in its genome, of which 6 are clustered within a 71-kb stretch of DNA (Fig. 1; Larroux et al., 2007). As this cluster is similar to that found in bilaterians, we infer that this is a metazoan synapomorphy. There is no evidence of other ANTP class homeobox genes--Hox, ParaHox and EHG-box--in the sponge genome, thus we also hypothesize that the eumetazoan ProtoHox gene originated from within the ancestral NK cluster after the divergence of sponge and eumetazoan lineages. The Amphimedon NK cluster consists of NK2/3/4-, Msx-, Hex-, Tlx-, and two NK5/6/7-related genes, one of which encodes two homeodomains and the other one of which has two different splice forms (Fig. 1). This cluster is markedly smaller than the inferred ancestral bilaterian cluster (Luke et al., 2003; Garcia-Fernandez, 2005; Larroux et al., 2007) and has a number of non-homeobox genes located within it. To illustrate this difference in size, we mapped the genomic regions corresponding to two conserved gene linkages, Tlx and NK5/6/7 (Fig. 2A) and Msx and Hex (Fig. 2B) in Amphimedon and in the chordate Branchiostoma floridae. Between Tlx and NK5/6/7, there are more genes in Amphimedon than in Branchiostoma (5 vs. 2) and the distance between the two genes is shorter in Amphimedon than in Branchiostoma (26 kb vs. 86 kb). Branchiostoma has two Hex-Msx clusters probably resulting from duplication of the genomic region. Not only is the genomic region between the two genes larger in this chordate than in Amphimedon (195-220 kb vs. 25 kb), but it also contains many more genes (11-15 vs. 4). A striking feature of the Amphimedon cluster is the high density of genes and the limited amount of intergenic DNA. In terms of the homeobox genes AmqNK2/3/4, AmqMsx, AmqHex, AmqTlx, AmqNK5/6/7A, and AmqNK5/6/7B, the estimated upstream intergenic regions range in size from 33 to 973 bp (Fig. 1; Appendix Table A1). This size r ange is similar for the non-homeobox genes in and flanking the Amphimedon NK cluster (227 to 1685 bp; Fig. 1; Appendix Table A2).
Appendix Table A 1 Orientation and length of upstream intergenic: regions of NK genes in Amphimedon (Amq), Nematostella (Nv), Lottia (Lg), Drosophila (Dm), C. elegans (Ce), Branchiostoma (Bf), and Ciona (Ci) Gene name JGI gene model Upstream Upstream gene intergenic orientation distance (bp) (bp) AmqNK2-4 + 190 AmqMsx + 758 AmqHex + 973 AmqTlx + 187 AmqNK5-7al + 33 AmqNK5-7a2 + 558 AmqNK5-7b + 317 NvNK2a gw. 243.78.1 - 7,000 NvNK2b estExt_GenewiseH_l.C_2430020 + 10,743 NvNK2c e_gw. 243.57.1 + 30,984 NvNK2d e_gw.243.50.1 - 7,000 NvNK3 fgenesbl_pg.scaffold_87000063 + 3,512 NvNK4 fgeneslil_pg.scaffold_87000064 - 20,976 NvMsx e_gw.6.245.1 - 17,077 NvHhex gw. 12.267.1 + 6,441 NvHD017 gw. 105.4.1 - 2,562 (Tlx) NvHD023 gw. 124.34.1 + 10,983 (Tlx) NvHD032 e_gw. 124.100.1 + 25,454 (Tlx) NvHD042 gw.124.101.1 + 3,220 (Tlx) NvHD043 fgeneshl_pg.scaffold_91000054 + 170 (Tlx) NvHD071 gw.124.99.1 + 12,033 (Tlx) NvHD076 gw.124.32.1 + 5.447 (Tlx) NvHD102 e_gw.91.2.l + 2,876 (Tlx) NvHDI47 gw.17.372.1 + 6,716 (Tlx) NvHmx (NK5) e_gw.6.279.1 - 18,209 NvNK6 e_gw.464.8.1 - 3,631 NvNK7 gw.464.14.1 - 11,289 LgNK2a gwl.45.125.1 - 33.879 LgNK2b gwl.29.306.1 + 22,920 LgNK3 gwl.21.46.1 + 47,718 LgNK4 e_gwl. 21.263.1 + 22,579 LgMsxa gwl.122.88.1 + 18,352 LgMsxb e_gw1.122.24.I - 22,282 LgMsxc gwl.122.87.1 + 19,195 LgHex e_gwl.63.126.1 - 996 LgTlxa fgenesh2_pg.C_sca_40000128 + 14,861 LgTkb e_gwl.40.172.1 + 12,770 LgNK5 gwl.263.3.1 + 31,538 LgNK6 e_gwl.l9.20.1 + 21,437 LgNK7 gwl.88.121.1 - 10,222 Dm Dr (Msx) + 39,903 Dm CG7056 + 2,353 (Hex) Dm tin (NK4) - 1,579 Dm bap (NK3) + 6,912 Dm vnd (NK2) + 4,820 Dm scro + 471 (NK2) DmC15 (Tlx) + 164 Dm Hmx (NK5) - 12,421 Dm HGTX - 19,474 (NK6) Dm NK7.1 + 7,432 Ce ceh-22 + 1,859 (NK2) Ce ceh-24 - 6,070 (NK2) Ce ceh-27 - 3,378 (NK2) Ce ceh-28 - 6,344 (NK4) Ce pha-2 + 5,526 (Hex) Ce vab-15 - 9,258 (Msx) Ce mls-2 + 5,652 (NK5) Ce + 19,763 cog-1(NK6) Ce ceh-9 + 6,046 (NK7) BfNK2a estExt_fgenesh2 Jcg.C_ 1360002 - 23,622 BfNK2b fgenesh2_pg.scaffold_136000026 + 2,474 BfNK2c fgenesh2_kg.scaffold_52000001 + 32,765 BiNK2d estExt_fgenesh2_Pg._C520073 - 17,035 BfNK3 esfExt_fgenesh2_pg.C_1430013 + 9,209 BfNK4 estExt_fgenesh2_Pg. C_1430014 - 12,523 BfMsxa estExt_gwp.C_540071 - 4,186 BfMsxb fgenesh2_pg. scaffold_56000154 - 17,323 BfMsxc fgenesh2_pg.scaffold_56000155 + 382 BfHexa e_gw.54.243.1 + 30,834 BfHexb e_gw.56.215.1 - 67,144 BfTlx fgenesh2_pg. scaffold_294000030 - 55,425 BfNK5 e_gw.406.46. 1 + 12,910 BfNK6 estExt_GenewiseH_1.C_2940047 - 21,848 BtNK7 gw.294.35.1 - 21,848 Ci-TTFl(NK2) esfExt_genewisel.C_chr_10q0483 + 7,166 Ci-NK4 esfExt_fgenesh3_kg.C_chr_O8q0171 - 9,808 Ci-msxb estExt_genewisel.C_chr_02ql758 + 2,376 Ci-Hex TC58909 (scaff_88) - 7,094 Ci-Tlx Ci0100l48267(chr_01q) + 3,709 Ci-NK5 ci0 100137765 (chr_08q) + 3,439 G-NK6 ci0100136284 (chr_05q) - 4,538 Table A2 Orientation and length of upstream intergenic regions of structural genes in Amphimedon (Amq), Nematostella (Nv), Lottia (Lg), Drosophila (Dm), C. elegans (Ce), Branchiostoma (Bf), and Ciona (Ci) Gene name * JGI gene model AmqKTF3B/C AmqTelrRes AmqVPS8 Amqlpp AmqKIFll NvKIF3B/C estExt_gwp.C_790037 NvTetrRes estExt_GenewiseH_l .C_40135 NvVPS8 fgeneshl_pg.scaffold_439000001 NvIppA fgeneshl_pg.scaffold_4000062 NvKIFl 1 estExt_fgeneshl_pg.C_40115 LgKIF3B/C estExt_Genewise 1 Plus.C_sca_l 900046 LgTetrRes gwl.21.72.1 LgVPS8 fgenesh2_pg.C_sca_30000085 LglppA estExt_GenewiselPlus.C_sca_190206 LgKIFl I estExt_fgenesh2_pg.C_sca_320118 Dm Klp68D-PA (KIF3B/C) Dm TetrRes Dm CG10144 (VSP8) DmCG31110(IPP) DmKlp61F-PA(KIFll) Ce F20C5.2a (KIF3B/C) Ce F10D7.2 (TetrRes) CeC42C1.4a Ce ipp-5 (IPP) CeCOlBlO.3 (IPP) Cebmk-I (KIF11) B[pounds sterling]KIF3B/Ca estExt_gwp.C_5080063 BfKIF3B/Cb estExt_fgenesh2_pg.CJ960003 BtTetrResa estExt_fgenesh2_pg.C_1430029 BfTetrResb estExt_gwp.C_4060030 BfVPS8a jgi|Braill |253940|e_gw.664.1.1 BtVPS8b BflppAa fgenesh2jjg.scaffold_294000035 BflppAb fgenesh2_pg.scattold_205000005 BfKIFlla estExt_fgenesh2_pg.C_1850009 BfKIFllb estExt_fgenesh2_pg.C_1670112 Ci KIF3B estExt_fgenesh3_kg.C_chr_01 q0056 Ci TetrRes fgenesh3_kg.C_chr_03q000092 Ci VPS8 fgenesii3_pg.C_chr_02q001219 Ci IppA fgenesh3_pm.C_chr_05q000005 CiKIFll e_gw 184.108.40.206 Gene name * Upstream gene orientation Upstream intergenic distance (bp) AmqKTF3B/C + 366 AmqTelrRes + 339 AmqVPS8 + 331 Amqlpp - 228 AmqKIFll - 1685 NvKIF3B/C + 1442 NvTetrRes - 379 NvVPS8 + 403 NvIppA - 379 NvKIFl 1 - 341 LgKIF3B/C - 2499 LgTetrRes + 10232 LgVPS8 - 928 LglppA - 5964 LgKIFl I + 2486 Dm Klp68D-PA (KIF3B/C) + 189 Dm TetrRes - 484 Dm CG10144 (VSP8) - 358 DmCG31110(IPP) - 1190 DmKlp61F-PA(KIFll) + 1127 Ce F20C5.2a (KIF3B/C) - 2823 Ce F10D7.2 (TetrRes) + 3693 CeC42C1.4a + 103 Ce ipp-5 (IPP) - 11533 CeCOlBlO.3 (IPP) + 3019 Cebmk-I (KIF11) - 585 B[pounds sterling]KIF3B/Ca - 590 BfKIF3B/Cb - 610 BtTetrResa - 25846 BfTetrResb + 21359 BfVPS8a + 4869 BtVPS8b - 20941 BflppAa - 797 BflppAb - 805 BfKIFlla - 142 BfKIFllb - 145 Ci KIF3B + 21809 Ci TetrRes - 1647 Ci VPS8 - 585 Ci IppA + 12889 CiKIFll + 17516 * TetrRes, tetracyclin resistance; Ipp, inositol phosphatase.
A similar high gene density has been observed by Breter et al. (2003) for other clusters of non-homeobox genes in demosponges. These authors deduced that this high gene density was typical of the genome of the sponges they were studying--Suberites domuncula and Geodia cydonium--and inferred that these species' genomes consisted of about 300,000 genes. While we estimate that the Amphimedon genome has at least an order of magnitude fewer genes, we do find that in general it consists of clusters of tightly packed genes (unpubl. data), suggesting that this may be a common feature of demosponge genomes.
[FIGURE 2 OMITTED]
Representative NK-related genes are expressed in simple patterns during Amphimedon embryogenesis
The small intergenic regions of the Amphimedon NK genes suggest that there might be limited regulatory information available to drive complex patterns of developmental gene expression. We addressed this supposition by analyzing the embryonic expression patterns of three representative genes: AmqNK2/3/4, AmqTlx, and AmqNK5/6/7B (Fig. 3). Although patterning processes appear to be less complex in this sponge than in bilaterians (Adamska et al., 2007a,b), gastrulation, patterning, and differentiation events in Amphimedon give rise to a radially symmetrical parenchymella larva with 3 cell layers, at least 11 differentiated cell types, and a tissue-like structure, the photoreceptor pigment ring (Leys and Degnan, 2002). AmqNK2/3/4 transcripts are first detected during cleavage in a small number of micromeres per embryo (Fig.3A, B). At gastrulation these localize to the inner cell mass, where they remain (Fig. 3C, D). Throughout development, the number of cells that highly express this gene seems to be limited. AmqTlx expression is not detected before the spot stage (late gastrulation) by whole-mount in situ hybridization (Fig. 3E-H). At this stage, expression is predominantly in cells surrounding the ring (Fig. 3E, F). Later in development, there is no evidence of expression around the ring. Instead, AmqTlx transcripts are restricted to cells of the inner cell mass (Fig. 3G, H). AmqNK5/6/7B appears to be expressed in a slightly more complex pattern, with two populations of micromeres expressing this gene through cleavage, gastrulation, and larva formation (Fig. 31-M). These cells are fated to the outer layer, which contains a limited number of cell types (Leys and Degnan, 2002). During ring formation, AmqNK5/6/7B is activated in cells underlying the ring (Fig. 3J-M) that are clearly distinct from the micromeric expression. From these in situ hybridization data, we estimate that these three NK genes are expressed in 2-4 cell types during embryogenesis. The expression of AmqNK5/6/7B and AmqTlx in the vicinity of the pigment spot and ring during the formation of the ring suggests that these genes are playing a role in patterning this structure.
[FIGURE 3 OMITTED]
Bilaterian NK genes are expressed in all three germ layers and play a major role in mesoderm and nervous system development (reviewed in Banerjee-Basu and Baxevanis, 2001; Jagla et al., 2001; Carroll et al., 2005; Garcia-Fernandez, 2005; Stanfel et al., Slack, 2006). Given that Amphimedon NK genes are expressed in cells that do not have clear eumetazoan homologs, it is difficult to infer what the role of these homeobox genes in the last common ancestor to sponges and eumetazoans might have been. Nonetheless, it does appear that the sponge NK genes are expressed in a limited number of cell types and developmental stages, in contrast to what has been observed in representative bilaterians. More generally, expression patterns of multiple Amphimedon transcription factors and signaling molecules suggest that these genes do not have a multitude of roles during Amphimedon development (e.g., they are expressed in what appear to be a limited number of cell lineages throughout embryogenesis; Larroux et al., 2006; Adamska et al., 2007a, b; unpubl. data), indicating that regulatory genes may be less pleiotropic in sponges than in bilaterians.
Although we have suggested that whole-mount in situ hybridization data may be used as a proxy to deduce the complexity of the regulatory information controlling the expression of a given gene, a number of important caveats are associated with this approach. First, we only investigated embryogenesis and thus do not know how these genes are expressed during metamorphosis, in the juvenile, or in the adult. New expression patterns at any of these stages of the Amphimedon life cycle will require additional cis-regulatory modules. Second, without accurate cell lineage data for Amphimedon embryos, we cannot exclude the possibility that the cell type we are inferring to be constant between stages is actually changing. Evidence from the developmental expression of another NK-related gene, AmqBsh (formerly RenBsh), indicates that a single cell type does express the gene throughout embryogenesis (Larroux et al., 2006). In this case, AmqBsh is expressed in cells fated to become sclerocytes, which produce spicules and have a distinctive morphology (Leys and Degnan, 2002) and thus can be easily traced.
Comparative genomics of genes in the sponge NK cluster
To test whether there is a significant difference in the sizes of the upstream intergenic region of the eumetazoan orthologs of homeobox and non-homeobox genes in the Amphimedon NK cluster, we analyzed publicly available gene models for the cnidarian Nematostella vectensis and representative bilaterians: the lophotrochozoan Lottia gigantea, the ecdyzosoans Caenorhabditis elegans and Drosophila melanogaster, and the deuterostomes Branchiostoma floridae and Ciona intestinalis (Fig. 4; Appendix Tables A1 and A2). These bilaterian organisms were chosen to span the three major superphyletic lineages: lophotrochozoan; ecdysozoan; and deuterostome. Whereas the genomes of Lottia (360 Mbp), Nematostella (450 Mbp), and Branchiostoma (600 Mbp) are at least twice the size of the Amphimedon genome (~ 167 Mbp), the Drosophila and Ciona genomes are of a similar size to the sponge genome (170-180 Mbp) and C. elegans has a smaller genome (97 Mbp). We chose these latter three genomes to test whether smaller intergenic regions were merely an effect of smaller genome sizes. Gene duplication in the eumetazoan lineage meant that there were often a number of orthologs in each species for a given Amphimedon NK gene (Appendix Table A1).
We compared the intergenic lengths for each of the NK gene families. Where there is more than one ortholog, averages and standard deviations are shown (Fig. 4A). Comparison of these regions reveals that the Amphimedon NK genes have consistently smaller upstream intergenic regions than the orthologous genes from representative bilaterians. In total, the Amphimedon NK genes have significantly smaller intergenic regions compared to those from eumetazoans (see paired values for the Student's t-test in Table 1). As this is also the case for C. elegans, Drosophila, and Ciona, which have genomes that are smaller than or similar in size to those of Amphimedon, this trend does not appear to be a function of genome size but rather to reflect a fundamental difference between Amphimedon and these eumetazoan representatives. Although the Nematostella NK genes have generally smaller intergenic regions than those of their bilaterian orthologs, in total there is no significant difference (Fig. 4A; see paired t-test values in Table 1).
Table 1 Paired t-test values and significance levels: Amphimedon (Amq), Nematostella (Nv), Lottia (Lg), Drosophila (Dm), C. elegans (Ce), Branchiostoma (Bf), and Ciona (Ci) Homeodomain Comparison t degrees of freedom significance level Amq vs. Nv 4.24 25 0.01 Amq vs. Dm 2.36 15 0.05 * Amq vs. Ce 3.86 14 0.01 Amq vs. Lg 5.85 8 0.01 Amq vs. Bf 4.42 20 0.01 Amq vs. Ci 4.98 12 0.01 Nv vs. Dm 0.91 28 NS Nv vs. Ce 0.59 27 NS Nv vs. Lg 3.82 31 0.01 Nv vs. Bf 3.19 33 0.01 Nv vs. Ci 0.24 25 NS Dm vs. Ce 0.58 17 NS Dm vs. Lg 1.84 21 0.1* Dm vs. Bf 1.97 23 0.1* Dm vs. Ci 1.03 15 NS Ce vs. Lg 3.26 20 0.01 Ce vs. Bf 2.86 22 0.01 Ce vs. Ci 0.83 14 NS Lg vs. Bf 0.53 26 NS Lg vs. Ci 4.04 18 0.01 Bf vs. Ci 3.31 20 0.01 Non-Homeodomain Comparison t degrees of freedom significance level Amq vs. Nv 0.003 8 NS Amq vs. Dm 0.23 8 NS Amq vs. Ce 1.78 9 NS Amq vs. Lg 2.26 8 0.1 Amq vs. Bf 2.09 13 0.1 Amq vs. Ci 1.78 8 NS Nv vs. Dm 0.27 8 NS Nv vs. Ce 1.79 9 NS Nv vs. Lg 2.28 8 0.1 Nv vs. Bf 2.09 13 0.1 Nv vs. Ci 1.78 8 NS Dm vs. Ce 1.74 9 NS Dm vs. Lg 2.23 8 0.1 Dm vs. Bf 2.07 13 0.1 Dm vs. Ci 1.77 8 NS Ce vs. Lg 0.34 9 NS Ce vs. Bf 1.06 14 NS Ce vs. Ci 1.49 9 NS Lg vs. Bf 0.85 13 NS Lg vs. Ci 1.42 8 NS Bf vs. Ci 1.10 13 NS NS, not significant. * Due to high variance in Dm dataset.
On the whole, eumetazoan orthologs of the five structural genes linked to the Amphimedon NK cluster have smaller upstream intergenic distances than the NK ortholog (Fig. 4B; Appendix Table A2). Taken together, the distances of Amphimedon structural genes are not significantly different than those of eumetazoans (Fig. 4; see paired t-test values in Table 1). These data are compatible with a difference in proximal regulatory capacity between sponges and eumetazoans occurring specifically in genes belonging to developmental GRNs rather than being a genome-wide trend. Sponge regulatory genes thus appear to possess simpler proximal cis-regulatory control sequences upstream of their transcript start site than their eumetazoan counterparts. Clearly, the supposition that the Amphimedon genome houses less regulatory information per developmental gene than eumetazoan genomes needs to be investigated on a much larger scale, with the inclusion of data from a wider sample of gene classes and families. At this stage, equally parsimonious interpretations of the data presented here are that (i) there has been a compaction and simplification of upstream regulatory region DNA in the sponge lineage, and (ii) a bulk of cis-regulatory DNA is housed elsewhere in the genome. Whether there has been expansion of intergenic region in the eumetazoan lineage or reduction in the sponge lineage, there appears to be an interesting correlation between the size of intergenic DNA and morphological diversity and complexity. This observation supports genome-wide studies undertaken in Drosophila and C. elegans that reveal that the regulatory complexity of a given gene correlates positively with the amount of surrounding intergenic DNA (Nelson et al., 2004).
[FIGURE 4 OMITTED]
The evolution of a developmental gene regulatory network (GRN) may have been one of the prerequisites for the evolution of multicellularity and embryogenesis. Indeed, comparison of early-branching metazoan genomes reveals that the components required to assemble basic GRN--that is, conserved transcription factors and signaling pathways--evolved prior to metazoan cladogenesis. The duplication and divergence of genes encoding signaling pathway components and transcription factors early in eumetazoan evolution apparently allowed for ancient networks to expand and evolve, and for new networks to form. In this study, we have emphasized another mechanism by which simple GRNs may have increased in complexity during the transition from an ancestral metazoan to a eumetazoan grade of organization. Increases in the size and complexity of the cis-regulatory regions required to direct expression of GRN components may have allowed for the evolution of developmental gene pleiotropy by increasing the number of ontogenetic contexts in which a given gene is employed. Using a handful of NK genes clustered in the genome of the sponge Amphimedon, we provide support for the proposal that an increase in proximal, upstream cis-regulatory information of genes belonging to developmental GRNs occurred in the stem leading to the eumetazoan last common ancestor. Because this study provides no measure of cis-regulatory information housed elsewhere in the genome, we recognise that direct measures of intergenic region size may be at best a general indicator of regulatory complexity (cf. Nelson et al., 2004). Nonetheless, the relatively simple expression patterns of the Amphimedon NK genes during embryogenesis are compatible with the idea that these genes have limited regulatory information. Given that this is a single case study that is based on a simple comparison of upstream intergenic sizes, further comparisons and experiments are required before we can include cis-regulatory expansion as a primary force in the early evolution of metazoan morphological complexity.
We gratefully acknowledge the significant contribution and support of The US Department of Energy Joint Genome Institute in the production of Amphimedon (Reniera) genomic and EST sequences used in this study through the Community Sequencing Program. The research was supported by grants from the Australian Research Council to B.M.D.
Bryony Fahey *, Claire Larroux *, Ben J. Woodcroft *, AND Bernard M. Degnan [dagger]
School of Integrative Biology, University of Queensland, Brisbane QLD 4072, Australia
Received 7 January 2008; accepted 11 March 2008.
* These authors contributed equally to this paper.
[dagger] To whom correspondence should be addressed. E-mail: email@example.com
Abbreviations: GRN, gene regulatory network.
Abril, J. F., and R. Guigo. 2000. gff2ps: visualizing genomic annotations. Bioinformatics 16: 743-744.
Adamska, M., S. M. Degnan, K. M. Green, M. Adamski, A. Craigie, C. Larroux, and B. M. Degnan. 2007a. Wnt and TGF-[beta] expression in the sponge Amphimedon queenslandica and the origin of metazoan embryonic patterning. PLoS ONE 2: el031.
Adamska, M., D. Q. Matus, M. Adamski, K. M. Green, M. Q. Martindale, and B. M. Degnan. 2007b. The evolutionary origin of hedgehog proteins. Curr. Biol. 17: R836-837.
Banerjee-Basu, S., and A. D. Baxevanis. 2001. Molecular evolution of the homeodomain family of transcription factors. Nucleic Acids Res. 29: 3258-3269.
Borchiellini, C., M. Manuel, E. Alivon, N. Boury-Esnault, J. Vacelet, and Y. Le Parco. 2001. Sponge paraphyly and the origin of Metazoa. J. Evol. Biol. 14: 171-179.
Breter, H. J., V. A. Grebenjuk, A. Skorokhod, and W. E. G. Muller. 2003. Approaches for a sustainable use of the bioactive potential in sponges: analysis of gene clusters, differential display of mRNA and DNA chips. Pp. 199-230 in Sponges: (Porifera), W. E. G. Muller, ed. Springer, Berlin.
Burger, G., L. Forget, Y. Zhu, M. W. Gray, and B. F. Lang. 2003. Unique mitochondrial genome architecture in unicellular relatives of animals. Proc. Natl. Acad. Sci. USA 100: 892-897.
Carroll, S. B., J. K. Grenier, and S. D. Weatherbee. 2005. From DNA to Diversity: Molecular Genetics and the Evolution, of Animal Design. Blackwell Science, Malden, MA.
Cavalier-Smith, T., and E. E. Y. Chao. 2003. Phylogeny of choanozo, apusozoaoa, and other protozoa and early eukaryote megaevolution. J. Mol. Evol 56: 540-563.
Cavalier-Smith, T., E. E. Chao, N. Boury-Esnault, and J. Vacelet. 1996. Sponge phylogeny, animal monophyly, and the origin of the nervous system: 18S rRNA evidence. Can. J. Zool. 74: 2031-2045.
Chourrout, D., F. Delsuc, P. Chourrout, R. B. Edvardsen, F. Rentzsch, E. Renfer, M. F. Jensen, B. Zhu, P. de Jong, R. E. Steele, and U. Technau. 2006. Minimal ProtoHox cluster inferred from bilaterian and cnidarian Hox complements. Nature 442: 684-687.
Collins, A. G. 2002. Phylogeny of Medusozoa and the evolution of cnidarian life cycles. J. Evol. Biol. 15: 418-432.
Davidson, E. H. 2006. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution. Academic Press, New York.
Degnan, S. M., and B. M. Degnan. 2006. The origin of the pelagobenthic metazoan life cycle: what's sex got to do with it? Integr. Comp. Biol. 46: 683-690.
Degnan, B. M., S. P. Leys, and C. Larroux. 2005. Sponge development and antiquity of animal pattern formation. Integr. Comp. Biol. 45: 335-341.
Garcia-Fernandez, J. 2005. The genesis and evolution of homeobox gene clusters. Nat. Rev. Genet. 6: 881-892.
Gauthier, M., andB. M. Degnan. 2008. The transcription factor NF-kB in the demosponge Amphimedon queenslandica: insights on the evolutionary origin of the Rel homology domain. Dev. Genes Evol 218: 23-32.
Jagla, K., M. Bellard, and M. Frasch. 2001. A cluster of Drosophila homeobox genes involved in mesoderm differentiation programs. BioEssays 23: 125-133.
Kamm, K., B. Schierwater, W. Jakob, S. L. Dellaporta, and D. J. Miller. 2006. Axial patterning and diversification in the Cnidaria predate the Hox. system. Curr. Biol. 16: 1-7.
King, N. 2004. The unicellular ancestry of animal development. Dev. Cell 7: 313-325. King, N., and S. B. Carroll. 2001. A receptor tyrosine kinase from choanoilagcliates: molecular insights into early animal evolution. Proc: Natl. Acad. Sci. USA 98: 15032-15037.
King, N., C. T. Hittinger, and S. B. Carroll. 2003. Evolution of key cell signaling and adhesion protein families predates animal origins. Science 301: 361-363.
King, N., M. J. Westbrook, S. L. Young, A. Kuo, M. Abedin, J. Chapman, S. Fairclough, U. Hellsten, Y. Isogai, 1. Letunic, et at. 2008.
The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature 451: 783-788.
Kusserow, A., K. Pang, C. Sturm, M. Hrouda, J. Lentfer, H. A. Schmidt, U. Technau, A. von Haeseler, B. Hobmayer, M. Q. Martindale, and T. W. Holstein. 2005. Unexpected complexity of the Wnt gene family in a sea anemone. Nature 433: 156-160.
Lang, B. F., C. O'Kelly, T. Nerad, M. W. Gray, and G. Burger. 2002. The closest unicellular relatives of animals. Curr. Biol. 12: 1773-1778.
Larroux, C, B. Fahey, D. Liubicich, V. F. Hinman, M. Gauthier, M. Gongora, K. Green, G. Worheide, S. P. Leys, and B. M. Degnan. 2006. Developmental expression of transcription factor genes in a demosponge: insights into the origin of metazoan multicellularity. Evol. Dev. 8: 150-173.
Larroux, C, B. Fahey, S. M. Degnan, M. Adamski, D. S. Rokhsar, and B. M. Degnan. 2007. The NK homeobox gene cluster predates the origin of Hox genes. Curr. Biol. 17: 706-710.
Larroux, C, G. N. Luke, P. Koopman, D. S. Rokhsar, S. M. Shimeld, and B. M. Degnan. 2008. Genesis and expansion of metazoan transcription factor gene classes. Mol. Biol. Evol 25: 980-996.
Leys, S. P., and B. M. Degnan. 2002. Embryogenesis and metamorphosis in a haplosclerid demosponge: gastrulation and transdifferentiation of larval ciliated cells to choanocytes. Invertebr. Biol. 121: 171-189.
Li, C. W., J. Y. Chen, and T. E. Hua. 1998. Precanbrian sponges with cellular structures. Science 279: 879-882.
Luke, G. N., L. F. C. Castro, K. MeLay, C. Bird, A. Coutson, and P. W. H. Holland. 2003. Dispersal of NK homeobox gene clusters in amphioxus and humans. Proc. Natl. Acad. Sci. USA 100: 5292-5295.
Magie, C. R., K. Pang, and M. Q. Martindale. 2005. Genomic inventory and expression of Sox and Fox genes in the cnidarian Ne-matostella vectensis. Dev. Genes Evol. 215: 618-630.
Medina, M., A. G. Collins, J. D. Silberman, and M. L. Sogin. 2001. Evaluating hypotheses of basal animal phylogeny using complete sequences of large and small subunit rRNA. Proc. Natl. Acad. Sci. USA 98: 9707-9712.
Miller, D. J., E. E. Ball, and U. Technau. 2005. Cnidarians and ancestral genetic complexify in the animal kingdom. Trends Genet. 21: 536-539.
Nelson, C. E., B. M. Hersh, and S. B. Carroll. 2004. The regulatory content of intergenic DNA shapes genome architecture. Genome Biol. 5: R25.
Nichols, S. A., W. Dirks, J. S. Pearse, and N. King. 2006. Barly evolution of animal cell signaling and adhesion genes. Proc. Natl. Acad. Sci. USA 103: 12451-12456.
Putnam, N., M. Srivastava, U. Hellsten, B. Dirks, J. Chapman, A. Salamov, A. Terry, H. Shapiro, E. Lindquist, V. V. Kapitonov, et al. 2007. Sea anemone genome reveals ancestral euineu.i/oan gene repertoire and genomic organization. Science M'/: 85-94.
Ryan, J. F., P. M. Burton, M. E. Mazza, G. K. Kwong, J. C. Mullikin, and J. R. Finnerty. 2006. The cnidarian-bilaterian ancestor possessed at least 56 homeoboxes. Evidence from the starlet sea anemone, Nematostella vectensis. Genome Biol. 7. R64.
Simionato, E., V. Ledent, G. Richards, M. Thomas-Chollier, P. Kerner, D. Coornaert, B. M. Degnan, and M. Vervoort. 2007. Origin and diversification of the basic helix-loop-helix gene family in metazoans: insights from comparative genomics. BMC Evol. Biol. 7: 33.
Simpson, T. L. 1984. The Cell Biology of Sponges. Springer-Verlag, New York.
Slack, J. M. W. 2006. Essential Developmental Biology. Blackwell Publishing, Maiden, MA.
Snell, E. A., R. F. Furlong, and P. W. H. Holland. 2001. Hsp70 sequences indicate that choanoflagellates are closely related to animals. Curr. Biol. 11: 967-970.
Stanfel, M. N., K. A. Moses, R. J. Schwartz, and W. E. Zimmer. 2005. Regulation of organ development by the Nkx-homendomain factors: an Nkx code. Cell. Mol. Biol. 51: OL785-OL799.
Stanke, M., O. Schoffmann, B. Morgenstern, and S. Waaek. 2006. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7: 62.
Steenkamp, E. T., J. Wright, and S. L. Baldauf. 2006. The protistan origins of animals and fungi. Mol. Biol. Evol. 23: 93-106.
Wallberg, A., M. Thollesson, J. S. Farris, and U. Jondelius. 2004. The phylogenetic position of the comb jellies (Ctenophora) and the importance of laxonomic sampling. Cladistics 20:558-578.
Yamada, A., K. Pang, M. Q. Martindale, and S. Tochinai. 2007. Surprisingly complex T-box gene complement in diploblastic metazoans. Evol. Dev. 9: 220-230.
|Printer friendly Cite/link Email Feedback|
|Author:||Fahey, Bryony; Larroux, Claire; Woodcroft, Ben J.; Degnan, Bernard M.|
|Publication:||The Biological Bulletin|
|Date:||Jun 1, 2008|
|Previous Article:||Biological Bulletin virtual symposium: genomics of large marine metazoans.|
|Next Article:||Cell-cell adhesion in the cnidaria: insights into the evolution of tissue morphogenesis.|