A demonstration of the H3 trimethylation ChIP-seq analysis of galline follicular mesenchymal cells and male germ cells.
Functional cells of multi-cellular organisms maintain their mutual state by adopting specific gene expression patterns. They hereby fix their transcriptional activities by imprinting a particular chromatin structure into genome to allocate and control target gene regions-- also recognized as the chromatin packaging . In eukaryotic cell, a chromatin consists of several units of 147 base pairs DNA twining around nucleosome formed by histone octamer--which is the molecule responsible for the chromatin packaging process. A histone octamer comprised of four basic core histones--H2A, H2B, H3, and H4 which are targets for histone modification--methylation, phosphorylation and acetylation. When a precursor cell is committed a lineage-specific development, it modifies particular N-termini amino acids of specific core histones. This event thus results in chromatin packaging to control gene a transcription pattern specific to its lineage-committed cell [1,2].
Different histone modification can relax or condense chromatin structures differently. Since each specific chromatin structure can affect the accessibility of gene transcription factors and RNA polymerase II differently at a gene promoter region, they will result in a gene transcription pattern specific to the lineage-committed cell [1-3]. Among well-recognized histone modifications, trimethyl modification of histone 3 (H3) at 4th lysine N-termini (H3K4me3) is very common in active gene transcriptions of several lineage-committed cells [4,5]. This phenomenon, however cannot conclusively comply with several genes in stem cells and germ cells, of which H3K4me3 regions are presented without transcription. Interestingly, such genes always demonstrate not only trimethylation of H3 on 4th lysine (H3K4me3), but also the 27th lysine (H3K27me3) in the same loci of N-termini--the bivalent H3K4m3/ H3K27me3 chromatin or poised chromatin [2,6,7].
In the embryonic stem cells, the presence of H3K4me3 in the poised gene helps marking the gene as the active one. However, the concurrent H3K27me3 repressed the transcription and thus guarantees the cells' pluripotent state [6,7]. Actually, such characteristics were proved to be inherited from germ cells even prior to fertilization. According to this reason, the study of H3K4me3/H3K27me3 bivalent (poised) epigenetic state in male germ cells became a very popular topic in epigenetic evolution studies [2,8].
Chromatin immunoprecipitation (ChIP) assays with sequencing or ChIP-Seq is a sequencing technology generally utilized to study genome-wide H3 trimethylation--H3K4me3 and H3K27me3. In ChIP-Seq, the chromatin regions presented with modified histones of interest are immunoprecipitated and sequenced to study epigenetic control of gene expressions in several cell types [2,5,9,10]. Not only mammals, ChIP-Seq of H3 trimethylation were also conducted in chicken (Gallus gallus) as the representative of galline species [2,11-13]. These studies provided biologists and veterinarians the valuable insights into evolution and cell development of galline species. Of note, an increase of H3K4me3 and H3K27me3 ChIP-seq databases in chicken also allowed both biologists and veterinarians a new opportunity to observe and meta-analyze ChlP-seq data of various cell types acquired from different experiments.
Analytical guidelines are generally concerned in human and mouse ChIP-Seq studies . On the contrary, the issue was limitedly described in H3 trimethylation ChIP-Seq data analysis in galline species [2,12,13]. Since the procedures could affect the analytical results differently, a practical demonstration of such approach should prove beneficial, especially for those who are not acquainted with ChIP-Seq data analysis. Due to such necessity, we therefore demonstrated a chicken ChIP-seq meta-analysis in this study. The H3K4me3 ChIP-Seq data of chicken follicular mesenchymal cells and male germ cells were acquired, and we also included H3K27me3 ChIP-Seq data of chicken germ cells to demonstrate the effect of bivalent H3K4me3/H3K27me3 on gene transcription.
MATERIALS AND METHODS
As far as we knew, the follicular mesenchymal cell was the only lineage-committed cell type with H3 trimethylation ChIP-seq data available in public database. A list of follicular mesenchymal cell and germ cell datasets used in this study is provided (Table 1). Follicular mesenchymal cells used in this study consisted of two populations isolated from median (Med) and lateral mesenchyme (Lat) of feather follicles, accordingly. Male germ cells used in this study derived from two time points during development: pachytene spermatocyte (PS) acquired during prophase of meiosis I, and round spermatid (RS) acquired after completing meiosis. Both germ cell types were representatives of male germ cells to illustrate the poised genes--where their poised states were already known to be retained throughout spermatogenic development . For ChIP-seq data, the sample file and its corresponding control file were provided together. Of note, the control file was not required for RNA-seq data analyses, and thus not provided.
In this study, we categorized our analytical processes into 3 steps--data preprocessing, peak calling with visualization, and integrative genomics viewer (IGV) exploration (Figure 1). In brief, the sample datasets would be aligned to chicken genome in order to acquire ChIP files and RNA-seq files in the data preprocessing step (Figure 1A). The ChIP files were then applied in peak calling step to identify genome regions enriched with H3 trimethylation--the significant peaks (Figure 1B). Only peaks overlapped with gene promoter region would be considered in IGV exploration of which, the enriched peaks along with the mapped sequences ChIP and RNA-seq files were integrated and visualized together in IGV platform (Figure 1C).
Data preprocessing: The sequence read archive (SRA) files of germ cells and follicular mensenchymal ChIP-Seq and RNA-Seq datasets were retrieved from SRA database (https://www. ncbi.nlm.nih.gov/sra) (Table 1), subsequently extracted, and preprocessed as previously described . The ChIP files were aligned to chicken genome (Gallus gaUus 5.0) by Bowtie2 aligner , while STAR aligner was applied with the RNA-seq files [14,16]. The aligned files were converted into "bam" format, indexed by gene annotation, removed for duplicated sequences , and bias corrected for guanine-cytosine base balance  to acquire ChIP files and RNA-seq files. The qualities of both raw and preprocessed datasets were then determined by FastQC as previously described . For ChIP quality determination, normalized strand cross-correlation coefficient (NSC), relative strand cross-correlation coefficient (RSC) and DNA fragment length were additionally included. In more details, NSC is genome-wide correlation between positive and negative strand read counts when shifted by half of the fragment length relative to background. NSC value thus represents enrichment of clustered ChIP fragments around target sites. RSC is the relative enrichment of fragment-length cross-correlation to read-length cross-correlation. The RSC thus represented the clustering of relatively fixed sized fragment around target sites.
Peak calling with visualization: Peak calling was, in case of this study, the computational method to identify genomewide chromatin regions with targeted H3 trimethylation significant to the other regions. The peaks therefore represented the genome area with significant presences of H3K4me3 or H3K27me3 when compared to the same region presented in the corresponding control datasets (Table 1). Model-based analysis of ChIP-seq 2 (MACS2) algorithm  was used in this study. Since the broad peak regions were suggested in the previous quality analysis, we applied broad peak calling function in MACS2 with all ChIP files. Further details for such function calling was already described . Significant peaks were those with false discovery rate (FDR) [less than or equal to] 0.01 and Foldchange [greater than or equal to] 2 presented with chicken gene promoter regions (2,000 base pairs upstream to 200 base pairs downstream of gene promoter) (Peak files in Figure 1B). Peak files were also utilized for counting aggregated DNA within peak region, and counting the DNA fragments distributed from the transcription start site of chicken genes (Figure 2B).
Integrative genomics viewer (IGV) exploration: The peak files along with their corresponding ChIP files and RNA-seq files were integratively observed together in IGV platform (Figure 1C). The Gal_gal5.0 genome (UCSC version galGal5 released in December 2015) was administrated as the reference genome. The gal_Gal5.0 consisted of 34 chromosomes, 1 linkage group and 15,411 unplaced scaffolds. Further details of IGV platform can be acquired online (http://software. broadinstitute.org/software/igv/). In brief, DNA aggregations along the genome of sample ChIP files were expressed by blue heatmap. Similarly, the density of transcript fragments from RNA-seq files acquired from the corresponding cell sample were expressed by red heatmap. Peak regions of each cell type were also included in the figure the aggregated DNA fragments from ChIP files. To demonstrate bivalent H3K4me3/ H3K27me3 chromatin of germ cell samples, cellular retinoic acid binding protein 1 (CRABP1), growth differentiation factor 10 (GDF10), and gremlin 1 (GREM1) genes were chosen for illustration (Figure 4). We select such genes due to the fact that transcription of these genes was crucial in topological control of chicken feather's generation of follicular mesenchyme .
All sample datasets were successfully preprocessed and aligned to chicken genome
The raw datasets of all cell types were retrieved and preprocessed according to the methodology (Figure 1A). All preprocessed datasets (both ChIP-seq and RNA-seq data in Table 1) acquired quality scores across all bases [greater than or equal to]30 without presence of contaminated adapters. Alignment percentages of ChIP-seq and RNA-seq datasets were [greater than or equal to]85% and [greater than or equal to]70%, accordingly. NSC, RSC, and DNA fragment length of ChIP-seq samples are provided in Table 2.
Follicular mesenchymal cells and germ cells shared several gene promoter regions with H3K4me3 and H3K27me3
As previously described in materials and methods, the peaks were the chromatin loci in gene promoter regions enriched with target H3 trimethylations (H3K4me3 or H3K27me3). When categorized the acquired peaks according to their localized chromosomes, chromosome 1 contained the largest number of peaks comparing to the others (chr1 in Figure 2A).
Most of aggregated DNA fragments of ChIP files were found distributed within 0-1 kilobases from the transcription start sites in all cell types (Figure 2B). The results indicated that that most H3K4me3 peaks (5,704 peaks) were uniquely shared among follicular mesenchymal cells and germ cells (lat_K4, med_K4, PS_K4, and RS_K4), and the second largest
intersected peaks were uniquely shared among all H3K4me3 and H3K27me3 (1,909 peaks) (Figure 2A, 3).
CRABP1, GDF10, and GREM1 genes of chicken germ cells adopted the bivalent H3 trimethylation chromatin pattern with presence of transcription's inhibition
Follicular mesenchymal cells (Lat RNA-seq and Med RNAseq in Figure 4) expressed CRABP1, GDF10, and GREM1 genes higher than germ cells (PS RNA-seq and RS RNA-seq in Figure 4). Of note, the bivalent H3K4me3/H3K27me3 pattern near transcription start site of these gene in germ cells was demonstrated by the significant peak regions (PS_K4 Peak, PS_K27 Peak, RS_K4 Peak, and RS_K27 Peak in Figure 4) with presence of remarkable ChIP-seq DNA aggregations (PS_K4 ChIP-seq, PS_K27 ChIP-seq, RS_K4 ChIP-seq, and RS_K27 ChIP-seq in Figure 4).
Data quality was among the most critical issues in ChIP-seq analysis due to its later effect on measurement and comparison. According to this reason, we preprocessed the ChIP-seq data to acquire adequate quality and alignment percentages. Another crucial issue in ChIP-seq quality was the determination of signal-to-noise ratio represented by NSC and RSC values. ChIP-seq data with few genuine binding site regions usually rendered high NSC ([greater than or equal to]1.5) and RSC ([greater than or equal to]0.8), accordingly. However, marks that tend to be enriched at repeat-like regions with diffused genome-wide patterns like H3 trimethylation usually show lower NSC and RSC values--which also presented in our study [10,11] (Table 2).
Since H3 trimethylation (H3K4me3 or H3K27me3) at gene promoter regions were strongly associated with gene transcription control [4,5,12], only peaks within gene promoter region were hereby considered in this study. Inheritance mechanism of H3K4me3/H3K27me3 bivalent state from germ cells to embryonic stem cells, and activation mechanism of lineage-specific genes in lineage-committed cells were comprehensively described elsewhere , and are not included in this discussion. According to the result, most H3 trimethylation peaks in both follicular mesenchymal cells and germ cells were located in chromosome 1--agreeing with the largest gene numbers in the chromosome (2,162 coding genes and 524 non-coding genes) (chr1 in Figure 2). Interestingly, the largest number of intersected H3K4me3 peaks among ChIPseq samples (orange bar in Figure 3) implied the conserved roleof H3K4me3among chicken germ cells and mesenchymal cells. Of note, shared presented among H3K4me3 and H3K27me3 (red bar in Figure 3) could only partially imply several unique genes controlled by H3K4me3/H3K27me3 bivalent state in chicken germ cells due to the lack of H3K27me3 ChIP-Seq datasets from analysis.
To demonstrate the effect of H3K4me3/H3K27me3 bivalent state on transcription inhibition of germ cells [2,7], we chose CRABP1, GDF10, and GREM1 as candidate genes due to their redundant lineage-specific transcriptions in follicular mesenchymal cells . Visualization by IGV properly demonstrated such lineage-specific characteristics in follicular mesenchymal cells. On the contrary, inhibition of the gene transcription was obvious in germ cells with a significant presence of H3K4me3/H3K27me3 (poised) in the gene promoter regions as expected (Figure 4). As described in the introduction, such a poised state of genes in germ cells allowed them to inherit pluripotency to the later developed embryonoic stem cells. The observatory procedure in this study hereby successfully demonstrated the inhibitory effect of H3K4me3/H3K27me3 bivalent (poised) state on follicular mesenchyme-specific genes in chicken germ cells. By mean of this, other poised genes of interest could also be observed by adopting the same approach.
In conclusion, the current study provided an example for H3 trimethylation ChIP-seq analysis of chicken (Gallus gallus) cells. Since the methodology required no specific process, ChIP-seq data of other histone modifications could also be analyzed using similar approach. There were, however some important limitations that should be noted in this study. These included the lack of H3K27me3 ChIP-seq data of follicular mesenchymal cells to confirm abortion of poised genes, and limited samples per cell type available for valid differential analysis . Hopefully, the rapid growth of galline ChIP-seq database would allow us to demonstrate such analysis in our future study.
CONFLICT OF INTEREST
We certify that there is no conflict of interest with any financial organization regarding the material discussed in the manuscript.
The current study was fully supported by Thailand Research Fund (TRF) through New Research Scholar Program (Grant No. TRG5880003).
[1.] Feil R, Khosla S. Genomic imprinting in mammals: an interplay between chromatin and DNA methylation? Trends Genet 1999;15:431-5.
[2.] Lesch BJ, Silber SJ, McCarrey JR, Page DC. Parallel evolution of male germline epigenetic poising and somatic development in animals. Nat Genet 2016;48:888-94.
[3.] Waclawska A, Kurpisz M. Key functional genes of spermatogenesis identified by microarray analysis. Syst Biol Reprod Med 2012;58:229-35.
[4.] Guenther MG, Levine SS, Boyer LA, Jaenisch R, Young RA. A chromatin landmark and transcription initiation at most promoters in human cells. Cell 2007;130:77-88.
[5.] Zhang B, Zheng H, Huang B, et al. Allelic reprogramming of the histone modification H3K4me3 in early mammalian development. Nature 2016;537(7621):553-7.
[6.] Azuara V, Perry P, Sauer S, et al. Chromatin signatures of pluripotent cell lines. Nat Cell Biol 2006;8:532-8.
[7.] Mikkelsen TS, Ku M, Jaffe DB, et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 2007;448(7153):553-60.
[8.] Villar D, Berthelot C, Aldridge S, et al. Enhancer evolution across 20 mammalian species. Cell 2015;160:554-66.
[9.] Eissenberg JC, Shilatifard A. Histone H3 lysine 4 (H3K4) methylation in development and differentiation. Dev Biol 2010;339:240-9.
[10.] Landt S, Marinov G. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 2012;22:1813-31.
[11.] Li A, Figueroa S, Jiang T-X, et al. Diverse feather shape evolution enabled by coupling anisotropic signalling modules with self-organizing branching programme. Nat Commun 2017;8: ncomms14139.
[12.] Jahan S, Xu W, He S, et al. The chicken erythrocyte epigenome. Epigenetics Chromatin 2016;9:19.
[13.] Luo J, Mitra A, Tian F, et al. Histone methylation analysis and pathway predictions in chickens after MDV infection. PLoS One 2012;7:e41849.
[14.] Chokeshaiusaha K, Thanawongnuwech R, Puthier D, Nguyen C. Inspection of C-type lectin superfamily expression profile in chicken and mouse dendritic cells. Thai J Vet Med 2016;46: 443-53.
[15.] Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012;9:357-9.
[16.] Dobin A, Davis CA, Schlesinger F, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 2013;29:15-21.
[17.] Li H, Handsaker B, Wysoker A, et al. The sequence alignment/ map format and SAMtools. Bioinformatics 2009;25:2078-9.
[18.] Ramirez F, Dundar F, Diehl S, Gruning BA, Manke T. Deep Tools: A flexible platform for exploring deep-sequencing data. Nucleic Acids Res 2014;42 (Web Server issue):W187-91.
[19.] Feng J, Liu T, Qin B, Zhang Y, Liu XS. Identifying ChIP-seq enrichment using MACS. Nat Protoc 2012;7:1728-40.
[20.] Vastenhouw NL, Schier AF. Bivalent histone modifications in early embryogenesis. Curr Opin Cell Biol 2012;24:374-86.
[21.] Ross-Innes CS, Stark R, Teschendorff AE, et al. Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature 2012;184(7381):389-93.
Kaj Chokeshaiusaha (1), Denis Puthier (2), Catherine Nguyen (2), and Thanida Sananmuang (1), *
* Corresponding Author: Thanida Sananmuang Tel: + 66-38358137, Fax: +66-38358141,
(1) Faculty of Veterinary Medicine, Rajamangala University of Technology Tawan-OK 43 Moo 6 Bangpra, Sriracha Chonburi 20110, Thailand
(2) Laboratoire TAGC/INSERM U1090, Parc Scientifique de Luminy case 928, 163, avenue de Luminy, Marseille cedex 09 13288, France
Submitted Oct 9, 2017; Revised Nov 23, 2017; Accepted Jan 10, 2018
Caption: Figure 1. Analytical workflow for this study. Analytical processes were categorized into 3 steps--data preprocessing (A), peak calling with visualization (B), and integrative genomics viewer (IGV) exploration (C). The files required and outcome for each analytical step are provided.
Caption: Figure 2. H3 trimethylation loci. The total number of peaks significantly identified were categorized according to their regionalchromosomes(A).Each color bar presented in the legend was described as following: [red], peaks shared among all samples with either K4 or K27 trimethylation (AII_shared_K4_and_K27); [orange], peaks presented only among samples with K4 trimethylation (AII_shared_K4_but_not_K27);, peaks presented only among samples with K27 trimethylation (AII_shared_K27_but_not_K4); [green], peaks presented only in Lat_K4 (Unigue_Lat_K4); [blue], peaks presented only in Med_K4 (Unigue_Med_K4); [pink], peaks presented only in PS_K4 (Unigue_PS_K4); [porchae], peaks presented only in RS_K4 (Unigue_RS_K4);, peaks presented only in PS_K27 (Unigue_PS_K27); , peaks presented only in RS_K27 (Unigue_RS_K27), and [light green], other peaks not in the previously mentioned groups. Distribution of precipitated DNA frarpento acguired front sample ChIP-teg data according to chicken gene transcription start site was also demonstrated (B). Aggregation of fragments within 0-1 kilobases  and 1-3 kilobases  are presented.
Caption: Figure 3. UpSet plot of intersected peaks among cell types. The intersected peaks shared among cell types (Lat_K4, Med_K4, PS_K4, RS_K4, PS_K27, and RS_K27) were presented by UpSet plot. The connected lines among cell types; shown in the lower panel of the plot represented the group of intersected peaks among them. The upper panel bht trapti pf tlte plot presenaed the number of peaks found in eoch group. The rhared peak etmber among all samples with either K4 or K27 trimethylation and only K4 trimethylation are in red (red) and orange (orange) color, accordingly. The horizontal leftmost bar graph (olive green color, (green) show to total number of peaks presented in each cell type.
Caption: Figure 4. Chromatin histone 3 (H3) trimethylation state and transcription of cellular retinoic acid binding protein 1 (CRABP1), growth differentiation factor 10 (GDF10), and gremlin 1 (GREM1) genes. The heatmaps of CRABP1, GDF10, and GREM1 genes were acquired from the integrative genomics viewer (IGV) interface. The chicken full genome sequences was used to identify the chromatin loci of these genes with transcription direction indicated by the arrow heads presented in the genome along with the associated promoters in the area (galGal5 and Promoter regions at the bottom of the figure). The figure show 3 types of data--ChIP-seq, RNA-seq, and Peak files acquired from each sample cell type. In brief, ChIP-seq data show the density of DNA fragments aligned to a chromatin region (aggregated DNA fragments) in blue heatmap. The RNA-seq data show the density of transcript fragments aligned to the gene regions representing level of gene expressions by red heatmap. Finally, the peak file of each cell sample indicated the chromatin region with significant peaks identifed by peak calling analysis.
Table 1. Raw data files used in this study Cell type's Description abbreviation PS_K4 ChIP-seq of H3K4me3 of pachytene spermatocye PS_K27 ChIP-seq of H3K27me3 of pachytene spermatocyte PS_RNA-seq RNA-seq of pachytene spermatocyte RS_K4 ChIP-seq of H3K4me3 of round spermatid RS_K27 ChIP-seq of H3K27me3 of round spermatid RS_RNA-seq RNA-seq of round spermatid Lat_K4(1) ChIP-seq of H3K4me3 of lateral mesenchyme Lat_RNA-seq(1) RNA-seq of lateral mesenchyme Med_K4(1) ChIP-seq of H3K4me3 of medial mesenchyme Med_RNA-seq(1) RNA-seq of medial mesenchyme Lat_K4(2) ChIP-seq of H3K4me3 of lateral mesenchyme Lat_RNA-seq(2) RNA-seq of lateral mesenchyme Med_K4(2) ChIP-seq of H3K4me3 of medial mesenchyme Med_RNA-seq(2) RNA-seq of medial mesenchyme Cell type's Sample file Control file abbreviation PS_K4 SRR1977505 SRR1977507 PS_K27 SRR1977506 SRR1977507 PS_RNA-seq SRR1977509 -- RS_K4 SRR1977535 SRR1977537 RS_K27 SRR1977536 SRR1977537 RS_RNA-seq SRR1977540 -- Lat_K4(1) SRR4125205 SRR4125207 Lat_RNA-seq(1) SRR4125190 -- Med_K4(1) SRR4125206 SRR4125208 Med_RNA-seq(1) SRR4125192 -- Lat_K4(2) SRR4125209 SRR4125211 Lat_RNA-seq(2) SRR4125191 -- Med_K4(2) SRR4125210 SRR4125212 Med_RNA-seq(2) SRR4125193 -- Table 2. NSC values, RSC values, and DNA fragment length of ChIP-seq samples Cell type Sample file NSC RSC Fragment length PS_K4 SRR1977505 1.10 1.05 170 PS_K27 SRR1977506 1.02 1.28 170 RS_K4 SRR1977535 1.13 1.10 170 RS_K27 SRR1977536 1.03 1.25 170 Lat_K4_1 SRR4125205 1.02 0.72 210 Med_K4_1 SRR4125206 1.01 0.64 200 Lat_K4_2 SRR4125209 1.02 0.71 240 Med_K4_2 SRR4125210 1.01 0.66 195 NSC, normalized strand cross; RSC, relative strand cross.
|Printer friendly Cite/link Email Feedback|
|Author:||Chokeshaiusaha, Kaj; Puthier, Denis; Nguyen, Catherine; Sananmuang, Thanida|
|Publication:||Asian - Australasian Journal of Animal Sciences|
|Date:||Jun 1, 2018|
|Previous Article:||A genome-wide association study of social genetic effects in Landrace pigs.|
|Next Article:||Genetic parameter estimation for milk [beta]-hydroxybutyrate and acetone in early lactation and its association with fat to protein ratio and energy...|