Annotation of genomics data using bidirectional hidden Markov models unveils variations in Pol II transcription cycle
19-Dec-2014
Molecular Systems Biology ,10: 768, DOI 10.15252/msb.20145654
Molecular Systems Biology, online article
DNA replication, transcription and repair involve the recruitment of protein complexes that change their composition as they progress along the genome in a directed or strand‐specific manner. Chromatin immunoprecipitation in conjunction with hidden Markov models (HMMs) has been instrumental in understanding these processes, as they segment the genome into discrete states that can be related to DNA‐associated protein complexes. However, current HMM‐based approaches are not able to assign forward or reverse direction to states or properly integrate strand‐specific (e.g., RNA expression) with non‐strand‐specific (e.g., ChIP) data, which is indispensable to accurately characterize directed processes. To overcome these limitations, we introduce bidirectional HMMs which infer directed genomic states from occupancy profiles de novo. Application to RNA polymerase II‐associated factors in yeast and chromatin modifications in human T cells recovers the majority of transcribed loci, reveals gene‐specific variations in the yeast transcription cycle and indicates the existence of directed chromatin state patterns at transcribed, but not at repressed, regions in the human genome. In yeast, we identify 32 new transcribed loci, a regulated initiation–elongation transition, the absence of elongation factors Ctk1 and Paf1 from a class of genes, a distinct transcription mechanism for highly expressed genes and novel DNA sequence motifs associated with transcription termination. We anticipate bidirectional HMMs to significantly improve the analyses of genome‐associated directed processes.