Satellite submission
Satellite manuscript internal review page
Welcome to the FANTOM5 Satellite review page. As discussed at the Ume and Koyo meetings, all papers will be visible to consortium members. This is to allow everyone to know what is going on, promote collaboration, carry out due process regarding co-authorship and to avoid competition.
Authorship
The author list will basically be selected by the first author and the corresponding author of each satellite paper on the basis of the scientific contribution to the manuscript. Remember to include an authors contribution statement for all authors named in your manuscript (of the form AB carried out the cell isolation, SB carried out the network predictions etc.).
In addition the FANTOM5 headquarter will name RIKEN members who should be co-authors for their input on each manuscript and to the entire FANTOM5 project. For those of you who have participated in previous FANTOMs you will be familiar with this process, for those new to FANTOM please look at the author lists on the satellite paper collections for FANTOM2-4. FANTOM5 headquarter is currently discussing the policy for RIKEN co-authorship on the FANTOM5 satellites, but basically satellites papers will be considered on a case by case basis, and will take into account datasets used, intellectual input and facilitating technologies/analyses for each paper.
At this stage please name any authors from RIKEN that you think should definitely be included as co-authors, in addition for all satellite submissions include the following term RIKEN_members as an additional author.
Updated flow for the Fantom Phase 1 and Phase 2 papers submission and manuscript formatting guidelines
With the Phase 1 and 2 main paper published, we have changed the flow for the Phase 1 and 2 manuscript submission. We have also consolidated information that was scattered in several e-mails sent to the FANTOM5 mailing list into a single document. We kindly ask everyone to stick the below flow and manuscript formatting guidelines linked in this section.
Please CC Erik Arner erik.arner@riken.jp and FANTOM5 Secretariat fantom5-secretariat@gsc.riken.jp in all correspondence connected to your paper.
General flow for satellite paper submission:
At draft stage:
1. All draft manuscripts should be entered into the wiki prior to submission - sharing these avoids internal competition and cases where co-authors are missed.
Phase1 satellite submission page: https://fantom5-collaboration.gsc.riken.jp/wiki/index.php/Satellite_submission
Phase2 satellite submission page: https://fantom5-collaboration.gsc.riken.jp/wiki/index.php/Time-courses_Satellite_papers
2. A contributions statement should be included on all manuscripts
3. Draft manuscripts should be sent to Erik Arner erik.arner@riken.jp (please CC fantom5-secretariat fantom5-secretariat@gsc.riken.jp as well) to assess RIKEN co-authorships. Each manuscript will be considered on a case by case basis and will consider, data production, optimization of mapping, data management, and establishment of the collaboration and management of the project over the past 4 years. In a few cases we also recommended additional authors from outside the RIKEN if they had provided particular critical sets of samples or co-ordinated analyses.
4. After authorship consideration by the FANTOM management you will be notified about the RIKEN authorship and also the RIKEN author who will be the contact person for communicating with the 1st author within RIKEN
5. Going through internal review (optional) If you would like your manuscript to undergo internal review, it can be organized within FANTOM consortiums. Please update the latest copy to the Satellite submission page and contact Erik Arner erik.arner@riken.jp. Please allow at least a week turnaround.
6. Be sure to follow the consortium manuscript formatting rules regarding authorship, affiliations, acknowledgements etc., as summarized in the below manuscript formatting manual with author checklist:
File:F5 satellite manuscript formatiing rules 150226.pdf
Before submission
7. Before submission to a journal, please be sure to send the manuscript to all the co-authors and get their feedback and consensus about the contents etc.
At the time of the submission/after submission
8. Please keep Erik Arner erik.arner@riken.jp, the RIKEN contact author and FANTOM5 Secretariat fantom5-secretariat@gsc.riken.jp updated about the paper progress or any change in submission status. In case of revisions, submission after rejection please be sure to send the manuscript to all the authors for confirmation. Please keep your wiki entries up to date.
9. After the paper is accepted be sure to send the proof to all the authors for the checks.
If you have any questions, please contact Erik Arner erik.arner@riken.jp(cc FANTOM5 Secretariat fantom5-secretariat@gsc.riken.jp)
Instructions
Please make a copy of the template below and enter your manuscript details.
If you are not able to edit the wiki yourself please email the secretariat with the subject line "FANTOM5_satellite", but please understand that these will be processed when we can rather than immediately. You must fill in all of the details below and provide both a PDF that contains all figures, and word doc of the main text, for reviewers to mark up directly.
Manuscripts
PUBLISHED OR ACCEPTED
Title: High-throughput transcription profiling identifies putative epigenetic regulators of hematopoiesis
ManuscriptID: Phase1_004
Status: PUBLISHED IN BLOOD
Abstract: The hematopoietic differentiation pathway is a complex regulatory program for generating different lineages of blood cell types from multipotent, hematopoietic stem cells. The transcriptional program dictating hematopoietic cell fate and differentiation requires an epigenetic memory function consisting of a network of enzymes controlling DNA methylation, histone posttranslational modifications and chromatin structure. Defective interactions between epigenetic enzymes and transcription factors cause perturbations in blood cell differentiation, which often leads to various types of hematopoietic disorders such as leukemia. To elucidate the contribution of different epigenetic factors in human hematopoieis, high-throughput Cap Analysis of Gene Expression (CAGE) sequencing was used to build comprehensive transcription profiles of 199 epigenetic factors in a wide range of blood cells. These epigenetic factors include proteins that covalently modify DNA/histones or alter chromatin structure dynamics. Our analysis revealed several epigenetic factors to have expression profiles specific for cell type, lineage type and/or leukemic cell lines. In this report the ‘epigenetic transcriptome’ has been systematically studied to predict their potential functions in the epigenetic regulatory network of human hematopoiesis. The potential of such a comprehensive study is not only to identify putative epigenetic regulators of normal hematopoiesis and postulate their function but also to serve as a resource for the scientific community for further characterization and validation of differentially expressed transcripts.
Authors: Punit Prasad, Michelle Rönnerblad,...FANTOM5, Erik Arner, Karl Ekwall and Andreas Lennartsson
Authors contribution statement: PP and MR have done analysis and written the manuscript. EA has performed the initial CAGE analysis for the epigenetic factors and assisted in writing the manuscript. AL and KE have assisted in writing the manuscript, planned and coordinated the study. The authors declare no conflict of interest.
Datasets used: Helicos CAGE on ...
Target journal(s): Blood or other
Internal submission date: December 06, 2012
Contact by email: Andreas Lennartsson, Erik Arner
Word document version of manuscript for editors: File:Prasad et al Final F5 sec.docx
PDF version for general viewing (including all figs in one PDF): File:Prasad et al Blood 020213 .pdf
Title: Redefinition of the human mast cell transcriptome by deep-CAGE sequencing
ManuscriptID: Phase1_009
Status: PUBLISHED IN BLOOD
Abstract: Despite their haematopoietic origin, mast cells (MCs) mature exclusively in peripheral tissues, hampering research into their developmental and functional programs. Here, we employed deep-CAGE on skin-derived MCs to generate the most comprehensive view of the human MC transcriptome ever reported. A particular advantage is that MCs were embedded in the FANTOM5 project, giving the opportunity to contrast their molecular signature against an extensive panel of human samples. We demonstrate that MCs possess a unique and surprising transcriptional landscape, combining expression of typical haematopoietic genes with those exclusively active in MCs, and genes not previously reported as expressed in MCs. Specifically we found that MCs express functional BMP receptors, which transduce pro-survival and activatory signals. Conversely, several genes frequently studied in MCs were either not or only weakly expressed in direct comparison with other myelocytes. By the parallel use of MCs ex vivo and following culture, we also found that MCs change their transcriptome in in vitro surroundings. Befitting their uniqueness, MCs had no close relative in the haematopoietic network. This rich dataset reveals that our knowledge of human MCs is still fairly limited. It can be anticipated that with this resource novel functional programs of MCs will soon be discovered.
Authors: Efthymios Motakis,1,* Sven Guhl,2,* Yuri Ishizu,1 RIKEN OSC members,1 Torsten Zuberbier,2 Alistair R R Forrest,1¶ Magda Babina2¶
Authors contribution statement: E.M. carried out bioifnormatics analayses S.G. isolated the mast cells and performed most experiments, M.B. performed several experiments, was involved in planning, supervision, and data analysis, and wrote the first draft of the manuscript, E.M. S.G., A.R.R.F. and T.Z. helped with planning, data analysis and manuscript writing.
Datasets used: Helicos CAGE on mast cell samples in comparison to freeze 1 data
Target journal(s): Blood, eBlood
Internal submission date:
Contact by email: Magda Babina, Sven Guhl
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:BLOOD-2013-483792v1-Forrest.pdf
Title: Transcription and enhancer profiling in human monocyte subsets
ManuscriptID: Phase1_011
Status: PUBLISHED IN BLOOD
Abstract: Human blood monocytes comprise at least three subpopulations that differ in phenotype and function. Here we present the first in-depth regulome analysis of classical (CD14++CD16-), intermediate (CD14+CD16+), and nonclassical (CD14dimCD16+) monocytes. Cap Analysis of Gene Expression (CAGE) adapted to Helicos single molecule sequencing was used to map transcription start sites throughout the genome in all three subsets. In addition, global maps of H3K4me1 and H3K27ac deposition were generated for classical and nonclassical monocytes defining enhanceosomes of the two major subsets. We identify differential regulatory elements (including promoters and putative enhancers) that were associated with subset-specific motif signatures corresponding to different transcription factor activities and exemplarily validate a novel downstream enhancer of the CD14 locus. In addition to known subset specific features, pathway analysis revealed marked differences in metabolic gene signatures. While classical monocytes expressed higher levels of genes involved in carbohydrate metabolism priming them for anaerobic energy production, nonclassical monocytes expressed higher levels of oxidative pathway components and showed a higher routine mitochondrial activity. Our findings describe promoter/enhancer landscapes and provide novel insights into the specific biology of human monocyte subsets.
Authors: Christian Schmidl, Kathrin Renner, Ruediger Eder, Katrin Peter, Petra Hoffmann, Reinhard Andreesen, Marina P. Kreutz, RIKEN_OSC_members, Matthias Edinger, Michael Rehli
Authors contribution statement: CS performed experiments, computational analyses and wrote parts of the manuscript writing, KR performed experiments and contributed to manuscript writing, RE isolated the cells, KP performed experiments, PH, RA, MK, and ME contributed to planning and supervision, RIKEN_OSC_members who organized or performed Helicos sequencing and provided aligned data; MR initiated, planned and supervised the study, performed computational analyses, and wrote the manuscript.
Datasets used: Helicos CAGE on monocyte subsets (Regensburg samples)
Target journal(s): Blood, eBlood, other
Internal submission date: September 1 ,2012
Contact by email: Michael Rehli, Christian Schmidl
Word document version of manuscript for editors: File:Schmidl MonoSub.docx
PDF version for general viewing (including all figs in one PDF): File:Schmidl MonoSub.pdf
Title: The enhancer and promoter landscape of regulatory and conventional T cell subpopulations
ManuscriptID: Phase1_34
Status: PUBLISHED IN BLOOD
Abstract: CD4+CD25+FOXP3+ human regulatory T cells (Treg) are essential for self-tolerance and immune homeostasis. Here, we describe the promoterome of CD4+CD25highCD45RA+ naïve and CD4+CD25highCD45RA– memory Treg and their CD25– conventional T cell (Tconv) counterparts both before and after in vitro expansion by cap analysis of gene expression adapted to single molecule sequencing (HeliscopeCAGE). We performed comprehensive comparative digital gene expression analyses and revealed new orphan transcription start sites, of which several were validated as alternative promoters of known genes including FOXP3 and CTLA4. For all in vitro expanded subsets, we additionally generated genome-wide maps of poised and active enhancer elements marked by histone H3 lysine 4 monomethylation and histone H3 lysine 27 acetylation. Analysis of cell type-specific regulatory elements revealed a specific enrichment of several transcription factor binding motifs. We validated promising candidates by chromatin immunoprecipitation coupled to next generation sequencing and identified STAT5 and FOXP3 as well as RUNX1 and ETS1 as global regulators of Treg- and Tconv-specific enhancers, respectively. In summary we provide a highly detailed and easily accessible resource of gene expression and -regulation in Treg and Tconv subpopulations.
Authors: R
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks
Target journal(s): Blood
Internal submission date:
Contact by email: Christian Schmidl, Michael Rehli
Word document version of manuscript for editors: File:121027 FANTOM Treg manuscript.docx
PDF version for general viewing (including all figs in one PDF): File:Schmidl Treg.pdf
Title: Effect of cytosine methylation on transcription factor binding sites and regulation of transcription
ManuscriptID: Phase1_010
Status: PUBLISHED IN BMC GENOMICS
Abstract: Background: DNA methylation in promoters is strongly linked to downstream gene repression. However, the question remains as to whether DNA methylation is a cause or a consequence of gene repression. In the former case, DNA methylation may affect the affinity of transcription factors (TFs) towards their binding sites (TFBSs). In the latter case, gene repression caused by chromatin modification is stabilized by DNA methylation. Until now, the above-mentioned scenarios have been only supported only by non-systematic evidences and have not been tested for a wide spectrum of TFs. Although the average promoter methylation is usually used in related studies, recent results suggested that methylation of individual cytosines can be also important.
Results: We found that for 16.6% of cytosines methylation profile and the expression profile of neighboring TSSs were significantly anti-correlated. We named CpG corresponding to such cytosines as “traffic lights”. We observed a strong selection against CpG “traffic lights” within TFBSs. The negative selection was stronger for transcriptional repressors as compared to transcriptional activators or multifunctional TFs as well as for core TFBS positions as compared to flanking TFBS position.
Conclusions: Our results indicate that direct and selective methylation of certain TFBS that prevents TF binding is restricted to only special cases and cannot be considered as a general regulatory mechanism of transcription.
Authors: Yulia A Medvedeva, Abdullah Khamis, Ivan V Kulakovskiy, Wail Ba-Alawi, Md Shariful I Bhuyan, Hideya Kawaji, Timo Lassmann, Matthias Herbers, Alistair RR Forrest, Vladimir B Bajic and the FANTOM consortium
Authors contribution statement: YAM designed the computational experiments, selected and preprocessed the data, produced statistical analysis and wrote the manuscript; AK performed most of the data analysis; WBA and MdSIB contributed RDM models and tools for threshold estimation and mapping; [potential F5 collaborators], IVK performed part of the analysis, contributed to the design of the experiments and writing of the manuscript; VBB contributed to the design of the experiments and writing of the manuscript.
Datasets used: Helicos CAGE on 50 sample types, ENCODE RRBS data for the same samples
Target journal(s): Genome biology
Internal submission date: December, 16
Contact by email: Yulia Medvedeva
Word document version of manuscript for editors: File:Effect of cytosine methylation on transcription factor binding sites and regulation of transcription.doc
PDF version for general viewing (including all figs in one PDF): File:Effect of cytosine methylation on transcription factor binding sites and regulation of transcription.pdf
Additional files for general viewing: File:Effect of cytosine methylation on transcription factor binding sites Additional files.zip
Accepted version: File:Medvedeva et al accepted.zip
Title: The Evolution of Human Cells in terms of Protein Innovation
ManuscriptID: Phase1_013
Status: PUBLISHED IN MOLECULAR BIOLOGY AND EVOLUTION
Abstract: Humans are complex organisms composed of a great many cell types. Since the genomic DNA of each cell is identical, cell type is determined by what is expressed. We examine the evolutionary history of each human cell type at the molecular level via the collective histories of proteins, the principal product of gene expression. Sequence data from the FANTOM5 consortium are used to provide cell-type specific digital expression of protein-coding genes, and the SUPERFAMILY and dcGO resources provide domain and function annotation respectively. Cross-referencing with the domain annotation of all other completely-sequenced genomes provides the evolutionary context for each protein. We combine all of this to generate a description of cellular evolution at the molecular level.
We present a protein domain view of the evolution of cell type. To achieve this we first identify the most recent common ancestor (MRCA) or ‘creation epoch’ of every protein in the repertoire of the human genome. We are then able to use the protein creation epochs to describe the history of the emergence of each cell type over evolution in terms of the collective histories of the proteins expressed in that cell type. Each cell type has an evolutionary profile consisting of a timeline along the lineage from the ancient cellular ancestor to modern day human. The profile of each cell type shows at which epochs along the timeline innovations in protein evolution took place; required to allow the observed expression in that type of cell. By clustering cell types on these profiles, we find groups of cell types that share a parallel protein evolutionary history and thus potentially possess a common progenitor cell type or are evolving in cooperation. A functional enrichment analysis of these clusters reveals key proteins responsible for evolutionary shifts and functional innovations; it also suggests a possible order in which different cells could have emerged during evolution, which we discuss in relation to the human immune system. The structural domain-centric perspective which we employ in this work can also be used as the basis for a comparison of the molecular basis of functional and phenotypic differences between cell types within these evolutionary clusters, exemplified by an inspection of our results on different regions of the brain.
We present a view of the landscape of nature’s innovation of protein structure and architecture required to explain the creation of the different human cell types. This landscape has some important features such as the possibility that the last universal ancestor of life provided most of the innovation for the innate immune system whilst brain cells have been making use of novel proteins that first appeared in opisthokonta (animals and fungi) and continued to do so right up until homo sapiens. The landscape also lends itself to identifying candidate genes for disease by highlighting those that were important in enabling certain phenotypic shifts at key points in evolution.
Authors: Adam J. Sardar, Matt E. Oates, Hai Fang, Alistair R.R. Forrest,Hideya Kawaji, Julian Gough, Owen J.L. Rackham and the FANTOM Consortium
Authors contribution statement: FANTOM5 was made possible by a Research Grant for RIKEN Omics Science Center from MEXT to Yoshihide Hayashizaki and a Grant of the Innovative Cell Biology by Innovative Technology (Cell Innovation Program) from the MEXT, Japan to Y.H.. We would like to thank all members of the FANTOM5 consortium for contributing to generation of samples and analysis of the dataset and thank GeNAS for data production. A.J.S. and M.E.O. were funded by BCCS studentships from EPSRC [EP/E501214]; another funding source was the BBSRC [BB/ G022771/1 to J.G., funding O.J.L.R. and H.F.].The authors would like to thank David de Lima Morais for useful discussion at the preliminary stages of this work.
Datasets used: Helicos CAGE on all samples
Target journal(s):GR
Internal submission date:
Contact by email: Julian Gough, Owen Rackham
Word document version of manuscript for editors: Rough draft available on request
PDF version for general viewing (including all figs in one PDF): submitted revisions: Full_Manuscript_Sardar_et_al.pdf File:TraP Journal Submission.zip File:GR Submission 3 March Sardar 2013 The Evolution of Human Cells in terms of Protein Innovation.pdf
Title: Comparison of CAGE and RNA-seq transcriptome profiling using a clonally amplified and single molecule next generation sequencing
ManuscriptID: Phase1_027
Status: PUBLISHED IN GENOME RESEARCH
Abstract: CAGE (Cap Analysis Gene Expression) and RNA-seq are two major technologies used for transcript quantification. These protocols measure expression by from either the 5’ end of capped molecules (CAGE) or tags randomly distributed along the length of a transcript (RNA-seq). Library protocols for clonally amplified (Illumina, SOLiD, 454, Ion Torrent) 2nd generation sequencing platforms typically employ PCR pre-amplification prior to clonal amplification, while 3rd generation single molecule sequencers can sequence unamplified libraries. While these protocols individually have been demonstrated to be highly reproducible, no systematic comparison has been carried out between the protocols. Here we compare CAGE using both 2nd and 3rd generation sequencers and RNA-seq using a 2nd generation sequencer based on a panel of RNA mixtures from two human cell lines (THP-1 and HeLa, 100%, 50%, 20%, 10%, 5%, 1% and 0% of HeLa RNAs) to examine power to discriminate biological states, to detect differentially expressed genes, linearity of measurements as well as quantification reproducibility. Quantification by CAGE with the 2nd and 3rd generation sequencers (Illumina GA-IIx and HeliScope) were consistent at gene level, however we observed several differences, which can be explained by differences in their protocols and sequencing platforms. These include significant bias in the Illumina library, such as GC biases and over-estimation of transcripts harboring internal Ecop15I sites., A poorer correlation at the level of individual TSS positions, which is likely to be due to higher indel rate in HeliScope, is also found. We found high consistency between HeliScopeCAGE with RNA-seq (spearman correlations 0.88). Differences between CAGE and RNA-seq are explained by incompleteness of existing gene models in most cases, where 5’-ends of gene models do not reflect actual transcription starting site in the profiled cells, or RNA polymerase run through the poy adenylation site resulting in fusion of neighboring genes.
Authors: WP3
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used:
Target journal(s): Genome Res.
Internal submission date: 23rd Dec, 2012
Contact by email: KAWAJI Hideya
Submitted PDF: File:130215-PlatformEval-submittedGR.pdf
Title: Differential roles of epigenetic conversion and Foxp3 expression in regulatory T cell-specific transcriptional regulation
ManuscriptID: Phase1_029
Status: PUBLISHED IN PNAS
Abstract: Naturally occurring regulatory T (Treg) cells are engaged in the maintenance of immune tolerance and homeostasis. The development of Treg cells requires both the expression of the transcription factor Foxp3 and the establishment of Treg cell-type DNA hypomethylation pattern. By transcriptional start site (TSS) cluster analysis, we here assessed possible correlation of genome-wide DNA methylation pattern or Foxp3-binding pattern with Treg-specific gene expression. We found that Treg cell-specific DNA hypomethylated regions were closely correlated with Treg-upregualted TSS clusters, whereas Foxp3-binding regions had no significant correlation with either up- or down-regulated clusters, in non-activated Treg cells. On the other hand, in activated Treg cells, Foxp3-binding regions showed a strong correlation with down-regulated clusters. In silico search for transcription factor-binding motifs revealed that the motifs enriched in Foxp3-binding or Treg-specific DNA hypomethylated regions were mostly different. These results collectively indicate that Treg cell-specific DNA hypomethylation is conducive to up-regulation in the steady state Treg cells whereas Foxp3 expression to down-regulation of its target genes in activated Treg cells. Thus, the combination of the two events is required for the establishment of Treg cell-specific gene expression and function.
(185 words)
Authors: Hiromasa Morikawa1,2, Naganari Ohkura1, Alexis Vandenbon3, RIKEN_OSC_members 4, Daron Standley3, Hiroshi Date2, Shimon Sakaguchi1
1. Department of Experimental Immunology, World Premier International Immunology Frontier Research Center, Osaka University, Suita 565-0871, Japan
2. Department of Thoracic Surgery, Kyoto University, 54 Shogoin-Kawahara-cho, Sakyo-ku, Kyoto, 606-8507, Japan
3. Department of Systems Immunology, World Premier International Immunology Frontier Research Center, Osaka University, Suita 565-0871, Japan
4. RIKEN Omics Center, Yokohama, Japan
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
'Datasets used: phase1 CAGE peaks
Target journal(s): Genome Research
Internal submission date: 2012/12/18
Contact by email: Hiromasa Morikawa
Word document version of manuscript for editors: Submit130114v3.docx
PDF version for general viewing (including all figs in one PDF): Submit130114v3.pdf
Title: An atlas of active enhancers across human cell types and tissues
ManuscriptID: Phase1_35
Status: PUBLISHED IN NATURE
Abstract: In higher organisms, cellular development and diversity is highly controlled by enhancers, which regulate the correct temporal and cell type-specific activation of gene expression. Despite their obvious importance for development and disease, the exact locations, target genes and mechanisms of enhancers are still poorly defined. Thus, there is an urgent need not only to identify enhancer locations, but also to elucidate their specific usage across the wide diversity of cells within the human body, their impact on regulation in healthy and diseased individuals, and how enhancers interact with target genes. Here, we use the FANTOM5 panel of tissue and primary cell samples covering the majority of human tissues and cell types to define an atlas of active, in vivo bidirectionally transcribed enhancers across the human body. It enables comparison of regulatory programs between different cells and tissues at unprecedented depth, and makes it possible to define distinct subsets of enhancers, including fetal-specific, cell-specific and ubiquitous enhancers – a novel enhancer subtype with distinct properties. We show that known target genes of enhancers can be recaptured using expression correlations and predict many novel enhancer-TSS associations. We present models confirming the utility of multiple redundant enhancers, which explain TSS expression strength rather than expression patterns. We demonstrate that disease-associated functional single nucleotide polymorphisms are over-represented in enhancers and that such enhancers often have disease-relevant expression patterns. The human enhancer atlas can be accessed through an online database and is a unique resource for studies on tissue/cell-specific enhancers and their gene interactions.
Authors: Robin Andersson1#, Claudia Gebhard2#, Irene Miguel-Escalada3, Ilka Hoof1, Xiaobei Zhao1, Christian Schmidl2, Eivind Valen1,4, Kang Li1, Lucia Schwarzfischer2, Dagmar Glatz2, Johanna Raithel2, Yun Chen1, Berit Lilje1, Nicolas Rapin1,5, Frederik Otzen Bagger1,5, Mette Jørgensen1, Mette Boyd1, Jette Bornholdt1, Kenneth Baillie6, Chris Mungall7, Timo Lassmann8, Hideya Kawaji8, Andreas Lennartsson9, Carsten Daub8,9, David Hume6, Peter Heutnik10, Alistair Forrest8, Piero Carninci8, Yoshihide Hayashizaki8, Ferenc Müller3, Michael Rehli2*, Albin Sandelin1*
Authors contribution statement: RA, IH, EV, KL, YC, BL, XZ, MJ, HK, TL, KB, CM, NR, FOB, MR, AS made the computational analysis. TL, HK, CD, AF, PC, YH prepared, mapped and analyzed CAGE libraries. RA, CG, IH, EV, FM, PC, AF, AK, MB, JBL, AL, CD, DH, PH MR, AS interpreted results. CG, CS, ME, MR made the blood cell ChIP experiments, methylation assays and in vitro blood cell validations. IME, FM made zebrafish in vivo validations and interpretations. RA, CG, IH, FM, MR, AS wrote the paper.
Datasets used: phase1 CAGE peaks and raw CAGE mapped data from human, internal ChIP and other validation data
Target journal(s): To be decided
Internal submission date:
Contact by email: [robin@binf.ku.dk, michael.rehli@klinik.uni-regensburg.de, albin@binf.ku.dk , Michael Rehli Albin Sandelin]
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf File:Enhancerome full.pdf
Title: Analysis of DNA methylation and transcription during granulopoiesis reveals timed methylation changes in low CpG areas and regulation of transcription factor expression and motif activity
ManuscriptID: Phase1_001
Status: PUBLISHED IN BLOOD
Abstract: In development epigenetic mechanisms such as DNA methylation have been suggested to provide cellular memory to maintain pluripotency but also stabilize cell fate decisions and direct lineage restriction. In this study we set out to characterize changes in DNA methylation levels and gene expression during granulopoiesis using four distinct cell populations ranging from the oligopotent common myeloid progenitor stage to terminally differentiated neutrophils. We found a general decrease of DNA methylation during granulopoiesis. Methylation levels appear to change at specific differentiation stages and correlate with changes in transcription and motif activity of key hematopoietic transcription factors. Differentially methylated sites (DMSs) are preferentially located in areas distal to CpG islands and shores and are overrepresented in potentially regulatory enhancer elements. Overall this study depicts in detail the epigenetic and transcriptional changes that occur during granulopoiesis and supports the role of DNA methylation as a regulatory mechanism in cell differentiation.
Authors: Michelle Rönnerblad, Tor Olofsson, Sören Lehmann, RIKEN_OSC_members, Karl Ekwall*, Erik Arnér* & Andreas Lennartsson*
Authors contribution statement: MR did most of the practical experiments, the bioinfo analysis (except CAGE related) and most manuscript writing, TO isolated the cells from bone marrows, SL gave valuable input to the planning, analysis and critically reviewed the manuscript, KE planned and supervised the study and contributed to the manuscript writing , EA supervised the bioinformatic analysis and performed the ones related to CAGE and contributed to the manuscript writing, AL initiated, planned and supervised the study and contributed to the manuscript writing and did some experiments.
Datasets used: Helicos CAGE on granulo precursor populations
Target journal(s): Blood
Internal submission date: April 7th 2012
Contact by email: andreas lennartsson, Karl Ekwall, Erik Arner
Word document version of manuscript for editors: File:Rönnerblad prepub F5 sec.doc
PDF version for general viewing (including all figs in one PDF): File:Rönnerblad Aprl07.pdf
Title: Ceruloplasmin is a Novel Adipokine Which is Overexpressed in Adipose Tissue of Obese Subjects and in Obesity-Associated Cancer Cells
ManuscriptID: Phase1_32
Status: PUBLISHED IN PLOS ONE
Abstract: Obesity confers an increased risk of developing specific cancer forms. Although the mechanisms are unclear, increased fat cell secretion of specific proteins (adipokines) may promote/facilitate development of malignant tumors in obesity by cross-talk between adipose tissues and the tissues prone to develop cancer among obese. This was investigated using expression data from human adipose tissue of obese and non-obese as well as from a large panel of human cancer cell lines and corresponding primary cells and tissues. We identified three previously described adipokines, SERPINE1, SERPINE2 and C3 sharing a common cognate receptor LRP1 which was expressed in all cancer cell lines associated with obesity. Expression and secretion of SERPINE1 and C3 were increased in obese adipose tissue and their plasma levels were elevated in obese subjects. We also identified genes enriched in obesity-associated cancer cells compared to cell lines and corresponding healthy tissues or primary cells. We found expression of ceruloplasmin to be the most enriched in obesity-associated cancer cells. This gene was also significantly up-regulated in adipose tissue of obese subjects. Ceruloplasmin is the body’s main copper carrier and is involved in angiogenesis. We demonstrated that ceruloplasmin was a novel adipokine and that obese adipose tissue contributed markedly (22%) to the total protein level. In summary, we have identified several adipokines, which can serve as endocrine signals facilitating growth of obesity-associated cancer tumors. These adipocyte signals are increased in obesity and may be important for development of cancer associated with excess body fat.
Authors: Erik Arner, Alistair Forrest, Anna Ehrlund, Niklas Mejhert, [Additional RIKEN people?], Jurga Laurencikiene, Mikael Rydén, Peter Arner
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks
Target journal(s): Cancer Research
Internal submission date:
Contact by email: Erik Arner
Word document version of manuscript for editors: File:Fat cells and cancer draft 120816 EA.docx File:Figs 2012-08-15.ppt
PDF version for general viewing (including all figs in one PDF):
Title: Interactive visualization and analysis of large-scale NGS data-sets using ZENBU
ManuscriptID: Phase1_33
Status: PUBLISHED IN NATURE BIOTECHNOLOGY
Abstract:The world of genome sciences has dramatically changed over the last 5 years. With the advent of next generation sequencers and RNA-expression sequencing, genome science is no longer the domain of a few elite centralized "genome centers" like in the early days of the field. The advance of next-generation sequencers has spurred an ever-growing body of tag-based data allowing the survey of chromatin states and transcriptome dynamics. Visualization of expression levels of genomic regions was achieved by displaying expression levels in various experimental conditions in dedicated tracks allowing investigators a direct comparison of their dynamics. Novel file formats and browser design have allowed for dealing efficiently with the depth of data produced by next-generation sequencer based technologies. Researchers need to interact within global collaborations and need easy ways to process, share and visualize their data in a secured manner prior to publication. To this end we have developed the ZENBU system. ZENBU is a web based system which is a social networking platform for secured data upload and data sharing with collaborators, a data processing system, and a visualization system. ZENBU provides the infrastructure for working with 100s of terrabytes of sequence data in the form of BAM sequence alignment files and genome annotation formats like BED and GFF, to efficiently cross-analyze these databsets using a Map-Reduce/autonomous-agent based parallel processing system, and provide fast efficient web services for user interfaces. The user interfaces for ZENBU is based on Web2.0 technologies in the form of a new expression-enhanced genome browser, and data manipulation interfaces for data upload, data processing, and data download. ZENBU currently contains the entire FANTOM 3/4/5 datasets, the entire ENCODE datasets, and much of the UCSC genome annotation data. ZENBU is planned to be a corner stone in the expanding global network of scientific sharing web systems.
Authors: Jessica Severin*, Marina Lizio, Jayson Harshbarger, Hideya Kawaji, Carsten Daub, The FANTOM5 consortium, Yoshihide Hayashizaki, Nicolas Bertin*, Alistair Forrest*
Authors contribution statement: JMS, ML, JH, HK, CD, YH, NB, AL
- JMS, wrote the software/webservices.
- JMS, NB, planned the study.
- NB supervised the study.
- JMS, NB, contributed to the manuscript writing.
- JMS, NB, gave valuable input to the analysis in the manuscript.
- JMS, NB, critically reviewed the manuscript.
- [addition of any other, clearer or more precise statement is very welcome]
Datasets used: phase1 CAGE peaks
Target journal(s): Nature Biotech/Genome Research
Internal submission date:
Contact by email: Jessica Severin, Nicolas Bertin, Alistair Forrest
Word document version of the most up to date manuscript draft: File:ZENBU manuscript.014 (1).docx
Word document version of manuscript for editors: [[]]
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Chromatin states reveal functional associations for globally defined transcription start sites in four human cell lines
ManuscriptID: Phase1_017
Status: PUBLISHED IN BMC GENOMICS
Abstract: Background: Deciphering the most common modes by which chromatin regulates transcription, and how this is related to cellular status and processes is an important task for improving our understanding of human cellular biology. The FANTOM5 and ENCODE projects represent two independent large scale efforts to map regulatory and transcriptional features to the human genome. Here we investigate chromatin features around a comprehensive set of transcription start sites in four cell lines by integrating data from these two projects. Results: Transcription start sites can be distinguished by chromatin states defined by specific combinations of both chromatin mark enrichment and the profile shapes of these chromatin marks. The observed patterns can be associated with cellular functions and processes, and they also show association with expression level, location relative to nearby genes, and CpG content. In particular we find a substantial number of repressed inter- and intra-genic transcription start sites enriched for active chromatin marks and Pol II, and these sites are strongly associated with immediate-early response processes and cell signaling. Associations between start sites with similar chromatin pattern are validated by significant correlations in their global expression profiles. Conclusions: The results confirm the link between chromatin state and cellular function, but they also show that the relationship between chromatin state and transcription is more subtle than previously appreciated.
Authors: Morten Rye, Geir Kjetil Sandve, Finn Drablos
Authors contribution statement: MR, GKS and FD did data analysis and wrote the paper
Datasets used: Helicos CAGE data, ENCODE chromatin ChIP-Seq and DNase HS data
Target journal(s): Genome Biology
Internal submission date: 01.03.2013
Contact by email: Finn Drablos,Morten Rye
Word document version of manuscript for editors: File:Internal submission draft FD GKS MBR 010313.docx
PDF version for general viewing (including all figs in one PDF): File:Main figures 01032015.pdf Supplementary figures: File:All supplem figs 01032013.pdf
Title: Transcriptional profiling by deep CAGE of the human fibrillin/LTBP gene family, key regulators of mesenchymal cell functions.
ManuscriptID: Phase1_014
Status: PUBLISHED IN MOLECULAR GENETICS AND METABOLISM
Abstract: The fibrillins and latent transforming growth factor binding proteins (LTBPs) form a superfamily of extracellular matrix (ECM) proteins characterized by the presence of a unique domain, the 8-cysteine transforming growth factor beta (TGFβ) binding domain (TB domain). These proteins are involved in both maintaining the extracellular matrix and controlling the bioavailability of TGFβ family members. Genes encoding these proteins show differential expression in mesenchymal cell types which synthesise the extracellular matrix and form connective tissues. We have investigated the promoter regions of the seven gene family members using the FANTOM5 CAGE data base for human. Although the protein and nucleotide sequences show considerable homology, the promoter regions were quite diverse. The three fibrillin genes had a single predominant promoter cluster, while LTBP1 and LTBP4 showed promoter switching. Most of the family members were expressed in a range of mesenchymal and other cell types, often associated with use of alternative promoters or transcription start sites within a promoter. FBN3 was the lowest expressed gene, and was expressed only in embryonic and fetal tissues, primarily neurological. There was evidence of enhancer activity likely to be involved in expression of the genes. Each gene showed a unique pattern of transcription factor motifs or activity. This study highlights the role of alternative transcription start sites in regulating the tissue specificity of closely related genes and suggests that this important class of extracellular matrix genes is subject to subtle regulatory variations that explain the differential roles of members of this gene family.
Authors: Margaret R Davis, RIKEN OSC members, Kim M Summers
Authors contribution statement: MRD performed the analysis and contributed to writing the paper, RIKEN OSC did ..., KMS performed the analysis and contributed to writing the paper
Datasets used: Helicos CAGE on ...
Target journal(s):
Internal submission date:
Contact by email: kim.summers@roslin.ed.ac.uk
Word document version of manuscript for editors: File:Fantom5 FBN paper 22-08-13.doc, File:Supplementary Table 1.pdf, File:Supplementary Table 2.xlsx, File:Supplementary Table 3.xlsx, File:Supplementary Table 4.xls
PDF version for general viewing (including all figs in one PDF): File:Fibrillin-LTBP satellite.pdf
Revised version of paper:
File:Summers Phase1 014 revision 18Apr2013.pdf
Final submitted manuscript File:Summers et al paper revision.docx, File:Summers et al Supplementary figure revision.pdf, File:Summers et al Supplementary Table 1.pdf, File:Summers et al Supplementary Table 2.xlsx, File:Summers et al Supplementary Table 3.xlsx, File:Summers et al Supplementary Table 4.xls
Title: RECLU: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE).
ManuscriptID: Phase1_030
Status: PUBLISHED IN BMC GENOMICS
Abstract: Next generation sequencing based technologies are being extensively used to study transcriptomes.
Among these, cap analysis of gene expression (CAGE) is specialized in detecting the most 5’ ends of RNA molecules.
After mapping the sequenced reads back to a reference genome CAGE data highlights the transcriptional start sites
(TSSs) and their usage at a single nucleotide resolution.
We propose a pipeline to group the single nucleotide TSS into larger reproducible peaks and compare their
usage across biological states. Importantly, our pipeline discovers broad peaks as well as the fine structure of
individual transcriptional start sites embedded within them. We assess the performance of our approach on a large
CAGE datasets including 156 primary cell types and two cell lines with biological replicas. We demonstrate that genes
have complicated structures of transcription initiation events. In particular, we discover that narrow peaks embedded
in broader regions of transcriptional activity can be differentially used even if the larger region is not.
By examining the reproducible fine scaled organization of TSS we can detect many differentially
regulated peaks undetected by previous approaches.
Authors: Hiroko Ohmiya1, Morana Vitezic1, Martin Frith, Yoshihide Hayashizaki1, Timo Lassmann1 and many more
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: FANTOM5 human primary cells and HeLa and THP-1 cells by Kanamori-Katayama et al.
Target journal(s):
Internal submission date:
Contact by email: Timo Lassmann
Word document version of manuscript for editors: File:Manuscript Ohmiya Mar04.doc
PDF version for general viewing (including all figs in one PDF): File:Manuscript Ohmiya et al.pdf
Title: Explaining the correlations among properties of mammalian promoters
ManuscriptID: Phase1_003
Status: PUBLISHED IN NUCLEIC ACIDS RESEARCH
Abstract: Proximal promoters are fundamental genomic elements for gene expression. They vary in terms of: GC percentage, CpG abundance, presence of TATA signal, evolutionary conservation, chromosomal spread of transcription start sites, and breadth of expression across cell types. These properties are correlated, and it has been suggested that there are two classes of promoter: one class with high CpG, widely spread transcription start sites, and broad expression, and another with TATA signals, narrow spread and restricted expression. It has been unclear, however, why these properties are correlated in this way.
We re-examined these features using the deep FANTOM5 CAGE data from hundreds of cell types. Firstly, we point out subtle but important biases in previous definitions of promoters and of expression breadth. Secondly, we show that most promoters are rather non-specifically expressed across many cell types. Thirdly, promoters' expression breadth is independent of maximum expression level, and therefore correlates with average expression level. Fourthly, the data show a more complex picture than two classes, with a network of direct and indirect correlations among promoter properties. By tentatively distinguishing the direct from the indirect correlations, we reveal simple explanations for them.
Authors: M.C. Frith and the FANTOM Consortium
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: All human and mouse Phase1 CTSSs
Target journal(s): Genome Research(?)
Internal submission date: Feb 2013
Contact by email: Martin Frith
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:Mcf-prom-sat.pdf
Supplement: File:Mcf-prom-sat-sup.pdf
Title: CAGExploreR: an R package for the analysis and visualization of promoter dynamics across multiple experiments
ManuscriptID: Phase1_043
Status: PUBLISHED IN BIOINFORMATICS
Abstract: Alternate promoter usage is an important molecular mechanism for generating RNA and protein diversity. Cap Analysis Gene Expression (CAGE) is a powerful approach for revealing the multiplicity of transcription start site (TSS) events across experiments and conditions. An understanding of the dynamics of TSS choice across these conditions requires both sensitive quantification and comparative visualization. We have developed CAGExploreR, an R package to detect and visualize changes in the utilization of specific TSS in wider promoter regions in the context of changes in overall gene expression when comparing different CAGE samples. These changes provide insight into the modification of transcript isoform gen-eration and associated regulatory network alterations associated with cell types and conditions. CAGExploreR is based on the FANTOM5 and MPromDb promoter set definitions but can also work with user-supplied regions. The package compares multiple CAGE libraries simultaneously and does not require replicates. Online supplementary materials describe methods in detail and a vignette demonstrates a workflow with a real data example.
Authors: Emmanuel Dimont, Alistair R. R. Forrest, Hideya Kawaji, the FANTOM Consortium and Winston Hide.
Authors contribution statement: ED developed the method, the R package (software), wrote the paper and supplementary materials plus figures, AF created the original idea and provided data, HK created DPI TSS clusters (promoters), WH formulated the idea, wrote the paper and provided funding, FC provided funding and data.
Datasets used: phase1 DPI clusters, ENCODE CAGE data for MCF7 and A549 cell lines
Target journal(s): Bioinformatics (Application Note)
Internal submission date: October 7th, 2013
Contact by email: Emmanuel Dimont (edimont@mail.harvard.edu)
Word document version of manuscript for editors: The latest version of the manuscript, supplementary methods, R package and vignette can be found at here.
PDF version for general viewing (including all figs in one PDF): see above
NOTE: This is a paper describing what used to be called "SwitchEngine"
ManuscriptID: Phase1_036
Status: PUBLISHED IN STEM CELLS
Abstract: The role of cancer microenvironment is being recognized as one of the critical hallmarks in both cancer progression and metastasis. Mesenchymal Stem/Stromal Cells (MSCs) are the precursors of various cell types that compose both normal and cancer tissue microenvironments. We have isolated MSCs from various High-Grade Serous Ovarian Carcinomas (HG-SOCs), demonstrated their normal genotype, and analyzed their transcriptome with respect to similarly derived normal tissues MSCs (N-MSCs), all embedded in the large comprehensive FANTOM5 sample dataset. An integrative analysis was conducted against the extensive panel of primary cells and tissues of the FANTOM5 project that allowed us to identify a cell-type specific transcriptional activity associated with the HG-SOC-MSCs. In fact the analysis shows that HG-SOC-MSCs retain a specific identity when compared to N-MSCs and are related to the primary mesothelial or mesothelial-derived cells representing the ovarian cellular precursors. Our results support the hypothesis that HG-SOC-MSCs are bona-fide representatives of the ovarian district thus tracing their origin either to the local mesothelium or highlighting the epigenetic conditioning of externally recruited MSCs by the HG-SOC cancer cell compartment.
Authors: Roberto Verardo, Silvano Piazza, Enio Klaric, Yari Ciani, Stefania Marzinotto, Laura Mariuzzi, Daniela Cesselli, Antonio P. Beltrami, Masayoshi Itoh, Hideya Kawaji, Timo Lassmann, Piero Carninci, Yoshihide Hayashizaki, Alistair R.R. Forrest, Carlo A. Beltrami, Claudio Schneider and the FANTOM consortium
Authors contribution statement: R.V., S.P. and C.S. designed research and analyzed all the data; R.V. followed all sample RNA/DNA quality controls; S.P. designed software, carried out statistical tests and bioinformatics analysis; Y.C. implemented part of the software and prepared some figures; E.K. performed molecular biology assays; R.V., S.M., L.M., D.C., and A.P.B. performed cell isolation and characterization, R.V., D.C., A.P.B., C.A.B. and C.S. analyzed cell-biology data; M.I. was responsible for CAGE data production; T.L. was responsible for tag mapping; H.K. managed the data handling; P.C., Y.H. and A.R.R.F. were responsible for FANTOM5 management and concept; CS supervised the whole study; R.V., S.P. and C.S. wrote the manuscript.
Datasets used: Helicos CAGE on all of F5freeze1
Target journal(s):
Internal submission date: October 15th 2012
Contact by email: Claudio Schneider
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:Claudio.pdf
Title: A transient disruption of a fibroblast-specific transcriptional regulatory network potently promotes trans-differentiation
ManuscriptID: Phase1_40
Status: PUBLISHED IN NAR
Abstract: Background: Transcriptional Regulatory Networks (TRN) coordinates multiple transcription factors (TF) in concert to maintain homeostasis and cellular function. The re-establishment of TRNs have been previously implicated in direct trans-differentiation studies where the newly introduced TFs switch-on a set of key regulatory factors to induce de novo expression and function. However, the extent to which TRNs in starting cell types, such as dermal fibroblasts, protect the cells from undergoing cellular reprogramming remains largely unexplored. Results: In order to identify specific TFs in fibroblasts, we first modeled the TRN of fibroblast cells using a Matrix-RNAi approach where 18 fibroblast-specific TFs were systematically knock-downed and profiled. The resulting expression matrix revealed 7 highly interconnected TFs as targetable factors. Interestingly, suppressing 4 out of 7 TFs generated lipid droplets and induced PPARG and CEBPA expression in the presence of adipocyte-inducing medium, while the control knockdown maintained fibroblastic characteristics in the same induction regime. The global gene expression analysis further revealed that the knockdown induced adipocytes (KDiADP) highly expressed genes associated with lipid metabolism and significantly suppressed fibroblast-specific genes. Conclusion: Overall, this study reveals the critical role of the TRN in protecting cells against aberrant reprogramming, and demonstrates, for the first time, the vulnerability of TRN, which may be a novel target to induce transgene-free trans-differentiations.
Authors: Yasuhiro Tomaru, Ryota Hasegawa, Jay W. Shin , Takahiro Suzuki, Taiji Sato, Atsutaka Kubosaki, Masanori Suzuki, Yoshihide Hayashizaki and Harukazu Suzuki
Authors contribution statement: YT designed and carried out experiments, analyzed and wrote the paper. RH carried out experiments, supported statistical analysis and wrote the paper. JS generated expression data, analyzed and wrote the paper. TS, TS and AK carried out validation of KDiADP cells. MS carried out editing of the manuscript. YH and HS coordinated all efforts and supervised the project
Datasets used: phase1 CAGE peaks
Target journal(s): Genome Biology
Internal submission date: May 20th, 2013
Contact by email: [1]Harukazu Suzuki, [2]Jay Shin
Word document version of manuscript for editors: File:Manuscript-YT-May17.docx
PDF version for general viewing (including all figs in one PDF): File:Tomaru F5 wiki.pdf
ManuscriptID: Phase1_0X
Status: PUBLISHED IN SCIENTIFIC REPORTS
Abstract:Standard culture of human induced pluripotent stem cells (hiPSCs) requires basic Fibroblast Growth Factor (bFGF) to maintain the pluripotent state, whereas hiPSC more closely resemble epiblast stem cells than true naïve state ES which requires LIF to maintain pluripotency. Here we show that chemokine (C-C motif) ligand 2 (CCL2) enhances the expression of pluripotent marker genes through the phosphorylation of the signal transducer and activator of transcription 3 (STAT3) protein. Moreover, comparison of transcriptomes between hiPSCs cultured with CCL2 versus with bFGF, we found that CCL2 activates hypoxia related genes, suggesting that CCL2 enhanced pluripotency by inducing a hypoxic-like response. Further, we show that hiPSCs cultured with CCL2 can differentiate at a higher efficiency than culturing with just bFGF and we show CCL2 can be used in feeder-free conditions in the absence of LIF. Taken together, our finding indicates the novel functions of CCL2 in enhancing its pluripotency in hiPSCs.
Authors: Yuki Hasegawa, Dave Tang, Naoko Takahashi, Yoshihide Hayashizaki, Alistair R. R. Forrest, the FANTOM consortium & Harukazu Suzuki
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks
Target journal(s):
Internal submission date:
Contact by email: [3]
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Mesencephalic dopaminergic neurons express a repertoire of olfactory receptors and respond to odorant-like molecules
ManuscriptID: Phase1_0X
Status: PUBLISHED IN BMC GENOMICS
Abstract:Background
The mesencephalic dopaminergic (mDA) cell system is composed of two major groups of projecting cells in the Substantia Nigra (SN) (A9 neurons) and the Ventral Tegmental Area (VTA) (A10 cells). Selective degeneration of A9 neurons occurs in Parkinson’s disease (PD) while abnormal function of A10 cells has been linked to schizophrenia, attention deficit and addiction. The molecular basis that underlies selective vulnerability of A9 and A10 neurons is presently unknown.
Results By taking advantage of transgenic labeling, laser capture microdissection coupled to nano Cap-Analysis of Gene Expression (nanoCAGE) technology on isolated A9 and A10 cells, we found that a subset of Olfactory Receptors (OR)s is expressed in mDA neurons. Gene expression analysis was integrated with the FANTOM5 Helicos CAGE sequencing datasets, showing the presence of these ORs in selected tissues and brain areas outside of the olfactory epithelium. OR expression in the mesencephalon was validated by RT-PCR and in situ hybridization. By screening 16 potential ligands on 5 mDA ORs recombinantly expressed in an heterologous in vitro system, we identified carvone enantiomers as agonists at Olfr287 and able to evoke an intracellular Ca2+ increase in solitary mDA neurons. ORs were found expressed in human SN and down-regulated in PD post mortem brains.
Conclusions
Our study indicates that mDA neurons express ORs and respond to odor-like molecules providing new opportunities for pharmacological intervention in disease.
Authors: Alice Grison1†, Silvia Zucchelli12†, Alice Urzì1, Ilaria Zamparo3, Dejan Lazarevic14, Giovanni Pascarella156, Paola Roncaglia114, Alejandro Giorgetti78, Paula Garcia-Esparcia9, Christina Vlachouli1, Roberto Simone1, Francesca Persichetti2, Alistair RR Forrest56, Yoshihide Hayashizaki1056, Paolo Carloni11127, Isidro Ferrer9, Claudia Lodovichi3, Charles Plessy56, the FANTOM Consortium†, Piero Carninci56* and Stefano Gustincich
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks
Target journal(s):
Internal submission date:
Contact by email: [4]
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: A simple metric of promoter architecture robustly predicts expression breadth of human genes suggesting that most transcription factors are positive regulators.
ManuscriptID: Phase1_020
Status: PUBLISHED IN GENOME BIOLOGY
Abstract: Background
Conventional wisdom holds that, owing to the dominance of features such as chromatin level
control, the expression of a gene cannot be readily predicted from knowledge of promoter
architecture. This is reflected, for example, in a weak or absent correlation between promoter
divergence and expression divergence between paralogs. However, an inability to predict
may reflect an inability to accurately measure or employment of the wrong parameters. Here
we address this issue through integration of two exceptional resources: ENCODE data on
transcription factor binding and the FANTOM5 high-resolution expression atlas.
Results
Consistent with the notion that in eukaryotes most transcription factors are activating, the
number of transcription factors binding a promoter is a strong predictor of expression
breadth. In addition, evolutionarily young duplicates have fewer transcription factor binders
and narrower expression. Nonetheless, we find several binders and cooperative sets that are
disproportionately associated with broad expression, indicating that models more complex
than simple correlations should hold more predictive power. Indeed, a machine learning
approach improves fit to the data compared with a simple correlation. Machine learning could
at best moderately predict tissue of expression of tissue specific genes.
Conclusions
We find robust evidence that some expression parameters and paralog expression divergence
are strongly predictable with knowledge of transcription factor binding repertoire. While
some cooperative complexes can be identified, consistent with the notion that most
eukaryotic transcription factors are activating, a simple predictor, the number of binding
transcription factors found on a promoter, is a robust predictor of expression breadth.
Authors: Lukasz Huminiecki and Core RIKEN Authors
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ... and LH did everything else
Datasets used: Helicos CAGE on ..., F5 promoter and enhancer datasets, TreeFam8
Target journal(s): Genome Research
Internal submission date: September 1st
Contact by email: Lukasz Huminiecki
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: FANTOM5 reveals the genomic architecture of the genes implicated in Rett Syndrome
ManuscriptID: Phase1_038
Status: PUBLISHED IN BMC GENOMICS
Abstract: Mutations in MECP2, FOXG1 and CDKL5 genes cause Rett Syndrome, a neuro-developmental disorder of the grey matter of the brain that almost exclusively affects females. We analyzed the RNA expression data from the FANTOM5 project in both human and mouse to investigate the genomic architecture of the three genes involved in Rett syndrome. Data from FANTOM 5 provides the unprecedented opportunity to study the expression profile, identify transcription start sites and, in conjunction with the recently released ENCODE dataset, identify the regulatory regions and transcription regulators of the three genes implicated in Rett Syndrome. Even though MECP2 and CDKL5 are expressed ubiquitously, mutations in these genes cause a brain specific phenotype suggesting that their role in brain is distinctly important from their function in other tissues.
Authors: Morana Vitezic, Leonard Lipovitch, Alistair RR Forrest, Piero Carninci, Alka Saxena
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks
Target journal(s): NAR
Internal submission date: December 2012
Contact by email: Morana Vitezic Alka Saxena
Word document version of manuscript for editors: File:Rett paper.doc File:Rett paper figures.zip File:Rett paper supplementary.zip
Title: Gateways to the FANTOM5 promoter level mammalian expression atlas
ManuscriptID: Phase1_025
Status: PUBLISHED IN GENOME BIOLOGY
Abstract: Background. Identification and quantification of the RNARNAs transcribed within different cell types is anare essential stepsteps toward understanding the full complexity of the functional components of the genome and, ultimately, the entire cellular system. Most previous studies involving the collection of a large set of genome-wide transcription profiles consist of tissues and/or cell lines. In the FANTOM5 project we studiedhave investigated transcription activities in more than one thousand human and mouse primary cells, cell lines and tissues using CAGE in combination with CAGE and a single molecule sequencer (HeliScope) allowing for high accuracyHeliScopeTM) which allows the highly accurate identification of transcription initiation sites and expression levels without the bias of PCR amplification.
Results. To facilitate the exploration of our large-scale data, we assembled the FANTOM5 expression profiles and subsequent analyses into a centralized data resource with an open-access on-line interface. All sample data is fully annotated with curated descriptions including the development and use ofDescriptions of individual samples are carefully curated manually and an application ontology that uses classes from established ontologies for cell types, anatomy, and diseases. is developed to group related samples systematically based on sample types. Web interface-based databases and visualization tools (SSTAR, ZENBU, BioLayout Express 3DExpress3D, TET, BioMart, UCSC genome browser, and more) are provided in an integrative manner to allow research scientists to search, navigate, and extract data related to samples, genes, promoter expressionactivity, and transcription factor gene regulation across the entire FANTOM5 atlas.
Conclusions. This combination of datasoftware tools, curated databases and systematic sample annotation gives the scientific community powerful tools to explore, examine, and extract data in severalmultiple ways. Here we introduce the online resources, their underlying data structure, and discuss potential impacts in cell, genome and molecular biology.
Authors: Marina Lizio,Jayson Harshbarger,Takeya Kasukawa,Hisashi Shimoji,Jessica Severin,Serkan Sahin,Christopher J. Mungall,Fumi Hori, Sachi Ishikawa-Kato,Shiro Fukuda, Terrence F. Meehan,Alexander D. Diehl,J. Kenneth Baillie,Tom C. Freeman,Derek Wright,Emmanuel Dimont,Winston Hide,Toshiaki Katayama,Zuotian Tatum,Mark Thompson,Erik A. Schultes,Peter A.C. 't Hoen,Rajaram Kaliyaperumal,Tetsuro Toyoda,Koro Nishikata,Albin Sandelin,Erik Arner, Hidemasa Bono,Hiromasa Ono,Kaori Fujieda, Michael Rehli, Michiel de Hoon,Nicolas Bertin,Timo Lassmann,Carsten Daub,Masayoshi Itoh,Piero Carninci,Yoshihide Hayashizaki,Alistair R.R. Forrest,Hideya Kawaji and the FANTOM Consortium
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on phase1 freeze
Target journal(s):
Internal submission date:
Contact by email: KAWAJI Hideya
Word document version of manuscript for editors:
PDF version for general viewing (including all figs in one PDF):
package of word, pdf, etc: File:130225-F5web-resource.zip
Title: The Statistical Geometry of Transcriptome Divergence in Cell Type Evolution and Cancer
ManuscriptID: Phase1_045
Status: PUBLISHED IN NATURE COMMUNICATIONS
Abstract: In evolution the complexity of organisms increases, in part, due to an increase in the number of individualized cell types. The number of recognizable cell types varies between metazoan lineages by two to three orders of magnitude, from five in the primitive metazoan Trichoplax to at least 500 in humans. Yet, there is very little understanding of the mechanisms that produces this form of organismal complexity. One model for the origin of novel cell types is the sister cell types model (Arendt, 2008, Nat. Rev. Genet.). According to this model, each new cell types arises together with a sister cell type through specialization from an ancestral cell type. A key prediction of the sister cell type model is that the gene expression patterns of cell types should exhibit a tree structure, i.e. more recently diverged cell types should be more similar in terms of gene expression than more distantly related cell types. Here we present a new statistical model for detecting tree structure (“treeness”) in transcriptomic data based on statistical geometry and apply this method to transcriptomes of twelve normal cells from ENCODE and 168 cells of the FANTOM5 project. The analysis of these data shows that transcriptomic data of cell types harbors substantial amounts of hierarchical structure, consistent with the predictions of the sister cell type model. In contrast, cancer cell lines have much less tree structure, suggesting the emergence of cancer cells exhibits different principles from normal cell type evolution. Using replicate data the method presented here can also be used to test whether different samples belong to distinct cell types or represent variants of the same cell type.
Authors: Cong Liang, the FANTOM5 consortium, Alistair R. R. Forrest, Gunter P. Wagner
Authors contribution statement: CL did the analysis, GPW oversaw the project
Datasets used: phase1 CAGE TPM tables
Target journal(s): Nature Communications
Internal submission date: July 2014
Contact by email: Gunter Wagner and Cong Liang
PDF version for general viewing (including all figs in one PDF): File:Ncomms7066.pdf
Title: Transcription factor, promoter and enhancer utilisation in human myeloid cells
ManuscriptID: Phase1_049
Status: PUBLISHED IN JLB
Abstract: ...
Authors: Anagha Joshi, Christopher Pooley, Tom Freeman, Andreas Lennartsson, Magda Babina, Christian Schmidl, Teunis Geijtenbeek, Tom Michoel, Jessica Severin, Masayoshi Itoh, Timo Lassmann, Hideya Kawaji, Yoshihide Hayashizaki, Piero Carninci, Alistair Forrest, Michael Rehli, and David Hume
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks
Target journal(s):
Internal submission date:
Contact by email: [5]
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Discovery of molecular markers to discriminate corneal endothelial cells in the human body
ManuscriptID: Phase1_046
Status: PUBLISHED IN PLOS ONE
Abstract: The corneal endothelial cells (CECs) are critical in maintaining corneal transparency. Corneal transplantation is only one way to treat severe dysfunction, and generation of CECs from other cell types attracts increasing interests, since the current transplantation suffers from donor shortage. Precise identification of CECs is an essential step in these efforts, however the current markers used to identify CECs are far from satisfactory because they are expressed in other cell types too. In this study, we explored molecular markers discriminating CECs from any other cell types in human body by integration of the published RNA-seq data obtained from CECs and the FANTOM5 promoter-level expression atlas representing diverse range of human cell types (Figure 1). Independent experiments confirmed the resulting six proteins as CEC makers (Figure 2), and surprisingly none of them has been used for CEC markers so far. Our result paves a clearer way to generate CEC in vitro, as well as contributes to re-definition of CECs themselves and indicates novel function of these proteins in CECs.
Authors: Masahito Yoshihara1,2, Hiroko Ohmiya2,3, Susumu Hara1, Satoshi Kawasaki1, Yoshihide Hayashizaki4, Masayoshi Itoh2,4, Hideya Kawaji2,3,4,$, Motokazu Tsujikawa1, Kohji Nishida1,$
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks
Target journal(s):
Internal submission date:
Contact by email: [6]
Word document version of manuscript for editors: File:140424-CEC-inquiry.docx
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: A draft network of ligand–receptor-mediated multicellular signalling in human
ManuscriptID: Phase1_048
Status: PUBLISHED IN NATURE COMMUNICATIONS
Abstract: Cell-to-cell communication across multiple cell types and tissues strictly governs proper functioning of metazoans and extensively relies on interactions between secreted ligands and cell-surface receptors. Herein, we present the first large-scale map of cell-to-cell communication between 144 human primary cell types. We reveal that most cells express tens to hundreds of ligands and receptors to create a highly connected signalling network through multiple ligand–receptor paths. We also observe extensive autocrine signalling with approximately two-thirds of partners possibly interacting on the same cell type. We find that plasma membrane and secreted proteins have the highest cell-type specificity, they are evolutionarily younger than intracellular proteins, and that most receptors had evolved before their ligands. We provide an online tool to interactively query and visualize our networks and demonstrate how this tool can reveal novel cell-to-cell interactions with the prediction that mast cells signal to monoblastic lineages via the CSF1–CSF1R interacting pair.
Authors: Jordan A. Ramilowski, Tatyana Goldberg, Edda Kloppman, Jayson Harshbarger, Venkata P. Satagopam, Piero Carninci, the FANTOM consortium, Burkhard Rost, Alistair R.R. Forrest
Authors contribution statement: JAR did ..., TG did ..., KE did ..., JH did ..., AF did ...
Datasets used: phase1 CAGE peaks
Target journal(s): presubmission Nat. Gen.
Internal submission date:
Contact by email: [7]
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Complementing tissue characterisation by integrating transcriptome profiling from the Human Protein Atlas and from the FANTOM5 consortium
ManuscriptID: Phase1_042
Status: PUBLISHED IN NAR
Abstract: : Understanding the normal state of human tissue transcriptome profiles is essential for recognizing tissue disease states and identifying disease markers. Recently, the Human Protein Atlas and the FANTOM5 consortium have each published extensive transcriptome data for human samples using Illumina-sequenced RNA-Seq and Heliscope-sequenced CAGE. Here, we report on the first large-scale complex tissue transcriptome comparison between full-length vs. 5’-capped mRNA sequencing data. Overall gene expression correlation was high between the 22 corresponding tissues analyzed (R > 0.8). For genes ubiquitously expressed across all tissues, the two datasets show high genome-wide correlation (91% agreement), with differences observed for a small number of individual genes indicating the need to update their gene models. Among the identified single-tissue enriched genes, 75% showed consensus of 7 fold enrichment in the same tissue in both methods, while another 17% exhibited multiple tissue enrichment and/or high expression variety in the other dataset, likely dependent on the cell type proportions included in each tissue sample. Our results show that RNA-Seq and CAGE tissue transcriptome datasets are highly complementary for improving gene model annotations and highlight biological complexities within tissue transcriptomes. Furthermore, integration with image-based protein expression data is highly advantageous for understanding expression specificities for many genes.
Authors: Nancy Yu, Björn Hallström, Linn Fagerberg, Fredrik Ponten, Hideya Kawaji, Piero Carninci, Alistair R. R. Forrest, The FANTOM Consortium, Mathias Uhlèn, Carsten O. Daub
Authors contribution statement: NY, CD and MU conceived the idea for this project. BH performed the mapping and quantification of gene expressions. NY designed and performed most of the data analyses and wrote the manuscript. BH and LF contributed to some of the analyses as well as the generation of some of the figures. HK, ARRF, and PC were involved in FANTOM5 data generation and manuscript editing. CD and MU supervised and assisted with manuscript writing. All members contributed to discussions and directions of the project.
Datasets used: phase1 CAGE peaks, HPA RNA-Seq data
Target journal(s): Nucleic Acids Research
Submission date: March 2015
Contact by email: Nancy Yu, Carsten Daub
Word document version of manuscript for editors: File:ManuscriptNAR resubmission revised.doc
PDF version for general viewing (including all figs in one PDF): File:PDF Proof.pdf
Title: The frequent evolutionary birth and death of functional promoters in mouse and human
ManuscriptID: Phase1_042
Status: PUBLISHED IN GENOME RESEARCH
Abstract: : Promoters are central to the regulation of gene expression. Changes in gene regulation are thought to underlie much of the adaptive diversification between species and phenotypic variation within populations. In contrast to earlier work emphasizing the importance of enhancer evolution and subtle sequence changes at promoters, we show that dramatic changes such as the complete gain and loss (collectively turnover) of functional promoters are common. Using quantitative measures of transcription initiation in both humans and mice across 52 matched tissues we discriminate promoter sequence gains from losses and resolve the lineage of changes. We also identify expression divergence and functional turnover between orthologous promoters, finding only the latter is associated with local sequence changes. Promoter turnover has occurred at the majority (>56%) of protein-coding genes since humans and mice diverged. Tissue-restricted promoters are the most evolutionarily volatile where retrotransposition is an important, but not the sole source of innovation. There is considerable heterogeneity of turnover rates between promoters in different tissues, but the consistency of these in both lineages suggests the same biological systems are similarly inclined to transcriptional rewiring. The genes affected by promoter turnover show evidence of adaptive evolution. In mice, promoters are primarily lost through deletion of the promoter containing sequence; whereas in humans, many promoters appear to be gradually decaying with weak transcriptional output and relaxed selective constraint. Our results suggest that promoter gain and loss is an important process in the evolutionary rewiring of gene regulation and may be a significant source of phenotypic diversification.
Authors: Robert S. Young1*, Yoshihide Hayashizaki2, Robin Andersson3, Albin Sandelin3, Hideya Kawaji2, Masayoshi Itoh2, Timo Lassmann4, Piero Carninci4, the FANTOM Consortium, Wendy A. Bickmore1, Alistair R. Forrest4,5, Martin S. Taylor1*
Authors contribution statement: ...
Datasets used: phase1 CAGE peaks
Target journal(s): Genome Research
Submission date: March 2015
Contact by email: Martin Taylor
Word document version of manuscript for editors: File:...
PDF version for general viewing (including all figs in one PDF): File:PDF Proof.pdf
Title: The constrained maximal expression level owing to haploidy shapes gene contents on the mammalian X chromosome
ManuscriptID: Phase1_042
Status: PUBLISHED IN PLOS BIOLOGY
Abstract: : .
Authors: Laurence D. Hurst, Avazeh T. Ghanbarian, Alistair RR Forrest, the FANTOM consortium, Lukasz Huminiecki.
Authors contribution statement: ...
Datasets used: phase1 CAGE peaks
Target journal(s): Genome Research
Submission date: March 2015
Contact by email: Lukasz Huminecki
Word document version of manuscript for editors: File:...
PDF version for general viewing (including all figs in one PDF): File:PDF Proof.pdf
Title: Recurrent transcriptome alterations across multiple cancer types.
ManuscriptID: Phase1_41
Status: PUBLISHED IN CANCER RESEARCH
Abstract:CAGE FANTOM5 data collection of cancer cell lines and corresponding primary cells enables us to study the changes in transcription and gene regulation that occur in cancer and drive its development. CAGE is a 5’ sequence tag technology and provides us with a snapshot of genome-wide transcription start sites and shows in unbiased way which parts of genome are being actively transcribed into RNA in any given biological state. We analysed the CAGE data from 123 cell lines representing 12 different cancer types and compared them to the corresponding normal/primary cells (141 samples). We show the protein coding genes and non-coding RNAs that are up-regulated or down-regulated across multiple cancer types and therefore are candidates for pan cancer biomarkers. Furthermore, we show the changes in transcription factor activities and enhancer usage in cancers as well as disruption in gene co-regulation.
Authors: Bogumil Kaczkowski, the FANTOM5 consortium and Alistair Forrest
Authors contribution statement:
Datasets used: phase1 CAGE peaks
Target journal(s):
Internal submission date:
Contact by email: [8]
Title: Mapping mammalian cell-type-specific transcriptional regulatory networks using KD-CAGE and ChIP-seq data in the TC-YIK cell line.
ManuscriptID: Phase1_047
Status: PUBLISHED IN FRONTIERS IN GENETICS
Abstract: Mammals are composed of hundreds of different cell types with specialized functions. Each of these cellular phenotypes are controlled by different combinations of transcription factors. Using a human non islet cell insulinoma cell line (TC-YIK) which expresses insulin and the majority of known pancreatic beta cell specific genes as an example, we describe a general approach to identify key cell-type-specific transcription factors (TFs) and their direct and indirect targets. By ranking all human TFs by their level of enriched expression in TC-YIK relative to a broad collection of samples (FANTOM5), we confirmed known key regulators of pancreatic function and development. Systematic siRNA mediated perturbation of these TFs followed by qRT-PCR revealed their interconnections with NEUROD1 at the top of the regulation hierarchy and its depletion drastically reducing insulin levels. For 15 of the TF knock-downs (KD), we then used Cap Analysis of Gene Expression (CAGE) to identify thousands of their targets genome-wide (KD-CAGE). The data confirm NEUROD1 as a key positive regulator in the transcriptional regulatory network (TRN), and ISL1 and PROX1 as antagonists. As a complimentary approach we used ChIP-seq on four of these factors to identify NEUROD1, LMX1A, PAX6 and RFX6 binding sites in the human genome. Examining the overlap between genes perturbed in the KD-CAGE experiments and genes with a ChIP-seq peak within 1kb of their promoter, we identified direct transcriptional targets of these TFs. Integration of KD-CAGE and ChIP-seq data shows that both NEUROD1 and LMX1A work as the main transcriptional activators. In the core TRN (i.e. TF-TF only), NEUROD1 directly transcriptionally activates the pancreatic TFs HSF4, INSM1, MLXIPL, MYT1, NKX6-3, ONECUT2, PAX4, PROX1, RFX6, ST18, DACH1 and SHOX2, while LMX1A directly transcriptionally activates DACH1, SHOX2, PAX6 and PDX1. Analysis of these complementary datasets suggests the need for caution in interpreting ChIP-seq datasets. 1. A large fraction of binding sites are at distal enhancer sites and cannot be directly associated to their targets, without chromatin conformation data. 2. Many peaks may be non-functional: even when there is a peak at a promoter, the expression of the gene may not be affected in the matching perturbation experiment.
Authors: R Marina Lizio, Yuri Ishizu, Masayoshi Itoh, Timo Lassmann, Atsutaka Kubosaki, Eri Saijo, Shoko Watanabe, Akiko Saka, Jessica Severin, Hideya Kawaji, Yukio Nakamura, Harukazu Suzuki, Yoshihide Hayashizaki, Alistair Forrest and the FANTOM Consortium
Authors contribution statement: AF designed the study and wrote the manuscript; ML carried out all bioinformatics analyses and wrote the manuscript; YH1,2 set up perturbation assays; YH1,2 and AK carried out chromatin immuno-precipitation experiments; MI, ES, AS and SW provided the CAGE libraries; TL mapped the CAGE data; HO analyzed the CAGE KD data; YN provided the TC-YIK cell lines; HS3 and JS developed the visualization tools SSTAR and ZENBU, respectively; HS, HK, YH3 and AF supervised the project.
Datasets used: phase1 CAGE peaks, TC-YIK cell line
Target journal(s): Genome Biology
Internal submission date:
Contact by email: [9]
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Mogrify: Defining Factors For Direct Reprogramming Between All Cell Types
ManuscriptID: Phase1_31
Status: PUBLISHED IN NATURE GENETICS
Abstract: We now know that cellular state is a plastic phenomenon which it is possible to control. There are an increasing number of reports in the literature of induced pluripotency and also induced trans-differentiated from one cell type to another. Each of these experiments has relied heavily on a process of trial and error as well as expert knowledge in order to discover the transcription factors capable of inducing a cell conversion. Here we present a novel network based method (Mogrify) that can identify the factors required for cell conversion. The method compares differences in expression, as measured by FANTOM5 CAGE data, over interaction networks. It provides candidate combinations of transcription factors for over-expression and knock-down, along with the likelihood score for conversion between any two given cell types.
We show that the method reproduces known reprogramming factors for several successful trans-differentiations from the literature (eg between fibroblast and cardiomyocyte, neuron and hepatocyte); we discuss alternative combinations that Mogrify suggests for these conversions and for other conversions which have some experimental data in the literature but for which a fully successful differentiation is yet to be published.
The technique is then run without human intervention on every possible pairwise combination of over 1000 libraries in the FANTOM 5 set, assessing possible combinations of factors for perturbation, and associating a likelihood score for success. This information is then used to construct a computational “Waddington landscape”, identifying the best candidate source and target cell types for future cell conversion experiments. This is the first resource of it’s kind, only made possible by the new FANTOM5 promoterome data and represents a considerable step forward in computational cell reprogramming.
Authors: Owen and Julian
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks in all samples
Target journal(s):
Internal submission date:
Contact by email: Owen Julian
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:Rackham et al 2015.pdf File:Online Methods.pdf File:Supplementary Materials.pdf
Title: CAGEd-oPOSSUM: motif enrichment analysis from CAGE-derived TSSs
ManuscriptID: Phase1_52
Status: PUBLISHED IN BIOINFORMATICS
Abstract:Summary: With the emergence of large-scale Cap Analysis of Gene Expression data sets from individual labs and the FANTOM5 consortium, one can now analyze the cis-regulatory regions associated with gene transcription at an unprecedented level of refinement. By coupling transcription factor binding site (TFBS) enrichment analysis with CAGE-derived cis-regulatory regions, CAGEd-oPOSSUM can identify TFs that act as key regulators of genes involved in specific mammalian cell and tissue types. The webtool allows for the analysis of CAGE-derived promoters either (i) provided by the user or (ii) coming from ~1,400 mammalian samples from the FANTOM5 project with pre-computed TFBS predicted with JASPAR TF binding profiles. The tool can help power insights into the transcriptional regulation of genes through the study of the specific usage of TSSs within specific cell types and/or under specific conditions.
Availability and implementation: The CAGEd-oPOSUM web tool is implemented in Perl, MySQL, and Apache and is available at http://cagedop.cmmt.ubc.ca/CAGEd_oPOSSUM. The source code is also freely available for download from GitHub at https://github.com/wassermanlab/CAGEd-oPOSSUM.
Authors: David J Arenillas, Alistair R.R. Forrest, Hideya Kawaji, Timo Lassman, The FANTOM Consortium, Wyeth W. Wasserman, Anthony Mathelier
Authors contribution statement: AM and WWW were responsible for project conception and oversight. DJA implemented the CAGEd-oPOSSUM web tool. TL was responsible for tag mapping. HK managed the data handling. ARRF was responsible for FANTOM5 management and its concept. DJA, WWW, and AM wrote the manuscript.
Datasets used: phase1 CAGE peaks
Target journal(s): Bioinformatics
Internal submission date: ETA Feb 2016
Contact by email: anthony.mathelier@gmail.com
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): http://biorxiv.org/content/early/2016/02/22/040667
Title: FANTOM5 transcriptome catalog of cellular states based on Semantic MediaWiki.
ManuscriptID: Phase1_026
Status: PUBLISHED IN DATABASE
Abstract: overview and instruction to the resource browser
Authors: Shimoji H, Kawaji H., WP4
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on phase1 freeze
Target journal(s):
Internal submission date:
Contact by email: KAWAJI Hideya
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Transcriptome analysis of periodontitis-associated fibroblasts by CAGE sequencing identified DLX5 and RUNX2 long variant as novel regulators involved in periodontitis.
ManuscriptID: Phase1_0xx
Status: PUBLISHED IN SCIENTIFIC REPORTS
Abstract: Periodontitis is affecting over half of the adult population, and represents a major public health problem. Previously, we isolated a subset of gingival fibroblasts (GFs) from periodontitis patients, designated as periodontitis-associated fibroblasts (PAFs), which were highly capable of collagen degradation. To elucidate their molecular profiles, GFs isolated form healthy and periodontitis-affected gingival tissues were analyzed by CAGE-seq and integrated with the FANTOM5 atlas. GFs from healthy gingival tissues displayed distinctive patterns of CAGE profiles as compared to fibroblasts from other organ sites and characterized by specific expression of developmentally important transcription factors such as BARX1, PAX9, LHX8, and DLX5. In addition, a novel long non-coding RNA associated with LHX8 was described. Furthermore, we identified DLX5 regulating expression of the long variant of RUNX2 transcript, which was specifically active in GFs but not in their periodontitis-affected counterparts. Knockdown of these factors in GFs resulted in altered expression of extracellular matrix (ECM) components. These results indicate activation of DLX5 and RUNX2 via its distal promoter represents a unique feature of GFs, and is important for ECM regulation. Down-regulation of these transcription factors in PAFs could be associated with their property to degrade collagen, which may impact on the process of periodontitis.
Authors: Horie M1,2,3, Yamaguchi Y4,5, Saito A1,2, Nagase T1, Lizio M3,6, Itoh M3,6,7, Kawaji H3,6,7, Lassmann T3,6, Carninci P3,6, Forrest AR3,6,8, Hayashizaki Y6,7, Suzutani T9, Kappert K10, Micke P11, Ohshima M12.
Datasets used: Helicos CAGE on phase1 freeze
Target journal(s):
Internal submission date:
Contact by email: [10]
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: The FANTOM5 integrated expression atlas of miRNAs and their promoters
ManuscriptID: Phase1_53
Status: ACCEPTED IN NATURE BIOTECHNOLOGY
Abstract: MicroRNAs (miRNAs) are a family of short non-coding RNAs with key roles in cellular regulation. We provide the first integrated expression atlas of miRNAs and their promoters in mammalian cells, in particular human primary cells, by deep sequencing 492 short RNA libraries, each with a matching CAGE (Cap Analysis Gene Expression) library produced from the same RNA sample. We annotate the cell type specificity of each miRNA based on its expression profile across human primary cells, and identify the miRNA promoter by CAGE analysis followed by manual curation. By using the CAGE expression at each promoter, we extended the miRNA expression atlas to the 1829 human and 1029 mouse CAGE libraries in FANTOM5. We demonstrate the accuracy of the miRNA promoter identification by analyzing regulatory motifs in the miRNA promoter regions. This expression atlas of miRNAs and their promoters establishes the foundation for detailed analysis of the transcriptional control regions that regulate miRNA expression.
Authors: Derek de Rie*, Tanvir Alam, Erik Arner, Peter Arner, Haitham Ashoor, Gaby Åström, Magda Babina, Nicolas Bertin, A. Maxwell Burroughs, Carsten O. Daub, Michael Detmar, Ruslan Deviatiiarov, Alexandre Fort, Dan Goldowitz, Sven Guhl, Jayson Harshbarger, Akira Hasegawa, Kosuke Hashimoto, Peter Heutink, Edward Huang, Peter Klinken, Timo Lassmann, Charles Lecellier, Weonju Lee, Marina Lizio, Vsevolod Makeev, Anthony Mathelier, Yulia A. Medvedeva, Chris Mungall, Shohei Noma, Mitsuhiro Ohshima, Helena Persson, Filip Roudnicky, Pål Sætrom, Jessica Severin, Kim M. Summers, Hiroshi Tarui, Kristoffer Vitting-Seerup, Christine Wells, Louise Winteringham, Yoko Yamaguchi, Hideya Kawaji, Albin Sandelin, Michael Rehli, the FANTOM consortium, Yoshihide Hayashizaki, Piero Carninci, Alistair R. R. Forrest*, Michiel J. L. de Hoon*
Authors contribution statement: PA, GÅ, MB, MD, DG, SG, PH, PK, WL, MO, KMS, CW, LW, YY, AF provided RNA samples; COD selected samples from the FANTOM5 time courses; HT and SN produced the short RNA libraries; ML and HK managed the data; DdR, MJLdH, KVS, AMB, TA, HA, AH, TL, HP, CL, AM, VM, MR carried out the bioinformatics analyses with the help of ML, KH, FR, and JS; CM provided the cell ontology; AF, AM, ARRF, AS, CL, CW, DdR, EH, FR, HP, KVS, AMB, MJLdH, MR, NB, PS, RD, VM, YAM contributed to the manual miRNA promoter annotation; JH created the web visualization tool, DdR, ARFF and MJLdH wrote the manuscript with the help of EA, AS, AMB, KMS, KVS, MR, NB, PC, PS, CW; ARRF and MJLdH designed the study; PC and YH supervised the FANTOM5 project.
Datasets used: Short RNA data (phase 1 and phase 2); Helicos CAGE (phase 1 and phase 2)
Target journal(s): Nature Biotechnology
Internal submission date: ...
Contact by email: Michiel de Hoon
Word document version of manuscript for editors: [[11]]
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
IN PREPARATION OR REVISION
Title: Cell-type specificity and co-expression of regulatory polymorphisms associated with human disease
ManuscriptID: Phase1_002
Status: redraft/revise
Abstract: Our ability to use genetic associations with disease to develop better treatments has been limited by the difficulty of identifying a biological process, or cell type, on which to focus investigation. Most disease-associated polymorphisms do not lie within protein-coding genes, raising the possibility that variation in regulatory sequence plays a critical role in disease phenotypes. We have used genome-scale 5’RACE (CAGE) to identify the location and usage of transcription start sites in 864 human tissues, primary cells and cell lines, and show here that there is a strong enrichment for disease-associated variants within the sequence immediately adjacent to transcription start sites. Using the expression profiles of known variants associated with disease susceptibility, we identify experimentally-available cell types significantly associated with specific diseases and traits. The expression of genes known to be associated with particular diseases was positively correlated. Such co-expression was used to identify unreported candidate disease-associated regulatory regions within published genome-wide association studies (GWAS). The approach was validated by identifying candidate loci in a 2007 GWAS study that were subsequently validated in larger independent datasets These functional genomics approaches directly inform choices of model system and identify disease- and cell type-specific co-regulated networks for a wide range of common diseases.
Authors: Baillie JK*, Haley CS, Schaefer U, Faulkner GJ, Freeman T, Brown JB, [others...], [Numerous RIKEN authors, order etc. TBC, at least including: Kawaji H, Forrest A, Carninci P]*, Hume DA*
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on Primary Cells
Target journal(s): Nature Genetics
Internal submission date: ...
Contact by email: Kenneth Baillie, David Hume
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Ab Initio Prediction of Tissue-Specific Regulatory Modules in the FANTOM5 Project
ManuscriptID: Phase1_005
Status: SUBMITTED TO GENOME RESEARCH
Abstract: One of the major goals of the FANTOM5 project, the broadest TSS-based promoter-level expression atlas of transcriptional regulatory networks, is the identification of coding and non-coding, annotated and novel transcriptional units being transcribed in a cell-specific mode across the different biological states/samples. In this work we analyzed the FANTOM5 dataset using ScanAll, a newly developed software here described, to ab initio predict the presence of conserved elements in the genomic regions surrounding FANTOM5 promoters. Firstly we aimed at identifying motifs that were conserved in a subset of the selected genomic regions and that possibly corresponded to Transcription Factor Binding Sites (TFBS); we then expanded our analysis to pinpoint the existence of more complex, structured regulatory modules, that is groups of conserved motifs co-occurring in the aforementioned (co-expressed) regions within a fixed distance. We confirmed the sample-specificity of our output by showing that the majority of the obtained combinations of modules were able to divide the specimens into sample-specific groups, thus possibly explaining the peculiarities of regulatory events occurring in each tissue. Among these sites it was possible to confirm the presence of TFBS for known regulators already associated to those samples together with an additional and significant portion of motifs remaining unannotated, thus representing putative novel binding elements. In addition we were able to associate the presence of a significant portion of the identified motifs to distinct families of repeated elements, thus confirming a structural/functional feature of mammalian promoters that is currently emerging as one of the most peculiar regulatory aspects associated to mammalian phylogeny. Finally, we were able to identify previously uncharacterized aspects of the regulatory networks occurring in early-development samples thus confirming the significant advantage deriving from our modular approach.
Authors: Emiliano Dalla, Yari Ciani, Marco Zantoni, Alberto Policriti, Hideya Kawaji, Michiel J.L. de Hoon, Timo Lassmann, Alistair R.R. Forrest, Michael Rehli, Ivan Kulakovsky, Claudio Schneider, Silvano Piazza
Authors contribution statement: ED conceived the project, developed part of the software, oversaw implementation, performed some of the analysis and most manuscript writing; YC implemented part of the software, performed some of the analysis and prepared some figures; MZ developed and implemented part of the software; AP developed part of the software and contributed to the manuscript writing; TL was responsible for tag mapping; HK managed the data handling; MR, IK and MJLdH were involved in motif assessment; ARRF was responsible for FANTOM5 management and concept; CS supervised the study; SP developed and implemented part of the software, carried out statistical tests and results interpretation and wrote parts of the manuscript.
Datasets used: Helicos CAGE on all of F5freeze1
Target journal(s):
Internal submission date: June 1st 2012; Update: December 21st 2012: Post Internal Review Update: January 28th 2012
Contact by email: Emiliano Dalla
Word document version of manuscript for editors: File:FANTOM5 PromoteromeSatelliteLNCIB.doc
PDF version for general viewing (including all figs in one PDF): File:FANTOM5 PromoteromeSatelliteLNCIB wFigures.pdf
Title: A high resolution spatial-temporal promoterome of the human brain (was Brain CAGE)
ManuscriptID: Phase1_007
Status: redraft/revise
Abstract: The human brain is an extremely complex organ that governs our abilities for cognition, reasoning and emotions and is the control center for the body. Its morphology and functionality during development have been well studied, but the molecular mechanisms contributing to its function and maintenance later in life remain poorly understood. Complexity at the transcriptional level is likely to play a major role in defining its morphological and functional characteristics. To investigate this we used single molecule CAGE and created a high resolution atlas of transcription start sites for 15 anatomical regions of the human central nervous system, using post-mortem samples derived from infant and aged adult donors. On the transcriptional level brain is clearly distinguishable from other tissues even if we consider only non-coding genes or expression from genomic regions often described as genomic dark matter. Using these differences we identify a specific set of transcription start sites that characterizes the brain. We show extensive differences in transcription between infant and adult that in some cases can be linked to loci associated with major neurodegenerative diseases. The differential expression across distinct regions correlates well with developmentally and/or functionally related anatomical districts and is refelected by distinct networks of interacting transcription factors, a range of lncRNAs and novel transcripts co-expressed in a regionally biased manner. Overall we provide the scientific community with a powerful expression resource based on post-mortem tissue, particularly highlighting the contribution of non-coding RNAs to the transcriptional complexity of human central nervous system.
Authors: Margherita Francescatto, Morana Vitezic, Patrizia Rizzu, Javier Simon-Sanchez, Robin Andersson, FANTOM5_RIKEN_OSC_members, Carsten O Daub, Albin Sandelin, MIchiel JL de Hoon, Piero Carninci, Alistair RR Forrest, Peter Heutink
Authors contribution statement: MF and MV did the analyses; MF, MV and PH wrote the manuscript, PR selected all samples, evaluated medical and pathological records and isolated RNA, JSS curated the list of disease loci, RA and AS provided the list of enhancers, ARRF, PC and PH designed the study ...
Datasets used: Helicos CAGE on VUMC provided brain samples (adult and newborn); full list of samples presented in Supplementary Table 1
Target journal(s): Genome Research
Internal submission date:
Contact by email: Peter Heutink, Margherita Francescatto, Morana Vitezic
Final version: File:Francescatto and Vitezic manuscript.pdf File:Francescatto and Vitezic figures.pdf File:Francescatto and Vitezic Supplementary Note.pdf
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Evolution of expression patterns in human gene families illustrated by the FANTOM5-CAGE encyclopedia of transcription start sites.
ManuscriptID: Phase1_019
Status: SUBMITTED TO BMC EVOLUTIONARY BIOLOGY
Abstract: Background Human gene families emerged through consecutive rounds of gene duplication. Here we apply the cutting-edge FANTOM5 single-nucleotide resolution atlas of transcription start sites from 1348 human and mouse libraries, to elucidate expression pattern evolution in animal gene families, with stress on comparison between human and mouse, and normal versus cancer cells. Results
Broad over-view of FANTOM5 was obtained with intra-species and inter-species hierarchical clustering of human and mouse samples. In the follow-up, we dated gene duplications by phylogenetic timing, and investigated the rate of expression pattern divergence between duplicates, as well as the tissue-specificity of their expression. Finally, we defined the concept of phylo-expression signatures as strong associations between duplications of certain ages and expression samples in the FANTOM5 atlas. We show how phylo-expression signatures can be used to generate novel hypotheses on the nature of animal evolution, and discuss central nervous system and reproductive tract as two focused examples.
Conclusions A striking trend for young genes to be narrowly expressed was revealed. Several lines of evidence suggested that emergence of placental mammals was a unique period in the evolution of animal gene families and duplicates dating to that period have broader and more conserved expression patterns, with genes involved in chromatin assembly and epigenetic control driving the trend. A major strength of the FANTOM5 atlas is that it profiles normal tissues, primary cells, and cancer cell lines, and as expected, clustering of expression profiles showed a major divide between leukemias and solid tumors. Where the evolutionary link became apparent was that in cancer cell lines, unlike in tissues and primary cells, recent paralogs lacked the peak of highly correlated pairs. This novel finding suggests that global devolution and loss-of-evolutionary constraints on expression patterns accompany malignant transformation, and provides additional evidence in the debate on use of cancer cell lines as research models.
Authors contribution statement:
OS and LH designed the study, performed all analyses, and wrote the manuscript.
A.R.R.F. and C.O.D were involved in the FANTOM5 concepts and management.
Datasets used: Helicos CAGE, TreeFam8
Target journal(s): Genome Biology
Internal submission date: November 30th
Contact by email: Lukasz Huminiecki ,Oxana Sachenkova
Word document version of manuscript for editors : File:The structure of animal expression pattern evolution.doc (only text)
PDF version for general viewing (including all figs in one PDF): File:The structure of animal expression pattern evolution.pdf (this file includes all the figures)
Title: Homotypic clusters of transcription factor binding sites control specific promoter expression
ManuscriptID: Phase1_006
Status: ON REVISION TO BE RESUBMITTED TO NAR
Abstract: Transcription factors (TFs) regulate gene expression via binding to appropriate DNA sites (TFBS). Regions at transcription start sites often contain homotypic clusters of binding sites (HCBS), or groups of closely located TFBS for the same TF. Presence of HCBS is a pronounced feature of TSS sequence landscape but their role in promoter expression regulation was never studied at a comprehensive dataset. We predicted HCBS across the atlas of human TSS provided by the FANTOM5 project. More than a half of all studied TFs had HCBS at least for 1% of the entire promoterome. One class of TFs was associated with HCBS specifically at sample-specific promoters; all other TFs had HCBS at both housekeeping and sample-specific promoters. No TFs were identified having HCBS exclusively at housekeeping promoters. TFs expressed in particular cell types tended to have HCBS at promoters active in the same cell types. Classification of cell types based upon specifically expressed TFs agreed with classification based upon TFs with HCBS at specifically expressed promoters and with FANTOM5 sample ontology. This demonstrates that HCBS are an important component of mechanisms specifically regulating gene expression. Supplementary information
http://line.bioinfolab.net/guest/homo1/Supplementary.zip Authors: Ivan V. Kulakovskiy (1,2), Yulia A. Medvedeva (2,3,4), Sebastian Schmeier (3,5), Ilya E. Vorontsov (2), Maya S. Polishchuk (1), Alexander V. Favorov (2,6,7), Hideya Kawaji (8), Michiel J.L. de Hoon (8), Timo Lassmann (8), Alistair R.R. Forrest (8), Michael Rehli (9), Vsevolod J. Makeev (1,2,7,10), the FANTOM5 consortium (8)
Authors contribution statement: I.V.K. implemented the software and drafted the manuscript. Y.A.M. carried out statistical tests and results interpretation. M.S.P. developed the homotypic cluster detection algorithm. A.V.F. selected proper statistical tests and performed hierarchical clustering. S.S. provided the housekeeping set of TSS-clusters and processed TF expression data. I.E.V. performed PWM threshold estimation. T.L. was responsible for tag mapping and provided sets of CAGE tag clusters enriched in each particular sample. H.K. managed the data handling. M.R. and M.J.L.dH. were involved in motif assessment. A.R.R.F. was responsible for FANTOM5 management and concept. V.J.M. coordinated the homotypic clusters study. All the authors participated in writing and finalizing the manuscript.
Datasets used: Helicos CAGE - FANTOM5 FREEZE1, "robust" subset
Target journal(s): Nucleic Acids Research, Bioinformatics
Internal submission date: 18 June 2012 / Updated: 12 September 2012 / Minor fixes: 1 December 2012
Contact by email: Vsevolod Makeev, Ivan Kulakovskiy
PDF version for general viewing (including all figs in one PDF): File:HOMOTYPICUS-FANTOMsatellitepaper.r2.pdf
Title: Analysis of antisense transcription in loci associated to neurodegenerative diseases
ManuscriptID: Phase1_022
Status: SUBMITTED TO BMC GENOMICS
Abstract: The FANTOM5 sequencing datasets represent the largest collection of transcriptomes from human cell lines, primary cells and whole tissues of various origin. Transcription starting sites are mapped at high resolution by the use of a modified protocol of Cap-Analysis of Gene Expression (CAGE) for high-throughput single molecule next-generation sequencing with Helicos (hCAGE). We employed the FANTOM5 collection of data to address the role of antisense transcription in neurodegeneration. We focused our analysis exclusively on tissues and primary cells, to avoid artifacts due to cellular transformation in culture cell lines. Among the >1261 human hCAGE libraries, we selected those of brain origin. Libraries from total blood and selected blood cell populations were also included in the analysis. A total of 66 tissue- and 244 cell-specific libraries were interrogated for the presence of antisense transcription to well-established loci associated to Alzheimer’s disease, Amyotrophic Lateral Sclerosis, Frontotemporal Dementia, Huntington’s and Parkinson’s disease. Almost all analyzed genes display some degree of antisense transcription mainly in their 5’ or 3’ UTRs. 5’ head-to-head divergent antisense transcription appears enriched compared to global distribution of sense/antisense pairs. Identified antisense transcripts may have coding and non-coding capabilities, with lncRNAs being more represented. Expressed transcripts are generally poorly annotated and may contain repetitive elements of the Alu, SINE and LINE families. Antisense transcription was validated for a subset of genes, including amyloid precursor protein, microtubule-associated protein tau, DJ-1, leucin-rich repeat kinase 2 and α-synuclein. The validated transcripts are predicted to have non-coding functions and most of them were not annotated. Quantitative analysis of antisense transcripts in human tissues indicates enrichment in the brain, compatible with FANTOM 5 data. Overall, these results represent the most comprehensive analysis of antisense transcription at loci associated to neurodegeneration and provide evidence for the existence of additional regulation of disease-related genes by previously not-annotated long non-coding RNAs.
Authors: Zucchelli SIlvia, Paolo Vatta, Stefania Fedele, Raffaella Calligaris, XXXX (from F5 consortium), Al Forrest, Piero Carninci and Stefano Gustincich
Authors contribution statement: SZ designed the experiments, analyzed the data, wrote the manuscript; PV performed the bioinformatics analysis, prepared some figures; SF designed the experiments, performed the experiments and analyzed the data; RC provided reagents, designed the experiments and analyzed the experiments; SG analyzed the data, wrote the manuscript
Datasets used: Helicos CAGE on human brain and blood samples
Target journal(s): Genome Research, Plos Genetics, Human Molecular Genetics
Internal submission date: beginning of june
Contact by email: Stefano Gustincich, Silvia Zucchelli
Word document version of manuscript for editors: File:Zucchelli FANTOM5 Manuscript 2013 01 22.doc
PDF version for general viewing (including all figs in one PDF):
File:Zucchelli FANTOM5 Figures 2013 01 22.pdf
File:Zucchelli FANTOM5 Supplementary 2013 01 22.pdf
File:Zucchelli FANTOM5 TAbles 2013 01 22.pdf
Title: Identification of miRNA promoters and primary structures
ManuscriptID: Phase1_028
Status: sent presubmission inquiry to Nat. Communications
Abstract: MicroRNA (miRNA) is a class of functional small RNAs that play crucial roles by regulating gene expressions in a wide range of biological processes. In the miRNA biogenesis, hairpin-structured RNAs are cleaved out from long primary miRNAs co-transcriptionally in the nucleus by a microprocessor complex comprising Drosha and Dgcr8, double-stranded ~22nt RNAs are cleaved out from the hairpins by Dicer in the cytosol, and single stranded small RNAs are loaded in a RNA-induced silencing complex (RISC) to repress their target genes. While these maturation and functional processes have been extensively investigated to date, their transcription have been poorly understood. Here we tackled to identify transcription initiation of primary miRNAs by combining a genome-wide survey of transcription starting sites (TSSs) based on a single molecule sequencer and perturbation of the microprocessor complex. We depleted the microprocessor proteins by siRNA knock down in HeLa cells and isolated their nuclei to enrich primary miRNAs (long RNA) in the RNA extracts. Genome-wide quantification of transcription initiation sites by CAGE (Cap Analysis of Gene Expression) enabled us to quantify if 5’-end of primary miRNAs are accumulated at each of the TSS regions identified in the FANTOM5 project (Forrest et al. Nature 507.7493 (2014): 462–470). We found 5’-end of capped RNAs corresponding to 120 TSSs located upstream of 86 miRNA hairpins are accumulated as a result of the depletion. Of those, 13 TSSs are located upstream of 15 intergenic miRNAs hairpins, and 107 TSSs are located upstream of 71 intragenic miRNA hairpins hosted by other long genes (Table 1). We found 21 of the 120 TSSs are completely novel and not reported by any gene models or previous analyses. Notably, our approach enabled us to discriminate a group of promoters affected by the microprocessor depletion from non-affected promoters (e.g. Figure 1), and we found 53 miRNA hairpins (61% of the 86 miRNA hairpins which TSSs are identified above) are driven by multiple promoters. It suggests that distinct forms of primary transcripts are used to produce the same miRNAs in the same cell type. Unexpectedly, 5’-end of capped RNAs corresponding to 166 known TSSs located upstream of 61 miRNA hairpins are not accumulated by the microprocessor depletion regardless they are actively transcribed in the cells (e.g. Figure 2). Our results suggest that only a subset of TSSs located upstream of microRNAs are carefully selected to produce miRNAs. Our genome-wide approach firstly revealed the complex regulation of miRNA transcription.
Authors: Kazuhiro Kajiyama, Yoshinari Ando, Mitsuoki Kawano, Alistair R.R. Forrest, the FANTOM consortium and Hideya Kawaji
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks and its original data
Target journal(s): Nat. communications
Internal submission date:
Contact by email: KAWAJI Hideya
Word document version of manuscript for editors: File:140501-miRNApromoter.docx
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Identification of novel ciliopathy candidate genes by probing tissue gene expression patterns
ManuscriptID: Phase1_039
Status: active
Abstract: Ciliopathies are a set of complex genetic diseases caused by defects in an antennae-like organelle in the cell called cilia, which are involved in crucial cellular functions including cell motility, signalling and sensory activities. Using the FANTOM5 human tissue transcriptome dataset, we identified a set of candidate motile cilia genes with similar expression patterns to that of known ciliopathy genes. Our candidate gene list is supported by validations from Human Protein Atlas immunohistochemistry results, protein domain analysis, subcellular localization identification, and comparison to other proteomics studies as well as gene expression patterns in public tissue microarray datasets. Our list of annotated motile cilia candidate genes could be used to probe for gene mutations in ciliopathy patients, as well as further studies in motile cilia functions. In addition, our method demonstrates the potential in using the FANTOM dataset to identify novel gene candidates for other diseases where the disease markers have a distinct tissue expression pattern.
Authors: Nancy Yu, Anna Lindstrand, Andrea Bieder, Isabel Tapia Paez, Carsten Daub, Juha Kere
Authors contribution statement: JK and CD conceived the idea for the project, NY designed and performed the bioinformatics analyses and validations. AL provided expertise on cilia genes and contributed to the validation of gene lists, NY, CD, AL, AB, IP, JK discussed the direction of the paper. AB and IP may perform some validation experiments.
Datasets used: Phase1 CAGE peaks
Target journal(s): Nucleic Acids Research / BMC Genomics / PLOS ONE
Internal submission date: 2014
Contact by email: Nancy Yu, Carsten Daub
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Pathogen specific monocyte transcriptional responses
ManuscriptID: Phase1_008
Status: Working draft
Abstract:
Authors: Wells
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on ...
Target journal(s):
Internal submission date:
Contact by email: Christine Wells, Anthony Beckhouse
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Precise annotation of TSS peaks uncovering complexity in state-dependent activation of transcription initiation
ManuscriptID: Phase1_024
Status: Presubmission inquiry sent to Nat Genet
Abstract: Recent progress of technology development in high-throughput sequencing, for example CAGE (Cap Analysis Gene Expression), enabled us to identify transcription starting sites (TSSs) and quantify their activities experimentally across the genome. Given the complexity of transcription (ref: ENCODE transcriptome), empirical annotation of TSSs based on experimental data is an essential baseline to understand transcriptional regulation in individual cells, and investigate them in the transcriptome space. In the FANTOM5 project (Forrest et al. Nature 507, 462–470, 2014), we profiled TSS activities more than one thousand mammalian samples including diverse range of primary cells, to understand transcriptome states encoded in the genomes. It was a major challenge to produce TSS annotations capturing all the complexity of transcription in such heterogeneous data. Here we developed a new method to identify TSS peaks empirically from such diverse range of TSS profiles. The method relies on estimation of underlying TSS profiles by using ICA (independent component analysis). The result successfully discriminated proximal TSSs differentially used, while the other approaches failed (Figure 1). We found that our approach defines TSS peaks precisely, that is, covering known genes extensively with a quite limited amount of genome coverage (~0.5% of the genome, Figure 2). The result was used as a baseline to assess composite promoters (Forrest et al. Nature 507, 462–470, 2014), and we additionally assessed alternative promoters of genes. We found almost a half of the genes have transcribed from by multiple TSSs differentially activated (Figure 3). Further we assessed accuracy of gene model 5’-end, to provide an overview of different gene models. The resulting data represent the most advanced baseline to understand transcription initiation to understand complexity of transcription initiation.
Authors: Kawaji H, et al.
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on phase1 freeze
Target journal(s):
Internal submission date:
Contact by email: KAWAJI Hideya
Word document version of manuscript for editors: File:140501-Precise annotation of TSS peaks uncovering complexity in state.docx
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: The regulated expression of repetitive elements across human cell types and tissues
ManuscriptID: Phase1_044
Status: in preparation
Abstract: ...
Authors: Dave Tang, the FANTOM5 consortium, and Piero Carninci
Authors contribution statement: DT processed the data, designed the analyses, performed the analyses, interpreted the data, and wrote the manuscript. PC oversaw the project.
Datasets used: All phase 1 BAM files
Target journal(s): Genome Research, Genome Biology, etc.
Internal submission date: soon
Contact by email: Dave Tang and Piero Carninci
Word document version of manuscript for editors: in preparation
PDF version for general viewing (including all figs in one PDF): in preparation
Title: MYEOV as a novel biomarker for human non-small cell lung cancer
ManuscriptID: Phase1_49
Abstract: We explored a biomarker for non-small cell lung cancer (NSCLC) utilizing the functional annotation of the mammalian genome (FANTOM) 5 database containing gene expression profiles for a comprehensive panel of human primary cells and tissues. By comparing normal lung epithelial cells and NSCLC cells, we identified MYEOV as a marker overexpressed exclusively in NSCLC. MYEOV expression was correlated with the demethylation status of NSCLC cells and tissues. We also found that MYEOV expression was more frequent in lung adenocarcinoma harboring the KRAS mutation. Functional studies in NSCLC cells demonstrated that MYEOV promoted cell proliferation, survival, and invasion. The prognostic impact of MYEOV expression or demethylation was screened in NSCLC gene signature data sets, in which MYEOV was found to be a poor prognostic indicator. These findings delineate a clinically distinct subgroup of NSCLC with MYEOV expression and encourage further clinical exploration of MYEOV as a potential biomarker in NSCLC.
Authors: Masafumi Horie, Akira Saito, Hirotaka Matsuzaki, Satoshi Noguchi, Yu Mikami, Masayoshi Itoh, Hideya Kawaji, Timo Lassmann, Piero Carninci, Yoshihide Hayashizaki, Alistair R.R. Forrest, the FANTOM consortium, Daiya Takai, Yoko Yamaguchi, Patrick Micke, Mitsuhiro Ohshima and Takahide Nagase
Authors contribution statement: M.H., and H.M.: data acquisition and analysis; A.S.: writing of the manuscript; M.I.: CAGE data production; H.K.: managed the data handling; T.L.: tag mapping; P.C., Y.H., and A.R.R.F.: FANTOM5 management and concept; S.N., Y.M., and D.T.: technical and material support; Y.Y., M.O., and P.M.: conception and design; T.N.: study supervision.
Datasets used: phase1 CAGE peaks
Target journal(s): Clinical Cancer Research
Internal submission date:
Contact by email: Akira Saito
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Selection of cell-specific enhancers and promoters across the human body using PrESSTo
ManuscriptID: Phase1_50
Abstract: Promoters and distal enhancers are key regulators of gene expression. In the FANTOM5 project, promoter and enhancer usage has been measured across hundreds of tissues and primary cells types covering the vast majority of human cell types1,2, using 5’ end sequencing of RNAs (CAGE).
These atlases are powerful datasets for understanding human biology and disease. However, navigating the vast data sets is daunting for most users. In particular, the lack of intuitive methods for selection of enhancers/promoters specifically used in tissues/cells of interests is a bottleneck for broad usage of such data sets by the community. Pre-defining such sets is challenging since the choice of thresholds and methods for tissue/cell specificity is typically best defined by users.
To this end, we present PrESSTo, (Promoter Enhancer Slider Selector Tool - http://pressto.binf.ku.dk), a tool that uses sliders to select enhancers/promoters with user-customized expression thresholds
Authors: Hans Ienasescu, Kang Li, Robin Andersson, Morana Vitezic, Yun Chen, Kristoffer Vitting-Seerup, Mette Boyd, Jette Bornholdt, (RIKEN PEOPLE go here), the FANTOM Consortium, Albin Sandelin
Datasets used: phase1 CAGE peaks
Target journal(s): To be decided
Internal submission date:
Contact by email: [12]
Word document version of manuscript for editors: [13]
Title: A role for YY1 in sex-biased transcription revealed through X-linked promoter activity and allelic binding analyses
ManuscriptID: Phase1_51
Abstract: Sex differences in susceptibility and progression have been reported in numerous diseases including cancer, autism, cardiac and autoimmune disorders. Such discrepancy has been attributed to sex chromosomes, sex hormones, and environmental factors. With two copies of the X chromosome in female cells, X-chromosome inactivation imparts mono-allelic gene silencing for dosage compensation; however, a subset of genes escape silencing. These escapee genes are transcribed bi-allelically, resulting in sexual dimorphism. We identified transcription start sites of escapees based on higher transcription levels in female cells (escTSSs) using FANTOM5 CAGE data. Significantly stronger DNA methylation similarity between sexes is consistent with bi-allelic activity at these escTSSs. We observe more transcription factor (TF) ChIP-seq peaks in escTSSs, with over-representation in females. Enrichment of JASPAR TF binding site motifs and ChIP-seq peaks as well as allelic binding analyses highlighted a positive association between YY1 binding and transcriptional activity on the inactive X (Xi) at bi-allelically transcribed escTSSs as well as long non-coding RNAs frequently associated with Xi-specific superloops. Using multiple data types, we revealed unique properties of escapees as well as bias in comparison to other chrX genes, which should be addressed in the studies of sexes. Collectively, our analyses highlight the importance of YY1 on transcriptional activity on Xi in general, and its potential involvement in addition to CTCF and RAD21 in chromatin looping of the Xi in GM12878 cells.
Authors: Chih-yu Chen, Wenqiang Shi, Allison M. Matthews, Yifeng Li, David J. Arenillas, Anthony Mathelier, the FANTOM Consortium, Carolyn J. Brown, Wyeth W. Wasserman
Authors contribution statement: CYC, WWW and CJB designed and conceived experiments. YL (sex classification), AMM (DNAm and XCI), AM (motif and ChIP-seq over-representation) and WS (allelic binding) contributed intellectually within their areas of expertise. CYC conducted all analyses. WS provided the allelic and simulated binding reads. AM and DJA developed and provided support for the CAGEd-oPOSSUM web server. CYC, WWW and CJB wrote the manuscript.
Datasets used: phase1 CAGE peaks
Target journal(s): Genome Biology
Internal submission date: ETA Dec 2015
Contact by email: [wyeth@cmmt.ubc.ca, juliec@cmmt.ubc.ca, amathelier@CMMT.ubc.ca]
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
DEAD/ABANDONED MANUSCRIPTS??
Title: Quantifying the informational complexity of transcriptional regulatory programmes
ManuscriptID: Phase1_015
Status: On-hold. Focussing on the biological results Phase1_016 rather than methods. Hope to return to methods later (phase2).
Abstract: The regulation of gene expression defines cellular identity, it is the basis for organism development and it underlies many cellular responses to the environment. Its disruption is implicated in many diseases and changes in gene regulation appear to underlie many adaptations evident between species. Previously, genes have been grouped and interpreted based on their specificity of expression, for example house-keeping genes that are expressed by all cells in all conditions versus highly tissue restricted genes expressed by only one cell type at a particular developmental time. Although such studies have been informative they fail to capture important aspects of how a gene is regulated or account for the heterogeneous relatedness of samples. The expression pattern of a gene is the output of a regulatory program within the cell. A program that must affect many state changes (on, off, up, down) is likely to require more regulatory information (Kolmogorov complexity) than a program effecting fewer state switches. If we can quantify this "regulatory complexity" we can then start to address deeper questions as to where that regulatory information is encoded, how malleable it is through evolution and how susceptible it is to perturbation by mutation. For example, a greater regulatory complexity could correspond to a higher concentration of cis-regulatory sequences around the gene or alternatively a single binding site for a transcription factor that is the output of an extensive intracellular signalling network. To address these questions we have explored a range of possible measures regulatory complexity including distance weighted entropies, diversity and richness scores. This leads us to introduce a novel measure of regulatory complexity (CR). It is implemented as a hierarchical Baysian model parametrised through MCMC. The CR method can be thought of as a relative measure of the number of gene expression state changes occurring over a tree relating all analysed samples. A by-product of this analysis is a probabilistic scoring of gene expression state switches between all analysed gene expression libaries. CR is weighted to account for the genome wide similarity of gene expression between samples but does not depend on the inference of a fixed underlying tree topology. Note - this is intended as essentially a methods paper, see Phase1_016 for the biological insights paper
Authors: Sarah Baker, Martin Taylor
Authors contribution statement: SB developed and implemented methods and performed general analyses; MT conceived the project and oversaw implementation and performed some of the analysis
Datasets used: Helicos CAGE on primary cells from human and mouse.
Target journal(s): Bioinformatics or Genome Research
Internal submission date: ETA July 2013
Contact by email: Martin Taylor, Sarah Baker
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Cis encoding of the master developmental regulatory programme
ManuscriptID: Phase1_016
Status: Working draft, starting dataset being regenerated to incorporate improved method
Abstract: The regulation of gene expression defines cellular identity, it is the basis for organism development and it underlies many cellular responses to the environment. Its disruption is implicated in many diseases and changes in gene regulation appear to underlie many adaptations evident between species.
Authors: Sarah Baker, Martin Taylor
Authors contribution statement: SB developed and implemented methods and performed general analyses; MT conceived the project and oversaw implementation and performed some of the analysis
Datasets used: Helicos CAGE on primary cells from human and mouse. We may also want to use time course data for this paper (does that push it into phase2?).
Target journal(s): PLoS Biology
Internal submission date: ETA March 2013
Contact by email: Martin Taylor, Sarah Baker
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Promoter specificity in transcription determines cell lineage choice
ManuscriptID: Phase1_018
Status: Delayed (as of September 12th)
Abstract: This paper will use pathprint (pathway fingerprinting) to develop an overall phylogenetic tree of all samples in F5 freeze1. This tree will be used to determine relative ancestry of samples and cluster them accordingly. SwitchEngine will be run to find switching in TSS at key junctions in differentiation. Will show TSS dynamics at these informative sites is associated with lineage-commitment.
Authors: Emmanuel Dimont, Gabriel Altschuler, Winston Hide
Authors contribution statement: ED did ..., GA did ..., WH did ...
Datasets used: Helicos CAGE on all of F5freeze1
Target journal(s):
Internal submission date:
Contact by email: Winston Hide, Emmanuel Dimont, Gabriel Altschuler
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Gene duplication and promoter divergence in mammals.
ManuscriptID: Phase1_020
Status: Delayed
Abstract:
Authors: Lukasz Huminiecki and Core RIKEN Authors
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ... and LH did everything else
Datasets used: Helicos CAGE on ..., F5 promoter and enhancer datasets, TreeFam8
Target journal(s): Genome Research
Internal submission date: September 1st
Contact by email: Lukasz Huminiecki
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Gene duplication and TF/miRNA regulatory network evolution in mammals.
ManuscriptID: Phase1_021
Status: Delayed
Abstract:
Authors: Lukasz Huminiecki and Core RIKEN Authors
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ... and LH did everything else
Datasets used: Helicos CAGE on ... TreeFam8, miRBase, microRNA target predictions
Target journal(s): Genome Research
Internal submission date: December 1st
Contact by email: Lukasz Huminiecki
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf
Title: Investigating tissue-specificity of cancer-causing mutations
ManuscriptID: Phase1_037
Status: Working draft
Abstract: Over the past 10 years an increasing number of mutated genes have been associated with familial predisposition to cancer. Interestingly for more than half of these genes their involvement in cancer is restricted to only a few cancer types (e.g. BRCA1 mutations in breast and ovarian cancers). Even more interestingly some of these genes are expressed in all cell types, and perhaps we would expect to see them causing many more different types of cancer but they don’t. This paper will examine how these mutations are tolerated in most cell types but not in others by considering the network of genes expressed in different cell types and how that determines whether they are susceptible or resistant.
Authors: Jessica Mar, Daniel Carbajo, RIKEN_OSC_members, Alistair Forrest
Authors contribution statement: JM and AF conceived the project, DC conducted the analyses.
Datasets used: phase1 CAGE peaks
Target journal(s):
Internal submission date:
Contact by email: Jessica Mar
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf