Satellite submission: Difference between revisions

From Wiki
Jump to navigationJump to search
No edit summary
Line 452: Line 452:
Word document version of manuscript for editors: '''[[Image:XXXYOUR.doc]] <br>'''PDF version for general viewing (including all figs in one PDF): '''[[Image:XXXYOUR.pdf]] [[File:enhancerome_full.pdf]]
Word document version of manuscript for editors: '''[[Image:XXXYOUR.doc]] <br>'''PDF version for general viewing (including all figs in one PDF): '''[[Image:XXXYOUR.pdf]] [[File:enhancerome_full.pdf]]


----

==Title: Transcriptome dynamics in the mesenchymal stem/stromal cells of high-grade-serous-ovarian cancer microenvironment. ==
'''ManuscriptID: '''Phase1_036 <br>'''
'''Status: '''Working draft<br>
'''Abstract: '''From the most recent and accumulating evidence, the role of cancer microenvironment is being recognized as one of the most critical hallmarks in both cancer progression and metastasis. Mesenchymal Stem/Stromal Cells (MSCs) are the precursors of various cell types that compose both normal and cancer tissue microenvironments. We have isolated MSCs from various High-Grade Serous Ovarian Carcinomas (HG-SOCs),demonstrated their normal genotype and analyzed their transcriptome using deep-CAGE analysis with respect to similarly derived normal tissues MSCs and FANTOM5 sample data-set.
The integrative analysis conducted against the extensive panel of primary cells and tissues of the FANTOM5 project allowed us to identify a cell-type specific transcriptional activity associated with the HG-SOC-MSCs. The hierarchical clustering analysis shows that MSCs derived from HG-SOCs co-cluster with other MSCs while retaining distinct transcriptional peculiarities. Their transcriptional activity shows a very strong correlation with that of primary mesothelial cells, which actually represent the embryonic cellular origin of serous ovarian cancer. Most importantly, this analysis has revealed HG-SOC-MSCs specific identity when compared to similarly derived MSCs from normal tissues such as bone marrow, heart and adipose tissues, enforcing the idea that the environment organized by the transformed serous ovarian cancer cells could be responsible for establishing such transcriptional specificity in the resident/mobilized stromal precursor cells.
Integrating the identified transcriptional signatures of the HG-SOC-MSCs with the gene expression matrices of the publicly available TCGA HG-SOC dataset, we were able to trace HG-SoC-MSC signature in a fraction of the tumor samples.
Altogether, the reported analysis support the hypothesis that HG-SOC-MSCs are bona-fide representatives of the ovarian district, either tracing their specific mesothelial origin or highlighting their epigenetic conditioning by the HG-SOC enviroment<br>
'''Authors: '''Roberto Verardo, Silvano Piazza, RIKEN_OSC_members, Claudio Schneider <br>
'''Authors contribution statement: '''ED conceived the project, developed part of the software, oversaw implementation, performed some of the analysis and most manuscript writing; SP developed and implemented part of the software, carried out statistical tests and results interpretation and wrote parts of the manuscript; YC implemented part of the software and prepared some figures; MZ developed and implemented part of the software; AP developed part of the software and contributed to the manuscript writing; CS supervised the study <br>
'''Datasets used: '''Helicos CAGE on all of F5freeze1 <br>
'''Target journal(s): '''<br>
'''Internal submission date: '''June 1st 2012 <br>
'''Contact by email: '''[mailto:schneide@lncib.it Claudio Schneider] <br>'''Word document version of manuscript for editors: '''[[File:xxx claudio.doc]] <br>'''PDF version for general viewing (including all figs in one PDF): '''[[File:Claudio.pdf]]

----
----
----



Revision as of 06:45, 20 November 2012

Satellite manuscript internal review page

Welcome to the FANTOM5 Satellite review page. As discussed at the Ume and Koyo meetings, all papers will be visible to consortium members. This is to allow everyone to know what is going on, promote collaboration, carry out due process regarding co-authorship and to avoid competition.

Authorship

The author list will basically be selected by the first author and the corresponding author of each satellite paper on the basis of the scientific contribution to the manuscript. Remember to include an authors contribution statement for all authors named in your manuscript (of the form AB carried out the cell isolation, SB carried out the network predictions etc.).

In addition the FANTOM5 headquarter will name RIKEN OSC members who should be co-authors for their input on each manuscript and to the entire FANTOM5 project. For those of you who have participated in previous FANTOMs you will be familiar with this process, for those new to FANTOM please look at the author lists on the satellite paper collections for FANTOM2-4. FANTOM5 headquarter is currently discussing the policy for RIKEN OSC co-authorship on the FANTOM5 satellites, but basically satellites papers will be considered on a case by case basis, and will take into account datasets used, intellectual input and facilitating technologies/analyses for each paper.

At this stage please name any authors from the OSC that you think should definitely be included as co-authors, in addition for all satellite submissions include the following term RIKEN_OSC_members as an additional author.

Instructions

Please make a copy of the template below and enter your manuscript details.

If you are not able to edit the wiki yourself please email the secretariat with the subject line "FANTOM5_satellite", but please understand that these will be processed when we can rather than immediately. You must fill in all of the details below and provide both a PDF that contains all figures, and word doc of the main text, for reviewers to mark up directly.

Manuscripts

Title: Analysis of DNA methylation and transcription during granulopoiesis reveals timed methylation changes in low CpG areas and regulation of transcription factor expression and motif activity

ManuscriptID: Phase1_001
Status: Good draft
Abstract: In development epigenetic mechanisms such as DNA methylation have been suggested to provide cellular memory to maintain pluripotency but also stabilize cell fate decisions and direct lineage restriction. In this study we set out to characterize changes in DNA methylation levels and gene expression during granulopoiesis using four distinct cell populations ranging from the oligopotent common myeloid progenitor stage to terminally differentiated neutrophils. We found a general decrease of DNA methylation during granulopoiesis. Methylation levels appear to change at specific differentiation stages and correlate with changes in transcription and motif activity of key hematopoietic transcription factors. Differentially methylated sites (DMSs) are preferentially located in areas distal to CpG islands and shores and are overrepresented in potentially regulatory enhancer elements. Overall this study depicts in detail the epigenetic and transcriptional changes that occur during granulopoiesis and supports the role of DNA methylation as a regulatory mechanism in cell differentiation.
Authors: Michelle Rönnerblad, Tor Olofsson, Sören Lehmann, RIKEN_OSC_members, Karl Ekwall*, Erik Arnér* & Andreas Lennartsson*
Authors contribution statement: MR did most of the practical experiments, the bioinfo analysis (except CAGE related) and most manuscript writing, TO isolated the cells from bone marrows, SL gave valuable input to the planning, analysis and critically reviewed the manuscript, KE planned and supervised the study and contributed to the manuscript writing , EA supervised the bioinformatic analysis and performed the ones related to CAGE and contributed to the manuscript writing, AL initiated, planned and supervised the study and contributed to the manuscript writing and did some experiments.
Datasets used: Helicos CAGE on granulo precursor populations
Target journal(s): Blood
Internal submission date: April 7th 2012
Contact by email: andreas lennartsson, Karl Ekwall, Erik Arner
Word document version of manuscript for editors: File:Rönnerblad.doc
PDF version for general viewing (including all figs in one PDF): File:Rönnerblad Aprl07.pdf


Title: Cell-type specificity and co-expression of regulatory polymorphisms associated with human disease

ManuscriptID: Phase1_002
Status: Good draft
Abstract: Our ability to use genetic associations with disease to develop better treatments has been limited by the difficulty of identifying a biological process, or cell type, on which to focus investigation. Most disease-associated polymorphisms do not lie within protein-coding genes, raising the possibility that variation in regulatory sequence plays a critical role in disease phenotypes. We have used genome-scale 5’RACE (CAGE) to identify the location and usage of transcription start sites in 864 human tissues, primary cells and cell lines, and show here that there is a strong enrichment for disease-associated variants within the sequence immediately adjacent to transcription start sites. Using the expression profiles of known variants associated with disease susceptibility, we identify experimentally-available cell types significantly associated with specific diseases and traits. The expression of genes known to be associated with particular diseases was positively correlated. Such co-expression was used to identify unreported candidate disease-associated regulatory regions within published genome-wide association studies (GWAS). The approach was validated by identifying candidate loci in a 2007 GWAS study that were subsequently validated in larger independent datasets These functional genomics approaches directly inform choices of model system and identify disease- and cell type-specific co-regulated networks for a wide range of common diseases.


Authors: Baillie JK*, Haley CS, Schaefer U, Faulkner GJ, Freeman T, Brown JB, [others...], [Numerous RIKEN authors, order etc. TBC, at least including: Kawaji H, Forrest A, Carninci P]*, Hume DA*
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on Primary Cells
Target journal(s): Nature Genetics
Internal submission date: ...
Contact by email: Kenneth Baillie, David Hume
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: What classes of mammalian promoter are there?

ManuscriptID: Phase1_003
Status: Working draft
Abstract: This study uses the comprehensive FANTOM5 promoter data, and careful methodology, to identify classes of mammalian promoter. In agreement with previous results, we find that promoters fall into two classes with narrow or wide spread of transcription start sites. In stark contrast to previous studies, we find little association between width and either CpG rate or TATA signals. Width correlates with expression level, suggesting that strength of promoter signal is on average proportional to promoter length. The data are consistent with a simple null hypothesis for CpG islands: that they are a passive consequence of expression (and thus cytosine demethylation and reduced CpG mutation) in germ-line cells. Finally, we show that measures of tissue specificity are prone to statistical artifacts, and specificity is not correlated with promoter narrowness, in contrast to previous claims. These results clarify some fundamental properties of mammalian promoters.
Todo: Use Charles's good way of measuring tissue specificity.
Authors: Frith, maybe Drabløs et al., open to others
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: All human Phase1 CTSSs (plan to add mouse)
Target journal(s):
Internal submission date: August 2012?
Contact by email: Martin Frith
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: Epigenetic factors regulating Hematopoiesis

ManuscriptID: Phase1_004
Status: Good draft
Abstract: The hematopoietic differentiation pathway is a complex regulatory program for generating different lineages of blood cell types from multipotent, hematopoietic stem cells. The transcriptional program dictating hematopoietic cell fate and differentiation requires an epigenetic memory function consisting of a network of enzymes controlling DNA methylation, histone posttranslational modifications and chromatin structure. Defective interactions between epigenetic enzymes and transcription factors cause perturbations in blood cell differentiation, which often leads to various types of hematopoietic disorders such as leukemia. To elucidate the contribution of different epigenetic factors in human hematopoieis, high-throughput Cap Analysis of Gene Expression (CAGE) sequencing was used to build comprehensive transcription profiles of 199 epigenetic factors in a wide range of blood cells. These epigenetic factors include proteins that covalently modify DNA/histones or alter chromatin structure dynamics. Our analysis revealed several epigenetic factors to have expression profiles specific for cell type, lineage type and/or leukemic cell lines. In this report the ‘epigenetic transcriptome’ has been systematically studied to predict their potential functions in the epigenetic regulatory network of human hematopoiesis. The potential of such a comprehensive study is not only to identify putative epigenetic regulators of normal hematopoiesis and postulate their function but also to serve as a resource for the scientific community for further characterization and validation of differentially expressed transcripts.
Authors: Punit Prasad, Michelle Rönnerblad,...FANTOM5, Erik Arner, Karl Ekwall and Andreas Lennartsson
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on ...
Target journal(s): Blood or other
Internal submission date:
Contact by email: Andreas Lennartsson, Erik Arner
Word document version of manuscript for editors: File:Epigenetic factors in hematopoiesis 051012.docx
PDF version for general viewing (including all figs in one PDF): File:Punit el al Fig 091216.pdf


Title: Ab Initio Prediction of Tissue-Specific Regulatory Modules in the FANTOM5 Project

ManuscriptID: Phase1_005
Status: Working draft
Abstract: The FANTOM5 project aims to use single molecule CAGE sequencing (Kanamori-Katayama et al. 2011; Itoh et al. 2012) to fully understand the transcriptional regulation in a mammalian system by generating transcriptional regulatory networks that define the majority of the mammalian cell types. One of the major outcomes of the project is the identification, in more than 50% [cit from Promoterome] of known transcriptional units, of distinct Transcription Start Sites (TSS) being used in a cell-specific way across the different biological states, with n% more [cit from Promoterome, page 8, lines 11-12] mapping to unannotated exons of known genes or to putative new tissue-specific transcriptional units. In this work, we used ScanAll to ab initio predict the presence of regulatory elements in the genomic regions surrounding the FANTOM5 CAGE tags. First of all we aimed at identifying motifs, possibly corresponding to Transcription Factor Binding Sites (TFBS), that were conserved in a subset of the selected genomic regions; we then expanded our analysis to pinpoint the existence of structured regulatory modules, that is groups of conserved motifs co-occurring in the aforementioned (co-expressed) regions within a fixed distance and possibly playing a role in the regulatory mechanisms driving tissue specificity.
Authors: Emiliano Dalla, Silvano Piazza, Yari Ciani, Marco Zantoni, RIKEN_OSC_members, Alberto Policriti, Claudio Schneider
Authors contribution statement: ED conceived the project, developed part of the software, oversaw implementation, performed some of the analysis and most manuscript writing; SP developed and implemented part of the software, carried out statistical tests and results interpretation and wrote parts of the manuscript; YC implemented part of the software and prepared some figures; MZ developed and implemented part of the software; AP developed part of the software and contributed to the manuscript writing; CS supervised the study
Datasets used: Helicos CAGE on all of F5freeze1
Target journal(s):
Internal submission date: June 1st 2012
Contact by email: Emiliano Dalla
Word document version of manuscript for editors: File:FANTOM5 PromoteromeSatelliteLNCIB.doc
PDF version for general viewing (including all figs in one PDF): File:FANTOM5 PromoteromeSatelliteLNCIB wFigures.pdf


Title: Homotypic clusters of transcription factor binding sites in the vicinity of transcription start sites

ManuscriptID: Phase1_006
Status: (?) draft (?)

Abstract:
Background
Transcription factors (TFs) specifically recognizing DNA binding sites (TFBS) play a key role in regulation of gene expression. Groups of closely localized TFBSs for a particular TF, so-called homotypic TFBS clusters (HCBSs), were originally detected in yeast and extensively studied in fruit fly early development. Recently HCs were found to be highly important for several human regulatory systems.

Motivation
It is a general practice to estimate an enrichment of binding sites in regulatory sequences. Still there is no systematized data whether the presence of HCBSs is common for promoter regions of human genes. The general properties of HCBSs also remain unclear as well as possible relation between HCBSs and regulation of tissue-specific expression.

Results
Using data on sample-specific transcription start sites (TSSs) detected in FANTOM5 and high quality binding models for more than 400 TFs from the HOCOMOCO TFBS model collection we have predicted TFBSs and corresponding HCBSs in promoter regions surrounding TSSs. TFBS models for most TFs were shown to form statistically significant HCBSs often formed by separate distant binding sites. For HCBSs of most of TFs we were able to identify samples having significant association between promoters of sample-specific or housekeeping TSSs. Thus for most of TFs we predict putative preferences for sample-specific or housekeeping HCBSs activity and provide a genome-wide map of HCBSs nearby FANTOM5-defined TSSs.

Supplementary information
https://fantom5-collaboration.gsc.riken.jp/webdav/home/vigg/homotypicus/

Authors: I.V. Kulakovskiy, Y.A. Medvedeva, M.S. Polishchuk, A.V. Favorov, S. Schmeier, T. Lassman, I.E. Vorontsov, RIKEN_OSC_members, V.J. Makeev

Authors contribution statement: IVK implemented the software and drafted the manuscript. YAM carried out statistical tests and results interpretation. MSP developed the homotypic cluster detection algorithm. AVF selected proper statistical tests. SS provided the housekeeping set of TSS-clusters. TL provided the set of sample-specific TSS-clusters. IEV estimated proper thresholds for PWMs used in the study. VJM coordinated the study. All the authors participated in writing and finalizing the manuscript.

Datasets used: Helicos CAGE - FANTOM5 FREEZE1, "robust" subset

Target journal(s): Nucleic Acids Research, Bioinformatics

Internal submission date: 18 June 2012 / Updated: 12 September 2012

Contact by email: Vsevolod Makeev, Ivan Kulakovskiy

Word document version of manuscript for editors: File:HOMOTYPICUS-FANTOMsatellitepaper.doc
PDF version for general viewing (including all figs in one PDF): File:HOMOTYPICUS-FANTOMsatellitepaper.pdf


Title: Brain CAGE

ManuscriptID: Phase1_007
Status: Working draft
Abstract:
Many studies of the development of the human nervous system, both at the gene level and genome-wide, have shed light on the differentiation and migration of the cell types that give rise to the structures of the mature nervous system. Although these structures and regions are relatively well characterized in the developed brain, both morphologically and functionally, the molecular mechanisms that contribute to their maintenance are poorly understood.


To address this question we used Cap Analysis of Gene Expression (CAGE) to create a collection of high resolution transcription start sites for 15 regions of the human brain, using post-mortem samples derived from newborn and aged adult donors.


We identified brain specific transcription, both in terms of genes that are exclusively expressed in brain as well as genes that display brain specific isoforms. Interestingly (1) a significant fraction of these brain specific genes correspond to loci that lack experimental characterization and (2) an important fraction of these brain specific signals map to unannotated regions of the genome. Taking advantage of the two age extremes represented in our dataset we were able to characterize age-specific transcription. Finally, we identified region specific transcription, in both adult and newborn: we distinguished and characterized region specific expression patterns that are already established in newborn and maintained until late age as well as region specificity that is established only later in life.


In this work we used CAGE to profile transcription start sites and their expression levels at unprecedented resolution in several regions of early postnatal and aged human brain. We created a catalogue of brain specific, age specific and region specific coding and noncoding transcripts, providing the scientific community with a valuable resource to further study molecular specificity of brain functions.


Authors: Heutink
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on ...
Target journal(s):
Internal submission date:
Contact by email: Peter Heutink, Margherita Francescatto, Morana Vitezic
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: Pathogen specific monocyte transcriptional responses

ManuscriptID: Phase1_008
Status: Working draft
Abstract:
Authors: Wells
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on ...
Target journal(s):
Internal submission date:
Contact by email: Christine Wells, Anthony Beckhouse
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: Transcriptome profiling of human skin mast cells by deep CAGE identifies unexpected gene activity patterns through direct comparison with multiple cell and tissue subsets

ManuscriptID: Phase1_009
Status: Working draft
Abstract: Contrary to other haematopoietic cells, mast cells (MCs) complete their maturation process in peripheral tissues and do not circulate, leaving them less well characterized than other myelocytes. Here, we employed deep-CAGE on skin-derived MC subsets to generate the most comprehensive view of the MC transcriptome ever reported. A particular advantage of this study is that MCs were embedded in the FANTOM5 project, giving the unique opportunity to contrast them against an extensive panel of primary cells, cell lines and tissues. The study not only confirmed that MCs expressed typical lineage marker genes at highest levels, but also identified novel bona fide MC signature genes (c1orf150, HERV-FRD, MRGPRX2, RGS13, SIGLEC6). In addition, a plethora of genes (with/out known functions) were expressed at highest levels by MCs, many of which not investigated in MCs before. Another surprising finding was that MCs expressed several marker genes of seemingly unrelated tissues (brain, adipose, kidney, placenta), not found in other cells outside that particular tissue. In contrast, genes frequently studied in MCs (e.g those encoding TLRs) were, in reality, expressed at low levels if viewed against the background of samples. Validation studies highlighted the great power of this tissue-wide profiling strategy to discover new functional programs of MCs. For example, BMPR1 was revealed to regulate gene expression, degranulation, recovery from refractoriness and survival of MCs. We also show that MCs experience profound changes in their transcriptome both upon culture (r=0.86 for ex vivo versus cultured MCs) and temporarily upon stimulation (r=0.82 2.5 h following FcεRI aggregation). Positioning MCs in the haematopoietic framework emphasizes their uniqueness, as no close relatives can be found at the whole-transcriptome level; the best correlation exists with CD34+ progenitor cells (r=0.77), followed by basophils (r=0.75) and, rather surprsingly, NK cells (r=0.75). This rich data source reveals that our knowledge of MCs is probably still fairly limited and that their full functional spectrum may be much broader than currently believed. This powerful resource can guide the MC community to develop rational experimental approaches to study MC functions with the ultimate aim to understand the overall significance of these cells in human health and disease.
Authors: Sven Guhl, Torsten Zuberbier, Magda Babina, RIKEN OSC Members
Authors contribution statement: S.G. isolated the mast cells and performed most experiments, M.B. performed several experiments, was involved in planning, supervision, and data analysis, and wrote the first draft of the manuscript, S.G., and T.Z. helped with planning, data analysis and manuscript writing.
Datasets used: Helicos CAGE on mast cell samples in comparison to freeze 1 data
Target journal(s): Blood, eBlood
Internal submission date:
Contact by email: Magda Babina, Sven Guhl
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: Effect of cytosine methylation on transcription factor binding sites and regulation of transcription

ManuscriptID: Phase1_010
Status: Working draft
Abstract:
Authors: Yulia Medvedeva et al.
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on ...
Target journal(s):
Internal submission date: End of September
Contact by email: Yulia Medvedeva
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: Transcription and enhancer profiling in human monocyte subsets

ManuscriptID: Phase1_011
Status: Good draft
Abstract: Human blood monocytes comprise at least three subpopulations that differ in phenotype and function. Here we present the first in-depth regulome analysis of classical (CD14++CD16-), intermediate (CD14+CD16+), and nonclassical (CD14dimCD16+) monocytes. Cap Analysis of Gene Expression (CAGE) adapted to Helicos single molecule sequencing was used to map transcription start sites throughout the genome in all three subsets. In addition, global maps of H3K4me1 and H3K27ac deposition were generated for classical and nonclassical monocytes defining enhanceosomes of the two major subsets. We identify differential regulatory elements (including promoters and putative enhancers) that were associated with subset-specific motif signatures corresponding to different transcription factor activities and exemplarily validate a novel downstream enhancer of the CD14 locus. In addition to known subset specific features, pathway analysis revealed marked differences in metabolic gene signatures. While classical monocytes expressed higher levels of genes involved in carbohydrate metabolism priming them for anaerobic energy production, nonclassical monocytes expressed higher levels of oxidative pathway components and showed a higher routine mitochondrial activity. Our findings describe promoter/enhancer landscapes and provide novel insights into the specific biology of human monocyte subsets.
Authors: Christian Schmidl, Kathrin Renner, Ruediger Eder, Katrin Peter, Petra Hoffmann, Reinhard Andreesen, Marina P. Kreutz, RIKEN_OSC_members, Matthias Edinger, Michael Rehli
Authors contribution statement: CS performed experiments, computational analyses and wrote parts of the manuscript writing, KR performed experiments and contributed to manuscript writing, RE isolated the cells, KP performed experiments, PH, RA, MK, and ME contributed to planning and supervision, RIKEN_OSC_members who organized or performed Helicos sequencing and provided aligned data; MR initiated, planned and supervised the study, performed computational analyses, and wrote the manuscript.
Datasets used: Helicos CAGE on monocyte subsets (Regensburg samples)
Target journal(s): Blood, eBlood, other
Internal submission date: September 1 ,2012
Contact by email: Michael Rehli, Christian Schmidl
Word document version of manuscript for editors: File:Schmidl MonoSub.docx
PDF version for general viewing (including all figs in one PDF): File:Schmidl MonoSub.pdf  


Title: An atlas of active, transcribed enhancers over 166 human tissues and 495 primary cells

ManuscriptID: Phase1_012
Status: Good draft
Abstract:
Authors: Sandelin
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on ...
Target journal(s):
Internal submission date:
Contact by email: Albin Sandelin, Robin Andersson
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: A Protein-Structure Perspective Of The FANTOM Consortium Human Transcriptome

ManuscriptID: Phase1_013
Status: Working draft
Abstract: We compare functional and molecular evolution over different human cell types from a protein-domain perspective. Utilising existing structural bioinformatic tools to analyse CAGE transcription datasets, we annotate expressed sequences with their corresponding structural-domain information.

Cell types express different portions of the genome to produce sets of unique proteins, facilitating cell function. Each of these proteins is constructed from a set of evolutionarily distinct units; protein domains. The order in which these domains appear within a given sequence defines its protein domain-architecture and is an extremely sensitive indicator of homology. We use the SUPERFAMILY database of structural-domain architecture assignments in all sequenced cellular genomes to identify the most recent common ancestor (MRCA) of each protein in the human genome. We then use this ancestral information for each of the cell-type expression samples present in the FANTOM 5 dataset to yield an evolutionary profile of cell development. These profiles identify times of human descent where a given cell type appears to have utilised more or less of the protein innovation at that time than average. By clustering cell types on these profiles, we find groups that share a common protein evolutionary history. This highlights important domain architectures defining evolutionary shifts and functional innovations. It also elucidates which order cells could have existed. Finally, we demonstrate how such a structural-domain perspective allows for effective comparisons of the molecular basis of the functional differences between cell types.
Authors: Julian Gough, Owen Rackham, Adam Sardar, Matt Oates + Sample Providers + RIKEN OSC
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on all samples
Target journal(s): Bioinformatics?
Internal submission date:
Contact by email: Julian Gough, Owen Rackham
Word document version of manuscript for editors: Rough draft available on request
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf link to google doc


Title: Promoter definition and differential expression and regulation of mammalian fibrillin/LTBP gene family members using transcriptional profiling by deep CAGE of mesenchymal cell types.

ManuscriptID: Phase1_014
Status: Working draft
Abstract: The fibrillins and latent transforming growth factor binding proteins (LTBPs) form a superfamily of extracellular matrix (ECM) proteins characterized by the presence of a unique domain, the 8-cysteine transforming growth factor beta (TGFβ) binding domain (TB domain). These proteins are involved in both maintaining the extracellular matrix and controlling the bioavailability of TGFβ family members. Genes encoding these proteins show differential expression in mesenchymal cell types which synthesise the extracellular matrix and give rise to connective tissues. We have investigated the promoter regions of the seven gene family members using using the FANTOM5 CAGE data base for human. Although the protein and nucleotide sequences showed considerable homology (for the protein sequence maximum sequence homology was 68% between FBN1 and FBN2; minimum sequence homology was 59% between FBN1 and FBN3), the promoter regions were quite diverse, with the most divergent being the FB N3 promoter. The three fibrillin genes had a single predominant promoter, while two genes in the LTBP group showed tissue specific alternative promoter usage. Although these genes were expressed in various mesenchymal cell types, there was no overlap of transcription factor motifs or activity in the promoters of the seven genes. This study highlights the redundancy of factors that control both general and highly specific expression and suggests that this important class of extracellular matrix genes is subject to subtle regulatory variations that explain the differential roles of members of this gene family.
Authors: Margaret R Davis, RIKEN OSC members, Kim M Summers
Authors contribution statement: MRD performed most of the analysis and contributed to writing the paper, RIKEN OSC did ..., KMS performed the analysis and contributed to writing the paper
Datasets used: Helicos CAGE on ...
Target journal(s):
Internal submission date:
Contact by email:
kim.summers@roslin.ed.ac.uk
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: Quantifying the informational complexity of transcriptional regulatory programmes

ManuscriptID: Phase1_015
Status: Unknown
Abstract: The regulation of gene expression defines cellular identity, it is the basis for organism development and it underlies many cellular responses to the environment. Its disruption is implicated in many diseases and changes in gene regulation appear to underlie many adaptations evident between species. Previously, genes have been grouped and interpreted based on their specificity of expression, for example house-keeping genes that are expressed by all cells in all conditions versus highly tissue restricted genes expressed by only one cell type at a particular developmental time. Although such studies have been informative they fail to capture important aspects of how a gene is regulated or account for the heterogeneous relatedness of samples. The expression pattern of a gene is the output of a regulatory program within the cell. A program that must affect many state changes (on, off, up, down) is likely to require more regulatory information (Kolmogorov complexity) than a program effecting fewer state switches. If we can quantify this "regulatory complexity" we can then start to address deeper questions as to where that regulatory information is encoded, how malleable it is through evolution and how susceptible it is to perturbation by mutation. For example, a greater regulatory complexity could correspond to a higher concentration of cis-regulatory sequences around the gene or alternatively a single binding site for a transcription factor that is the output of an extensive intracellular signalling network. To address these questions we have explored a range of possible measures regulatory complexity including distance weighted entropies, diversity and richness scores. This leads us to introduce a novel measure of regulatory complexity (CR). It is implemented as a hierarchical Baysian model parametrised through MCMC. The CR method can be thought of as a relative measure of the number of gene expression state changes occurring over a tree relating all analysed samples. A by-product of this analysis is a probabilistic scoring of gene expression state switches between all analysed gene expression libaries. CR is weighted to account for the genome wide similarity of gene expression between samples but does not depend on the inference of a fixed underlying tree topology. Note - this is intended as essentially a methods paper, see Phase1_016 for the biological insights paper
Authors: Sarah Baker, Martin Taylor
Authors contribution statement: SB developed and implemented methods and performed general analyses; MT conceived the project and oversaw implementation and performed some of the analysis
Datasets used: Helicos CAGE on primary cells from human and mouse.
Target journal(s): Bioinformatics or Genome Research
Internal submission date: ETA July 2012
Contact by email: Martin Taylor, Sarah Baker
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: Cis encoding of the master developmental regulatory programme

ManuscriptID: Phase1_016
Status: Unknown
Abstract: The regulation of gene expression defines cellular identity, it is the basis for organism development and it underlies many cellular responses to the environment. Its disruption is implicated in many diseases and changes in gene regulation appear to underlie many adaptations evident between species.
Authors: Sarah Baker, Martin Taylor
Authors contribution statement: SB developed and implemented methods and performed general analyses; MT conceived the project and oversaw implementation and performed some of the analysis
Datasets used: Helicos CAGE on primary cells from human and mouse. We may also want to use time course data for this paper (does that push it into phase2?).
Target journal(s): PLoS Biology
Internal submission date: ETA October 2012
Contact by email: Martin Taylor, Sarah Baker
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: Correspondence between CAGE clusters and chromatin marks

ManuscriptID: Phase1_017
Status: Delayed -> Moved to PHASE2
Abstract: The paper presents an analysis of the correlation between cell-type-specific CAGE clusters and chromatin marks, using FANTOM CAGE data and ENCODE ChIP-Seq data. It includes comparison of chromatin states between strong and weak clusters and between subclasses of peak distributions in clusters and cluster regions, and properties of clusters depleted of chromatin marks. The analyses show that chromatin marks can be used to identify a reasonable cutoff between low tag counts potentially representing noise, and high tag counts that are likely to represent real transcription. The analyses also show that clusters depleted of chromatin marks are mainly intragenic, and more associated with repeats than non-depleted clusters, and may also show some increased overlap with promoters in the extended ENSEMBL annotation, relative to the RefSeq annotation. (See preliminary reports from January Media:CAGE_cluster_evaluation_Drablos_Rye_13012012.pdf and April Media:CAGE_clusters_and_chromatin_Drablos_Rye_26042012.pdf.)
Authors: Morten Rye, Finn Drablos
Authors contribution statement: MR and FD did data analysis and wrote the paper
Datasets used: Helicos CAGE data, ENCODE chromatin ChIP-Seq and DNase HS data
Target journal(s):
Internal submission date: Most likely October 2012
Contact by email: Finn Drablos,Morten Rye
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: Promoter specificity in transcription determines cell lineage choice

ManuscriptID: Phase1_018
Status: Delayed (as of September 12th)
Abstract: This paper will use pathprint (pathway fingerprinting) to develop an overall phylogenetic tree of all samples in F5 freeze1. This tree will be used to determine relative ancestry of samples and cluster them accordingly. SwitchEngine will be run to find switching in TSS at key junctions in differentiation. Will show TSS dynamics at these informative sites is associated with lineage-commitment.
Authors: Emmanuel Dimont, Gabriel Altschuler, Winston Hide
Authors contribution statement: ED did ..., GA did ..., WH did ...
Datasets used: Helicos CAGE on all of F5freeze1
Target journal(s):
Internal submission date: Most likely July-August 2012
Contact by email: Winston Hide, Emmanuel Dimont, Gabriel Altschuler
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: Expression pattern evolution in mammals.

ManuscriptID: Phase1_019
Status: Working draft
Abstract: OS_LH_F5_Sat1 Punchlines:P1. F5 encyclopedia of expression patterns is the most comprehensive expression dataset facilitating groundbreaking research into the nature of expression pattern evolution in animals. Animals are primarily characterized by the existence of many cell-types and tissues, and F5 offers the first comprehensive view on the evolution of tissue-specific expression.
P2. Brain exhibits many unique features in terms of its expression pattern evolution: no new verterate genes, but rapid expression pattern divergence between 1:1 orthologs, clustering by developmental stage, intraspecies clusering for brain sublocations undelie fundamental divergence in brain expression between species      
P3. New mammalian genes tend to be expressed in the reproductive tract.
P4. Devolution of cancer cell-lines expression patterns: (1) as measured by expression pattern divergence between paralogs, (2) aberrant siganture of phylogenetically aware expression pattern profiling
P5. Recently-born genes tend to be tissue specific and their expression domain correlates with many mammalian evolutionary noveties. (except for brain where there are no new genes, and everything seems to depend on new expression patterns)
P6. Battling the trends: functional classes of genes, gene families, and tissues which contradict the genral trend for gradu al expression divergence over evolutionary time.
P7. Final word on the heated neutralist/selectionist debate in the field of expression pattern evolution: tissues can be div ided into tree classes in terms their expression evolution rates: (a) dynamic (neutral or almost neutral evolution), (b) intermediate, and (c) static (strong purifying selection for expression pattern conservation).
Authors: Lukasz Huminiecki, Oxana Sachenkova and Core RIKEN Authors
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ... and LH did everything else
Datasets used: Helicos CAGE on ... TreeFam8
Target journal(s): Genome Research
Internal submission date: June 1st
Contact by email: Lukasz Huminiecki
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:OS LH F5 Sat1 Figures.pdf


Title: Gene duplication and promoter divergence in mammals.

ManuscriptID: Phase1_020
Status: Unknown
Abstract:
Authors: Lukasz Huminiecki and Core RIKEN Authors
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ... and LH did everything else
Datasets used: Helicos CAGE on ..., F5 promoter and enhancer datasets, TreeFam8
Target journal(s): Genome Research
Internal submission date: September 1st
Contact by email: Lukasz Huminiecki
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: Gene duplication and TF/miRNA regulatory network evolution in mammals.

ManuscriptID: Phase1_021
Status: Unknown
Abstract:
Authors: Lukasz Huminiecki and Core RIKEN Authors
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ... and LH did everything else
Datasets used: Helicos CAGE on ... TreeFam8, miRBase, microRNA target predictions
Target journal(s): Genome Research
Internal submission date: December 1st
Contact by email: Lukasz Huminiecki
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: Analysis of antisense transcription in loci associated to neurodegenerative diseases

ManuscriptID: Phase1_022
Status: Working draft
Abstract: The FANTOM5 sequencing datasets represent the largest collection of transcriptomes from human cell lines, primary cells and whole tissues of various origin. Transcription starting sites are mapped at high resolution by the use of a modified protocol of Cap-Analysis of Gene Expression (CAGE) for high-throughput single molecule next-generation sequencing with Helicos (hCAGE). We employed the FANTOM5 collection of data to address the role of antisense transcription in neurodegeneration. We focused our analysis exclusively on tissues and primary cells, to avoid artifacts due to cellular transformation in culture cell lines. Among the >1261 human hCAGE libraries, we selected those of brain origin. Libraries from total blood and selected blood cell populations were also included in the analysis. A total of 66 tissue- and 244 cell-specific libraries were interrogated for the presence of antisense transcription to well-established loci associated to Alzheimer’s disease, Amyotrophic Lateral Sclerosis, Frontotemporal Dementia, Huntington’s and Parkinson’s disease. Almost all analyzed genes display some degree of antisense transcription mainly in their 5’ or 3’ UTRs. 5’ head-to-head divergent antisense transcription appears enriched compared to global distribution of sense/antisense pairs. Identified antisense transcripts may have coding and non-coding capabilities, with lncRNAs being more represented. Expressed transcripts are generally poorly annotated and may contain repetitive elements of the Alu, SINE and LINE families. Antisense transcription was validated for a subset of genes, including amyloid precursor protein, microtubule-associated protein tau, DJ-1, leucin-rich repeat kinase 2 and α-synuclein. The validated transcripts are predicted to have non-coding functions and most of them were not annotated. Quantitative analysis of antisense transcripts in human tissues indicates enrichment in the brain, compatible with FANTOM 5 data. Overall, these results represent the most comprehensive analysis of antisense transcription at loci associated to neurodegeneration and provide evidence for the existence of additional regulation of disease-related genes by previously not-annotated long non-coding RNAs.
Authors: Zucchelli SIlvia, Paolo Vatta, Stefania Fedele, Raffaella Calligaris, XXXX (from F5 consortium), Al Forrest, Piero Carninci and Stefano Gustincich
Authors contribution statement: SZ designed the experiments, analyzed the data, wrote the manuscript; PV performed the bioinformatics analysis, prepared some figures; SF designed the experiments, performed the experiments and analyzed the data; RC provided reagents, designed the experiments and analyzed the experiments; SG analyzed the data, wrote the manuscript
Datasets used: Helicos CAGE on human brain and blood samples
Target journal(s): Genome Research, Plos Genetics, Human Molecular Genetics
Internal submission date: beginning of june
Contact by email: Stefano Gustincich, Silvia Zucchelli
Word document version of manuscript for editors: File:Zucchelli FANTOM5 satellite 2012 09 14.doc
PDF version for general viewing (including all figs in one PDF): File:Zucchelli Figures.pdf


Title: Higher order chromatin structure and promoter activity

ManuscriptID: Phase1_023
Status: Delayed -> moved to PHASE2
Abstract:
Authors: Semple CA, Prendergast JG, et al
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on ...
Target journal(s):
Internal submission date: October 2012
Contact by email: Colin Semple, James Prendergast
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title:Building context depending TSS regions from thousands of profiles

ManuscriptID: Phase1_024
Status: Unknown
Abstract: about DPI
Authors: Kawaji H, et al.
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on phase1 freeze
Target journal(s):
Internal submission date:
Contact by email: KAWAJI Hideya
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title:Gateways to the transcriptional snapshots in thousands of mammalian biological states

ManuscriptID: Phase1_025
Status: Unknown
Abstract: overview and instruction to FANTOM5 resource
Authors: WP4
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on phase1 freeze, smallRNA, RNA-seq
Target journal(s):
Internal submission date:
Contact by email: KAWAJI Hideya
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title:Application of Semantic MediaWiki to snapshot of thousands of biological states in transcription

ManuscriptID: Phase1_026
Status: Unknown
Abstract: overview and instruction to the resource browser
Authors: Shimoji H, Kawaji H., WP4
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: Helicos CAGE on phase1 freeze
Target journal(s):
Internal submission date:
Contact by email: KAWAJI Hideya
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title:Comparison of CAGE and RNA-seq transcriptome profiling using a clonally amplified and single molecule next generation sequencing

ManuscriptID: Phase1_027
Status: Working draft
Abstract: CAGE (Cap Analysis Gene Expression) and RNA-seq are two major technologies used for transcript quantification. These protocols measure expression by from either the 5’ end of capped molecules (CAGE) or tags randomly distributed along the length of a transcript (RNA-seq). Library protocols for clonally amplified (Illumina, SOLiD, 454, Ion Torrent) 2nd generation sequencing platforms typically employ PCR pre-amplification prior to clonal amplification, while 3rd generation single molecule sequencers can sequence unamplified libraries. While these protocols individually have been demonstrated to be highly reproducible, no systematic comparison has been carried out between the protocols. Here we compare CAGE using both 2nd and 3rd generation sequencers and RNA-seq using a 2nd generation sequencer based on a panel of RNA mixtures from two human cell lines (THP-1 and HeLa, 100%, 50%, 20%, 10%, 5%, 1% and 0% of HeLa RNAs) to examine power to discriminate biological states, to detect differentially expressed genes, linearity of measurements as well as quantification reproducibility. Quantification by CAGE with the 2nd and 3rd generation sequencers (Illumina GA-IIx and HeliScope) were consistent at gene level, however we observed several differences, which can be explained by differences in their protocols and sequencing platforms. These include significant bias in the Illumina library, such as GC biases and over-estimation of transcripts harboring internal Ecop15I sites., A poorer correlation at the level of individual TSS positions, which is likely to be due to higher indel rate in HeliScope, is also found. We found high consistency between HeliScopeCAGE with RNA-seq (spearman correlations 0.88). Differences between CAGE and RNA-seq are explained by incompleteness of existing gene models in most cases, where 5’-ends of gene models do not reflect actual transcription starting site in the profiled cells, or RNA polymerase run through the poy adenylation site resulting in fusion of neighboring genes.
Authors: WP3
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used:
Target journal(s):
Internal submission date:
Contact by email: KAWAJI Hideya
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title:Identification of miRNA promoters and primary structures

ManuscriptID: Phase1_028
Status: Unknown
Abstract: ...
Authors: Kawaji H.
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks
Target journal(s):
Internal submission date:
Contact by email: KAWAJI Hideya
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title:CAGE profiling and epigenetic factor reveal distinct gene network feature of regulatory T-cell

ManuscriptID: Phase1_029
Status: Working draft
Abstract: Regulatory T cells (Tregs) play an essential role in keeping the immune homeostasis, and can suppress excessive immune reactions. Although the transcription factor Foxp3 has been considered as a lineage-determination factor for Tregs, its expression is insufficient for the development of Tregs. Moreover, it still remains elusive how epigenetic conversion contributes to the establishment of Treg lineage. To clarify the significance of the Treg-specific epigenetic conversion in the development of Tregs, we analyzed the correlation between genome-wide DNA methylation pattern and Treg-specific gene expression, by using methylated DNA immunoprecipitation sequencing, and the transcriptional start site (TSS) cluster analysis. We found that thousands of non-annotated RNA transcripts (or novel transcripts) were expressed in Tregs, and that Treg-specific hypomethylated regions were frequently detected in gene body regions of Treg up-regulated genes. Moreover, Treg-specific DNA hypomethylated regions were highly correlated with TSS clusters showing Treg-dominant expression. By serching transcription factor-binding consensus motifs in silico, we also found that frequently observed motifs within the upstream regions of TSS were mostly different form those within the Treg-specific DNA hypomethylated regions, suggesting that transcription factors assembled on the hypomethylated regions might be different form those for transcriptional initiation. Collectively, these results indicate that Tregs exhibit specific features of gene regulation and specific epigenetic modifications. The epigenome information would facilitate our understanding of the developmental process of Tregs, and become a more suitable marker for defining Treg cell lineage.
Authors: Hiromasa Morikawa1, Naganari Okura1, Alexis Vandenbon2, Daron M. Standley2, Shimon Sakaguchi1 (1 Experimental Immunology, Immunology Frontier Research Center, Osaka university 2 Systems Immunology, Immunology Frontier Research Center, Osaka university)
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks
Target journal(s):
Internal submission date:
Contact by email: Hiromasa Morikawa
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title:Automated clustering and quality control pipeline for CAGE technologies

ManuscriptID: Phase1_030
Status: Working draft
Abstract: To understand the manner and mechanisms of transcription initiation by RNA Polymerase II, different strategies for genome-wide detection of transcription start sites (TSSs) have been developed. We propose the clustering and quality control pipeline suitable for the Cap Analysis of Gene Expression (CAGE) sequence tags. The new framework uses parametric clustering at multiple scales and adopts the irreproducible discovery rate (IDR) to measure reproducibility between replicates of each cluster. Our pipeline reveals that genes have complicated structures of transcription initiation events and discover novel alternative promoters which were not detected by previous approaches.
Authors: Hiroko Ohmiya1, Morana Vitezic1, Martin Frith, Yoshihide Hayashizaki1, Timo Lassmann1 and many more
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used:
Target journal(s):
Internal submission date:
Contact by email: Timo Lassmann
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf



Title:Mogrify:Identifying reprogramming factors using network analysis

ManuscriptID: Phase1_31
Status: Working draft -> PHASE2?
Abstract: We now know that cellular state is a plastic phenomenon which it is possible to control. There has been a number of reports in the literature where cells have been made to go from fully differentiated cell types to pluripotency and also from one fully differentiated cell type to another. Each of these experiments was guided by a process of trial and error in order to discover the transcription factors that were capable of inducing this reprogramming. This work presents a new network based method “Mogrify” for the identification of likely candidate transcription factors for cell reprogramming. We show that we are able to predict the known reprogramming factors for 3 successful trans-differentiations (heart, neuron and liver) and then provide evidence for a further as yet un-tested transdifferentiations. The technique is then run without human intervention on every combination of cell types in the FANTOM 5 set providing a list of likely reprogramming factors (with scores) for any transition, plus a resulting evaluation of potentially viable trans-differentiations. This is what we refer to as the reprogramming landscape, and whilst it is likely that not each of the sets would work it is the first step towards a true map of cell to cell transitions.
Authors: Owen and Julian
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks in all samples
Target journal(s):
Internal submission date:
Contact by email: Owen Julian
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:Mogrify.pdf


Title: Obestity associated cancer is directly linked to fat cells by several adipokines

ManuscriptID: Phase1_32
Status: Good draft
Abstract: Obesity is associated to specific cancer forms. This may involve increased protein signals (adipokines) from fat cells to the tumors associated with obesity. Using expression data from adipose tissue of obese and nonobese and from a large panel of cancer cell lines and normal cells in FANTOM 5, we identified a receptor, LRP1 that was target for three adipokines, SERPINE1, SERPINE3 and C3 and expressed in all examined obesity associated cancer cells. The expression of the adipokines was increased in obese adipose tissue and the same was true for their adipose secretion and circulating levels. By investigating genes that were enriched in obesity associated cancers but not in the corresponding tissues or primary cells we found ceruloplasmin to be a strongly enriched gene which was highly expressed in obesity associated cancer and also significantly up-regulated in adipose tissue of obese subjects. Ceruloplasmin is the body’s main carrier of copper and is involved in angiogenesis. We demonstrated it to be an adipokine contributing markedly (22%) to the total circulating level in obesity. In summary, we have identified several adipokines which can serve as endocrine or permissive signals from fat cells to tumors. These signals are increased in obesity and may thus represent a direct link between fat cells and different forms of cancer that are associated with excess body fat.


Authors: Erik Arner, [other KI authors], [other OSC authors], Peter Arner
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks
Target journal(s): Cancer Research
Internal submission date:
Contact by email: Erik Arner
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title: ZENBU

ManuscriptID: Phase1_33
Status: Working draft
Abstract:The world of genome sciences has dramatically changed over the last 5 years. With the advent of next generation sequencers and RNA-expression sequencing, genome science is no longer the domain of a few elite centralized "genome centers" like in the early days of the field. The advance of next-generation sequencers has spurred an ever-growing body of tag-based data allowing the survey of chromatin states and transcriptome dynamics. Visualization of expression levels of genomic regions was achieved by displaying expression levels in various experimental conditions in dedicated tracks allowing investigators a direct comparison of their dynamics. Novel file formats and browser design have allowed for dealing efficiently with the depth of data produced by next-generation sequencer based technologies. Researchers need to interact within global collaborations and need easy ways to process, share and visualize their data in a secured manner prior to publication. To this end we have developed the ZENBU system. ZENBU is a web based system which is a social networking platform for secured data upload and data sharing with collaborators, a data processing system, and a visualization system. ZENBU provides the infrastructure for working with 100s of terrabytes of sequence data in the form of BAM sequence alignment files and genome annotation formats like BED and GFF, to efficiently cross-analyze these databsets using a Map-Reduce/autonomous-agent based parallel processing system, and provide fast efficient web services for user interfaces. The user interfaces for ZENBU is based on Web2.0 technologies in the form of a new expression-enhanced genome browser, and data manipulation interfaces for data upload, data processing, and data download. ZENBU currently contains the entire FANTOM 3/4/5 datasets, the entire ENCODE datasets, and much of the UCSC genome annotation data. ZENBU is planned to be a corner stone in the expanding global network of scientific sharing web systems.


Authors: Jessica Severin*, Marina Lizio, Jayson Harshbarger, Hideya Kawaji, Carsten Daub, The FANTOM5 consortium, Yoshihide Hayashizaki, Nicolas Bertin*, Alistair Forrest*
Authors contribution statement: JMS, ML, JH, HK, CD, YH, NB, AL

  • JMS, wrote the software/webservices.
  • JMS, NB, planned the study.
  • NB supervised the study.
  • JMS, NB, contributed to the manuscript writing.
  • JMS, NB, gave valuable input to the analysis in the manuscript.
  • JMS, NB, critically reviewed the manuscript.
  • [addition of any other, clearer or more precise statement is very welcome]


Datasets used: phase1 CAGE peaks
Target journal(s): Nature Biotech/Genome Research
Internal submission date:
Contact by email: Jessica Severin, Nicolas Bertin, Alistair Forrest
Word document version of the most up to date manuscript draft: File:ZENBU manuscript.014 (1).docx
Word document version of manuscript for editors: [[]]
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf


Title:The enhancer and promoter landscape of regulatory and conventional T cell subpopulations

ManuscriptID: Phase1_34
Status: almost finished manuscript
Abstract: CD4+CD25+FOXP3+ human regulatory T cells (Treg) are essential for self-tolerance and immune homeostasis. Here, we describe the promoterome of CD4+CD25highCD45RA+ naïve and CD4+CD25highCD45RA– memory Treg and their CD25– conventional T cell (Tconv) counterparts both before and after in vitro expansion by cap analysis of gene expression adapted to single molecule sequencing (HeliscopeCAGE). We performed comprehensive comparative digital gene expression analyses and revealed new orphan transcription start sites, of which several were validated as alternative promoters of known genes including FOXP3 and CTLA4. For all in vitro expanded subsets, we additionally generated genome-wide maps of poised and active enhancer elements marked by histone H3 lysine 4 monomethylation and histone H3 lysine 27 acetylation. Analysis of cell type-specific regulatory elements revealed a specific enrichment of several transcription factor binding motifs. We validated promising candidates by chromatin immunoprecipitation coupled to next generation sequencing and identified STAT5 and FOXP3 as well as RUNX1 and ETS1 as global regulators of Treg- and Tconv-specific enhancers, respectively. In summary we provide a highly detailed and easily accessible resource of gene expression and -regulation in Treg and Tconv subpopulations.
Authors: R
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks
Target journal(s): Blood
Internal submission date:
Contact by email: Christian Schmidl, Michael Rehli
Word document version of manuscript for editors: File:121027 FANTOM Treg manuscript.docx
PDF version for general viewing (including all figs in one PDF): File:Schmidl Treg.pdf


Title:Systematic in-vivo characterization of active enhancers across the human body

ManuscriptID: Phase1_35
Status: Almost finished manuscript
Abstract: In higher organisms, cellular development and diversity is highly controlled by enhancers, which regulate the correct temporal and cell type-specific activation of gene expression. Despite their obvious importance for development and disease, the exact locations, target genes and mechanisms of enhancers are still poorly defined. Thus, there is an urgent need not only to identify enhancer locations, but also to elucidate their specific usage across the wide diversity of cells within the human body, their impact on regulation in healthy and diseased individuals, and how enhancers interact with target genes. Here, we use the FANTOM5 panel of tissue and primary cell samples covering the majority of human tissues and cell types to define an atlas of active, in vivo bidirectionally transcribed enhancers across the human body. It enables comparison of regulatory programs between different cells and tissues at unprecedented depth, and makes it possible to define distinct subsets of enhancers, including fetal-specific, cell-specific and ubiquitous enhancers – a novel enhancer subtype with distinct properties. We show that known target genes of enhancers can be recaptured using expression correlations and predict many novel enhancer-TSS associations. We present models confirming the utility of multiple redundant enhancers, which explain TSS expression strength rather than expression patterns. We demonstrate that disease-associated functional single nucleotide polymorphisms are over-represented in enhancers and that such enhancers often have disease-relevant expression patterns. The human enhancer atlas can be accessed through an online database and is a unique resource for studies on tissue/cell-specific enhancers and their gene interactions.
Authors: Robin Andersson1#, Claudia Gebhard2#, Irene Miguel-Escalada3, Ilka Hoof1, Xiaobei Zhao1, Christian Schmidl2, Eivind Valen1,4, Kang Li1, Lucia Schwarzfischer2, Dagmar Glatz2, Johanna Raithel2, Yun Chen1, Berit Lilje1, Nicolas Rapin1,5, Frederik Otzen Bagger1,5, Mette Jørgensen1, Mette Boyd1, Jette Bornholdt1, Kenneth Baillie6, Chris Mungall7, Timo Lassmann8, Hideya Kawaji8, Andreas Lennartsson9, Carsten Daub8,9, David Hume6, Peter Heutnik10, Alistair Forrest8, Piero Carninci8, Yoshihide Hayashizaki8, Ferenc Müller3, Michael Rehli2*, Albin Sandelin1*
Authors contribution statement: RA, IH, EV, KL, YC, BL, XZ, MJ, HK, TL, KB, CM, NR, FOB, MR, AS made the computational analysis. TL, HK, CD, AF, PC, YH prepared, mapped and analyzed CAGE libraries. RA, CG, IH, EV, FM, PC, AF, AK, MB, JBL, AL, CD, DH, PH MR, AS interpreted results. CG, CS, ME, MR made the blood cell ChIP experiments, methylation assays and in vitro blood cell validations. IME, FM made zebrafish in vivo validations and interpretations. RA, CG, IH, FM, MR, AS wrote the paper.
Datasets used: phase1 CAGE peaks and raw CAGE mapped data from human, internal ChIP and other validation data
Target journal(s): To be decided
Internal submission date:
Contact by email: [robin@binf.ku.dk, michael.rehli@klinik.uni-regensburg.de, albin@binf.ku.dk , Michael Rehli Albin Sandelin]
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf File:Enhancerome full.pdf


Title: Transcriptome dynamics in the mesenchymal stem/stromal cells of high-grade-serous-ovarian cancer microenvironment.

ManuscriptID: Phase1_036
Status: Working draft
Abstract: From the most recent and accumulating evidence, the role of cancer microenvironment is being recognized as one of the most critical hallmarks in both cancer progression and metastasis. Mesenchymal Stem/Stromal Cells (MSCs) are the precursors of various cell types that compose both normal and cancer tissue microenvironments. We have isolated MSCs from various High-Grade Serous Ovarian Carcinomas (HG-SOCs),demonstrated their normal genotype and analyzed their transcriptome using deep-CAGE analysis with respect to similarly derived normal tissues MSCs and FANTOM5 sample data-set. The integrative analysis conducted against the extensive panel of primary cells and tissues of the FANTOM5 project allowed us to identify a cell-type specific transcriptional activity associated with the HG-SOC-MSCs. The hierarchical clustering analysis shows that MSCs derived from HG-SOCs co-cluster with other MSCs while retaining distinct transcriptional peculiarities. Their transcriptional activity shows a very strong correlation with that of primary mesothelial cells, which actually represent the embryonic cellular origin of serous ovarian cancer. Most importantly, this analysis has revealed HG-SOC-MSCs specific identity when compared to similarly derived MSCs from normal tissues such as bone marrow, heart and adipose tissues, enforcing the idea that the environment organized by the transformed serous ovarian cancer cells could be responsible for establishing such transcriptional specificity in the resident/mobilized stromal precursor cells. Integrating the identified transcriptional signatures of the HG-SOC-MSCs with the gene expression matrices of the publicly available TCGA HG-SOC dataset, we were able to trace HG-SoC-MSC signature in a fraction of the tumor samples. Altogether, the reported analysis support the hypothesis that HG-SOC-MSCs are bona-fide representatives of the ovarian district, either tracing their specific mesothelial origin or highlighting their epigenetic conditioning by the HG-SOC enviroment
Authors: Roberto Verardo, Silvano Piazza, RIKEN_OSC_members, Claudio Schneider
Authors contribution statement: ED conceived the project, developed part of the software, oversaw implementation, performed some of the analysis and most manuscript writing; SP developed and implemented part of the software, carried out statistical tests and results interpretation and wrote parts of the manuscript; YC implemented part of the software and prepared some figures; MZ developed and implemented part of the software; AP developed part of the software and contributed to the manuscript writing; CS supervised the study
Datasets used: Helicos CAGE on all of F5freeze1
Target journal(s):
Internal submission date: June 1st 2012
Contact by email: Claudio Schneider
Word document version of manuscript for editors: File:Xxx claudio.doc
PDF version for general viewing (including all figs in one PDF): File:Claudio.pdf



Manuscript template

NOTE: Make a copy of the format below, paste it above and then edit with your details


Title:COPY THEN EDIT THIS TEMPLATE

ManuscriptID: Phase1_00x (INCREMENT THIS)
Abstract: ...
Authors: R
Authors contribution statement: MR did ..., TO did ..., KE did ..., EA did ..., AL did ...
Datasets used: phase1 CAGE peaks
Target journal(s):
Internal submission date:
Contact by email: CHANGETHIScorresponding1 CHANGETHIScorresponding2
Word document version of manuscript for editors: File:XXXYOUR.doc
PDF version for general viewing (including all figs in one PDF): File:XXXYOUR.pdf