Extend rat gene models with CAGEscan: Difference between revisions
From Wiki
Jump to navigationJump to search
(→Goal) |
No edit summary |
||
| Line 10: | Line 10: | ||
= Data = |
= Data = |
||
== |
== Sample preparation and sequencing == |
||
10009-101B8 is the same RNA as used for CNhs10614, the ‘Universal RNA - Rat Normal Tissues’ HelicosCAGE library. |
10009-101B8 is the same RNA as used for CNhs10614, the ‘Universal RNA - Rat Normal Tissues’ HelicosCAGE library. |
||
| Line 22: | Line 22: | ||
Name scheme: <code>name_lane_direction.fq.bz2</code>. The sequencer [[Dataset_introduction#Keywords.2C_jargon|lane]] is indicated but should not have importance. Direction 1 is 5′ and direction 2 is 3′. |
Name scheme: <code>name_lane_direction.fq.bz2</code>. The sequencer [[Dataset_introduction#Keywords.2C_jargon|lane]] is indicated but should not have importance. Direction 1 is 5′ and direction 2 is 3′. |
||
= |
== Mapping == |
||
* pending... |
|||
* should Copenhagen align as well? |
|||
= Goals = |
|||
Contribute experimental evidence that extends and update the gene models in rat. |
Contribute experimental evidence that extends and update the gene models in rat. |
||
| Line 35: | Line 40: | ||
* Alternative promoters |
* Alternative promoters |
||
* Other |
* Other |
||
= Project outline = |
|||
* Setup of scripting repo |
|||
* Get an overview of data |
|||
* (Mapping), processing, filtering |
|||
* How to deal with triplicates. Variance filtering? |
|||
* Tag clustering |
|||
* Aggregation of existing gene models. |
|||
* Define support(x) |
|||
* Integration of existing rat RNA-seq data from SRA/GEO? |
|||
= Definitions = |
|||
* Correct annotation |
|||
* Alternative promoter |
|||
* Definition to belong to a known promoter: Max shift allowed? The second cagescan tag falls in the known gene? |
|||
* Filter: Seen with both Cagescan and hCage |
|||
Revision as of 21:52, 23 March 2011
Background
- Some rat gene models miss the real 5′ ends.
- HelicosCAGE and CAGEscan libraries are available from a “universal” rat RNA preparation.
Data
Sample preparation and sequencing
10009-101B8 is the same RNA as used for CNhs10614, the ‘Universal RNA - Rat Normal Tissues’ HelicosCAGE library.
- NCig10012: 2 × 54 bp CAGEscan library, 6,903,269 reads. Index sequence
GCTCAG. - NCig10071: 2 × 36 bp experimental CAGEscan, 2,893,6176 reads. Index sequences
ACAGATGCTATA,ATCGTGGCTATA,CACGATGCTATA,CACTGAGCTATA,CTGACGGCTATA,GAGTGAGCTATA,GTATACGCTATA,TCGAGCGCTATA. - NChi10001: 2 × 51 bp CAGEscan library (HiSeq test run), 9,662,576 reads. Index sequence
GCTCAG.
Bzipped FASTQ files are available in <https://fantom5-collaboration.gsc.riken.jp/webdav/home/plessy/FASTQ/>. See CAGEscan_mapping_protocol on what to trim from the reads before aligning.
Name scheme: name_lane_direction.fq.bz2. The sequencer lane is indicated but should not have importance. Direction 1 is 5′ and direction 2 is 3′.
Mapping
- pending...
- should Copenhagen align as well?
Goals
Contribute experimental evidence that extends and update the gene models in rat.
Key questions:
- How many rat “refseq” TSS’es are correct?
- How many rat “refseq” TSS’es are shifted, either 5’ or 3’?
- Function of shifted genes, ontological enrichment of shifted genes compared to the correctly annotated ones?
- How many new genes with support(x)?
- Investigate high confidence TSSs (evidence from both CAGE-methods)
- Alternative promoters
- Other
Project outline
- Setup of scripting repo
- Get an overview of data
- (Mapping), processing, filtering
- How to deal with triplicates. Variance filtering?
- Tag clustering
- Aggregation of existing gene models.
- Define support(x)
- Integration of existing rat RNA-seq data from SRA/GEO?
Definitions
- Correct annotation
- Alternative promoter
- Definition to belong to a known promoter: Max shift allowed? The second cagescan tag falls in the known gene?
- Filter: Seen with both Cagescan and hCage