cuffdiff with reference gtf

vschulz

Junior Member

Join Date: Apr 2009

Posts: 8
- Share
- Tweet
#1

cuffdiff with reference gtf

10-18-2012, 05:35 AM

I have a question about cuffdiff using only a reference gtf annotation file. I am seeing that the gene start/stop in the output of cuffdiff does not always match the reference gtf that I provide. Here is an outline of what I am doing:

-map reads using tophat2 to igenome UCSC mm9 fasta sequence with igenome UCSC mm9 gtf files from ftp.illumina.com/Mus_musculus/UCSC/mm9/Mus_musculus_UCSC_mm9.tar.gz
-run cuffdiff 2.0.2 using igenome UCSC mm9 gtf file using a command like
$cuffdir/cuffdiff -p 4 --upper-quartile-norm --multi-read-correct -M $rRNAgtf --frag-bias-correct $genome -o $datadir/cuffdiffRefConVsPrimed3 -L Control,Primed $gtfFile \
$datadir/Control_7/tophat2_out/accepted_hitsSD.bam,$datadir/Control_11/tophat2_out/accepted_hitsSD.bam \
$datadir/Primed_4/tophat2_out/accepted_hitsSD.bam,$datadir/Primed_9/tophat2_out/accepted_hitsSD.bam

An example:
grep 0610005C13Rik $gtfFile
chr7 unknown exon 52823165 52823749 . - . gene_id "0610005C13Rik"; transcript_id "NR_038166"; gene_name "0610005C13Rik"; tss_id "TSS24565";
chr7 unknown exon 52823165 52823749 . - . gene_id "0610005C13Rik"; transcript_id "NR_038165"; gene_name "0610005C13Rik"; tss_id "TSS24565";
chr7 unknown exon 52826356 52826562 . - . gene_id "0610005C13Rik"; transcript_id "NR_038166"; gene_name "0610005C13Rik"; tss_id "TSS24565";
chr7 unknown exon 52829783 52829892 . - . gene_id "0610005C13Rik"; transcript_id "NR_038166"; gene_name "0610005C13Rik"; tss_id "TSS24565";
chr7 unknown exon 52829783 52829892 . - . gene_id "0610005C13Rik"; transcript_id "NR_038165"; gene_name "0610005C13Rik"; tss_id "TSS24565";
chr7 unknown exon 52829978 52830147 . - . gene_id "0610005C13Rik"; transcript_id "NR_038166"; gene_name "0610005C13Rik"; tss_id "TSS24565";
chr7 unknown exon 52829978 52830147 . - . gene_id "0610005C13Rik"; transcript_id "NR_038165"; gene_name "0610005C13Rik"; tss_id "TSS24565";
chr7 unknown exon 52830497 52830546 . - . gene_id "0610005C13Rik"; transcript_id "NR_038166"; gene_name "0610005C13Rik"; tss_id "TSS24565";
chr7 unknown exon 52830497 52830546 . - . gene_id "0610005C13Rik"; transcript_id "NR_038165"; gene_name "0610005C13Rik"; tss_id "TSS24565";

grep 0610005C13Rik gene_exp.diff
0610005C13Rik 0610005C13Rik 0610005C13Rik chr7:52823164-52845080 Control Primed NOTEST 0 0 0 0 1 1 no

Note that one end 52823164 is correct (52823165 off by one due to end convention?), but the other end is not 52845080 vs. 52830546. Where did the size come from? It seems to be from the overlapping Bcat2 gene
grep Bcat2 ../cuffdiffRefConVsPrimed2/gene_exp.diff
Bcat2 Bcat2 Bcat2 chr7:52823164-52845080 Control Primed OK 42.9202 96.0403 1.16198 -0.909377 0.363151 0.999861 no
Is the data for the two genes somehow separated out, and I don't need to worry about the strange start/end points?

Thanks,

Vince
Tags: cuffdiff gtf

Previous template Next

Latest Developments in Precision Medicine

by seqadmin

Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
- Channel: Articles
Yesterday, 01:16 PM
Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM

Topics	Statistics	Last Post
New Toolkit Enhances Plant Mitochondrial Genome Research by seqadmin Started by seqadmin, Yesterday, 07:15 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 07:15 AM
Catalog of Gene-Isoform Variation in Developing Human Brain by seqadmin Started by seqadmin, 05-23-2024, 10:28 AM	0 responses 15 views 0 likes	Last Post by seqadmin 05-23-2024, 10:28 AM
Ancient Viral Sequences in Human Brain Linked to Psychiatric Disorders by seqadmin Started by seqadmin, 05-23-2024, 07:35 AM	0 responses 16 views 0 likes	Last Post by seqadmin 05-23-2024, 07:35 AM
New Milestone for COSMIC with Extensive Cancer Mutation Data by seqadmin Started by seqadmin, 05-22-2024, 02:06 PM	0 responses 9 views 0 likes	Last Post by seqadmin 05-22-2024, 02:06 PM

Seqanswers Leaderboard Ad

Announcement

cuffdiff with reference gtf

Latest Articles

ad_right_rmr

News