Questions on cutoff setting of FPKM value & know genes filtering in Cuffmerge result

zhlingl

Junior Member

Join Date: Mar 2011

Posts: 1
- Share
- Tweet
#1

Questions on cutoff setting of FPKM value & know genes filtering in Cuffmerge result

04-05-2012, 04:31 AM

Hello~

I am running the Tuxedo protocol and trying to discover novel transcripts from RNA-seq data of several samples of mouse. As said in the protocol, I mapped the reads for each sample to the reference genome using Tophat (with -G parameter specified to guide the mapping process), and then assembled transcripts for each sample using Cufflinks (with -b and -u parameters specified to enable bias correction).

After that I ran Cuffmerge on all sample assembies to create a merged transcriptome (with -g and -s parameters specified). I would like to set a cutoff on the FPKM value to filter low expression transcripts(or background noise) for further investigation, but the FPKM values in the Cuffmerge output "transcripts.gtf" file seem to range from 0 to 1, even though the corresponding FPKM values in each separate sample assembly (the Cufflinks output "transcript.gtf" file) may present at the level of hundreds or even thousands. Did Cuffmerge go through some kind of normalization? Information on cuffmerge output in Cufflinks official website is very limited:

cuffmerge Output

cuffmerge produces a GTF file that contains an assembly that merges together the input assemblies.

<outprefix>/merged.gtf

So if I would stick to my plan and run the cuffmerge result through the FPKM filter, what value would be a appropriate threshold? Or should I apply the filter on each sample assembly (which will lead to another question that whether to keep or to leave out a transcript that is high expressed in one sample and low expressed in another)? Or should I use the combined.gtf from cuffcompare output instead?

Another thing is puzzling me is that if I want to filer out known genes(those annotated in UCSC,for example), can I feed the transcripts.gtf file previously built by cuffmerge and a GTF file that contain information on these genes to cuffcompare, and simply cross out transcripts marked with "class code" =, c, j, e in the resulting <outprefix>.tracking file(or otherwise keep those with "class code" u) ?

Class Codes

If you ran cuffcompare with the -r option, tracking rows will contain the following values. If you did not use -r, the rows will all contain "-" in their class code column.
Priority Code Description
1 = Complete match of intron chain
2 c Contained
3 j Potentially novel isoform (fragment): at least one splice junction is shared with a reference transcript
4 e Single exon transfrag overlapping a reference exon and at least 10 bp of a reference intron, indicating a possible pre-mRNA fragment.
5 i A transfrag falling entirely within a reference intron
6 o Generic exonic overlap with a reference transcript
7 p Possible polymerase run-on fragment (within 2Kbases of a reference transcript)
8 r Repeat. Currently determined by looking at the soft-masked reference sequence and applied to transcripts where at least 50% of the bases are lower case
9 u Unknown, intergenic transcript
10 x Exonic overlap with reference on the opposite strand
11 s An intron of the transfrag overlaps a reference intron on the opposite strand (likely due to read mapping errors)
12 . (.tracking file only, indicates multiple classifications)

Any suggestion would be greatly appreciated~
Tags: cuffcompare, cufflinks, cuffmerge, tophat

Previous template Next

Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
Today, 07:48 AM
Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM

Topics	Statistics	Last Post
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, Today, 07:17 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 20 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM

Seqanswers Leaderboard Ad

Announcement

Questions on cutoff setting of FPKM value & know genes filtering in Cuffmerge result

Latest Articles

ad_right_rmr

News