Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions on cutoff setting of FPKM value & know genes filtering in Cuffmerge result

    Hello~

    I am running the Tuxedo protocol and trying to discover novel transcripts from RNA-seq data of several samples of mouse. As said in the protocol, I mapped the reads for each sample to the reference genome using Tophat (with -G parameter specified to guide the mapping process), and then assembled transcripts for each sample using Cufflinks (with -b and -u parameters specified to enable bias correction).

    After that I ran Cuffmerge on all sample assembies to create a merged transcriptome (with -g and -s parameters specified). I would like to set a cutoff on the FPKM value to filter low expression transcripts(or background noise) for further investigation, but the FPKM values in the Cuffmerge output "transcripts.gtf" file seem to range from 0 to 1, even though the corresponding FPKM values in each separate sample assembly (the Cufflinks output "transcript.gtf" file) may present at the level of hundreds or even thousands. Did Cuffmerge go through some kind of normalization? Information on cuffmerge output in Cufflinks official website is very limited:

    cuffmerge Output

    cuffmerge produces a GTF file that contains an assembly that merges together the input assemblies.

    <outprefix>/merged.gtf
    So if I would stick to my plan and run the cuffmerge result through the FPKM filter, what value would be a appropriate threshold? Or should I apply the filter on each sample assembly (which will lead to another question that whether to keep or to leave out a transcript that is high expressed in one sample and low expressed in another)? Or should I use the combined.gtf from cuffcompare output instead?


    Another thing is puzzling me is that if I want to filer out known genes(those annotated in UCSC,for example), can I feed the transcripts.gtf file previously built by cuffmerge and a GTF file that contain information on these genes to cuffcompare, and simply cross out transcripts marked with "class code" =, c, j, e in the resulting <outprefix>.tracking file(or otherwise keep those with "class code" u) ?

    Class Codes

    If you ran cuffcompare with the -r option, tracking rows will contain the following values. If you did not use -r, the rows will all contain "-" in their class code column.
    Priority Code Description
    1 = Complete match of intron chain
    2 c Contained
    3 j Potentially novel isoform (fragment): at least one splice junction is shared with a reference transcript
    4 e Single exon transfrag overlapping a reference exon and at least 10 bp of a reference intron, indicating a possible pre-mRNA fragment.
    5 i A transfrag falling entirely within a reference intron
    6 o Generic exonic overlap with a reference transcript
    7 p Possible polymerase run-on fragment (within 2Kbases of a reference transcript)
    8 r Repeat. Currently determined by looking at the soft-masked reference sequence and applied to transcripts where at least 50% of the bases are lower case
    9 u Unknown, intergenic transcript
    10 x Exonic overlap with reference on the opposite strand
    11 s An intron of the transfrag overlaps a reference intron on the opposite strand (likely due to read mapping errors)
    12 . (.tracking file only, indicates multiple classifications)
    Any suggestion would be greatly appreciated~

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    Today, 07:48 AM
  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 07:17 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-02-2024, 08:06 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-30-2024, 12:17 PM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-29-2024, 10:49 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Working...
X