Hi
Everyone
I have been trying to wrap my head around these problems and I am sorry if someone has posted these exact problems earlier:
a) Does CUFFDIFF2 handles a .bam file that is a mixture of singletons and paired end fragments?
I am little uncertain if it does that or not. Doesn't throw an error for sure.
[Because the manual talks about fragments and how they are used to find assembled isoforms and then abundance estimation of the most likely isoforms, by that I mean most likely reads/fragments must have been generated from an isoform.]
b) How does cuffdiff calculates (by that I mean the actual calculation not statistical explanation, just data, numbers and formula wise explanation) raw fragment counts, internally scaled and externally scaled counts in the files generated such as genes.count_tracking, isoform.count_tracking , genes.read_group_tracking, and isoform.read_group_tracking?
c) In my experiment there 2 replicates for control and 2 for experiment. I used cuffdiff to find out the differentially expressed genes.
I used -c 10 for my CUFFDIFF analysis.
For a gene that has no isoform and is a single exon gene, I see a NOTEST when there is clearly enough reads in both of my experimental replicates(q1_exp=28,q2_exp=20).
But when I check the genes.count_tracking=> 8.9382, and the raw fragment counts for q1_exp=15 and q2_exp=2,
internal_scaled_frags external_scaled_frags
q1_exp=> 15.8613 15.8613
q2_exp=> 2.01515 2.01515
This is a small genome and around 95% of genes in the GTF file do not have any isoforms, so I don't think there is any isoform switching going on over here.
I just want to know how these numbers come from because the raw counts are different from my read count that involves the following strategy:
samtools view *.bam scaffold_3:236757-237675 |cut -f1|sort|uniq -c |wc -l
Hope this makes sense. Please help.
I will be very grateful for your concern.
Everyone
I have been trying to wrap my head around these problems and I am sorry if someone has posted these exact problems earlier:
a) Does CUFFDIFF2 handles a .bam file that is a mixture of singletons and paired end fragments?
I am little uncertain if it does that or not. Doesn't throw an error for sure.
[Because the manual talks about fragments and how they are used to find assembled isoforms and then abundance estimation of the most likely isoforms, by that I mean most likely reads/fragments must have been generated from an isoform.]
b) How does cuffdiff calculates (by that I mean the actual calculation not statistical explanation, just data, numbers and formula wise explanation) raw fragment counts, internally scaled and externally scaled counts in the files generated such as genes.count_tracking, isoform.count_tracking , genes.read_group_tracking, and isoform.read_group_tracking?
c) In my experiment there 2 replicates for control and 2 for experiment. I used cuffdiff to find out the differentially expressed genes.
I used -c 10 for my CUFFDIFF analysis.
For a gene that has no isoform and is a single exon gene, I see a NOTEST when there is clearly enough reads in both of my experimental replicates(q1_exp=28,q2_exp=20).
But when I check the genes.count_tracking=> 8.9382, and the raw fragment counts for q1_exp=15 and q2_exp=2,
internal_scaled_frags external_scaled_frags
q1_exp=> 15.8613 15.8613
q2_exp=> 2.01515 2.01515
This is a small genome and around 95% of genes in the GTF file do not have any isoforms, so I don't think there is any isoform switching going on over here.
I just want to know how these numbers come from because the raw counts are different from my read count that involves the following strategy:
samtools view *.bam scaffold_3:236757-237675 |cut -f1|sort|uniq -c |wc -l
Hope this makes sense. Please help.
I will be very grateful for your concern.