hello,
can someone point me to a place where I can find comprehensive information about the meaning of all the values in this file (or can give me the answers here):
RUM - feature quantification (see an example below):
1) Ucount for the transcript is the number of all unique mapped reads to this transcript, and therefore the sum of the numbers for each exon is higher, because some reads map to two exons (i.e. are split) - true?
2) maps to introns are also counted for the transcript - true?
3) maps to different transcripts of one gene are counted for each transcript, therefore the numbers for isoforms are similar, i.e. from the example below: 160 and 149 - true?
4) how are the min and max values calculated exactly? from the U- and NUcounts (to exons and introns?) following some statistical rational (e.g. mapping quality of multiple mapped reads) - if yes, this would be influenced by the max number of reported mapped reads (by default: 100).
5) would it be possible to get a robust count number for each transcript like you get with HTseq-count and for each gene (as sum of all transcripts) not for each transcript)?
6) why do I get with the script <featurequant2geneprofiles.pl> if I use -cnt and -sformat always the max values, and not the Ucount numbers for the transcripts?
CREB3L4_ENST00000461688(ensembl) 0
Type Location min max Ucount NUcount Length
transcript 1:153940401-153945244 2.3635 2.4816 160 8 870
exon 1 1:153940401-153940640 3.427 3.427 64 0 240
intron 1 1:153940641-153940997 0.6839 0.6839 19 0 357
exon 2 1:153940998-153941115 2.8317 2.8317 26 0 118
intron 2 1:153941116-153941405 1.3294 1.3294 30 0 290
exon 3 1:153941406-153941652 4.5266 4.5266 87 0 247
intron 3 1:153941653-153941809 0 0 0 0 157
exon 4 1:153941810-153941931 6.4257 7.2685 61 8 122
intron 4 1:153941932-153942111 0 0 0 0 180
exon 5 1:153942112-153942229 0 0 0 0 118
intron 5 1:153942230-153945219 0 0 0 0 2990
exon 6 1:153945220-153945244 15.9359 20.0484 31 8 25
--------------------------------------------------------------------
CREB3L4_ENST00000368601(ensembl) 0
Type Location min max Ucount NUcount Length
transcript 1:153940713-153945548 2.0156 2.1238 149 8 950
exon 1 1:153940713-153940786 0 0 0 0 74
intron 1 1:153940787-153940997 0 0 0 0 211
exon 2 1:153940998-153941175 3.0323 3.0323 42 0 178
intron 2 1:153941176-153941405 0 0 0 0 230
exon 3 1:153941406-153941652 4.5266 4.5266 87 0 247
intron 3 1:153941653-153941809 0 0 0 0 157
exon 4 1:153941810-153941931 6.4257 7.2685 61 8 122
intron 4 1:153941932-153945219 0 0 0 0 3288
exon 5 1:153945220-153945548 2.3828 2.6953 61 8 329
can someone point me to a place where I can find comprehensive information about the meaning of all the values in this file (or can give me the answers here):
RUM - feature quantification (see an example below):
1) Ucount for the transcript is the number of all unique mapped reads to this transcript, and therefore the sum of the numbers for each exon is higher, because some reads map to two exons (i.e. are split) - true?
2) maps to introns are also counted for the transcript - true?
3) maps to different transcripts of one gene are counted for each transcript, therefore the numbers for isoforms are similar, i.e. from the example below: 160 and 149 - true?
4) how are the min and max values calculated exactly? from the U- and NUcounts (to exons and introns?) following some statistical rational (e.g. mapping quality of multiple mapped reads) - if yes, this would be influenced by the max number of reported mapped reads (by default: 100).
5) would it be possible to get a robust count number for each transcript like you get with HTseq-count and for each gene (as sum of all transcripts) not for each transcript)?
6) why do I get with the script <featurequant2geneprofiles.pl> if I use -cnt and -sformat always the max values, and not the Ucount numbers for the transcripts?
CREB3L4_ENST00000461688(ensembl) 0
Type Location min max Ucount NUcount Length
transcript 1:153940401-153945244 2.3635 2.4816 160 8 870
exon 1 1:153940401-153940640 3.427 3.427 64 0 240
intron 1 1:153940641-153940997 0.6839 0.6839 19 0 357
exon 2 1:153940998-153941115 2.8317 2.8317 26 0 118
intron 2 1:153941116-153941405 1.3294 1.3294 30 0 290
exon 3 1:153941406-153941652 4.5266 4.5266 87 0 247
intron 3 1:153941653-153941809 0 0 0 0 157
exon 4 1:153941810-153941931 6.4257 7.2685 61 8 122
intron 4 1:153941932-153942111 0 0 0 0 180
exon 5 1:153942112-153942229 0 0 0 0 118
intron 5 1:153942230-153945219 0 0 0 0 2990
exon 6 1:153945220-153945244 15.9359 20.0484 31 8 25
--------------------------------------------------------------------
CREB3L4_ENST00000368601(ensembl) 0
Type Location min max Ucount NUcount Length
transcript 1:153940713-153945548 2.0156 2.1238 149 8 950
exon 1 1:153940713-153940786 0 0 0 0 74
intron 1 1:153940787-153940997 0 0 0 0 211
exon 2 1:153940998-153941175 3.0323 3.0323 42 0 178
intron 2 1:153941176-153941405 0 0 0 0 230
exon 3 1:153941406-153941652 4.5266 4.5266 87 0 247
intron 3 1:153941653-153941809 0 0 0 0 157
exon 4 1:153941810-153941931 6.4257 7.2685 61 8 122
intron 4 1:153941932-153945219 0 0 0 0 3288
exon 5 1:153945220-153945548 2.3828 2.6953 61 8 329