Dear colleagues,
i am building a pipeline to estimate gene abundance (expression) from RNA-seq data. I am wondering if my plan is reasonable:
a) map reads with bowtie using -m 10 (for example), allowing 10 multiple hits per read
a1) here i don't understand how the mapq values will be set in the SAM format, i understand that with allowing only single hits (-m 1) all mapq values will be 255
b) take only unmapped reads from a) for mapping with tophat
b1) again same question, with -g 40 (default), what are the mapq values in the SAM result?
OK, now i have alignments and i also have a GTF file for my organism (my.gtf)
c) join alignments from step a) and b) into one sorted SAM file (a_b.sam)
d) cufflinks -G my.gtf a_b.sam
* how will cufflinks take into account the mapq values from the SAM and by doing so "weight" the multiple hits (giving more meaning to single hits etc.)?
* why is cufflinks mentioned with tophat all the time and not with bowtie also?
thank you for your answers,
Gregor
i am building a pipeline to estimate gene abundance (expression) from RNA-seq data. I am wondering if my plan is reasonable:
a) map reads with bowtie using -m 10 (for example), allowing 10 multiple hits per read
a1) here i don't understand how the mapq values will be set in the SAM format, i understand that with allowing only single hits (-m 1) all mapq values will be 255
b) take only unmapped reads from a) for mapping with tophat
b1) again same question, with -g 40 (default), what are the mapq values in the SAM result?
OK, now i have alignments and i also have a GTF file for my organism (my.gtf)
c) join alignments from step a) and b) into one sorted SAM file (a_b.sam)
d) cufflinks -G my.gtf a_b.sam
* how will cufflinks take into account the mapq values from the SAM and by doing so "weight" the multiple hits (giving more meaning to single hits etc.)?
* why is cufflinks mentioned with tophat all the time and not with bowtie also?
thank you for your answers,
Gregor
Comment