Dear All,
I am working on RNASeq data which is generated using Truseq straded HT kit.
I have done removing adapters and aligned reads using STAR 2 pass mode.
And ran feature counts (subread 1.5.1) using -s option three times 0,1 and 2. More reads were mapped to features when I specified -s as "0". But rseqc infer it as reverse stranded.
Here are the commands which I used:
featureCounts -t exon -s 1 -T 12 -g gene_id -a genes.gtf -o featurecounts.txt sample.starAligned.sortedByCoord.out.bam
sum of counts : 34133722
featureCounts -t exon -s 2 -T 12 -g gene_id -a genes.gtf -o featurecounts.txt sample.starAligned.sortedByCoord.out.bam
sum of counts : 34029447
featureCounts -t exon -s 0 -T 12 -g gene_id -a genes.gtf -o featurecounts.txt sample.starAligned.sortedByCoord.out.bam
sum of counts : 63157326
Am I using parameters correctly? And summing up counts on 7th column of featurecounts output.Is this right way to predict strand?
Few observations:
I observed %GC (49 - 55%) and few overrepresented sequences which are showing as "no hit" in fastqc output. Out of 95% properly paired only 70% were showing as "uniquely mapped" and 20-30% were multimapped.
Out of that, Only 20-30% mapped reads were assigned to featureCounts in "-s 2" mode (reverse).
Since our data is from trueseq library. It should be reverse stranded.
And we should not get high number of counts for both -s 1 and -s 2 options for strand specific data.
Could you please suggest how can I resolve this issue.
Thanks In Advance
Fazulur Rehaman
I am working on RNASeq data which is generated using Truseq straded HT kit.
I have done removing adapters and aligned reads using STAR 2 pass mode.
And ran feature counts (subread 1.5.1) using -s option three times 0,1 and 2. More reads were mapped to features when I specified -s as "0". But rseqc infer it as reverse stranded.
Here are the commands which I used:
featureCounts -t exon -s 1 -T 12 -g gene_id -a genes.gtf -o featurecounts.txt sample.starAligned.sortedByCoord.out.bam
sum of counts : 34133722
featureCounts -t exon -s 2 -T 12 -g gene_id -a genes.gtf -o featurecounts.txt sample.starAligned.sortedByCoord.out.bam
sum of counts : 34029447
featureCounts -t exon -s 0 -T 12 -g gene_id -a genes.gtf -o featurecounts.txt sample.starAligned.sortedByCoord.out.bam
sum of counts : 63157326
Am I using parameters correctly? And summing up counts on 7th column of featurecounts output.Is this right way to predict strand?
Few observations:
I observed %GC (49 - 55%) and few overrepresented sequences which are showing as "no hit" in fastqc output. Out of 95% properly paired only 70% were showing as "uniquely mapped" and 20-30% were multimapped.
Out of that, Only 20-30% mapped reads were assigned to featureCounts in "-s 2" mode (reverse).
Since our data is from trueseq library. It should be reverse stranded.
And we should not get high number of counts for both -s 1 and -s 2 options for strand specific data.
Could you please suggest how can I resolve this issue.
Thanks In Advance
Fazulur Rehaman