I am using Human Illumina Paired-end RNA-Seq. I analysis purpose is to
get expression of isoform level. Not for SNP calling.
When I used fastqc(0.94) to examin my RNA-seq data, I found that there
are very high duplication level in it. About 70% are duplication
repost by fastqc. So I tried to use Picard(1.50) to remove duplicate
reads.
The command is:
java -Xmx4g -jar ~/bin/picard/MarkDuplicates.jar REMOVE_DUPLICATES=true
INPUT=accepted_hits.bam OUTPUT=remove_accepted_hits.bam
METRICS_FILE=dup.txt
After run picard, I used fastqc to check again. It is better but it is
still have a high duplication level (63% duplication). Does it mean
picard do not work well or fastqc report have a problem?
I looked the output from Picard,
In the METRICS_FILE of picard output, the PERCENT_DUPLICATION is 0.312927.
But fastqc give the DUPLICATION level percent is 70%.
Why have this difference?
Thanks.
get expression of isoform level. Not for SNP calling.
When I used fastqc(0.94) to examin my RNA-seq data, I found that there
are very high duplication level in it. About 70% are duplication
repost by fastqc. So I tried to use Picard(1.50) to remove duplicate
reads.
The command is:
java -Xmx4g -jar ~/bin/picard/MarkDuplicates.jar REMOVE_DUPLICATES=true
INPUT=accepted_hits.bam OUTPUT=remove_accepted_hits.bam
METRICS_FILE=dup.txt
After run picard, I used fastqc to check again. It is better but it is
still have a high duplication level (63% duplication). Does it mean
picard do not work well or fastqc report have a problem?
I looked the output from Picard,
In the METRICS_FILE of picard output, the PERCENT_DUPLICATION is 0.312927.
But fastqc give the DUPLICATION level percent is 70%.
Why have this difference?
Thanks.
Comment