Hi guys!
I have a weird problem to solve.
I have recently run an RNAseq pipeline on human cell line (polyA enriched, 75bp, PE, ~30 mil reads each).
12 samples out of 16 showed great alignment rates and library QC.
The remaining 4 instead have a ridiculously low properly mate pairing, like this one:
64544673 in total
0 QC failure
0 duplicates
64544673 mapped (100.00%)
64544673 paired in sequencing
33238851 read1
31305822 read2
14676 properly paired (0.02%)
57821112 with itself and mate mapped
6723561 singletons (10.42%)
54777510 with mate mapped to a different chr
44604000 with mate mapped to a different chr (mapQ>=5)
Sample_TK1_002_flagstat.txt (END)
Initially I thought it was a trivial issue, and I have proceeded with differential expression analysis. Unfortunately I have found out that these 4 samples were clustering together although belonging to 3 different cohorts.
Then I re-checked my work flow and realized that the Genome Center that performed the library prep and sequencing had to fund 2/3 sequencing for these samples, as they were providing a low yield (of reads).
Other thing to add, when trying to run RNASeQC on Gene Pattern, I cannot Mark Duplicates for these samples too (error mess:
net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: Flowcell 1 Older_P3:BFC08P1:257:C38GUACXX:8:1108:18368:35984
at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:294)
at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:169)
at org.genepattern.sam.MarkDuplicatesWrapper.main(MarkDuplicatesWrapper.java:83)
I am sorry if all this sounds confusing... I am quite desperate in understanding what's going on.
I know that there is some samtools available for fixing mate pairing...would it be the solution?
Thanks to all of you!
Manu
I have a weird problem to solve.
I have recently run an RNAseq pipeline on human cell line (polyA enriched, 75bp, PE, ~30 mil reads each).
12 samples out of 16 showed great alignment rates and library QC.
The remaining 4 instead have a ridiculously low properly mate pairing, like this one:
64544673 in total
0 QC failure
0 duplicates
64544673 mapped (100.00%)
64544673 paired in sequencing
33238851 read1
31305822 read2
14676 properly paired (0.02%)
57821112 with itself and mate mapped
6723561 singletons (10.42%)
54777510 with mate mapped to a different chr
44604000 with mate mapped to a different chr (mapQ>=5)
Sample_TK1_002_flagstat.txt (END)
Initially I thought it was a trivial issue, and I have proceeded with differential expression analysis. Unfortunately I have found out that these 4 samples were clustering together although belonging to 3 different cohorts.
Then I re-checked my work flow and realized that the Genome Center that performed the library prep and sequencing had to fund 2/3 sequencing for these samples, as they were providing a low yield (of reads).
Other thing to add, when trying to run RNASeQC on Gene Pattern, I cannot Mark Duplicates for these samples too (error mess:
net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: Flowcell 1 Older_P3:BFC08P1:257:C38GUACXX:8:1108:18368:35984
at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:294)
at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:169)
at org.genepattern.sam.MarkDuplicatesWrapper.main(MarkDuplicatesWrapper.java:83)
I am sorry if all this sounds confusing... I am quite desperate in understanding what's going on.
I know that there is some samtools available for fixing mate pairing...would it be the solution?
Thanks to all of you!
Manu
Comment