Hello there,
I obtained PE data from Illumina (chr21 subset of NA18507 - ftp://webdata:[email protected]..._100_chr21.bam).
After a lot of misery and systematic LENIENT use of picard I could manage to use the data to extract fastQ paired reads from it and remap them to another reference build.
BUT
I discovered (among many other serious SAM compliancy problems) that not all reads are present in that file and many pairs have lost one end (which probably did not map on chr21 and was filtered out uncleanly!)
My question is how to clean such SAM/BAM where FLAGS indicate paired reads but one read of the pair is not present anymore.
Below is the head of the original name-sorted reads showing the absence of one of the 'EAS51_0210:7:33:5109:13959' reads (many 1000's like that)
Thanks for your suggestions on which command to use and how to eliminate these reads to obtain a fully paired file on which bam2fastq will run smoothly. Preferentially using picard and not with some fancy perl code keeping only '*' in the 7th column .
Thanks a lot for your lights,
Stephane
I obtained PE data from Illumina (chr21 subset of NA18507 - ftp://webdata:[email protected]..._100_chr21.bam).
After a lot of misery and systematic LENIENT use of picard I could manage to use the data to extract fastQ paired reads from it and remap them to another reference build.
BUT
I discovered (among many other serious SAM compliancy problems) that not all reads are present in that file and many pairs have lost one end (which probably did not map on chr21 and was filtered out uncleanly!)
My question is how to clean such SAM/BAM where FLAGS indicate paired reads but one read of the pair is not present anymore.
- I cannot use the SAM flags to do it because they are erroneous
- I could not fix the flags to reflect the true paired status
Below is the head of the original name-sorted reads showing the absence of one of the 'EAS51_0210:7:33:5109:13959' reads (many 1000's like that)
Thanks for your suggestions on which command to use and how to eliminate these reads to obtain a fully paired file on which bam2fastq will run smoothly. Preferentially using picard and not with some fancy perl code keeping only '*' in the 7th column .
Thanks a lot for your lights,
Stephane
Code:
EAS51_0210:3:6:3797:7459 165 chr21 9719702 255 * * 0 0 AACCTTTGTTTGGATGGAGCAGTTTGTAAACAATCCTTTTGTAGAATCTGCAAAGGTATATTTCTGAGCCCATTGAGGCCTATGGTGAAATACGAAATAT GGGGGGGGGGGGGFGGFEGGGGEGGGEGGGFDFBGGEFEFGEEGEGFEGGEGEEED?EEEGEEGBEBDGEEEEED=DCCCEBEEEEEEEAAC@DDB:CCC H0:i:0 H1:i:0 H2:i:2 SM:i:-1 AS:i:0 EAS51_0210:3:6:3797:7459 89 chr21 9719702 73 100M * 0 0 ATATTTGGAGCGCTTTGAGGCCTATGGTAAAAAAGGAAATACCATCACATAAAAATTCGATGGAAGAATTCTGAGAAACTTCTTTGTGAGGGTTGGATTC DDC@BEEEEEEGEEBFGEGG@EEDDBEEGEGGGGFFFFGEECGFGGEGGGGGGGGDFGGGFFGGGGFGFEGGGFGGGGGGBGGEGGFGGGGGGGGGGGGG XD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN34 SM:i:73 AS:i:0 EAS51_0210:7:33:5109:13959 145 chr21 9719707 254 100M chrY 10653706 0 TGGAGCGCTTTGAGGCCTATGGTAAAAAAGGAAATACCATCACATAAAAATTCGATGGAAGAATTCTGAGAAACTTCTTTGTGAGGGTTGGATTCATCTC FEGDGEGEEEEGFEGEEGEEDFEGGGGGGEFGFGFAFFFEGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGG XD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN36T2 SM:i:461 AS:i:0 EAS25_0078:8:23:14907:11377 165 chr21 9719708 255 * * 0 0 CTTTTGTAGAATCTGCAAAGGTATATTTCTGAGCCCATTGAGGCCTATGGTGAAATACGAAATATCTTCCCATAAAAACTAGACAGAAGGTTTCTAAGAA GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGEFGGGGGGGGGEGGGGGGGGGGFGFGDGGGGGFDFBFGCEBFFFEGFF H0:i:1 H1:i:6 H2:i:60 SM:i:-1 AS:i:0 EAS25_0078:8:23:14907:11377 89 chr21 9719708 254 100M * 0 0 TGGAGCGCTTTGAGGCCTATGGTAAAAAAGGAAATACCATCACATGAAATTCGATGGAAGAATTCTGAGAAACTTCTTTGTGAGGGTTGGATTCATCTCA EGEEEEGEBGFGGEEEEEGGDEBGFGGGGGGGGGGAGFGGGEGGGGGGGGGGGGGGGGGGGFGGGGGGGGEEGGGGGGGGGGGGGGGGGGGGGGGGGGGG XD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN36T3 SM:i:461 AS:i:0 EAS25_0078:3:4:1830:9254 101 chr21 9719709 255 * * 0 0 TTGAACCTTTGTTTGGATGGAGCAGTTTGTAAACAATCCTTTTGTAGAATCTGCAAAGGTATATTTTTGAGCCCATTGAGGCCTATGGTGAAATACGAAA GGGGGGGGGGGGGGGGGGFGBGGGGGGGFEGGGGEGGGGGGGGGEGGGGGGEFGGEFGGEFFFFF/&8?@EEECCFGFGGFDFGFEGF?DEEDEFEEFEE H0:i:0 H1:i:0 H2:i:3 SM:i:-1 AS:i:0 EAS25_0078:3:4:1830:9254 153 chr21 9719709 254 100M * 0 0 GGAGCGCTTTGAGGCCTATGGTAAAAAAGGAAATACCATCACATGAAATTCGATGGAAGAATTCTGAGAAACTTCTTTGTGAGGGTTGGATTCATCTCAC EEBE?EEEEEEEEGEEBEEEBGGGEGGGEFAFFECEEE=EDGGFGGGGDGFFGGGGGGGGGEEGGGGDGGGEGFGFGFGGGGGGGGGDGGGEGGFGGGGG XD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN36T4 SM:i:461 AS:i:0
Comment