Hello all,
I am working with a bisulfite-converted library and I would like to extract the original genomic sequences for my reads (in other words, I would like to convert the bisulfite sequences back to regular genomic sequences). I have been using Bismark to align my reads to a reference genome. Bismark outputs a BAM file with the following fields (copied from the user guide):
1. QNAME*(seq-ID)
2. FLAG*(this flag tries to take the strand a bisulfite read originated from into account (this is different from ordinary DNA alignment flags!))
3. RNAME*(chromosome)
4. POS*(start position)
5. MAPQ*(only calculated for Bowtie 2, always 255 for Bowtie)
6. CIGAR
7. RNEXT
8. PNEXT
9. TLEN
10. SEQ
11. QUAL*(Phred33 scale)
12. NM-tag*(edit distance to the reference)
13. MD-tag*(base-by-base mismatches to the reference)
14. XM-tag (methylation call string)
15. XR-tag*(read conversion state for the alignment)
16. XG-tag (genome conversion state for the alignment)
Field 10 is the actual read, i.e., the bisulfite read. Field 14 specifies the methylation call at each position, so my understanding is that those fields could be used together to infer the original sequence. What I am looking for is either:
1. A way to have Bismark output the original genomic sequence (the older version actually did this - see note below),
2. A script that will use the BAM output to convert the bisulfite sequences to genomic sequences, or
3. Another bisulfite aligner that offers this functionality.
Note: The original version of Bismark actually outputs a tab-delimited text file that contains the information I want - field 7 is the "original bisulfite read sequence" and field 8 is the "equivalent genomic sequence." Bismark allows users to request this output using the --vanilla call, however, it uses the older version of Bismark, which is only compatible with bowtie1. I am getting much better alignments with the newer version which uses bowtie2 and outputs BAM files that do not contain the genomic sequence, so I would prefer not to use the --vanilla call.
Any help would be greatly appreciated.
Thanks,
Tricia
I am working with a bisulfite-converted library and I would like to extract the original genomic sequences for my reads (in other words, I would like to convert the bisulfite sequences back to regular genomic sequences). I have been using Bismark to align my reads to a reference genome. Bismark outputs a BAM file with the following fields (copied from the user guide):
1. QNAME*(seq-ID)
2. FLAG*(this flag tries to take the strand a bisulfite read originated from into account (this is different from ordinary DNA alignment flags!))
3. RNAME*(chromosome)
4. POS*(start position)
5. MAPQ*(only calculated for Bowtie 2, always 255 for Bowtie)
6. CIGAR
7. RNEXT
8. PNEXT
9. TLEN
10. SEQ
11. QUAL*(Phred33 scale)
12. NM-tag*(edit distance to the reference)
13. MD-tag*(base-by-base mismatches to the reference)
14. XM-tag (methylation call string)
15. XR-tag*(read conversion state for the alignment)
16. XG-tag (genome conversion state for the alignment)
Field 10 is the actual read, i.e., the bisulfite read. Field 14 specifies the methylation call at each position, so my understanding is that those fields could be used together to infer the original sequence. What I am looking for is either:
1. A way to have Bismark output the original genomic sequence (the older version actually did this - see note below),
2. A script that will use the BAM output to convert the bisulfite sequences to genomic sequences, or
3. Another bisulfite aligner that offers this functionality.
Note: The original version of Bismark actually outputs a tab-delimited text file that contains the information I want - field 7 is the "original bisulfite read sequence" and field 8 is the "equivalent genomic sequence." Bismark allows users to request this output using the --vanilla call, however, it uses the older version of Bismark, which is only compatible with bowtie1. I am getting much better alignments with the newer version which uses bowtie2 and outputs BAM files that do not contain the genomic sequence, so I would prefer not to use the --vanilla call.
Any help would be greatly appreciated.
Thanks,
Tricia
Comment