Hi, guys
I used bowtie2 to align paired-end reads to hg19 genome, and got the sam format output. Then when I tried to convert the sam into bam, I found some error information:
So, I looked into the sam file, and found the line 817574 is like:
The CIGAR is 125M, which means the read is 125bp and each base is match/mismatch, while the read sequence is only 124bp. As I know, my paired-end reads are 125bp in sequencing. I think bowtie2 has reedited the sequence of the read, so I compared the sequence in the original fastq file and the bowtie2 alignment:
It's quite strange the tail bases of the sequence in the alignment are quite different from the fastq file. Somebody know how do this happen? Does Bowtie2 reedit the sequence when it align the read to the reference?
If you have any idea, please reply me, thanks~
wisense
I used bowtie2 to align paired-end reads to hg19 genome, and got the sam format output. Then when I tried to convert the sam into bam, I found some error information:
Code:
[samopen] SAM header is present: 25 sequences. Line 817574, sequence length 124 vs 125 from CIGAR Parse error at line 817574: CIGAR and sequence length are inconsistent
Code:
HISEQ04:185:C62CTANXX:3:1101:8017:19749 99 chr2 184686606 42 125M = 184686727 246 TAGAAAAACTAAACAATGAACCGATAAAAAAACTACAACAACTTTTTAAGACGTAGATAATATAATACAATGTAAATAGAATCAACAAAAATTTTTTAAAATGAATGGTAAGAAAGGATTATTA BBBBBFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFBFBFFFFFFFFFFBFFFFFFFFFFFFBFFFFFFFFBFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFBFFFF<F7F<FFFF<BF<F AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:NM:i:0 MD:Z:125 YS:i:0 YT:Z:CP
Code:
Seq in Fastq: TAGAAAAACTAAACAATGAACCGATAAAAAAACTACAACAACTTTTTAAGACGTAGATAATATAATACAATGTAAATAGAATCAACAAAAATTTTTTAAAATGGCACGATGAAGTTAAGGCATAG Seq in Bowtie2 alignment: TAGAAAAACTAAACAATGAACCGATAAAAAAACTACAACAACTTTTTAAGACGTAGATAATATAATACAATGTAAATAGAATCAACAAAAATTTTTTAAAATG[COLOR="Red"]AATGGTAAGAAAGGATTATTA[/COLOR]
If you have any idea, please reply me, thanks~
wisense
Comment