Hi everyone,
I'm trying to analyze small RNA data from SOLiD using both Lifescope and a different pipeline which uses miRDeep2.
I converted the XSQ files into FASTQ files with the XSQ tools, then I also ran Lifescope and got the BAM file with the mapped sequences.
However, when I try to compare the reads from the FASTQ file and the reads extracted from the BAM file, they are completely different.
For example, the two following entries are from FASTQ and BAM, respectively:
@Library43:559_548_933/1
CGGTGCAGGGACGAAATACAGTTAGACATATCTC
+
@@@@@@@@@@@@@@@@@6@@@@6@@@@/@@;@/@
559_548_933 0 chr9 23209018 1 21M14H * 0 0 CAGATCAAGAGGTCCCCGGTT JJJJJJJJJJJJJJJJJJJJJ RG:Z:Library43_11 NH:i:10 CM:i:0 NM:i:0 CQ:Z:@@@@@@@@@@@@@@@@@@6@@@@6@@@@/@@;@/@ CS:Z:T21223210222012000301023302010303131
The IDs are the same (559_548_933), but the sequence in the FASTQ file (CGGTGCAGGGACGAAATACAGTTAGACATATCTC) is completely different than the one in the BAM file (CAGATCAAGAGGTCCCCGGTT). It's not just a matter of trimming the adaptor sequences, the sequences are just different overall.
Also, when I try to map the reads from the BAM file with either miRDeep or Tophat I have a high percentage of success, when I try the same thing with the FASTQ file I have 0% of mapped sequences.
Does anyone know why there is such a difference between reads with the same ID and what the FASTQ file reads actually are?
I'm trying to analyze small RNA data from SOLiD using both Lifescope and a different pipeline which uses miRDeep2.
I converted the XSQ files into FASTQ files with the XSQ tools, then I also ran Lifescope and got the BAM file with the mapped sequences.
However, when I try to compare the reads from the FASTQ file and the reads extracted from the BAM file, they are completely different.
For example, the two following entries are from FASTQ and BAM, respectively:
@Library43:559_548_933/1
CGGTGCAGGGACGAAATACAGTTAGACATATCTC
+
@@@@@@@@@@@@@@@@@6@@@@6@@@@/@@;@/@
559_548_933 0 chr9 23209018 1 21M14H * 0 0 CAGATCAAGAGGTCCCCGGTT JJJJJJJJJJJJJJJJJJJJJ RG:Z:Library43_11 NH:i:10 CM:i:0 NM:i:0 CQ:Z:@@@@@@@@@@@@@@@@@@6@@@@6@@@@/@@;@/@ CS:Z:T21223210222012000301023302010303131
The IDs are the same (559_548_933), but the sequence in the FASTQ file (CGGTGCAGGGACGAAATACAGTTAGACATATCTC) is completely different than the one in the BAM file (CAGATCAAGAGGTCCCCGGTT). It's not just a matter of trimming the adaptor sequences, the sequences are just different overall.
Also, when I try to map the reads from the BAM file with either miRDeep or Tophat I have a high percentage of success, when I try the same thing with the FASTQ file I have 0% of mapped sequences.
Does anyone know why there is such a difference between reads with the same ID and what the FASTQ file reads actually are?
Comment