Hello everyone,
I created a bowtie index from a fasta file which was merged from 4 individual fasta file. One is from miRBase, two from RFAM coressponding to tRNA and other non coding RNA and the final one is repeat masker fasta sequences downloaded from UCSC.
Then i aligned the reads using bowtie2 with index created from above fasta files under local mode with the following options:
bowtie2 --local -N 1 -L 20 -p 20 -x ~/Downloads/index/superfasta.index -f -U input_E1.fa -S output.sam
After the run i got the alignment rate as follow:
2599066 reads; of these:
2599066 (100.00%) were unpaired; of these:
1730189 (66.57%) aligned 0 times
265716 (10.22%) aligned exactly 1 time
603161 (23.21%) aligned >1 times
33.43% overall alignment rate
Then i converted the sam file to bam using
samtools view -bS output_E1.sam > output_E1.bam
Then i checked the falgstats using:
samtools flagstat output_E1.bam'
and got the following results
0 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (N/A : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
The flagstat output says that none of the sequences mapped, whereas it should be 33.43% as shown in the first output.
Then, i also checked the number of sequences that mapped using this command
samtools view -F 4 output.bam | wc -l
and again i got 0.
When i used tail command for the SAM file, everything seems fine i can see the flag value of 0 or 4 or 16. But why isn't the samtools showing weird results. is there an issue with the SAM file? I actually used another genome with another file of reads and i when i analysed using SAM tools, everything is fine and all the numbers are concordant. So the only likely explanation is that there must be a problem with the reference index or the fasta file that is used to create the reference index. There were some white spaces in the header of few fasta sequecnes and i removed them, then tried it again with the new index and the problem still persists. What could be the problem behind this? Whether the SAM file has a malformed line? or could be something else. If someone could suggest a solution, that would be awesome. I have an fast approaching deadline to complete the data analysis and i am stuck here. if anyone could help find a solution, that would be great.
thanks!
I created a bowtie index from a fasta file which was merged from 4 individual fasta file. One is from miRBase, two from RFAM coressponding to tRNA and other non coding RNA and the final one is repeat masker fasta sequences downloaded from UCSC.
Then i aligned the reads using bowtie2 with index created from above fasta files under local mode with the following options:
bowtie2 --local -N 1 -L 20 -p 20 -x ~/Downloads/index/superfasta.index -f -U input_E1.fa -S output.sam
After the run i got the alignment rate as follow:
2599066 reads; of these:
2599066 (100.00%) were unpaired; of these:
1730189 (66.57%) aligned 0 times
265716 (10.22%) aligned exactly 1 time
603161 (23.21%) aligned >1 times
33.43% overall alignment rate
Then i converted the sam file to bam using
samtools view -bS output_E1.sam > output_E1.bam
Then i checked the falgstats using:
samtools flagstat output_E1.bam'
and got the following results
0 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (N/A : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
The flagstat output says that none of the sequences mapped, whereas it should be 33.43% as shown in the first output.
Then, i also checked the number of sequences that mapped using this command
samtools view -F 4 output.bam | wc -l
and again i got 0.
When i used tail command for the SAM file, everything seems fine i can see the flag value of 0 or 4 or 16. But why isn't the samtools showing weird results. is there an issue with the SAM file? I actually used another genome with another file of reads and i when i analysed using SAM tools, everything is fine and all the numbers are concordant. So the only likely explanation is that there must be a problem with the reference index or the fasta file that is used to create the reference index. There were some white spaces in the header of few fasta sequecnes and i removed them, then tried it again with the new index and the problem still persists. What could be the problem behind this? Whether the SAM file has a malformed line? or could be something else. If someone could suggest a solution, that would be awesome. I have an fast approaching deadline to complete the data analysis and i am stuck here. if anyone could help find a solution, that would be great.
thanks!
Comment