I have a fasta file with 16S sequences from many organisms. I want to find all occurrences of a certain ~20 bp sequence in this fasta file. I could do a simple text search but I would prefer to allow some flexibility in the matches.
For each match I would like the following information
1) fasta entry name
2) postion in the sequence
3) CIGAR string or some other representation of the alignment
A SAM file would be fine. I tried using bowtie2 with "-a" but it never seemed to finish. Through trial and error I found that setting "-k" to 150 worked fine but setting "-k" to 200 did not, indicating to me that there is probably some upper limit to the number of matches per query that it can report.
I am certain that what I want to do is commonly done by many people here on the site. What is the easiest/best way to go about it?
Thanks so much.
For each match I would like the following information
1) fasta entry name
2) postion in the sequence
3) CIGAR string or some other representation of the alignment
A SAM file would be fine. I tried using bowtie2 with "-a" but it never seemed to finish. Through trial and error I found that setting "-k" to 150 worked fine but setting "-k" to 200 did not, indicating to me that there is probably some upper limit to the number of matches per query that it can report.
I am certain that what I want to do is commonly done by many people here on the site. What is the easiest/best way to go about it?
Thanks so much.
Comment