Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find all occurrences of a sequence in a fasta file

    I have a fasta file with 16S sequences from many organisms. I want to find all occurrences of a certain ~20 bp sequence in this fasta file. I could do a simple text search but I would prefer to allow some flexibility in the matches.

    For each match I would like the following information
    1) fasta entry name
    2) postion in the sequence
    3) CIGAR string or some other representation of the alignment

    A SAM file would be fine. I tried using bowtie2 with "-a" but it never seemed to finish. Through trial and error I found that setting "-k" to 150 worked fine but setting "-k" to 200 did not, indicating to me that there is probably some upper limit to the number of matches per query that it can report.

    I am certain that what I want to do is commonly done by many people here on the site. What is the easiest/best way to go about it?

    Thanks so much.
    Doug
    www.sharedproteomics.com

  • #2
    Have you tried BLAST or BLAT? Those tools are designed for looking for a low number of sequences in a large database of many different sequences.

    Comment


    • #3
      You can use BLAST program and specify the parameter -W 20 (in that way BLAST will report just the hits with at least 20 pb of similarity). Ahhh, an important think deactivate the low complexity filter (-F F)

      Thank's
      André

      Comment


      • #4
        I would suggest primer_match. It is rather simple to use and rather flexible in its output.

        The primers in this specific case would be your 20bp sequence. The program would in turn allow you to manipulate the output displaying position, entry name, or counts



        (I am by no means associated with edwards lab, just a frequent user)

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        66 views
        0 likes
        Last Post seqadmin  
        Working...
        X