Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie : More reference sequences Less aligned reads

    Hi All,

    I use Bowtie1 (version 1.0.0 for MacOSX)

    In order to discard some reads, I mapped reads to multiple reference sequences which I want to remove.

    I have a problem that Bowtie gave me fewer aligned reads, when I use more reference sequences.

    To be specific....
    Total sequences I want to discard are 21 sequences, and there are three different groups of sequences, and each groups have 7 sequences.

    Group A: A1,A2,A3,A4,A5,A6,A7. -> similarity:53%~99%, seq length: 1550nt
    Group B: B1,B2,B3,B4,B5,B6,B7. -> similarity:49%~99%, seq length: 2900nt
    Group C: C1,C2,C3,C4,C5,C6,C7. -> similarity:51%~99%, seq length: 120nt
    ====> Major targets are A1 and B1

    By using major two sequences, A1 & B1, I built a index file, and then did bowtie1.
    Its log file reports that:
    10.00% reads were reported as aligned reads,
    00.01% reads were reported as suppressed reads, and
    89.99% reads were reported as failed reads.

    After that, I did the same process with all 21 sequences : built a index, ran bowtie1.
    And I expected that this result would have more aligned reads than former result. However, it was absolutely wrong!

    Latter log file reports that:
    00.20% reads were reported as aligned reads,
    11.00% reads were reported as suppressed reads, and
    88.80% reads were reported as failed reads.

    I can not understand the reason why more reference sequences have fewer aligned reads.
    At least, it should have more or even reads than former result.
    Thankfully, # failed reads to align are similar each other.

    I used some options :
    bowtie `INDEX` -5 1 -n 0 -n 0 -k 1 -m 1 -l 20 --best --phred33-quals --un `UNMAPPED` -q `INPUT` -S `OUT` 2>> `LOG` -t

    Thank you!

    Jiyoung

  • #2
    I'm not an expert at Bowtie, but a couple things stand out to me. First, you have -n 0 -n 0 (-n 0 repeated) so is there an option missing and you wrote -n 0 instead?

    But the main issue is tied to the -m 1 option. You are telling Bowtie to only report reads that have a single valid alignment, otherwise suppress them. So when you include all the sequences in the index, in which sequences within the group have high similarity, you are making it very likely that Bowtie will find more than 1 valid alignment and suppress the reporting.
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      Make sense! But why so many reads were suppressed?

      Originally posted by SNPsaurus View Post
      I'm not an expert at Bowtie, but a couple things stand out to me. First, you have -n 0 -n 0 (-n 0 repeated) so is there an option missing and you wrote -n 0 instead?

      But the main issue is tied to the -m 1 option. You are telling Bowtie to only report reads that have a single valid alignment, otherwise suppress them. So when you include all the sequences in the index, in which sequences within the group have high similarity, you are making it very likely that Bowtie will find more than 1 valid alignment and suppress the reporting.
      SNPsaurus, thanks!

      Yes, your explanation makes sense. So latter index with more reference sequences showed a few reduced failed reads.

      BUt still, it is unclear that why so many reads were suppressed ?
      Okay, it will be helpful to compare two output files! Thank you!

      Jiyoung

      Comment


      • #4
        No, the suppressed reads are the ones that are not reported because of your -m 1 option. In your first try (using A1 and B1) very few are suppressed because very few reads align to both A1 and B1. In the second try many more are suppressed because nearly every read that aligns, aligns to A1 and A2 and A3,4,5,6,7, or B1 and B2 and B3,4,5,6,7. When the read aligns to multiple index sequences, then it fails the -m 1 option and becomes suppressed.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Non-Coding RNA Research and Technologies
          by seqadmin


          Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

          [Article Coming Soon!]...
          Today, 08:07 AM
        • seqadmin
          Recent Developments in Metagenomics
          by seqadmin





          Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
          09-23-2024, 06:35 AM
        • seqadmin
          Understanding Genetic Influence on Infectious Disease
          by seqadmin




          During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

          Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
          09-09-2024, 10:59 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 10-02-2024, 04:51 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-01-2024, 07:10 AM
        0 responses
        23 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-30-2024, 08:33 AM
        1 response
        29 views
        0 likes
        Last Post EmiTom
        by EmiTom
         
        Started by seqadmin, 09-26-2024, 12:57 PM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Working...
        X