Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dpryan
    replied
    Just to update everyone, Ben Langmead replied to my report of this here (I'll note copy and paste the whole reply or bug report).

    Leave a comment:


  • tongsandiego
    replied
    Thank you everyone. I really appreciate your help.

    Leave a comment:


  • dpryan
    replied
    While I haven't traced things completely through the code, I can't see that bowtie2 reliably follows the -N option. It sets it internally and does do some computation dependent upon it, but it seems to not set a read as unalignable if -N 0 is used and there are no perfect seeds (the easiest fix (presumably) would be to just flag a read as unmapped if bestmin > 0 in the multiseedSearchWorker if multseedMms == 0). Either way, this is a bug and should get reported (in fact, I've just done so).

    Leave a comment:


  • dpryan
    replied
    Yeah, if the read were long enough that the mismatch could not be in the seed then that would make total sense. There are no Ns in the reference in that area (the sequence there is "ttaaaggaaccctgagagatatttca"). My guess at the moment is that either the scoring matrix that's fed to al.exactSweep() isn't set properly or the output of that (which contains whether a seed maps with 0, 1, or 2 mismatches) just isn't being dealt with properly. I guess it'd be faster to just email Ben Langmead :P

    Leave a comment:


  • gringer
    replied
    Originally posted by dpryan View Post
    Yeah, but the mismatch is in the seed region.
    Bowtie2 seeds across the entire read length:

    Bowtie 2 begins by extracting substrings ("seeds") from the read and its reverse complement and aligning them in an ungapped fashion with the help of the FM Index. This is "multiseed alignment" and it is similar to what Bowtie 1 does, except Bowtie 1 attempts to align the entire read this way.
    Although now I notice that you've got a 26bp read, and a 22bp seed, so any seed will overlap with the mismatch. Thinking again about jp.'s question, perhaps there is an N (or other ambiguous base) at that position in the reference sequence. Otherwise, yes, very odd.

    Leave a comment:


  • dpryan
    replied
    Originally posted by jp. View Post
    Can bowtie handle N seqs in .fq files and remove them because these will not be matched with hg19 so will be automatically removed. If so, then why go for trimming and removing N using other program. If we just remove adaptor seq then will be okay... or just define seq length in bowtie and , in this case, we dont even need trimming ?
    I am totally confused, since I didnt touch this field for 1 year.
    May somebody like to answer ?
    thanks in advance
    jp.
    See the --np and --n-ceil options for how bowtie2 handles Ns. By default, Ns decrease the alignment score and reads with too many Ns will be skipped altogether. If you have Ns at one end of a read, then you might as well trim them off.

    Leave a comment:


  • dpryan
    replied
    Originally posted by gringer View Post
    Seed mismatches are different from sequence mismatches. The seed mismatch only tells bowtie2 how to start looking for sequences, not how to deal with sequences when it finds a matching seed. If you don't want any sequence mismatches, then you need to set the minimum score to 0 (--score-min C,0,0) and use end-to-end mode, or filter on XM/NM.
    Yeah, but the mismatch is in the seed region.

    Leave a comment:


  • gringer
    replied
    Can bowtie handle N seqs in .fq files and remove them because these will not be matched with hg19 so will be automatically removed.
    This question should really be posted in a new thread, but given that it's marginally related...

    Bowtie2 can handle Ns in the map index and in the reads, and happily align any base at that location. They're not removed, but are probably treated in a similar way to a read with a very low Q score. It may also "correct" a read mapping to a non-N position for the read record in the SAM output.

    [FWIW, Bowtie v1 can't handle Ns. I think it will replace Ns with As when doing indexing and alignment]

    Leave a comment:


  • jp.
    replied
    Can bowtie handle N seqs in .fq files and remove them because these will not be matched with hg19 so will be automatically removed. If so, then why go for trimming and removing N using other program. If we just remove adaptor seq then will be okay... or just define seq length in bowtie and , in this case, we dont even need trimming ?
    I am totally confused, since I didnt touch this field for 1 year.
    May somebody like to answer ?
    thanks in advance
    jp.

    Originally posted by gringer View Post
    Seed mismatches are different from sequence mismatches. The seed mismatch only tells bowtie2 how to start looking for sequences, not how to deal with sequences when it finds a matching seed. If you don't want any sequence mismatches, then you need to set the minimum score to 0 (--score-min C,0,0) and use end-to-end mode, or filter on XM/NM.

    Leave a comment:


  • gringer
    replied
    Seed mismatches are different from sequence mismatches. The seed mismatch only tells bowtie2 how to start looking for sequences, not how to deal with sequences when it finds a matching seed. If you don't want any sequence mismatches, then you need to set the minimum score to 0 (--score-min C,0,0) and use end-to-end mode, or filter on XM/NM.

    Leave a comment:


  • tongsandiego
    replied
    Yes. It is human sequence. And I used hg19

    Leave a comment:


  • dpryan
    replied
    This is human sequence I take it? I might play around with the bowtie2 source code tomorrow to see why this is happening if no one comes up with the reason beforehand. I imagine that this sort of issue affects more than a few people, especially since even the default settings shouldn't allow this!

    Leave a comment:


  • westerman
    replied
    Ah, so correct. Must be the end of a long day. I'm getting dangerous in not thinking fast enough. Anyway I am as mystified as you are. If I have time (hah!) I'll try out your command myself and see if 'playing around' reveals anything. Once again thanks for the correction.

    Leave a comment:


  • tongsandiego
    replied
    26M in CIGAR string means 26 match or mismatch. So CIGAR string is consistent with MD field.

    Leave a comment:


  • westerman
    replied
    No idea.

    Ignore the following, I just left it in for historical purposes and to remind myself how stupid I can be ... It is interesting that the MD field is different from the CIGAR field. As per the Bowtie2 manual: The MD field ought to match the CIGAR string. Which it obviously does not. '12T13' vs 26M.

    Out of stupidity mode, the rest of my original comment ....

    Out of curiosity, and perhaps to help troubleshooting, what does the reference look like at the match position?
    Last edited by westerman; 12-05-2013, 01:17 PM. Reason: Stupidity

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
30 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 09:45 AM
0 responses
201 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 08:54 AM
0 responses
212 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-02-2024, 03:00 PM
0 responses
194 views
0 likes
Last Post seqadmin  
Working...
X