Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What_Da_Seq
    replied
    How does Bowtie handle ambiguous bases in the refgenome

    Does anybody have experience in preparing a Bowtie search index where certain bases have been modified with ambiguous bases like "Y" which stands for "C" or "T" and if so will these locations be called matches or missmatches if the to be aligned Solexa read has either a "C" or a "T" at that position.

    Thanks

    Leave a comment:


  • Ben Langmead
    replied
    My worry is that I will lose alignments which have perfect alignments if I also have other near matches. I guess I can make -n smaller (1 say). But I suppose the caveat is that if there is an exact match and a near match, you cannot say which is the correct one - sequencing error etc... - and so it is conservative to reject the read as having multiple alignments?
    I see your concern. In your case, you may want to consider running Bowtie with -a --nostrata (or -k <some int> --nostrata) and then postprocessing the results in whatever way you think is appropriate for your application. If you'd like to reject reads on the basis of the number of alignments found in the *best* match stratum (as opposed to all strata), you can do that with a script.

    Another alternative is to do multiple Bowtie runs with decreasingly stringent alignment policies (e.g. -n 0, then -n 1, etc). The input to each run might the the --unfq reads from the run before.

    I guess what I really want to know is: how stringent are the default alignment settings? Can I make them more stringent without losing a lot of 'true' (but imperfect) alignments?
    The default alignment policy is -n 2 -l 28 -e 70, which mimics Maq's defaults (with the caveat that Maq actually lets through some alignments with 3 mismatches in the seed). Whether you can make the policy more stringent without losing true alignments depends on how different your query organism is from the reference. Intuitively, the default policy has no problem finding alignments where there are 2 SNPs very close together, but might have a problem finding alignments where there are 3 SNPs very close together. The same goes for -n 1 and 1 SNP vs. 2 SNPs. It's up to you to determine how well those policies fit your problem.

    Hope that helps,
    Ben

    Leave a comment:


  • ieuanclay
    replied
    Hi Ben,

    Thanks for your help so far - I am relatively new to mapping (but not so new that I am not impressed by bowtie!), so please excuse any dopey questions...

    My worry is that I will lose alignments which have perfect alignments if I also have other near matches. I guess I can make -n smaller (1 say). But I suppose the caveat is that if there is an exact match and a near match, you cannot say which is the correct one - sequencing error etc... - and so it is conservative to reject the read as having multiple alignments?

    What I work on it is really important to be very sure about where the reads map... so maybe it would be good to keep -n at 1 and be more confident about the reads? I don't want to have to refer back to alignment confidences in analyses later on, but say that beyond a certain confidence threshold I am happy with them all. If I am going to reduce -n, should I also reduce -l to 20 or 25?

    I guess what I really want to know is: how stringent are the default alignment settings? Can I make them more stringent without losing a lot of 'true' (but imperfect) alignments?

    Thanks again,

    Ieuan

    Leave a comment:


  • Ben Langmead
    replied
    so if a read has multiple valid alignments, one of which is better than the others (fewer mismatches, though the others are still valid), and i specify -k 1 -m 1, will the best alignment be given, or will it be pumped into --maxfa ?
    In that situation, no alignments will be printed and the read will go into the --maxfa/--maxfq file(s).

    If I am worried about this sort of situation, should i specify --best?
    That won't help in this case because --best doesn't change which alignments are considered valid; rather, it changes which valid alignments are reported by Bowtie. The -v/-n/-l/-e options are the only ones that change which alignments are considered valid by Bowtie. If the set of valid alignments happens to be stratified (e.g., there's an exact hit and a bunch of 1-mismatch hits), the existence of the better alignments doesn't invalidate the worse ones.

    If this poses a problem, I'd be interested to hear more about what you're looking for...

    Thanks,
    Ben

    Leave a comment:


  • ieuanclay
    replied
    Sorry to keep on about this, I just want to get it clear.

    By default:
    -k is 1, so only one (the best according -n-v-l-e) alignment is reported.
    -m is unlimited

    so if a read has multiple valid alignments, one of which is better than the others (fewer mismatches, though the others are still valid), and i specify -k 1 -m 1, will the best alignment be given, or will it be pumped into --maxfa ?

    If I am worried about this sort of situation, should i specify --best?

    Leave a comment:


  • ieuanclay
    replied
    Yes - great thank you!

    Ieuan

    Leave a comment:


  • Ben Langmead
    replied
    Yes - if you specify -k > 1 or -a, Bowtie will output the appropriate number of hits per read for reads with >=1 hit. If -m <int> is also specified, Bowtie will output no alignments for reads with > <int> alignments and, if --maxfa/--maxfq is specified, will dump those reads (the reads, not the alignments) to the specified file. For reads with <= <int> alignments, Bowtie behaves the same as if -m were not specified.

    I hope that helps.

    Leave a comment:


  • ieuanclay
    replied
    Doh... Your patience obviously exceeds mine...

    I was confused by the default for -m being unlimited - does this mean that without --maxfa being set, your mapped output will include sequences with mulitple hits?
    Last edited by ieuanclay; 03-10-2009, 08:02 AM.

    Leave a comment:


  • Ben Langmead
    replied
    Thanks!

    Have you checked out the -m and --maxfa/--maxfq options?

    Leave a comment:


  • ieuanclay
    replied
    Hi Ben, Just to say bowtie is great work. Far outstrips any pipeline we have used previously!
    One question though - is there a way to output reads with multiple hits to a separate file? We work on repetitive regions and with a little massaging, this data may still be useful to us.

    Leave a comment:


  • danielsbrewer
    replied
    Is Bowtie suitable for miRNA detection

    I am just playing around with bowtie along with other software (maq,novoalign) and was wondering whether bowtie is suitable for use with an miRNA detection experiment. In a previous post Ben states that:
    Originally posted by Ben Langmead View Post
    First, let me reemphasize that I think of Bowtie's target application as mammalian resequencing - that's how I characterize it in the manual and that's what we spend our time trying to optimize it for.
    That hints to me that the default options might not be the best for experiments to compare miRNAs between samples. Does anyone have an opinion as to what the best options to use are?

    I would think that you want to know all the alignments for each read above a certain quality threshold. At the moment I am thinking of using "--best -k 100", as if there is more that 100 hits then it is probably not a "real" alignement.

    Any thoughts?

    Leave a comment:


  • Ben Langmead
    replied
    That sounds like an issue with how the read file is formatted. Can you share that file with me, e.g. via email (langmead at umd dot edu)? I can take a quick look.

    Leave a comment:


  • danielsbrewer
    replied
    bowtie error

    I am just starting out with bowtie and I am getting the following error:

    Code:
    $ ./bowtie -p 4  -t h_sapiens ../GDB1.fastq GDB1.map
    Time loading forward index: 00:00:01
    Time loading mirror index: 00:00:02
    Error: Read (Error: Read (Error: Read (Error: Read (HHHHWWWWIIII----EEEEAAAASSSS222266669999BBBB::::5555::::1464::::711344088362:84:1::1818164431573) is less than 2 characters long7) is less than 2 characters long) is less than 2 characters long) is less than 2 characters long
    Has anyone seen anything similar or know what its actually saying. I am pretty sure that the smallest read size is something like 10.

    Leave a comment:


  • doxologist
    replied
    Originally posted by danielsbrewer View Post
    Does anyone know whether bowtie supports aligning multiple read lengths?

    I am doing small RNA Solexa sequencing and so after the adapter has been removed I end up with variable length reads. With MAQ it appears that you have to run it multiple times for the different lengths, is bowtie the same?
    we had a similar discussion in another thread: http://seqanswers.com/forums/showthread.php?p=3505

    Basically, ELAND doesn't allow for different lengths and Bowtie does.

    Leave a comment:


  • ewingad
    replied
    Originally posted by danielsbrewer View Post
    Does anyone know whether bowtie supports aligning multiple read lengths?

    I am doing small RNA Solexa sequencing and so after the adapter has been removed I end up with variable length reads. With MAQ it appears that you have to run it multiple times for the different lengths, is bowtie the same?
    Yes it does!

    -Adam

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM
  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:35 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-09-2024, 02:46 PM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-07-2024, 06:57 AM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-06-2024, 07:17 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Working...
X