Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ben Langmead
    replied
    Originally posted by bloomfi1 View Post
    Hi,

    I just updated bowtie from version 0.11.3 to 0.12.2. With version 0.11.3 I was able to run the command "bowtie -m 25 -a -n 15 --un <file> -p 4 <ebwt> <infile> <outfile>". When I run this command in version 0.12.2, I get error "-n/--seedmms arg must be at least 0 and at most 3". Am I missing something in the change log about this parameter? Is the behavior of -n in version 0.11.3 accurate?

    Thank you.

    EDIT: I just realized that while version 0.11.3 will let me give -n greater than 3, it is still capped at -n 3. Is it possible to align with more than 3 mismatches? I am using bowtie to align 75bp reads to a genomic model (coding regions only) with the ultimate goal of calculating RPKM for each of the models. Is bowtie simply the wrong tool for this purpose?
    Hi,

    Yes, the problem was that versions < 0.12.2 were failing to check for a too-high input for -n and -v. The manual and the usage message both said max=3, but bowtie erroneously didn't enforce it.

    Note that the -n option only constrains the number of mismatches in the seed, not in the entire alignment. The key is to set -n, -l and -e to reasonable numbers given your data. Since your reads are 75bp, I would suggest trying a few different settings, perhaps starting with -l 28 (the default) -n 2 and -e 180 and then adjusting all 3 until your getting your desired mix of speed and sensitivity.

    Thanks,
    Ben

    Leave a comment:


  • lcollado
    replied
    I guess that you should trim your data and try to align your sequences again. Also, I don't think that "bowtie figures it out", though I'm no expert.

    Leave a comment:


  • thinkRNA
    replied
    if you are using reads of length 75, would you change the seed length or bowtie figures that out?

    I can only align around 50% of my single read Illumina data from this paper using bowtie default setting : http://www.nature.com/nmeth/journal/...meth.1226.html

    Anyone knows what parameters to tweak to get more sequences aligned?

    Leave a comment:


  • bloomfi1
    replied
    Both -v and -n have a maximum size of 3. What is the reason for this restriction?

    Leave a comment:


  • Xi Wang
    replied
    -v <int> report end-to-end hits w/ <=v mismatches; ignore qualities
    or
    -n/--seedmms <int> max mismatches in seed (can be 0-3, default: -n 2)
    -e/--maqerr <int> max sum of mismatch quals across alignment for -n (def: 70)
    -l/--seedlen <int> seed length for -n (default: 28)
    -v for end-to-end mismatches
    -n only for mismatches in the seed region, and you can specify the seed length by '-l'

    Leave a comment:


  • bloomfi1
    replied
    Hi,

    I just updated bowtie from version 0.11.3 to 0.12.2. With version 0.11.3 I was able to run the command "bowtie -m 25 -a -n 15 --un <file> -p 4 <ebwt> <infile> <outfile>". When I run this command in version 0.12.2, I get error "-n/--seedmms arg must be at least 0 and at most 3". Am I missing something in the change log about this parameter? Is the behavior of -n in version 0.11.3 accurate?

    Thank you.

    EDIT: I just realized that while version 0.11.3 will let me give -n greater than 3, it is still capped at -n 3. Is it possible to align with more than 3 mismatches? I am using bowtie to align 75bp reads to a genomic model (coding regions only) with the ultimate goal of calculating RPKM for each of the models. Is bowtie simply the wrong tool for this purpose?
    Last edited by bloomfi1; 02-15-2010, 05:34 PM.

    Leave a comment:


  • Chipper
    replied
    Originally posted by jlmlj View Post

    The result is as below:
    Reads uniquely aligned was 45~%,
    Reads multiple aligned was ~6%,
    Read failed to align was ~49%.
    51% aligned is not too bad, but yo could try also without the -v parameter to allow more mismatches in the 3' end.

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by jlmlj View Post
    Thanks a lot for the reply, Xi. I checked the human genome that I used, it was un-masked version, and I did not mask repeat regions when mapping, so it may not be the case...
    I meant here also the 'N's existing in the human reference genome. Our group have observed many cases where lots of reads packed at the neighbor of 'N' regions.
    Hope this helps.

    Leave a comment:


  • jlmlj
    replied
    "Maybe the rate is among the normal. If the ChIP-seq reads are from the repeat regions, and you masked the repeat regions when mapping, these reads will fail to map. And there are still quite a few 'N's in the human reference genome.[/QUOTE]"

    Thanks a lot for the reply, Xi. I checked the human genome that I used, it was un-masked version, and I did not mask repeat regions when mapping, so it may not be the case...

    I am thinking to try a couple of parameters, such as --strata, however it looks a bit tricky and I am not sure of the way to handle it yet

    Leave a comment:


  • Xi Wang
    replied
    There are two questions bother me:
    1. I have 76bp read length, however bowtie only allows me 3 mismatches at maximum, which I think it is too stringent. Do you think the bowtie will allow more mismatches (7-8) for 76bp or even longer 100bp read length?
    There is another parameter set of bowtie to deal with the mismaches when mapping reads back to the reference genome: -n -e -l

    2. I have ~45-49% reads failed to align (no repeats included) to human reference genome, which is very high. I thought the rate is too high to accept. Do you have any idea of how it happens?
    Maybe the rate is among the normal. If the ChIP-seq reads are from the repeat regions, and you masked the repeat regions when mapping, these reads will fail to map. And there are still quite a few 'N's in the human reference genome.

    Leave a comment:


  • jlmlj
    replied
    Hi Dr. lengmead,

    I am doing data analysis for ChIP-seq experiments on transcription factor binding sites. I have 5 million raw reads (76 bp read length) per sample from Illumina platform. I used bowite 0.11.3 to align these reads to reference human genome.

    The code I used for one high quality alignment was:
    ~/120809_ChiPseq/bowtie-0.11.3_linux_x86_64/bowtie --solexa1.3-quals -v 2 -a -m 1 -t -p 30 --un result_chipseq2/index2.hq.un --max result_chipseq2/index2.hq.max indexes_chipseq1/h_sapiens_asm reads/index2.fq > result_chipseq2/index2.hq.bt

    The result is as below:
    Reads uniquely aligned was 45~%,
    Reads multiple aligned was ~6%,
    Read failed to align was ~49%.

    Then I increased mismatches to 3 (-v 3) and trimmed the low quality end (--trim3 22). However I still had ~45% reads failed to align.

    There are two questions bother me:
    1. I have 76bp read length, however bowtie only allows me 3 mismatches at maximum, which I think it is too stringent. Do you think the bowtie will allow more mismatches (7-8) for 76bp or even longer 100bp read length?

    2. I have ~45-49% reads failed to align (no repeats included) to human reference genome, which is very high. I thought the rate is too high to accept. Do you have any idea of how it happens?

    Many thanks for your help,
    jlmlj

    Leave a comment:


  • Ben Langmead
    replied
    I'm working on this now. I don't have any time estimates.

    Thanks,
    Ben

    Leave a comment:


  • amaer
    replied
    Originally posted by Ben Langmead View Post
    Hi amaer,

    Perhaps by end-of-year. It's very hard to say because most of my time goes to collaborators, and they don't have predictable schedules . But by end-of-year is a reasonable guess.

    Thanks,
    Ben
    Hi Ben,

    What's the status of doing gapped alignments? Do you have an estimated date?

    thanks, and keep up the great work!

    Leave a comment:


  • malcook
    replied
    bowtie: should I mask the pseudoautosomal segments of human genome

    What do you think of my plan to mask the pseudoautosomal segments of human Y chromosome prior to running bowtie on an RNASeq project.

    Since pseudoautosomal portion of human genome chromosomes X & Y are sequence-wise identical, any alignment strategy that utilizes only unique alignments will discard all alignments to these regions, as each aligning read will have two matches. Thus the 24 known genes wont be counted.

    I plan to use EMBOSS' `maskseq` to "hard mask" (replace with 'N') chrY prior to building the bowtie indices at:

    chrY:10001-2649520
    chrY:59034050-59363566

    Does anyone see a problem with this approach?

    I see the `--ntoa` option of the bowtie manual that explicitly states that "By default, Ns are simply excluded from the index and bowtie will not report alignments that overlap them." Does anyone know if the same is true for Xs?

    Finally, do you agree that the ability to direct bowtie-build to ignore portions of <reference_in> would be a sensible feature to request?

    Thanks for thinking!

    Malcolm Cook
    Stowers Institute for Medical Research

    Leave a comment:


  • bekkari
    replied
    Hi Ben,
    Can some one pleast let me know whether bowtie works with longer inserts (~20kb) between mate pairs?

    Thanks

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM
  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:35 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-09-2024, 02:46 PM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-07-2024, 06:57 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-06-2024, 07:17 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Working...
X