Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mediator
    replied
    Hi All,
    I am running bowtie on a pair end alignment (illumina hiseq). Here is the command and the output I got:
    bowtie -p 4 -v 2 -k 11 -m 10 -t --best /bowtie/indexes/hg19 -1 /data/rna_seq/0916_1.fq -2 /data/rna_seq/0916_2.fq /data/rna_seq/0916.SAM -S

    Time loading forward index: 00:00:08
    Time loading mirror index: 00:00:08
    Time loading reference: 00:00:03
    End-to-end 2/3-mismatch full-index search: 04:48:55
    # reads processed: 114497412
    # reads with at least one reported alignment: 64037326 (55.93%)
    # reads that failed to align: 50290127 (43.92%)
    # reads with alignments suppressed due to -m: 169959 (0.15%)
    Reported 94715801 paired-end alignments to 1 output stream(s)
    Time searching: 04:49:14
    Overall time: 04:49:14
    Not sure why I have so many reads fail to align?

    Leave a comment:


  • rahilsethi
    replied
    Re: Extra parameter(s) specified error

    the reason why I did not give any value to -n and --maxbts because I am trying to use their default values. If I wouldn't mention -n then how would bowtie know whether I want to do mapping with -n or -v options? I will give it a try by giving numbers to all of them though, but I think it should not give any problem because I did not give value to -n and --maxbts

    Leave a comment:


  • westerman
    replied
    There error messages makes me think that you are missing some parameter options. In particular '-n' should have a number after it; e.g., '-n 2' as should '--maxbts'.

    What I think is happening in the successful line where you have '-n --maxbts' is that the 'n' parameter is reading in '--maxbts' as the number to use. Thus there is no problem.

    Where as in the bad line you have '-n -l 20 --maxbts --chunkmbs' with the results that '-n' is swallowing (using) '-l' ... '20' is being skipped, '--maxbts' is swallowing '--chunkmbs' which then throws off the rest of the command line.

    Anyway that is my guess. Please try your command either with numbers after '-n' and '--maxbts' or just get rid of those two parameters.

    Leave a comment:


  • rahilsethi
    replied
    Extra parameter(s) specified error

    I am running bowtie version 0.12.7 for mapping SOLiD (colorspace 50bp read length) data against human genome (hg19), on a linux platform (CentOS). When I run with the following parameters:

    $bowtie -C -f -Q sample_QV.qual -a --best --strata -n -l 20 --maxbts --chunkmbs 1000 -t --al 50_mapped_reads.csfasta --sam -p 5 /bowtie-ref-build/hg19/hg19 sample.csfasta 50_mapping.sam
    it gives me the following error

    Extra parameter(s) specified: "sample.csfasta", "50_mapping.sam"

    and when I was running with default seed-length(-l) value by not defining
    -l 20 i.e.:


    $bowtie -C -f -Q sample_QV.qual -a --best --strata -n --maxbts --chunkmbs 1000 -t --al 50_mapped_reads.csfasta --sam -p 5 /bowtie-ref-build/hg19/hg19 sample.csfasta 50_mapping.sam
    it runs successfully, generating the number of reads mapped and unmapped
    details on the screen.

    How can I then run the program at different seed length when I run bowtie
    since, as seen above, it does not run whenever I mention seed length
    within permissible range (i.e. 20 > 5 for read length 50bp)?

    Leave a comment:


  • oxydeepu
    replied
    Hi all,

    I am running bowtie, i have this query that is there any way can we specify the mismatches to be at a particular end, say 3'...??
    waiting for a reply
    Thanking you

    Deepak
    Last edited by oxydeepu; 10-09-2011, 01:58 AM. Reason: did not get any reply

    Leave a comment:


  • oxydeepu
    replied
    Hi all,

    I am running bowtie, i have this query that can we specify the mismatches to be at a particular end, say 3'...??
    waiting for a reply
    Thanking you
    Deepak

    Leave a comment:


  • nemesis
    replied
    bowtie -e (--maqerr) parameter

    Hi all,

    According to the bowtie manual and some posts I've read here, the -e/--maqerr <int> option indicates the maximum sum of quality scores allowed at the mismatched bases throughout the entire alignment and as such can control the total number of mismatches over the entire read length.

    I understand that the higher this option will be, the higher number of alignments I will obtain. But I still have trouble understanding the logic behind this parameter. Indeed let's say I set -e 70 with --nomaqround.
    A read with an overall high quality (for ex. each of its base has a Phred score of 38) and 3 mismatched bases to the reference sequence will be excluded from the alignment, since (38 * 3) > 70. While another read with an overall poor quality (for instance, having a Phred score of 10 for each of its bases) and 5 mismatches will be kept, since (10 * 5) < 70. But if we suppose that bases with low quality have higher chance to be sequencing errors than true variations, I'd rather exclude the latter read and keep the former one... (No ?)

    If anyone could help me understand this parameter and its usage I would be very grateful.

    Cheers

    Leave a comment:


  • belmax
    replied
    bowtie 0.12.7 &amp; SOLiD PE reads

    Hi all,
    There is the problem for bowtie 0.12.7 & SOLiD mate pair reads.
    bowtie (-C -f -I 1000 -X 4000 --ff <ebwt> -1 F3.csfasta -2 R3.csfasta ) maps 0.0%, while SOLiD`s Bioscope maps about 70%.
    Insert size is about 2500.
    Colorspace index is OK. Synthetic csfasta reads are mapped well by bowtie. Separately F3 or R3 are mapped well.
    What is could be wrong? Is the problem of bowtie or mate pair reads?

    cheers
    Last edited by belmax; 09-30-2011, 12:50 AM.

    Leave a comment:


  • phatjoe
    replied
    BOWTIE, shortreads with different length

    Hi,

    Just tried out BOWTIE today. May I know if BOWTIE supports the mapping for shortreads of different lengths? (e.g:for r1/#1 I have 96 bp whereas for the r1/#2, i have 86 bp.) The shortreads was trimmed with a different software prior to the alignment.

    My bowtie version is 0.12.7

    Thanks in advance!

    Leave a comment:


  • [mic]
    replied
    Originally posted by Xi Wang View Post
    If you are good at programming, you can check the source code of bow tie_build.
    I still tried, but the code is very nested, which makes it difficult for me to get the all-over-picture. I would be grateful if someone can help me.
    Last edited by [mic]; 09-19-2011, 05:55 AM.

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by [mic] View Post
    Hi,

    i try to analyse Bowtie for using GPGPUs through CUDA. Next to the limited Hardware ressources, I have one big problem. It seems that Bowtie relies on structs, using C++ datatypes (please correct me if I'm wrong), but i need C compatible datatypes to get them on the device memory (global memory of the graphic card) and also to work with.
    On my walkthrough I noticed that the first bytes are used to store some extra information for the ebwt_params struct, but:

    How do I get the BWT?
    How is it stored? (I think either uint32 or uint64)
    How do i "read" the nc values (0,1,2,3) from that?

    Are there any additional information available how the files built? (Any files, slides,.. are welcome..)

    The plan:
    read the index file with my own code and store it into C compatible Datatypes, get them to the device and try to make an exact alignment on GPU.

    Thank you
    mic
    If you are good at programming, you can check the source code of bow tie_build.

    Leave a comment:


  • [mic]
    replied
    Additional Index information

    Hi,

    i try to analyse Bowtie for using GPGPUs through CUDA. Next to the limited Hardware ressources, I have one big problem. It seems that Bowtie relies on structs, using C++ datatypes (please correct me if I'm wrong), but i need C compatible datatypes to get them on the device memory (global memory of the graphic card) and also to work with.
    On my walkthrough I noticed that the first bytes are used to store some extra information for the ebwt_params struct, but:

    How do I get the BWT?
    How is it stored? (I think either uint32 or uint64)
    How do i "read" the nc values (0,1,2,3) from that?

    Are there any additional information available how the files built? (Any files, slides,.. are welcome..)

    The plan:
    read the index file with my own code and store it into C compatible Datatypes, get them to the device and try to make an exact alignment on GPU.

    Thank you
    mic

    Leave a comment:


  • vebaev
    replied
    Hi, again
    as I told before I'm trying to map my cleaned reads to hg19

    If I use -a -v 0 my output is like 2GB and I see that many seq with low read counts like 1 or 2 can align ten of thousands of time onto human genome?! and it is messy...

    I can use the option -k 100 -v 0, but If I want to know how many times a seq is mapping in the genome how to be sure as I artifivially put a threshold?
    As I want to annotate also repeat-assosiated and other RNAs how to do that and escape from the mess of the above?

    or beter to discard these by -m 100?

    Best
    Last edited by vebaev; 08-11-2011, 09:45 AM.

    Leave a comment:


  • vebaev
    replied
    hi cswarth
    You are quite right!
    My main concerns are for example in this case:
    I want to annotate where in the genome are mapping 2 reads. If I do not allow mistmaches the first read will have 1 hit in intron and the second will not align to the genome at all. In the option with 1 mismatch the first read will map in the intron perfectly and in intergenic region with 1 mismatch, in other hand now the second read can map to the genome in one place as mismatching is allowed.
    In the second scenario we are happy because the secong read can align, but then how to annotate the first read which hits are increased

    If you followed me my point is that if I want to map more reads that cannot map with zero mismatches I will lose the "sensitivity" of my reads that are already mapped

    I hope you got it
    Last edited by vebaev; 08-10-2011, 04:06 PM.

    Leave a comment:


  • cswarth
    replied
    I am new to this, but it seems to me that if you allow mismatches, you absolutely can get alignments that aren't real. You can also get alignments that aren't real if you don't allow mismatches!

    There are several sources of false-positives and false-negative alignments. The reference sequence you are aligning to is the consensus from probably many replicates of a particular lineage of organism. Your experimental sequences may come from a slightly different lineage of organism with a slightly different genome. If you do not allow mismatches, you will miss valid alignments that differ only by an expected polymorphic site.

    There are also several sources of error in the sequencing itself. If you're using an illumina machine, there are at least four sources of error that may mis-call a base in the sequence. If you don't allow mismatches, those reads that have an error in sequencing might not align to your genome at all.

    On the other hand, if you allow mismatches, your reads may align to several places on the genome, and how do you know which one is valid? There is a really no good answer. You could do some further processing and only consider reads that land inside exons of known genes. Or maybe you want to allow mismatches but only use those reads that match a single place on the genome.

    In our experiment we are starting with the most conservative assumptions and slowly loosening the criteria as we gain more confidence in our methodology. So we only consider reads that match perfectly against mm9 genome and which fall inside of known exons with a coverage of at least 10 reads. We'll start to loosen the criteria and see how that affects our results.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 02:06 PM
0 responses
7 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-14-2024, 07:03 AM
0 responses
27 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-10-2024, 06:35 AM
0 responses
47 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-09-2024, 02:46 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Working...
X