Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ShaunMahony
    replied
    This has probably been answered already, so apologies in advance.

    Does anyone know if Bowtie by default filters the input on the basis of quality? I'm getting a strange result. When I perfectly sample random 32mers from the mouse genome, and then align them back to the same genome, most aligners align ~83% uniquely. However, Bowtie is only aligning ~77%.

    Where are the missing reads going? It can't be mismatch qualities, since there are no mismatches in the sampled 'reads'. These are the options I'm using:

    ./bowtie -q --solexa-quals -m 2 --best -p 2

    Leave a comment:


  • dara
    replied
    yes that makes sense. Thank you

    Leave a comment:


  • Ben Langmead
    replied
    Hi dara,

    It complained that the total sequence length of all the reference strings was too big to fit in a single index, right? I didn't mean to imply that you can't feed multiple fasta files to bowtie-build; you certainly can. But if the total total length of all the sequence you're supplying is too big, you'll have to break the input up into chunks somehow and build separate indexes for each chunk. You might try feeding the fasta files in smaller bundles, or you might redistribute sequences throughout the fasta files, or both. If you've got chromosomes, you probably just want to try bundling together as many chromosome fasta files as you can get away with in a single invocation of bowtie-build.

    Does that make sense?

    Thanks,
    Ben

    Leave a comment:


  • dara
    replied
    Hello Ben,

    Thank you for your quick response. However, I'm a little puzzled because I was looking at the script that comes along with genome index on the Bowtie website (make_h_sapiens_asm.sh) and it seems to build just one index by providing all the chunks to the bowtie-build executable at once. Here's the line I'm talking about:

    INPUTS=hs_ref_chr1.fa,hs_ref_chr2.fa,hs_ref_chr3.fa,hs_ref_chr4.fa,hs_ref_chr5.fa,hs_ref_chr6.fa,hs_ref_chr7.fa,hs_ref_chr8.fa,hs_ref_chr9.fa,hs_ref_chr10.fa,hs_ref_chr11.fa,hs_ref_chr12.fa,hs_ref_chr13.fa,hs_ref_chr14.fa,hs_ref_chr15.fa,hs_ref_chr16.fa,hs_ref_chr17.fa,hs_ref_chr18.fa,hs_ref_chr19.fa,hs_ref_chr20.fa,hs_ref_chr21.fa,hs_ref_chr22.fa,hs_ref_chrMT.fa,hs_ref_chrX.fa,hs_ref_chrY.fa

    ${BOWTIE_BUILD_EXE} ${INPUTS} h_sapiens_asm

    I was trying the same thing- providing individual chromosome splits to the indexer and it complained.

    Thanks again

    Leave a comment:


  • Ben Langmead
    replied
    Now that paired-end is substantially done, we'll be embarking on gapped alignment soon. I'll probably start on that in June. Hopefully by the end of the summer you'll see at least initial gapped-alignment support. That's a guess though .

    Thanks,
    Ben

    Originally posted by dara View Post
    Also another question for you:

    Any updates on plans for bowtie supporting gapped alignment?

    thanks

    Leave a comment:


  • Ben Langmead
    replied
    Hi dara,

    Yes, you have to build separate index files and query them separately. You'll have to synthesize the per-index results into an overall set of results, e.g., with some scripts. Bowtie doesn't currently know how to query multiple indexes as part of a single alignment run.

    Thanks,
    Ben

    Leave a comment:


  • dara
    replied
    Also another question for you:

    Any updates on plans for bowtie supporting gapped alignment?

    thanks

    Leave a comment:


  • dara
    replied
    Hi Ben,

    Once the reference file has been split into chunks, do they have to be made into seperate indexes? So, for example if I've split the reference into chrom1, chrom2 and chrom3, would I need to do:

    ./bowtie-build -f chrom1 indexchrom1
    ./bowtie-build -f chrom2 indexchrom2
    ./bowtie-build -f chrom3 indexchrom3

    If I build separate indexes, how would I call all of them when mapping with my reads file?

    Thanks for your help
    Last edited by dara; 05-07-2009, 06:25 AM. Reason: name

    Leave a comment:


  • dara
    replied
    Hi Ben,

    Thank you for your response. The file is a human genome download from blast- Its about 8.3 gb in size and I was using the default 32-bit version of bowtie-build. Alright I will try what you suggested- will split the genome (by chromosome maybe) and then feed those splits to the bowtie-build.

    I will let you know if that causes any issues.

    Thanks

    Leave a comment:


  • Ben Langmead
    replied
    Hi dara,

    How large is the human_genomic.fa file? Are you using 32-bit or 64-bit bowtie-build? I've not seen this before. Most versions of Linux and glibc can handle very large files with no problem.

    I suspect that once you fix this problem, you'll run into the problem that Bowtie can only index reference sequences in chunks of about 3.6 Gbases or so. When you try to feed bowtie-build an input with too much sequence, it will say "Error: Reference sequence has more than 2^32-1 characters! Please divide the reference into batches or chunks of about 3.6 billion characters or less each and index each independently." This is because Bowtie uses 32-bit ints internally to refer to offsets in the index. We may fix this some day, but until then you'll have to work around this by indexing your reference in chunks.

    Ben

    Leave a comment:


  • dara
    replied
    BOWTIE_BUILD: Problems when using with large reference genomes?

    Hi all,

    I've been trying to run bowtie using the human_genomic.fa file from blast db as reference. When I attempted to use Bowtie-build to break up this large file into indexes, I keep getting a 'Error: could not open human_genomic.fa' message.
    I tried creating a file with just the first 10000 lines of the human genome and that works fine. I thought bowtie can easily handle such big reference files. Has anyone else faced this issue- any suggestions of how to overcome it?

    Here's what I did: ./bowtie-build -f human_genomic.fa human_genom

    thanks

    Leave a comment:


  • Ben Langmead
    replied
    Hi Ieuan,

    Originally posted by ieuanclay View Post
    0.9.9.2 does not have the same problem, and has roughly the rsaem footprint for both -a and -a --nostrata. Any idea what the change was? Either way I am happy!
    --best mode got an overhaul in 0.9.9.2 such that --best now conducts a best-first search, rather than a depth-first search with buffering and flushing of results, as before. My suspicion is that the old approach was, for some reads, buffering a huge number of results and exhausting memory. I'll take a harder look, though.

    Thanks,
    Ben

    Leave a comment:


  • thondeboer
    replied
    Hi Ben,

    You can read more on our read structure on our website and on this forum as well:

    Sequencing technologies without a commercially released platform (Oxford Nanopore, Halcyon Molecular, etc.)




    But basically we have a gapped read structure of 5 + 10 + 10 + 10 (times two) bases.
    The first gap is "negative" that is, has overlap between the 5 and 10 base reads.
    The other gaps are positive, that is, gaps in the more classical sense.

    You won't know the negative gap value (it can vary from 1 to 3 overlaps) unless you map the data (or unless there is only one way to overlap) onto the reference genome.

    Good to hear you are in support of SAM/BAM. We are considering this as our export format as well...

    Thon
    Complete Genomics

    Leave a comment:


  • Ben Langmead
    replied
    Hey Thon,

    We haven't tried implementing gapped alignment yet, though tools like BWA and SOAP2 show it's doable in this framework. Can you describe the "unusual read structure"?

    Yes, we would certainly like to support SAM/BAM output eventually. It's on the TODO list!

    Thanks,
    Ben

    Leave a comment:


  • thondeboer
    replied
    Hi Ben,

    Complete Genomics here....
    Have you tried to use our gapped read structure yet with Bowtie? As you may know, we have quite an unusual read structure so most mapping software is not able to use this effectively and we have build our own, but our customers would probably want to use other mapping software as well if only to compare our mapping to theirs...

    The data is available in the SRA under number SRA008092

    ftp://ftp.ncbi.nlm.nih.gov/sra/Submi...008/SRA008092/

    You can also get a sample data set which is part of the API we have released.



    We are considering changing to the SAM/BAM format as the export of our mapping data...Are you considering supporting SAM/BAM as an output format as well?

    Thanks!

    Thon

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM
  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:35 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-09-2024, 02:46 PM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-07-2024, 06:57 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-06-2024, 07:17 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Working...
X