BBMap (aligner for DNA/RNAseq) is now open-source and available for download.

lucila replied

08-16-2017, 10:35 AM
Originally posted by Brian Bushnell View Post

Hi Lucila,

No, it does not, but I have been considering adding gff output, at least for exon boundaries, if not full transcripts. For generating a reference transcriptome, you can do either mapping or assembly; so, assemblers like Trinity are also an option.

I have not tried it, but you may also want to examine this:
https://github.com/enormandeau/gawn

Thank you Brian for the information!!
Cheers,
Lucila.
Leave a comment:
Brian Bushnell replied

08-15-2017, 12:20 PM
Originally posted by lucila View Post

Hi Brian,
thank you so much for this useful tool. I would like to ask you a question. Does BBMap generate a list of transcripts as a result of the mapping of the reads? What I mean is if this tool generates a fasta file of reconstructed transcripts based on the reference genome and the reads used for mapping.
I need this because the genome that I am using is not annotated (I mean, I do not have a gff) and I want to generate a reference transcriptome using my reads and the genome.

Hi Lucila,

No, it does not, but I have been considering adding gff output, at least for exon boundaries, if not full transcripts. For generating a reference transcriptome, you can do either mapping or assembly; so, assemblers like Trinity are also an option.

I have not tried it, but you may also want to examine this:

GitHub - enormandeau/gawn: Genome Annotation Without Nightmares

https://github.com/enormandeau/gawn

Genome Annotation Without Nightmares. Contribute to enormandeau/gawn development by creating an account on GitHub.
Leave a comment:
lucila replied

08-15-2017, 07:18 AM
Hi Brian,
thank you so much for this useful tool. I would like to ask you a question. Does BBMap generate a list of transcripts as a result of the mapping of the reads? What I mean is if this tool generates a fasta file of reconstructed transcripts based on the reference genome and the reads used for mapping.
I need this because the genome that I am using is not annotated (I mean, I do not have a gff) and I want to generate a reference transcriptome using my reads and the genome.

Thank you again!
Best,
Lucila.
Leave a comment:
Brian Bushnell replied

08-09-2017, 02:42 PM
Originally posted by darthsequencer View Post

Thanks that's helpful. On the note of loading references - is there a way to use wildcards with the input and output of bbwrap?

Ah, sorry, but nope. You can, however, do something like...

Code:

ls *.fastq.gz > ls.txt

...then replace '\n' with ',' using a text editor (like Notepad++). I'm sure there's a simpler sed/awk solution, too.
Leave a comment:
darthsequencer replied

08-09-2017, 02:24 PM
Originally posted by Brian Bushnell View Post

To maximize speed when you are not looking for low-identity matches, "fast" (plus your identity threshold) is generally adequate. You can also speed it up by reducing "maxindel" (fast sets it to 80). Quality-trimming and adapter-trimming generally increase alignment speed.

With a large reference you may be able to increase speed with "k=14" instead of the default "k=13" - this increases the time to load the reference and memory usage, but increases mapping speed (so whether the process becomes faster or slower depends on how long it takes to load the reference compared to how much data you have to map). Also, turning off mate rescue (rescue=f) or reducing rescuedist (fast defaults to rescuedist=800) can also increase the speed slightly. Note that all of these options reduce sensitivity (aside from trimming which increases it), but at 97% identity you only need very low sensitivity anyway.

Thanks that's helpful. On the note of loading references - is there a way to use wildcards with the input and output of bbwrap?
Leave a comment:
darthsequencer replied

08-09-2017, 02:04 PM
Originally posted by GenoMax View Post

How long are the query sequences?

They range between 50bp single end to 2 x 250bp
Leave a comment:
Brian Bushnell replied

08-09-2017, 10:47 AM
To maximize speed when you are not looking for low-identity matches, "fast" (plus your identity threshold) is generally adequate. You can also speed it up by reducing "maxindel" (fast sets it to 80). Quality-trimming and adapter-trimming generally increase alignment speed.

With a large reference you may be able to increase speed with "k=14" instead of the default "k=13" - this increases the time to load the reference and memory usage, but increases mapping speed (so whether the process becomes faster or slower depends on how long it takes to load the reference compared to how much data you have to map). Also, turning off mate rescue (rescue=f) or reducing rescuedist (fast defaults to rescuedist=800) can also increase the speed slightly. Note that all of these options reduce sensitivity (aside from trimming which increases it), but at 97% identity you only need very low sensitivity anyway.

Last edited by Brian Bushnell; 08-09-2017, 11:01 AM.
Leave a comment:
GenoMax replied

08-09-2017, 10:39 AM
Originally posted by darthsequencer View Post

Hi Brian,
I have a lot of reference sequences I'm mapping to (~11 million)

How long are the query sequences?
Leave a comment:
darthsequencer replied

08-09-2017, 10:29 AM
bbmap fast macro?

Hi Brian,
I have a lot of reference sequences I'm mapping to (~11 million) and want to eek out as much as speed as possible.

I'm mostly looking for close matches - ex. I set minid to 0.97. Will setting fast still find matches like that? Any other thoughts on what I can set to get more speed?

Thanks a bunch!
Leave a comment:
darthsequencer replied

08-09-2017, 10:23 AM
Originally posted by Brian Bushnell View Post

It is tied to the number of threads defined for BBMap, just for some reason I capped it at a max of 8 even if the main process was allowed to use more; probably to conserve memory. I've increased it to a max of 64.

Thanks- that helps a lot!
Leave a comment:
Brian Bushnell replied

08-02-2017, 10:08 AM
Originally posted by GenoMax View Post

How about tying the number to the number of threads specified for BBMap? That way we know that many threads are available.

It is tied to the number of threads defined for BBMap, just for some reason I capped it at a max of 8 even if the main process was allowed to use more; probably to conserve memory. I've increased it to a max of 64.
Leave a comment:
GenoMax replied

08-02-2017, 09:52 AM
Originally posted by Brian Bushnell View Post

Oh, yep, for some reason I capped it at 8 threads. I wonder why? I'll eliminate that cap in the next release, which will probably be sometime today.

How about tying the number to the number of threads specified for BBMap? That way we know that many threads are available.

Last edited by GenoMax; 08-02-2017, 10:00 AM.
Leave a comment:
Brian Bushnell replied

08-02-2017, 09:47 AM
Oh, yep, for some reason I capped it at 8 threads. I wonder why? I'll eliminate that cap in the next release, which will probably be sometime today.
Leave a comment:
darthsequencer replied

07-31-2017, 07:51 PM
Hi - I love that bbmap and its tools can directly make bam files. I noticed that it's using samtools with 8 threads. Is there a way to increase the number of threads?

Thanks
Leave a comment:
jweger1988 replied

07-17-2017, 11:20 PM
Thanks for the reply. That is correct.

I have a virus that I introduced some degenerate nucleotides in to track bottlenecks.

I suppose I could just reformat to the area of the read I'm interested in and then just convert to fasta and use that.
Leave a comment:

Previous 1 4 11 12 13 14 15 16 17 24 34 template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 26 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News