Seqanswers Leaderboard Ad

**Brian Bushnell** · 07-11-2015, 07:26 PM

Thanks for noting this. Some tools (like reformat) support the "extin" and "extout" flags which let you override the default, so you could do this:

reformat.sh in=file.dat out=file2.dat extin=.sam extout=.fq

But, BBMap doesn't support that right now. I'll add it. And I don't particularly recommend sam -> fastq conversion because the names change, since in sam format read 1 and read 2 must have identical names, whereas in fastq format they will typically have "/1" and "/2" or similar to differentiate them. Though you can do that conversion if you want.

I have not used Galaxy and don't know what's possible, but until I make this change, I would suggest one of these:

1) Use BBDuk for this filtering; its default output format is fastq and it's probably faster than BBMap anyway in this case. The syntax is very similar. On the command line, it would be something like "bbduk.sh in=reads.fq outu=clean.fq ref=ecoli.fasta".

2) Tell BBMap "outu=stdout.fq" and pipe that to a file, if Galaxy supports pipes.

As for your question about pairing, the normal behavior in paired-mapping mode is:

"out=" will get everything.
"outm=" will get all pairs in which either of the reads mapped to the reference.
"outu=" will get all pairs in which neither read mapped to the reference.

For BBDuk, it's slightly different but essentially the same:
"out=" is the same as "outu=".
"outu", aka "out", will get all pairs in which neither had a kmer match to the reference.
"outm" will get all pairs in which either had a kmer match to the reference.
For BBDuk, this behavior can be changed with the "reib" (removeIfEitherBad) flag. The assumption of that flag's name is that the reference is contaminants being filtered against, so the default "reib=true" means any pair where either matches the contaminant is removed.

So, for both tools, if the input data is paired, the output data will also be paired - pairs are always kept together in all streams.

**Thowell** · 07-11-2015, 08:32 PM

Thank you for the quick reply Brian.

I was able to get things working with a pipe. I'm guessing the reads have to be interleaved with this method, but that will work fine for me until you can implement the alternate output flag.

Thanks again!

**mendezg** · 09-16-2015, 08:32 AM

How would you like us to cite bbduk in papers?

**GenoMax** · 09-18-2015, 04:23 PM

Originally posted by mendezg View Post

How would you like us to cite bbduk in papers?

In the past Brian has asked people to link to the project site on source forge.

**Brian Bushnell** · 09-20-2015, 07:18 PM

Hi - sorry I somehow missed this question! Yes, as Genomax stated, please just cite it something like this (altered according the format of the journal):

"BBDuk - Bushnell B. - sourceforge.net/projects/bbmap/"

**vmikk** · 09-24-2015, 04:42 AM

Hello!
Is it possible to use cutprimers.sh to cut the sequence AND to preserve the primer sites around?

**Brian Bushnell** · 09-24-2015, 08:57 AM

Not currently... but I'll plan to add a flag for that.

**Brian Bushnell** · 09-28-2015, 09:56 AM

I added the "include" flag to cutprimers. Default is "include=f". If you set "include=t" the primers will be retained for the output.

**vmikk** · 09-28-2015, 10:37 PM

Hello Brian! Thanks a lot for the implementation of this feature!
Meanwhile I thought to modify sam files from msa.sh, but the out of the box functionality is much more convenient!
Thanks again!

**dkainer** · 09-30-2015, 06:36 PM

Brian,

is there a way with the BB Suite to demultiplex paired-end reads based on inline barcodes, like Flexbar does?

I can see it can be done one barcode at a time by outputting matching reads based on the first 6 left bases. But can it be done in one command to demultiplex for multiple barcodes?

cheers
DK

**Brian Bushnell** · 09-30-2015, 06:52 PM

It is almost possible to do this with Seal, which outputs reads into bins based on kmer matching.

seal.sh in=reads.fq pattern=%.fq k=6 restrictleft=6 mm=f ref=barcodes.fa rcomp=f

That would require a file "barcodes.fa" like this:
>AACTGA
AACTGA
>GGCCTT
GGCCTT

etc., with one fasta entry per barcode, so the output reads would be in file AACTGA.fq and so forth. This is sort of a common request, so maybe I will make it unnecessary to provide a fasta file of the barcodes. Does that matter to you either way?

However, BBDuk has the flags "skipr1" and "skipr2", which allow it to only do kmer operations on one read or the other. Seal currently lacks this, but it's essential for processing inline barcodes. I'll add it for the next release.

**dkainer** · 09-30-2015, 07:04 PM

i hadn't noticed the Seal command. Thanks for responding so fast!

So i assume that if I were to input paired-end reads to Seal with a barcodes.fa as the ref, it would try and match the barcodes in both the R1 and R2 reads? Hence the need for skipr1 and skipr2...?

Additionally, would seal let you left trim off the barcode bases from the R1 read?

**Brian Bushnell** · 09-30-2015, 07:21 PM

Originally posted by dkainer View Post

i hadn't noticed the Seal command. Thanks for responding so fast!

So i assume that if I were to input paired-end reads to Seal with a barcodes.fa as the ref, it would try and match the barcodes in both the R1 and R2 reads? Hence the need for skipr1 and skipr2...?

That's correct.

Additionally, would seal let you left trim off the barcode bases from the R1 read?

Yes, it has a flag "ftl" (forcetrimleft) for doing that... "ftl=6" would remove the first 6 bases of all reads. Unfortunately it would do that for both read 1 and read 2. So... if you have reads in 2 files, that's fine; you just process the read1 file with "ftl=6". If they are interleaved it's more complicated - you'd have to split them first (for example, reformat.sh in=reads.fq out=read#.fq). I'll consider adding that the ability to only do all operations on left or right reads... it seems useful.

**habib** · 01-24-2016, 06:59 PM

Hi Brian,

I produced the following graph using khist.sh for my 100bp PE Illumina reads. Could you help me interpret the graph please? What is the difference between raw_count and unique_kmers?

https://raw.githubusercontent.com/ha...hist.sh_output

**westerman** · 01-25-2016, 08:58 AM

Try dividing the raw count by the depth and see that the result equals unique_kmers. That might give you a clue as to what everything means.

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 48 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News