Seqanswers Leaderboard Ad

**Mizzou55** · 09-22-2016, 01:23 PM

Hi Brian,

In this case I'm trying to trim vector and low qual from Sanger (BAC End Seq) reads (~850bp) to run through SSPACE. Will your script have problems with this?

**Brian Bushnell** · 09-22-2016, 03:01 PM

No, that will work... but generally I would recommend quality-trimming using "qtrim=r trimq=15" or similar, using the vector sequence as the reference with "ktrim=r" or "kmask=N" for vector trimming. "ref=vector kmask=N k=31 edist=1" would, for example, mask all of the vector sequence with Ns, which you could then remove via subsequent quality-trimming on both ends with "qtrim=rl trimq=2" (Ns have quality 0).

If you just want to remove the first and last 125bp the command would be "ftl=125 ftr2=125".

**germelcar** · 09-24-2016, 06:58 AM

Hi Brian:

I ran your "repair.sh" script yesterday and it has generated only "singletons.fq" file, the "r1.fq" and "r2.fq" files are empty. The command that I ran was the same as you mention on the page 2:

cat SRR867646_1.fastq SRR867646_2.fastq | repair.sh -Xmx4g in=stdin.fq out1=r1.fq out2=r2.fq outs=singletons.fq

I ran it with SRR867646's reads which seems to have beem trimmed previously to be uploaded to SRA archive files, and in fact, I am not able to do an alignment because R1 has 10919266 sequences and R2 has 2177589 sequences.

But... I don't know how to interpret that, why r1.fq and r2.fq files are empty and singletons.fq file not? Should I treat the reads as single-end using the singletons.fq file?

Also, the singletons.fq file contains 13096855 sequences, the same amount that R1+R2 from the original reads.

How should I handle that?
What recommendations do you have?

Thanks in advance.
~g

**GenoMax** · 09-25-2016, 07:07 AM

@germelcar's questions has been answered over at Biostars.

**Brian Bushnell** · 09-26-2016, 09:27 AM

Thanks, Genomax!

**JVGen** · 11-02-2016, 08:09 AM

BBDuk - Remove singletons?

Hi Brian,

I'm using BBDuk to adapter and quality trim Illumina reads that were prepped with the Nextera kit. I'm using the Plugin within Geneious, and I have a few questions:

1) I'm getting different a number or reads as outputs for pairs. I tried using removeifeitherbad=t, but still get different number of reads in each file. Would this command normally result in an equal number of reads for each file-pair? If so, I'll inquire with Geneious.

2) When trimming adapters, what is the benefit of selecting only the right or left end? I'm guessing my reads will only have adapter sequence on the 5' end, correct? This assumes that the insert size is longer than the read length, as I think in principal I should have adapters on both ends? Why not just trim both ends to be safe? Does it result in non-specific trimming of sample sequence?

Next, I intend to pair and merge the reads, and then de novo assemble.

Thanks for any help.

Jake

**Brian Bushnell** · 11-02-2016, 09:18 AM

Hi Jake,

1) When using paired reads with BBDuk, if the reads are in two files, you must run BBDuk just once using both files as input (using the in1= and in2= flags), rather than on one file at a time. As long as both files are used as input together, pairs will always be kept together.

2) Adapter-trimming should only be done on the right (5') end for fragment libraries. Left-trimming is only for special circumstances like specific long-mate pair protocols and amplicons with custom inline barcodes. For fragment libraries, the original molecule has adapters on both ends, but reading starts just after the adapter so the reads have no adapter sequence on the left end, and only have adapter sequence on the right end if the insert size was shorter than read length. If you left-trim a fragment library after right-trimming it, nothing will happen except that, as you note, you will get occasional random trimming of genomic sequence, though that will be very rare. Also note that BBDuk can't do left and right trimming simultaneously, as based on how it does trimming (when a reference kmer is found, trim that kmer and everything to the left or right) it would trim all bases in the entire read.

**JVGen** · 11-02-2016, 10:05 AM

Originally posted by Brian Bushnell View Post

Hi Jake,

1) When using paired reads with BBDuk, if the reads are in two files, you must run BBDuk just once using both files as input (using the in1= and in2= flags), rather than on one file at a time. As long as both files are used as input together, pairs will always be kept together.

Thanks Brian. I found that the unequal read number following trimming was due to the " were being generated from read length cut-off. I was specifying a minimum read length of 30, but it was removing the read's mate. It could be an issue with Geneious. I can quality- and adapter- trim and get the same read numbers in each file. I'll just dictate read length at a later step.

Jake

**Brian Bushnell** · 11-02-2016, 10:19 AM

Hi Jake,

It's still important to process both files together even if you have no minimum length cutoff, because the output order of BBDuk is not guaranteed to be the same as the input order (unless you add the "ordered" flag). So, I guess, if Geneious is running BBDuk on paired files individually, please add the "ordered" flag, and report that issue to the Geneious developers - it should process them together.

**JVGen** · 11-02-2016, 10:45 AM

Hi Brian, I'll report the issue to Geneious. I'm using the "keep order" feature in Geneious. I've attached a screenshot. The check boxes and insert fields that Geneious created are nice for those of us that don't code, but only if they actually do what they say :P

Jake

Never mind, forums have some pretty stringent rules on picture attachment dimension...painful.

**JVGen** · 11-03-2016, 09:16 AM

Originally posted by Brian Bushnell View Post

Hi Jake,

It's still important to process both files together even if you have no minimum length cutoff, because the output order of BBDuk is not guaranteed to be the same as the input order (unless you add the "ordered" flag). So, I guess, if Geneious is running BBDuk on paired files individually, please add the "ordered" flag, and report that issue to the Geneious developers - it should process them together.

Hey Brian. I heard back from Geneious. Turns out that I had to pair the reads before running BBDuk. After pairing I'm left with a single file in Geneious, though the reads have not been merged. Running BBDuk on the 'combined' file results in the removal of both pairs, yay!

**JVGen** · 11-07-2016, 11:32 AM

Hi Brian,

I just recently started using BBDuk to adapter and quality trim. I then use the reads to assemble in Spades, but ran into an issue. The error correction software that Spades uses cannot recognize the reads names.

The input read format is:
M00281:69:000000000-D22HU:1:1101:15164:1363 1:N:0:53

The output read format is:
M00281:69:000000000-D22HU:1:1101:15164:1363_1:N:0:53

The underscore isn't recognized by BWA, which is activated when the "careful" mode is used in Spades. Any way to switch that _ back to a space?

Thanks
Jake

**Brian Bushnell** · 11-07-2016, 11:53 AM

Hi Jake,

BBDuk does not add an underscore to read names. Reformat can, if you run it with the flag "addunderscore", but doesn't by default. Can you list all of the steps you are doing prior to running Spades and verify that the underscores are not present for the BBDuk input?

**JVGen** · 11-07-2016, 12:02 PM

Hi Brian,

You are correct. While the file names were correct in the program interface, upon export the "_" were introduced. I apologize for not opening up the fastq file in a texteditor to be sure.

Jake

**Brian Bushnell** · 11-07-2016, 01:30 PM

No problem!

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 22 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News