Seqanswers Leaderboard Ad

**Brian Bushnell** · 02-02-2015, 10:25 AM

Originally posted by pmiguel View Post

Hi Brian,
That is interesting. Any reason you chose to add only 1/4th the quality value of one of the pair? The old phrap method was to just add the quality values together if bases from different strands agreed.

That seemed reasonable enough -- if chance of one basecall being wrong was 1 in 100 (QV 20) but the basecall from the other strand agreed and also was QV 20, then the chance they are both wrong seems like it should be 1 in 10,000 (QV 40). Right?

--
Phillip

Hi Phillip,

Mathematically, if you have two completely independent observations, with perfectly accurate quality scores, the correct formula would be to add the scores together if the bases agree, and subtract them if they differ. However, I don't really like that approach since the quality scores are not perfectly accurate and the observations are not completely independent. For one thing, if a cluster is close enough to another cluster that there is interference during read 1, there will also be interference during read 2, with a similar nonrandom bias. Or if a cluster is near the edge of a lane for read 1 and thus slightly out of focus, it will be for read 2 as well. So even if the quality scores are accurate, both are affected by a similar bias rather than unrelated random biases, and thus strictly adding them is not appropriate.

Furthermore, a lot of downstream programs or analyses may be calibrated to assume that the dominant source of error is sequencing reading error, and ignore other error modes, as they tend to be smaller. Thus Q40 reads may yield Q40 variants. But when you have a Q80 read, the error mode will be dominated by other things like PCR errors or unwanted chemical reactions - there's no way merging two Q40 reads will yield bases with a 1/100,000,000 error rate, even if you can absolutely ensure that the overlap frame was correct, which you can't.

There is probably a better way to derive the quality than the simple equation I am using, but I like it because it is simple and gives useful results that can generally be used by tools that are designed for raw Illumina quality scores. Deriving an equation that correctly models all factors, or does a substantially better job overall, while still being simple enough that a person can understand the relationship between input and output, would be extremely difficult, I think.

**pmiguel** · 02-02-2015, 12:09 PM

Originally posted by Brian Bushnell View Post

Hi Phillip,

[...]

There is probably a better way to derive the quality than the simple equation I am using, but I like it because it is simple and gives useful results that can generally be used by tools that are designed for raw Illumina quality scores. Deriving an equation that correctly models all factors, or does a substantially better job overall, while still being simple enough that a person can understand the relationship between input and output, would be extremely difficult, I think.

Well, you've convinced me! Thanks for the response, Brian.

--
Phillip

**Brian Bushnell** · 04-23-2015, 04:46 PM

BBMerge now has a new flag - "outa" or "outadapter". This allows you to automatically detect the adapter sequence of reads with short insert sizes, in case you don't know what adapters were used. It works like this:

bbmerge.sh in=reads.fq outa=adapters.fa reads=1m

Of course, it will only work for paired reads! The output fasta file will look like this:

>Read1_adapter
GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG
>Read2_adapter
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG

If you have multiplexed things with different barcodes in the adapters, the part with the barcode will show up as Ns, like this:

GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG

**jazz710** · 10-27-2015, 08:18 AM

How to use merged reads

Brian,

Cool tool (as always)! I'm loving the Geneious integration for these tools. A+ move on your part.

I'm performing my first merges on some raw and trimmed datasets and I had a very basic question.

After merging my PE data, I will be left with some merged reads, and some unmerged (original PE) reads, correct? For de novo assembly, would I simply concatenate the merged reads onto the bottom of the 'Left' unmerged reads, then run Trinity with L (PE + merged) and R (PE R only)?

Best,
Bob

**Brian Bushnell** · 10-27-2015, 09:59 AM

That's an excellent question. I've never used Trinity. A post in their forum claimed that it will ignore the extra reads if you concatenate them to the end of one of the paired files. However, their faq claims the opposite:

If you have additional singletons, add them to the .fq file that they correspond to based on the sequencing method used (if they’re equivalent to the left.fq entries, add them there, etc).

So it sounds like probably you should, indeed, concatenate them at the end of the left reads. You can easily test this by using a single pair of reads (one read in the left and one in the right file), then concatenating all the unmerged reads onto the left file, then assembling. If if finishes almost immediately and nothing assembles, then it's ignoring the merged reads

**tarias** · 11-24-2015, 03:04 PM

Hi,
I have been using this program to merge reads but have not been able to generate a file with only the unmerged reads, any ideas on how to do this? thanks

**Brian Bushnell** · 11-24-2015, 03:08 PM

For interleaved reads:

bbmerge.sh in=reads.fq out=merged.fq outu=unmerged.fq

For reads in two files:

bbmerge.sh in1=read1.fq in2=read2.fq out=merged.fq outu1=unmerged1.fq outu2=unmerged2.fq

**rc630** · 01-18-2016, 08:43 AM

"Created read stream of 0 reads"

I am trying to use BBMerge with the two files I have for one sample of MiSeq data. I am getting this error however:

"Executing jgi.BBMerge [in1=1_S1_L001_R1_001.fastq, in2=1_S1_L001_R2_001.fastq, out=S1merged, reads, outu1=S1unmerged1, outu2=S1unmerged2]

BBMerge version 8.9
Writing mergable reads merged.
Unspecified format for output S1merged; defaulting to fastq.
Unspecified format for output S1unmerged1; defaulting to fastq.
Unspecified format for output S1unmerged2; defaulting to fastq.
Started output threads.
crisG: Warning - created a read stream for 0 reads.
Exception in thread "main" java.lang.AssertionError
at stream.ConcurrentGenericReadInputStream.<init>(ConcurrentGenericReadInputStream.java:129)
at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:129)
at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:51)
at jgi.BBMerge.runPhase(BBMerge.java:957)
at jgi.BBMerge.process(BBMerge.java:721)
at jgi.BBMerge.main(BBMerge.java:48)"

Why might it be saying there are 0 reads? I have opened it with FastQC and there are reads in the file.

Thanks

**Brian Bushnell** · 01-18-2016, 09:01 AM

Sounds like the files are probably misformatted. Can you run "head 1_S1_L001_R1_001.fastq" and "head 1_S1_L001_R2_001.fastq" and post the output here?

**rc630** · 01-18-2016, 09:07 AM

Originally posted by Brian Bushnell View Post

Sounds like the files are probably misformatted. Can you run "head 1_S1_L001_R1_001.fastq" and "head 1_S1_L001_R2_001.fastq" and post the output here?

Code:

head 1_S1_L001_R1_001.fastq
@M01936:151:000000000-AL2M3:1:1101:16824:1851 1:N:0:1
TCGATCGCTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCACTAGAGATATCGCCCTTCGGGGCTCTCATCCCTTCTATCACACCCCTGTGCACTTTGGCCACCGCTTCATTGCGGTGTGTCTT
+
BCCACFCCFCCCGGGGGGGGGGGHGHHHGHGHHHGGGAE53CGHHHHFGHGGGHHHGHHHHHGGGGGGHHHHHFB3335FHHHGGGAEHFA11E/EGHHHHHHGHGFHHHHFHHFDEDFAFGHHHHHHFEGCFFCFGGGGHHHGFFGDDDDHHHH
@M01936:151:000000000-AL2M3:1:1101:18467:1911 1:N:0:1
TCGATCGCTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCACTAGAGATATCGCCCTTCGGGGCTCTCATCCCTTCTATCACACCCCTGTGCACTTTGGCCACCGCTTCATTGCGGTGTGTCTT
+
BCCDCFCCFDCDGGGGGGGGGGHHHHHHHHHHHHGGHHGHGHHHHHHHGHHGGHHHHHHHHHGFGGGGHHHHFHHHE35GHHHFGGGGHHEE1E/EHHHHHHFHHGHHHHHHHHHGGCGEGHHHHHHHHGHHHGHGGGGGHHHGHGGGGGGHHHH
@M01936:151:000000000-AL2M3:1:1101:14252:2010 1:N:0:1
TCGATCGCTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCACTAGAGATATCGCCCTTCGGGGCTCTCATCCCTTCTATCACACCCCTGTGCACTTTGGCCACCGCTTCATTGCGGTGTGTCTT

Code:

1_S1_L001_R2_001.fastq 
@M01936:151:000000000-AL2M3:1:1101:16824:1851 2:N:0:1
CGGACTTGATGTACGAGCTGCGTTCTTCATCGATGCGAGAACCTAGAGATCCGTTGTTGAAAGTTATTTTAGTTTATAAGTTTTTACATTCAATGACTTGTGTTTATAGGTATGGTATAGTAAAAAGACACACCGCAATGAAGCGGTGGCCAAAG
+
33>>ABFFFFFFGGGGGGGGGFGGGGHGGHGFBGHGGCGFGB2B3333DFFEGFH1AFF555BFHHHHHG53@BF5FGGHHHHHEHDGHHHHBHHFHGGHHFHHHBFHFFHHHHFGFHEGF?GFFGEFHGFFFECCEGDFFGHGG?FCDCFG/FG
@M01936:151:000000000-AL2M3:1:1101:18467:1911 2:N:0:1
CGGACTCGATGTACGAGCTGCGTTCTTCATCGATGCGAGAACCTAGAGATCCGTTGTTGAAAGTTATTTTAGTTTATAAGTTTTTACATTCAATGACTTGTGTTTATAGGTATGGTATAGTAAAAAGACACACCGCAATGAAGCGGTGGCCAAAG
+
3>>AABFCBFFFGGGGGGGGGGGGGGHHHHGHHHHGGGGGGGFHGH335GHHHFHCGHGHG5B5DHHHHHHFHFHGGHGHFHHHGHHHFFHHFHHHHHHHHFHHHHHHHGFGHHHHHHHHBGFHHGEGHHHHGFG?DGCHHHEHGCGGGFHHAGB
@M01936:151:000000000-AL2M3:1:1101:14252:2010 2:N:0:1
CGGACTTGATGTACGAGCTGCGTTCTTCATCGATGCGAGAACCTAGAGATCCGTTGTTGAAAGTTATTTTAGTTTATAAGTTTTTACATTCAATGACTTGTGTTTATAGGTATGGTATAGTAAAAAGACACACCGCAATGAAGCGGTGGCCAAAG

**Brian Bushnell** · 01-18-2016, 09:56 AM

Hmmm, that's very strange. They look fine to me. For some reason it thinks you set the max number of reads to zero. Can you try adding the flag "reads=-1"?

**rc630** · 01-18-2016, 10:04 AM

Originally posted by Brian Bushnell View Post

Hmmm, that's very strange. They look fine to me. For some reason it thinks you set the max number of reads to zero. Can you try adding the flag "reads=-1"?

Thank you very much, it worked fine with the flag added

**luc** · 03-04-2016, 11:15 AM

What is the difference/ between ftr and ftr2?
Does ftr apply to the forward read and ftr2 to the reverse read?

Thanks in advance!

forcetrimright=0 (ftr) If nonzero, trim right bases of the read
after this position (exclusive, 0-based).
forcetrimright2=0 (ftr2) If positive, trim this many bases on the right end.

**Brian Bushnell** · 03-05-2016, 09:23 AM

Hi Luc,

Both of them apply to both reads. Let's say that you have 100bp reads in and you want to trim them to 70bp by trimming the right end. You can do that in two ways - either "ftr2=30" which trims the last 30bp, or "ftr=69" which trims all bases after position 69 (zero-based). The reason both options are present is because they behave differently when reads have variable lengths.

**lcmb** · 05-05-2016, 11:18 AM

Truncated merges

Hello,

I have noticed that some of my merged sequences, after using BBMerge, are truncated. My R1/R2 input sequences are 251 bp long (paired-end 250 bp sequencing). The overlap/insert, which I expect to be merged from the R1/R2 files, is 202 bp long. Although many of my merged sequences are the expected 202 bp long, a handful are much shorter.

The only optional parameter I have changed is setting mismatches=0, as I do not want any discrepancies in the overlap.

Here is an example:

Code:

Raw Data: R1 and R2 files (fastq.gz)
R1 File: 251 nucleotides
@M02344:9008:000000000-AJ1PF:1:1110:28244:16073 1:N:0:TGACCAAT+ATAGAGGC
TCTGCCGTCATCGACTTCGAAGGTTCGAATCCTTCCCCTCTAACCACGGCCGAAATTCAATACCCGGATCAAGCTCAATTCGGGTCGAGGTCGGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATGTCGTATGCCGTCTTCTGCTTGAAAAAAAATAAGTGGTGCGAAGAGAGCCTGTGGCCAACCTCATATGCGTGGAGATGTCTCG

R2: 251 nucleotides
@M02344:9008:000000000-AJ1PF:1:1110:28244:16073 2:N:0:TGACCAAT+ATAGAGGC
CGGAGCCTATGGAAAAACGCCAGCAACGCGGCCCGACCTCGACCCGAATTGAGCTTGATCCGGGTATTGAATTTCGGCCGTGGTTAGAGGGGAAGGATTCGAACCTTCGAAGTCGATGACGGCAGAAGATCGGAAGAGCGTCGTGTAGGGAAGGAGTGGGCCTAGATGTGTGGGTCGCGGTGGTGGCCGGATCGATGAAGAAAGTCTGGTGCTGGTGTGTGTGACGCCAGGGCGTCAGCAGTGGTGATGTG

------------------------------------------------------------------------------------------------------------------------------------------------------

BBMerge Output: 126 nucleotides
@M02344:9008:000000000-AJ1PF:1:1110:28244:16073 1:N:0:TGACCAAT+ATAGAGGC
TCTGCCGTCATCGACTTCGAAGGTTCGAATCCTTCCCCTCTAACCACGGCCGAAATTCAATACCCGGATCAAGCTCAATTCGGGTCGAGGTCGGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCG

------------------------------------------------------------------------------------------------------------------------------------------------------

Thank you for your help!

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 48 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News