As GenoMax says, trimming to Q30 is not beneficial before merging reads. BBMerge has some internal quality-trimming options, so it can try to merge, then quality-trim if it is unsuccessful, then try to merge again, etc. That can slightly increase the merge rate. But typically I just use the whole untrimmed reads as input. The longer the input reads are, the less likely it is for BBMerge to make an accidental incorrect merge, and it does take quality scores into account, so I do not recommend quality-trimming prior to BBMerge. Adapter-trimming is fine though.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Merge pairs before normalisation?
Hello, I'm building a pipeline for metagenomics.
I follow the bb tools user guide and do:
- normalization with bbnorm
- error correction with tedpole
- merge (with extension) with bbmerge
I want to increase the merging to get a better assembly.
I suspect that many reads, which could be merge are thrown away during the normalisation.
Wouldn't it be better to do merging (without extension) first than taking primarily the merged reads, normalize, error-correct and merge with extension?
What is the best way of normalising paired end and merged pairs or singletons in bbnorm?
For now I do two rounds of bbnorm and supply the other reads via the `extra` parameter, is there a better way to do?
Comment
-
Hi,
I have the shotgun data. Paired-end reads 100bp each end. I want to do MetaPhlAn2 next to know the general taxonomy profile.
So I am considering to merge them before the MetaPhlAn2. However, I do not know I need to run bbmap first to do quality control, OR to run bbmerge first to merge the sequence. Any suggestions?
Thanks in advance
Comment
-
@chloe - It's normally simplest and most effective to do QC first on the raw data, then anything else (such as merging) later.
@silask - they way you are doing it is currently the most effective way. It's a little bit annoying to have to run BBNorm twice, but that's the only way to process both paired and unpaired reads.
Comment
-
Hi, Brian,
Thanks for the reply. However, I have tried the QC. I used
bbduk.sh in=R1.fastq.gz out=filter_R1.fq maq=30
bbduk.sh in=R2.fastq.gz out=filter_R2.fq maq=30
(no reads in R1R2 will be trimmed)
bbduk.sh in=R1.fastq.gz out=clean_R1.fq trimq=30
bbduk.sh in=R2.fastq.gz out=clean_R2.fq trimq=30
(it will trim 50% of reverse reads, but no forward reads)
bbduk.sh in1=R1.fastq.gz in2=R2.fastq.gz out1=R1_001.fq out2=R2.fq outm=fail.fq bhist=hist_base.txt qhist=hist_q.txt aqhist=hist_aq.txt bqhist=hist_bq.txt ecco=t
(Also no reads will be trimmed)
But when I run the code BBmerge, only 32.268% of the reads can be joined.
Do you have any suggestions?
Thanks in advance.
Comment
-
@chloe1005: It is possible that only 32% of your reads have inserts of a size that the reads can merge.
`trimq=30` is too severe a bar for trimming. If you have a reference genome then not doing any trimming for quality works fine. If you are doing any de novo work then you may want to trim at Q20 or Q25.
Comment
-
Hi,
I am still confusing about the difference between the quality trimming and quality filtering. What is the difference between them?
May also know how to get the reference genome? Since I also see the first threads in this post.
Looking forward to getting the answer.
Comment
-
RQCFilter Norm and EC
Hi Brian,
I am trying to trim and filter my data with RQCFilter but I cannot find an option for normalisation and error correction. Are there any parameters in this package? Also there is a parameter called -merge. Does it do merging? Should I set it to false and try normalising and error correcting first?
Comment
-
Source: https://jgi.doe.gov/data-and-tools/b...preprocessing/
"These steps replicate the QA protocol implemented at JGI for Illumina reads. There is a program “RQCFilter” which implements them as a pipeline, but that is not publically available because it has numerous hard-coded paths to reference datasets of contaminants."
It is in the bbtools files.
Nevermind! 1) Is it a good plan to normalise and error correct first BEFORE merging? 2) Do I need to follow a different approach at trimming and filtering short vs long mate pair reads (Nextera)?Last edited by kokyriakidis; 07-08-2018, 12:15 PM.
Comment
-
Since notes on the page you linked say this:
There is a program “RQCFilter” which implements them as a pipeline, but that is not publically available because it has numerous hard-coded paths to reference datasets of contaminants.
In general @Brian has recommended merging reads before doing any additional manipulations.
Comment
-
Originally posted by GenoMax View PostSince notes on the page you linked say this:
You should follow the steps that are denoted to replicate that functionality on the linked page.
In general @Brian has recommended merging reads before doing any additional manipulations.
Comment
Latest Articles
Collapse
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, Yesterday, 05:31 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 05:31 AM
|
||
Started by seqadmin, 10-24-2024, 06:58 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
10-24-2024, 06:58 AM
|
||
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types
by seqadmin
Started by seqadmin, 10-23-2024, 08:43 AM
|
0 responses
48 views
0 likes
|
Last Post
by seqadmin
10-23-2024, 08:43 AM
|
||
Started by seqadmin, 10-17-2024, 07:29 AM
|
0 responses
58 views
0 likes
|
Last Post
by seqadmin
10-17-2024, 07:29 AM
|
Comment