BBMerge guide recommends trimming adapters before merging -- but also, in a different place, recommends providing the adapter sequences to BBMerge. Which is best?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Program ran out of memory on large dataset: Need some tips
Hi folks,
We have a shotgun metagenomic dataset (approx. 120Gbs compressed). I want to merge paired-end reads as longer reads will increase assembly performance. And I have tried it on a small subset of data and it remarkably increased N50 and scaffold length.
But now I want to merged approx 120Gbs of compressed data for subsequent assembly. We have a system with 32 threads and 120Gb of memory. After going through tips on bbtools page, I tried following command and ran out of memory (Error message: This program ran out of memory.
Try increasing the -Xmx flag and using tool-specific memory-related parameters).
bbmerge-auto.sh in1=in_R1.fastq.gz in2=in_R2.fastq.gz out=merged.fastq.gz outu1=1_um.fastq.gz outu2=2_um.fastq.gz outa=adapters.txt ihist=insert_histogram.txt k=62 vstrict rem extend2=50 ecct mininsert=150 -Xmx80g minprob=0.8 prefilter=2 prealloc ziplevel=5
My question are:
1. Are there any other specific parameters with which it is manageable to run this command on mentioned configured server.
2. Can I subset the data using partition.sh bbtools wrapper and run the command? But as I understand sub-setting the data will reduced merging of reads. is it true?
Any tips/advice in this case is appreciated.
Thanks
Comment
-
Can we merge two forward reads with this tool?
Hi Brain,
I am really new to bioinformatics data analysis and just found this wonderful tool. Here I have a question: I have several environmental samples (A, B, and C). I sequenced them (shotgun metagenoimcs sequencing; paired-end) and found that, for sample B, the sequencing depth is not high enough. So, I asked the sequencing center to sequence sample B again. In the end, I got two sets of sequencing results for sample B: B.R1, B.R2, B.2nd.R1, and B.2nd.R2. For my downstream analysis (e.g., co-assembly), do you think I should merge B.R1 and B.2nd.R1 first? If so, how can use BBmerge to do that? Based on my understanding, BBmerger is designed to merge R1 and R2. Can it be used to merge two sets of R1s (from two separate sequencing runs)? Or, is that merging even necessary?
Thanks a lot!
Yours,
Comment
-
If you have two separate sequencing runs you can't "merge" the two reads since they are not sequencing the same fragment. Reason you can (in some cases) merge two reads R1/R2 to get a longer representation is because they are sequences from same fragment.
Comment
-
Hi Brian,
I'm trying to use your BBMerge program on my trimmed miRNA PE reads, but I am getting a very low merge rate. I looked at the files that had sequences unable to merge to try to understand what the problem could be, but I'm confused because there were sequences that match and could have been merged. (Please refer to the below comparison of the R1 and R2 sequences from the unmergeable files.) Could you provide some insight as to why this might be happening?
[login001: ~]$ head mirna4Merged/14343_003_R1_fastx_trimmer_NOT_MERGED_output.fastq
@A00672:72:HNTG5DSX2:4:1101:24198:13369 1:N:0:AAGTACAG
TTCAAGTAATCCAGGATAGGC
+
FFFFFFFFFFFFFFFFFFFFF
@A00672:72:HNTG5DSX2:4:1101:24795:13369 1:N:0:AAGTACAG
TGAGGTAGTAGGTTGTGTGGTTT
+
FFFFFFFFFFFFFFFFFFFFFFF
@A00672:72:HNTG5DSX2:4:1101:29351:13369 1:N:0:AAGTACAG
TATTGCACTCGTCCCGGCCTCC
[login001: ~]$
[login001: ~]$
[login001: ~]$
[login001: ~]$ head mirna4Merged/14343_003_R2_fastx_trimmer_NOT_MERGED_output.fastq
@A00672:72:HNTG5DSX2:4:1101:24198:13369 2:N:0:AAGTACAG
GCCTATCCTGGATTACTTGAA
+
FFFFFFFFFFFFFFFFFFFFF
@A00672:72:HNTG5DSX2:4:1101:24795:13369 2:N:0:AAGTACAG
AAACCACACAACCTACTACCTCA
+
FFFFFFFFFFFFFFFFFFFFFFF
@A00672:72:HNTG5DSX2:4:1101:29351:13369 2:N:0:AAGTACAG
GGAGGCCGGGACGAGTGCAATA
Thank you for your time!
Emily Shrimpton
Edit: RESOLVEDLast edited by Emily Shrimpton; 02-10-2023, 08:40 PM.
Comment
-
Hi all --
I'm having trouble with BBMerge, from 38.90...
Given two fastq reads, adapter trimmed, pre-merge:
Code:r1: @M08540:24:000000000-KVP7W:1:1101:14796:2333 1:N:0:AGCGATAG+TAATCTTA TACTTTGCGAGATGCCCTAAGCTGGCGGGACTCTGGGGTTCGCGACACTGGCAGAGCATTACGCCCTGCAGGTAATACGACTCACTATAGGGGATAGATGTCCACGAggtctctATCATGCGGCTTTTAACAATCGTACGCTGCAGGTCGACAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGCGATAG + CCCCCFFFCCCCGGGGGGGGGGHHHGGGGGGHHHHHHGEEHGGGGGGGFHHHHGGHHHHHHHGGGGGHHHHHHHHHHHHFEGGHHHHHHHHHGGGHHHHHHHHHHGGGGGHHHHHHHHHHHGGGGGHHHHHHHHHGHHHGGGGGHHHHHGGGGGGGGGGGGFGGGGGGGGGGFFFFFFFFFFFFFFFFFFFFFF r2: @M08540:24:000000000-KVP7W:1:1101:14796:2333 2:N:0:AGCGATAG+TAATCTTA GTCGACCTGCAGCGTACGATTGTTAAAAGCCGCATGATAGAGACCTCGTGGACATCTATCCCCTATAGTGAGTCGTATTACCTGCAGGGCGTAATGCTCTGCCAGTGTCGCGAACCCCAGAGTCCCGCCAGCTTAGGGCATCTCGCAAAGTAAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTTAAGATTAGTGTAGATCTCGGGGGTAGCTGGAGCATTAAAAAAGAAAAAGAGATGAGAGTAGAAG + BBBB@DBBFFFFGGGGGGCGGGHHHHHHGHHGGGGGHHGHHHHHGHHGHHGGGHHHHHHHHHGHGHHGHHHHHHGGHGHHHGHHHHHHGGGGGGGHHGHHHHHHHHHHHGHGFGGGGGGGHHHHHHGGGGGGHHHHHHHGHHHHHGGGGHGFHHHHHHGGGGHHHGGGGGG.BCFGC:9CCFCFFGGGGFFFGGB0CBFB00;/;.@@D-.;//;.9.9:9//;///-.::/.....9/////::;////
Code:r1: tactttgcgagatgccctaagctggcgggactctggggttcgcgacactggcagagcattacgccctgcaggtaatacgactcactataggggatagatgtccacgaggtctctatcatgcggcttttaacaatcgtacgctgcaggtcgacAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGCGATAG r2(rc): CTTCTACTCTCATCTCTTTTTCTTTTTTAATGCTCCAGCTACCCCCGAGATCTACACTAATCTTAACACTCTTTCCCTACACGACGCTCTTCCGATCTtactttgcgagatgccctaagctggcgggactctggggttcgcgacactggcagagcattacgccctgcaggtaatacgactcactataggggatagatgtccacgaggtctctatcatgcggcttttaacaatcgtacgctgcaggtcgac
Code:CTTCTACTCTCATCTCTTTTTCTTTTTTAATGCTCCAGCTACCCCCGAGATCTACACTAATCTTAACACTCTTTCCCTACACGACGCTCTTCCGATCTtactttgcgagatgccctaagctggcgggactctggggttcgcgacactggcagagcattacgccctgcaggtaatacgactcactataggggatagatgtccacgaggtctctatcatgcggcttttaacaatcgtacgctgcaggtcgacAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGCGATAG
qtrim=f qtrim2=f trimq=0 tbo=f tno=f usequality=f forcetrimleft=0 forcetrimright=0 forcetrimright2=0 forcetrimmod=0 forcemerge=t
produces this merged output:
Code:>M08540:24:000000000-KVP7W:1:1101:14796:2333 1:N:0:AGCGATAG+TAATCTTA TACTTTGCGAGATGCCCTAAGCTGGCGGGACTCTGGGGTTCGCGACACTGGCAGAGCATTACGCCCTGCA GGTAATACGACTCACTATAGGGGATAGATGTCCACGAGGTCTCTATCATGCGGCTTTTAACAATCGTACG CTGCAGGTCGAC
Are there setting I can use that will produce the desired output (a 292 bp sequence, with no trimming on either end)?
- Likes 1
Comment
-
Originally posted by Emily Shrimpton View PostHi Brian,
I'm trying to use your BBMerge program on my trimmed miRNA PE reads, but I am getting a very low merge rate. I looked at the files that had sequences unable to merge to try to understand what the problem could be, but I'm confused because there were sequences that match and could have been merged. (Please refer to the below comparison of the R1 and R2 sequences from the unmergeable files.) Could you provide some insight as to why this might be happening?
[login001: ~]$ head mirna4Merged/14343_003_R1_fastx_trimmer_NOT_MERGED_output.fastq
@A00672:72:HNTG5DSX2:4:1101:24198:13369 1:N:0:AAGTACAG
TTCAAGTAATCCAGGATAGGC
+
FFFFFFFFFFFFFFFFFFFFF
@A00672:72:HNTG5DSX2:4:1101:24795:13369 1:N:0:AAGTACAG
TGAGGTAGTAGGTTGTGTGGTTT
+
FFFFFFFFFFFFFFFFFFFFFFF
@A00672:72:HNTG5DSX2:4:1101:29351:13369 1:N:0:AAGTACAG
TATTGCACTCGTCCCGGCCTCC
[login001: ~]$
[login001: ~]$
[login001: ~]$
[login001: ~]$ head mirna4Merged/14343_003_R2_fastx_trimmer_NOT_MERGED_output.fastq
@A00672:72:HNTG5DSX2:4:1101:24198:13369 2:N:0:AAGTACAG
GCCTATCCTGGATTACTTGAA
+
FFFFFFFFFFFFFFFFFFFFF
@A00672:72:HNTG5DSX2:4:1101:24795:13369 2:N:0:AAGTACAG
AAACCACACAACCTACTACCTCA
+
FFFFFFFFFFFFFFFFFFFFFFF
@A00672:72:HNTG5DSX2:4:1101:29351:13369 2:N:0:AAGTACAG
GGAGGCCGGGACGAGTGCAATA
Thank you for your time!
Emily Shrimpton
Edit: RESOLVED
Comment
-
Hi, I am new to programming and bioinformatics. I am having trouble merging my fastq files with the inclusion of the overhangs/nonoverlapping reads at either end. I tried PEAR before, which gave me final consensus only for the overlapping region. I tried bbmerge and it gave me the same thing. And I am not able to figure out what script I should use to have the final consensus that includes overhangs added to ends of my consensus so I have the full consensus sequences. For example, I have following fastq reads. Could any expert help me with a custom script for this? thanks.
@S1_A01_015_F
ATTTATTTTTGGTGCTTTTTCTGGTGTAGTAGGAACTACATTATCTGTTTTAATTAGAATGGAATTAGCACAACCCGGTAATCAAATTTTTGCTGGGAATCATCATTTATATAATGTTGTTGTTACAGCACATGCATTTATTATGATTTTTTTTATGGTTATGCCTGTTTTAATAGGTGGTTTTGGTAATTGGTTTGTACCTTTAATGATTGGTGCTCCAGATATGGCTTTTCCTCGTATGAATAATATAAGTTTTTGGTTATTACCACCATCATTATTATTATTAGTTTCTTCAGCTATTGTTGAATCAGGTGCAGGTACTGGTTGGACTGTATATCCTCCTTTATCAAGTGTACAAGCACATTCAGGTCCTTCAGTAGATTTAGCTATTTTTAGTTTACATTTATCAGGTATTTCTTCTTTATTAGGTGCTATTAATTTTTTATCTACTATTTATAATATGAGAGCTCCAGGTTTAAGTTTTCATAGATTACCTTTATTTGTTTGGGCTATATTTATTACTGCTTTTTTATTATTATTAACTTTACCTGTATTAGCTGGTGCAATTACTATGTTATTAACTGATAGAAACTTAAATACATCTTTTTACGATCCATCAGGCGGAGGAGATCCTGTATTATACCAACATTTATTTTGGTTTTTCGGCAACCCCGGAAG
+
9>*%**ROOAB*,78K[[[[W>W:0G6J@RP_J__Y_TPK_W_MRP\\__\\__\\\_W;W___\_\W___\__________\___W____\\_W____\___\_________WK__RW____\__\__________RKWWW__RW__W_WRW________________WWWW_\YW___WWWW______WW_WWW_____W_________________\S______LRRW_____Y____________\_W_________________________________\__________________W________________________________WRRWWW___________________________________________________\_____________________________________________RRR_____________________O_____________________________________________________[[[[R[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[WW[[[[[[[[[[[[[[[[[[[[[R[[[[[[W[T[[[[[[[[[[[[R[[[[[[W[[R[[[MQ[[SO[W[LJB[[HMLWRT[[UPRSSSSJ=ISLM[ISIHKSS>@[IDD''&&&&*<A5-*((5
@S1_A01_015_R
CTCCTCGCCTGATGGATCGTAAAAGATGTATTTAAGTTTCTATCAGTTAATAACATAGTAATTGCACCAGCTAATACAGGTAAAGTTAATAATAATAAAAAAGCAGTAATAAATATAGCCCAAACAAATAAAGGTAATCTATGAAAACTTAAACCTGGAGCTCTCATATTATAAATAGTAGATAAAAAATTAATAGCACCTAATAAAGAAGAAATACCTGATAAATGTAAACTAAAAATAGCTAAATCTACTGAAGGACCTGAATGTGCTTGTACACTTGATAAAGGAGGATATACAGTCCAACCAGTACCTGCACCTGATTCAACAATAGCTGAAGAAACTAATAATAATAATGATGGTGGTAATAACCAAAAACTTATATTATTCATACGAGGAAAAGCCATATCTGGAGCACCAATCATTAAAGGTACAAACCAATTACCAAAACCACCTATTAAAACAGGCATAACCATAAAAAAAATCATAATAAATGCATGTGCTGTAACAACAACATTATATAAATGATGATTCCCAGCAAAAATTTGATTACCGGGTTGTGCTAATTCCATTCTAATTAAAACAGATAATGTAGTTCCTACTACACCAGAAAAAGCACCAAAAATTAAATATAAAGTACCTATGTCTTTATGATTTGTTGAAAA
+
B*+'2+(0C2W:0[C4'L4(*'*)/=1:?HJ[J<=7____HH_GJI_FS_Y__PP_R___TN_\\T_\M\_\_\\_\OY__\_\_______RR__\_________\__W______\OSYYW_Y___\\\\YCLW_____W_\WW\______\\\\Y_YWYWW___Y\_______W_\SAG<KQY_WRW_____YWQWY\_O\\EYYQN_WC_\Y___\Y_\\__\___\\\_\_\_____Y__\_\__\__YWW_Y_\__\_WQY_Y_Y___\\_\_Y__KYYWLLW_SWWO\___OY\_\Y____\_______\____\__________YYLLWQE_\\___R_____\\\YMYYOOY___________YYYYY\\OOY\\\________N_SRFG__Y\\YYY_YORR_Y__Y__YR___YYYOY__________Y____LYY_OO_Y_____L_Y___YY_________O______________YYLJOYH?=OL___________MU_U___________[NUNMOUURWOQUUUPIIW[OMWOU7?OUUUPWU[[[UUN[WNLW[[WWUWNMU[[U[U[UDLC[QSHRWRSPWQMPQS[VTS[JJPPS[TTTT[[[PQR[[[[[RRJHMRTQQRR[W[R[[LQQSW@MLA9J1QQKL
could any expert provide me a custom script for this two fastq files to merge so i get full lenght consensus?
I used the command
# Construct the BBMerge command with options to not trim the overhangs
bbmerge_cmd = f'bbmerge.sh in1={forward_file} in2={reverse_file} out={output_file} outu={output_file}_unmerged.fastq ' \
f'qtrim=f qtrim2=f trimq=0 tbo=f tno=f usequality=f forcetrimleft=0 forcetrimright=0 forcetrimright2=0 forcetrimmod=0'
But still it gave me output by trimming the overlapped region as output as below:
@S1_A01_015
TTTAATTTTTGGTGCTTTTTCTGGTGTAGTAGGAACTACATTATCTGTTTTAATTAGAATGGAATTAGCACAACCCGGTAATCAAATTTTTGCTGGGAATCATCATTTATATAATGTTGTTGTTACAGCACATGCATTTATTATGATTTTTTTTATGGTTATGCCTGTTTTAATAGGTGGTTTTGGTAATTGGTTTGTACCTTTAATGATTGGTGCTCCAGATATGGCTTTTCCTCGTATGAATAATATAAGTTTTTGGTTATTACCACCATCATTATTATTATTAGTTTCTTCAGCTATTGTTGAATCAGGTGCAGGTACTGGTTGGACTGTATATCCTCCTTTATCAAGTGTACAAGCACATTCAGGTCCTTCAGTAGATTTAGCTATTTTTAGTTTACATTTATCAGGTATTTCTTCTTTATTAGGTGCTATTAATTTTTTATCTACTATTTATAATATGAGAGCTCCAGGTTTAAGTTTTCATAGATTACCTTTATTTGTTTGGGCTATATTTATTACTGCTTTTTTATTATTATTAACTTTACCTGTATTAGCTGGTGCAATTACTATGTTATTAACTGATAGAAACTTAAATACATCTTTTTANGATCCANCAGGCGGAGG
But i wanted the consensus as
TTTTCAACAAATCATAAAGACATAGGTACTTTATATTTAATTTTTGGTGCTTTTTCTGGTGTAGTAGGAACTACATTATCTGTTTTAATTAGAATGGAATTAGCACAACCCGGTAATCAAATTTTTGCTGGGAATCATCATTTATATAATGTTGTTGTTACAGCACATGCATTTATTATGATTTTTTTTATGGTTATGCCTGTTTTAATAGGTGGTTTTGGTAATTGGTTTGTACCTTTAATGATTGGTGCTCCAGATATGGCTTTTCCTCGTATGAATAATATAAGTTTTTGGTTATTACCACCATCATTATTATTATTAGTTTCTTCAGCTATTGTTGAATCAGGTGCAGGTACTGGTTGGACTGTATATCCTCCTTTATCAAGTGTACAAGCACATTCAGGTCCTTCAGTAGATTTAGCTATTTTTAGTTTACATTTATCAGGTATTTCTTCTTTATTAGGTGCTATTAATTTTTTATCTACTATTTATAATATGAGAGCTCCAGGTTTAAGTTTTCATAGATTACCTTTATTTGTTTGGGCTATATTTATTACTGCTTTTTTATTATTATTAACTTTACCTGTATTAGCTGGTGCAATTACTATGTTATTAACTGATAGAAACTTAAATACATCTTTTTANGATCCANCAGGCGGAGGATCCTGTATTATACCAACATTTATTTTGGTTTTTCGGCAACCCCGGAAG
Last edited by ghimbikal; 02-21-2024, 05:54 AM.
- Likes 1
Comment
-
I tripped over something to get around the merging problem in bbmerge described above...
Merging fastq reads with bbmerge does trim the merge, resulting in only retaining the overlapping part, however -- if I first turn the fastq reads into fasta reads and interleave them, it produces the desired output:
Code:Checking output: First two reads from interleaved file: >LH00315.67.22H3LVLT3.1.1101.5558.1048.R1 1:N:0:ACTTCCGG+GACCAATT GGGACTGTGGCCGTCGACCTGCAGCGTACGGGAATACCTGTTGATATTTAAGAGACCTCGTGGACATCTA TCCCCTATAGTGAGTCGTATTACCTGCAGGGCGTAATG >LH00315.67.22H3LVLT3.1.1101.5558.1048.R2 2:N:0:ACTTCCGG+GACCAATT TCGGACCAACTAAGCTGGCGGGACTCTGGGGTTCGCGACACTGGCAGAGCATTACGCCCTGCAGGTAATA CGACTCACTATAGGGGATAGATGTCCACGAGGTCTCTT >2nd read RC AAGAGACCTCGTGGACATCTATCCCCTATAGTGAGTCGTATTACCTGCAGGGCGTAATGCTCTGCCAGTGT CGCGAACCCCAGAGTCCCGCCAGCTTAGTTGGTCCGA Blast alignment: Query 50 AAGAGACCTCGTGGACATCTATCCCCTATAGTGAGTCGTATTACCTGCAGGGCGTAATG 108 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 108 AAGAGACCTCGTGGACATCTATCCCCTATAGTGAGTCGTATTACCTGCAGGGCGTAATG 50 Manual alignment: GGGACTGTGGCCGTCGACCTGCAGCGTACGGGAATACCTGTTGATATTTaagagacctcgtggacatctatcccctatagtgagtcgtattacctgcagggcgtaatg aagagacctcgtggacatctatcccctatagtgagtcgtattacctgcagggcgtaatgCTCTGCCAGTGTCGCGAACCCCAGAGTCCCGCCAGCTTAGTTGGTCCGA Expected sequence (157 bp): GGGACTGTGGCCGTCGACCTGCAGCGTACGGGAATACCTGTTGATATTTaagagacctcgtggacatctatcccctatagtgagtcgtattacctgcagggcgtaatgCTCTGCCAGTGTCGCGAACCCCAGAGTCCCGCCAGCTTAGTTGGTCCGA BBMerge output (157 bp): >LH00315.67.22H3LVLT3.1.1101.5558.1048.R1 1:N:0:ACTTCCGG+GACCAATT GGGACTGTGGCCGTCGACCTGCAGCGTACGGGAATACCTGTTGATATTTAAGAGACCTCGTGGACATCTA TCCCCTATAGTGAGTCGTATTACCTGCAGGGCGTAATGCTCTGCCAGTGTCGCGAACCCCAGAGTCCCGC CAGCTTAGTTGGTCCGA
Not sure if combining the R1 and R2 fastq files into a single interleaved file before running bbmerge, or if producing two fasta files rather than one interleaved fasta file will have the same effect.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 11:09 AM
|
0 responses
22 views
0 likes
|
Last Post
by seqadmin
Today, 11:09 AM
|
||
Started by seqadmin, Today, 06:13 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
Today, 06:13 AM
|
||
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
30 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
Comment