Header Leaderboard Ad

Collapse

Introducing BBMerge: A paired-end read merger

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BBMerge guide recommends trimming adapters before merging -- but also, in a different place, recommends providing the adapter sequences to BBMerge. Which is best?

    Comment


    • Program ran out of memory on large dataset: Need some tips

      Hi folks,

      We have a shotgun metagenomic dataset (approx. 120Gbs compressed). I want to merge paired-end reads as longer reads will increase assembly performance. And I have tried it on a small subset of data and it remarkably increased N50 and scaffold length.

      But now I want to merged approx 120Gbs of compressed data for subsequent assembly. We have a system with 32 threads and 120Gb of memory. After going through tips on bbtools page, I tried following command and ran out of memory (Error message: This program ran out of memory.
      Try increasing the -Xmx flag and using tool-specific memory-related parameters).

      bbmerge-auto.sh in1=in_R1.fastq.gz in2=in_R2.fastq.gz out=merged.fastq.gz outu1=1_um.fastq.gz outu2=2_um.fastq.gz outa=adapters.txt ihist=insert_histogram.txt k=62 vstrict rem extend2=50 ecct mininsert=150 -Xmx80g minprob=0.8 prefilter=2 prealloc ziplevel=5

      My question are:

      1. Are there any other specific parameters with which it is manageable to run this command on mentioned configured server.

      2. Can I subset the data using partition.sh bbtools wrapper and run the command? But as I understand sub-setting the data will reduced merging of reads. is it true?

      Any tips/advice in this case is appreciated.

      Thanks

      Comment


      • @Shriram369: As long as your reads are in proper order in the files it would be fine to sub-set the data into manageable chunks and then do the merging.

        Comment


        • Can we merge two forward reads with this tool?

          Hi Brain,

          I am really new to bioinformatics data analysis and just found this wonderful tool. Here I have a question: I have several environmental samples (A, B, and C). I sequenced them (shotgun metagenoimcs sequencing; paired-end) and found that, for sample B, the sequencing depth is not high enough. So, I asked the sequencing center to sequence sample B again. In the end, I got two sets of sequencing results for sample B: B.R1, B.R2, B.2nd.R1, and B.2nd.R2. For my downstream analysis (e.g., co-assembly), do you think I should merge B.R1 and B.2nd.R1 first? If so, how can use BBmerge to do that? Based on my understanding, BBmerger is designed to merge R1 and R2. Can it be used to merge two sets of R1s (from two separate sequencing runs)? Or, is that merging even necessary?

          Thanks a lot!

          Yours,

          Comment


          • If you have two separate sequencing runs you can't "merge" the two reads since they are not sequencing the same fragment. Reason you can (in some cases) merge two reads R1/R2 to get a longer representation is because they are sequences from same fragment.

            Comment


            • Hi Brian,

              I'm trying to use your BBMerge program on my trimmed miRNA PE reads, but I am getting a very low merge rate. I looked at the files that had sequences unable to merge to try to understand what the problem could be, but I'm confused because there were sequences that match and could have been merged. (Please refer to the below comparison of the R1 and R2 sequences from the unmergeable files.) Could you provide some insight as to why this might be happening?

              [login001: ~]$ head mirna4Merged/14343_003_R1_fastx_trimmer_NOT_MERGED_output.fastq
              @A00672:72:HNTG5DSX2:4:1101:24198:13369 1:N:0:AAGTACAG
              TTCAAGTAATCCAGGATAGGC
              +
              FFFFFFFFFFFFFFFFFFFFF
              @A00672:72:HNTG5DSX2:4:1101:24795:13369 1:N:0:AAGTACAG
              TGAGGTAGTAGGTTGTGTGGTTT
              +
              FFFFFFFFFFFFFFFFFFFFFFF
              @A00672:72:HNTG5DSX2:4:1101:29351:13369 1:N:0:AAGTACAG
              TATTGCACTCGTCCCGGCCTCC
              [login001: ~]$
              [login001: ~]$
              [login001: ~]$
              [login001: ~]$ head mirna4Merged/14343_003_R2_fastx_trimmer_NOT_MERGED_output.fastq
              @A00672:72:HNTG5DSX2:4:1101:24198:13369 2:N:0:AAGTACAG
              GCCTATCCTGGATTACTTGAA
              +
              FFFFFFFFFFFFFFFFFFFFF
              @A00672:72:HNTG5DSX2:4:1101:24795:13369 2:N:0:AAGTACAG
              AAACCACACAACCTACTACCTCA
              +
              FFFFFFFFFFFFFFFFFFFFFFF
              @A00672:72:HNTG5DSX2:4:1101:29351:13369 2:N:0:AAGTACAG
              GGAGGCCGGGACGAGTGCAATA

              Thank you for your time!
              Emily Shrimpton

              Comment

              Latest Articles

              Collapse

              • seqadmin
                A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                by seqadmin


                ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                01-24-2023, 01:19 PM
              • seqadmin
                Introduction to Single-Cell Sequencing
                by seqadmin
                Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                ...
                01-09-2023, 03:10 PM

              ad_right_rmr

              Collapse
              Working...
              X