Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • DNA Sorcerer
    Member
    • Mar 2010
    • 24

    #61
    testformat says: illumina fastq raw single-ended 108bp

    As far as I remember this was a HiSeq run.

    I tried the reformat line suggested by Brian but the process stops after a while with errors. Apparently short of memory. Will try to improve that and try again.
    Hi there

    Comment

    • Brian Bushnell
      Super Moderator
      • Jan 2014
      • 2709

      #62
      Hmm... can you post the errors? Reformat by default uses very little memory, which is all that should be needed for a correctly-formatted file containing reads. It will run out of memory if you use it without increasing the default memory allocation on extremely long sequences (over tens of megabases) such as the human genome. It will never run out of memory on a correctly-formatted Illumina fastq file.

      So, it would also be helpful if you could post the results of "head" (the first 10 lines of the file).

      Comment

      • DNA Sorcerer
        Member
        • Mar 2010
        • 24

        #63
        See below. I run it for only one fo the files because doing both would go over my storage quota.

        java -da -Xmx200m -cp /home/cslamovi/CLARKSCV1.2.2-b/bbmap/current/ jgi.ReformatReads -da ibq qin=33 in=scratch/s_3_1_sequence.fastq out=scratch/fixed_1.fq
        Executing jgi.ReformatReads [-da, ibq, qin=33, in=scratch/s_3_1_sequence.fastq, out=scratch/fixed_1.fq]

        Input is being processed as unpaired
        java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2786)
        at fileIO.ByteFile1.fillBuffer(ByteFile1.java:180)
        at fileIO.ByteFile1.nextLine(ByteFile1.java:136)
        at stream.FASTQ.toReadList(FASTQ.java:648)
        at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:111)
        at stream.FastqReadInputStream.nextList(FastqReadInputStream.java:96)
        at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:656)
        at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)
        Input: 32775200 reads 3539721600 bases
        Output: 32775200 reads (100.00%) 3539721600 bases (100.00%)

        Time: 807.191 seconds.
        Reads Processed: 32775k 40.60k reads/sec
        Bases Processed: 3539m 4.39m bases/sec
        Exception in thread "main" java.lang.RuntimeException: ReformatReads terminated in an error state; the output may be corrupt.
        at jgi.ReformatReads.process(ReformatReads.java:1032)
        at jgi.ReformatReads.main(ReformatReads.java:45)
        Hi there

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #64
          Looks like you only ran this with 200MB of RAM. Can you try with -Xmx2g?

          How old is this data BTW (in years)?

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #65
            Reformat should never run out of memory with the default settings and short (<200kbp) reads. I think the input file is corrupt, and should be re-downloaded. The corruption probably occurs somewhere around the 32.77 millionth read, but it's hard to be sure...

            Comment

            • FridaJoh
              Junior Member
              • Jan 2016
              • 1

              #66
              Hi

              I came across this when searching for a way to demultiplex non-overlapping paired end reads that were sequenced using combinatorial barcodes. I don't suppose there is a way of doing that somehow using seal (or other tools?).

              Originally posted by Brian Bushnell View Post
              It is almost possible to do this with Seal, which outputs reads into bins based on kmer matching.

              seal.sh in=reads.fq pattern=%.fq k=6 restrictleft=6 mm=f ref=barcodes.fa rcomp=f

              That would require a file "barcodes.fa" like this:
              >AACTGA
              AACTGA
              >GGCCTT
              GGCCTT

              etc., with one fasta entry per barcode, so the output reads would be in file AACTGA.fq and so forth. This is sort of a common request, so maybe I will make it unnecessary to provide a fasta file of the barcodes. Does that matter to you either way?

              However, BBDuk has the flags "skipr1" and "skipr2", which allow it to only do kmer operations on one read or the other. Seal currently lacks this, but it's essential for processing inline barcodes. I'll add it for the next release.

              Comment

              • mcauchy
                Junior Member
                • Oct 2015
                • 5

                #67
                Newbie here! I have unzipped and untared bbmap but it wont run any commands. I have a Linux virtual box in windows 10. Am I missing some software to use BBMap?

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #68
                  Originally posted by mcauchy View Post
                  Newbie here! I have unzipped and untared bbmap but it wont run any commands. I have a Linux virtual box in windows 10. Am I missing some software to use BBMap?
                  What do you mean "it won't run any commands"? Can you see the shell scripts in the "bbmap" folder. Try the following command and see if it produces help output on screen after you change to bbmap directory.

                  Code:
                  $ ./bbmap.sh

                  Comment

                  • mcauchy
                    Junior Member
                    • Oct 2015
                    • 5

                    #69
                    What I mean is I run:
                    $ ./repair.sh in1=/media/sf_D_DRIVE/champagnefastqs/srr1290816_1.fastq in2=/media/sf_D_DRIVE/champagnefastqs/srr1290816_2.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq

                    ...and get:
                    java -ea -Xmx-211m -cp /media/sf_D_DRIVE/bbmap/current/ jgi.SplitPairsAndSingles rp in1=/media/sf_D_DRIVE/champagnefastqs/srr1290816_1.fastq in2=/media/sf_D_DRIVE/champagnefastqs/srr1290816_2.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq
                    Invalid maximum heap size: -Xmx-211m
                    Error: Could not create the Java Virtual Machine.
                    Error: A fatal exception has occurred. Program will exit.

                    Comment

                    • GenoMax
                      Senior Member
                      • Feb 2008
                      • 7142

                      #70
                      How much memory have you allocated to the VM? You should at least have 2+ GB to have enough available for programs to run.

                      Comment

                      • mcauchy
                        Junior Member
                        • Oct 2015
                        • 5

                        #71
                        I have allocated 2.9Gb, which is all I have to give. It seems that is not enough. Thanks for your help.

                        Comment

                        • GenoMax
                          Senior Member
                          • Feb 2008
                          • 7142

                          #72
                          Originally posted by mcauchy View Post
                          I have allocated 2.9Gb, which is all I have to give. It seems that is not enough. Thanks for your help.
                          That may be true but in case BBMap was not able to allocate RAM correctly can you try running the command as follows:

                          Code:
                          $ ./repair.sh -Xmx2g in1=/media/sf_D_DRIVE/champagnefastqs/srr1290816_1.fastq in2=/media/sf_D_DRIVE/champagnefastqs/srr1290816_2.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq

                          Comment

                          • westerman
                            Rick Westerman
                            • Jun 2008
                            • 1104

                            #73
                            Also it seems to me that '-Xmx-211m' is odd. Why the negative 211? I am not sure that makes a difference but it might.

                            Comment

                            • mcauchy
                              Junior Member
                              • Oct 2015
                              • 5

                              #74
                              Didn't run for very long....

                              $ ./repair.sh -Xmx2g in1=/media/sf_D_DRIVE/champagnefastqs/srr1290816_1.fastq in2=/media/sf_D_DRIVE/champagnefastqs/srr1290816_2.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq

                              java -ea -Xmx2g -cp /media/sf_D_DRIVE/bbmap/current/ jgi.SplitPairsAndSingles rp -Xmx2g in1=/media/sf_D_DRIVE/champagnefastqs/srr1290816_1.fastq in2=/media/sf_D_DRIVE/champagnefastqs/srr1290816_2.fastq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq
                              Executing jgi.SplitPairsAndSingles [rp, -Xmx2g, in1=/media/sf_D_DRIVE/champagnefastqs/srr1290816_1.fastq, in2=/media/sf_D_DRIVE/champagnefastqs/srr1290816_2.fastq, out1=fixed1.fq, out2=fixed2.fq, outsingle=single.fq]

                              Set INTERLEAVED to false
                              Started output stream.
                              Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
                              at java.util.HashMap.resize(HashMap.java:580)
                              at java.util.HashMap.addEntry(HashMap.java:879)
                              at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:427)
                              at java.util.HashMap.put(HashMap.java:505)
                              at jgi.SplitPairsAndSingles.repair(SplitPairsAndSingles.java:751)
                              at jgi.SplitPairsAndSingles.process3_repair(SplitPairsAndSingles.java:538)
                              at jgi.SplitPairsAndSingles.process2(SplitPairsAndSingles.java:304)
                              at jgi.SplitPairsAndSingles.process(SplitPairsAndSingles.java:230)
                              at jgi.SplitPairsAndSingles.main(SplitPairsAndSingles.java:45)

                              Comment

                              • GenoMax
                                Senior Member
                                • Feb 2008
                                • 7142

                                #75
                                How about setting the VM aside and running BBMap directly on windows 10. How much RAM is there on the machine? BBMap is written in java and will run there but you would need to take into account windows versions of the command line usage for BBMap.

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  Yesterday, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 12:03 PM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, Yesterday, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...