Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mjm
    Junior Member
    • Mar 2015
    • 4

    Jumping library

    Hi all,

    I'm trying to merge 2 genome assemblies using metassembler (http://sourceforge.net/projects/metassembler/).

    Unfortunately, I don't have jumping libraries for my assemblies but a colleague told me that I could "simulate" them consistently using the reads I have.
    As a computer scientist, I don't know much about jumping libs, and I have no idea on how to simulate them.

    The genome assemblies I have are in fasta format, and reads in fq.

    It would be very nice to get any help on the matter, even starting with references to get a better understanding of jumping libs (I've already read the wikipedia page several times ).

    Thanks a lot !

    mjm
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    You can simulate a 4000bp jump library from your existing data like this (using BBTools):

    cat assembly1.fa assembly2.fa > combined.fa
    bbmap.sh ref=combined.fa
    randomreads.sh reads=1000000 length=100 paired interleaved mininsert=3500 maxinsert=4500 bell perfect=1 q=35 out=jump.fq.gz


    That may not help much, since it is only recycling information that is already present in your assemblies, unlike a real jump library.

    The orientation will come out forward/reverse; if you need them in a different orientation you can reverse-complement one or both reads with the associated "reformat.sh" tool.
    Last edited by Brian Bushnell; 03-27-2015, 09:24 AM.

    Comment

    • mjm
      Junior Member
      • Mar 2015
      • 4

      #3
      Hi Brian,

      Thanks a lot for your answer!
      I'm trying to run BBMap but I get the following error message:
      ==
      mjm@mining:~/sequence_assembly/bbmap> bbmap.sh ref=A.fa
      java -Djava.library.path=/home/mjm/sequence_assembly/bbmap/jni/ -ea -Xmx39107m -cp /home/mjm/sequence_assembly/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 ref=A.fa
      Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=A.fa]

      BBMap version 34.72
      Retaining first best site only for ambiguous mappings.
      No output file.
      NOTE: Deleting contents of ref/genome/1 because reference is specified and overwrite=true
      Writing reference.
      Executing dna.FastaToChromArrays2 [A.fa, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]

      Set genScaffoldInfo=true
      Writing chunk 1
      Set genome to 1

      Loaded Reference: 0.015 seconds.
      Loading index for chunk 1-1, build 1
      No index available; generating from reference genome: /home/mjm/sequence_assembly/bbmap/ref/index/1/chr1_index_k13_c5_b1.block
      Indexing threads started for block 0-1
      Indexing threads finished for block 0-1
      Generated Index: 9.051 seconds.
      No reads to process; quitting.

      Total time: 10.184 seconds.

      mjm@mining:~/sequence_assembly/bbmap> randomreads.sh reads=1000000 length=100 paired interleaved mininsert=3500 maxinsert=4500 bell perfect q=35 out=jump.fq.gz
      java -ea -Xmx39106m -cp /home/mjm/sequence_assembly/bbmap/current/ align2.RandomReads3 build=1 reads=1000000 length=100 paired interleaved mininsert=3500 maxinsert=4500 bell perfect q=35 out=jump.fq.gz
      Exception in thread "main" java.lang.NumberFormatException: For input string: "true"
      at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
      at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
      at java.lang.Float.parseFloat(Float.java:451)
      at align2.RandomReads3.main(RandomReads3.java:296)

      My config is as follows:
      mjm@mining:~/sequence_assembly/bbmap> uname -a
      Linux mining 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

      mjm@mining:~/sequence_assembly/bbmap> java -version
      java version "1.8.0_25"
      Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
      Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

      Any idea why it crashes?

      mjm

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        Oh, sorry - instead of "perfect" it should be "perfect=1". That's the fraction of reads that are generated with no errors, and for your usage, you don't want any errors.

        randomreads.sh reads=1000000 length=100 paired interleaved mininsert=3500 maxinsert=4500 bell perfect=1 q=35 out=jump.fq.gz

        Comment

        • mjm
          Junior Member
          • Mar 2015
          • 4

          #5
          Hi Brian,

          Thanks a lot for your answer. Now it works fine!
          I'm wondering if I could switch the quality values to a given numerical value?
          For instance, I'd like to replace:
          ====================
          @0_chr1_1_22737261_22737360_185068_scaffold_2
          TATCCAGGCGTTCCTCGATACGGAGACCGATGGCACCTGCAAGCGCAATGCATTCGCGGCCCTCATGTCCATCAGCCACCAAAAGGCCCTGGAGTACCTG
          +
          BGEBEFEHHHJJIGJJJIEJJGIJHGDHGGJJHJHJIHHJJGGJFGJJGIJGHHIIJJHJJHHHIJJHIDIJHHJJJFIGJEIIFHJJIIGIFHCCBBBC
          ====================

          by

          ====================
          @0_chr1_1_22737261_22737360_185068_scaffold_2
          TATCCAGGCGTTCCTCGATACGGAGACCGATGGCACCTGCAAGCGCAATGCATTCGCGGCCCTCATGTCCATCAGCCACCAAAAGGCCCTGGAGTACCTG
          +
          2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
          ====================

          I've tried bbmap.sh ref=A.fa fakefastaquality=2 but it doesn't seem to be the solution.

          Many thanks for your help,

          mjm

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            mjm,

            I'm not sure why you would want to do that. Are you aware that the symbol "2" indicates the quality value 17, which is fairly low?

            Anyway, you could do it like this, assuming your fake LMP reads are in "lmp.fq":

            reformat.sh in=lmp.fq out=lmp.fa
            reformat.sh in=lmp.fa out=fixed.fq qfake=17


            A better value for qfake on synthetic data would be something like 30.

            -Brian

            Comment

            • mjm
              Junior Member
              • Mar 2015
              • 4

              #7
              Hi Brian,

              Thanks for your solution.
              I'm still trying to use the jumping libs I created with bbmap for running matassembler, and I wanted the libs to be as close as possible to the samples provided by metassember.
              I'll update the thread once it works...

              mjm

              Comment

              • malm
                Junior Member
                • Jan 2015
                • 2

                #8
                Issues...

                Hello,

                I'm trying to run this exact pipeline on my genomes, and I run into this

                "java.lang.OutOfMemoryError: Java heap space" errors, even when using
                "-Xmx3200m"

                However, if i hit "ctrl+c" while it's still running (but not doing anything), it continues, and quite quickly produces the jump.fq.gz file.

                And the file consist of the right amount of reads, but I'm worried because I had to force it through, and also because all the quality values are all either just A's, B's....G's like this:

                @0_chr2_0_77814904_77815003_17861_flattened_line_6732
                AGGATCTCCAGCCGCTCCTCGCTCTCCTCCTGATGCTCTCTCTCGCTCCTCGCCAGCTGTGACGGCCGCGACCTGGCCTCCTTACGGGACATCTCCTGCC
                +
                FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

                Is this OK, or is there a way to fix this?

                I'm running it on a quite large cluster, so there should be no problem allocating the needed memory resources...

                Martin

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  #9
                  So.... first off, that's really strange about the ctrl+c thing. Though I suppose that maybe you are allocating enough memory for the reference but not BBMap's index, which is not really necessary for randomreads.

                  How much memory is on the computer, and how big is the reference you are using?

                  I suggest deleting the "ref" folder, which may be corrupt, and then doing something like this (with the specific numbers adjusted as needed):

                  randomreads.sh ref=reference.fa -Xmx25g reads=1000000 length=100 paired interleaved mininsert=3500 maxinsert=4500 bell perfect=1 q=35 out=jump.fq.gz

                  As for the fixed quality values, you can remove "q=35" and reads will get random quality values, but that won't affect anything other than making the output file bigger due to worse compression. -Xmx25g will allocate 25 gigabytes of memory; that's probably way more than you need. It should require something like 1.5x the size of the reference file.

                  Please let me know if this still causes problems.

                  Comment

                  • malm
                    Junior Member
                    • Jan 2015
                    • 2

                    #10
                    Hi Brian!

                    That worked perfectly! I tried up to 10g, as I figured that that should be more than enough, but apparently 25g did the trick.

                    Thanks a lot!

                    Best regards,
                    Martin

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM
                    • SEQadmin2
                      Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                      by SEQadmin2


                      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                      Introduction

                      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                      05-22-2026, 06:42 AM
                    • SEQadmin2
                      Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                      by SEQadmin2

                      Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                      Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                      05-06-2026, 09:04 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, Yesterday, 08:59 AM
                    0 responses
                    14 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    22 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 11:40 AM
                    0 responses
                    19 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 05-28-2026, 11:40 AM
                    0 responses
                    32 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...