Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bbmap aborts after mapping some reads

    Hello Brian,

    we are using bbmap to see in how far it is possible to quantify gene expression by mapping Illumina RNA-seq reads to the genome of a closely related species, e.g. map chimpanzee reads to human or as in this example Macaque reads.

    To this end, we generated Macaque Illumina SE reads using flux-simulator and map them to
    hg38 and for comparison we were also trying also Mmul8, downloaded from ensembl (wget ftp://ftp.ensembl.org/pub/release-92...toplevel.fa.gz).

    Everything mapped fine to hg38, but not to Mmul8.

    Exception in thread "Thread-12" java.lang.AssertionError
    at align2.BBIndex.extendScore(BBIndex.java:2612)
    at align2.BBIndex.slowWalk3(BBIndex.java:1389)
    at align2.BBIndex.find(BBIndex.java:777)
    at align2.BBIndex.find(BBIndex.java:623)
    at align2.BBIndex.findAdvanced(BBIndex.java:400)
    at align2.AbstractMapThread.quickMap(AbstractMapThread.java:750)
    at align2.BBMapThread.processRead(BBMapThread.java:408)
    at align2.AbstractMapThread.run(AbstractMapThread.java:508)


    I tried to run on one thread, increased memory to 101G, removed small contigs of <100kb ... but the error message remains the same.

    We are running a Debian system with java version "1.8.0_181" and have BBMap version 38.02 -- the detailed error output is in the attached file.

    The false Mapping Rates of bbmap are so much better than for STAR & GSNAP, that we definitely want to use bbmap for our paper and we are nearly done all other species (marmoset, gorilla, chimpanzee and orangutan) and the simulations ran through -- the only missing piece is the mapping to the Mmul8.

    Any help would be greatly appreciated.

    Best, Ines
    Attached Files

    Comment


    • bbmap for demultiplexing dual barcodes.

      Hello,
      I need it if possible to use dual indexes.

      For example: In bold dual barcode

      #R1 read
      @SOLEXA1_0069_FC:3:1:1673:948#ACAGTG/1
      GACTAACCGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAATGTTAGCCGTCGGGCAGTATACTGTTCGG
      +
      BMMQNTWSWWb_____b_bb__________Y_________YYYYY[[[Y[__________XXRWXVVVVTYYYYYT

      #R2 read
      @SOLEXA1_0069_FC:3:1:1673:948#ACAGTG/2
      CTGAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAGCCATGCAGCACCTGT
      +
      ghgaggfghhhhhhhhhhghhhhhhhhhhfhhhghfWffch[hhgahhedffddR[^W^Zc^_cac[Wb]^W^

      Here are 16 possible in the file I am working on.
      TCAG-TCAG
      CTGA-CTGA
      TCAG-GACT
      GACT-GACT
      AGTC-AGTC
      GACT-TCAG
      GACT-AGTC
      GACT-CTGA
      TCAG-CTGA
      AGTC-TCAG
      AGTC-GACT
      CTGA-AGTC
      CTGA-GACT
      AGTC-CTGA
      TCAG-AGTC
      CTGA-TCAG

      The first four nts are the barcode like our example before would be:
      GACT-CTGA_R1.fq
      GACT-CTGA_R2.fq

      But you would need both reads to tell you that it's GACT-CTGA and not something else.
      What would the command look like for this? Does this demux script do the dual barcoding?

      Comment


      • ref input for BBMap and paired ends

        I am sorry if this question is very basic but I am getting a low percentage of mapping reads to the reference genome, about the 36% of the pct reads mapped. Any clue what this is the case?

        I am using as the reference genome the genome in scaffolds and paired-end reads...

        Comment


        • Originally posted by juanita View Post
          I am sorry if this question is very basic but I am getting a low percentage of mapping reads to the reference genome, about the 36% of the pct reads mapped. Any clue what this is the case?

          I am using as the reference genome the genome in scaffolds and paired-end reads...
          Have you trimmed adapters away from the reads (short fragments will create reads that are part genomic and part adapter and may not map). You could use the related BBmap tool sendsketch to get a sense of what is in your reads (after trimming). When we do genotyping of samples, many samples have contaminating species...so using sendsketch can help figure out what is in there. You can input the entire fastq file with sendsketch, or go to read mose and get a result on a per read basis.

          You can also grab 100 reads, turn them into fasta format and do blastn with them (if online use the blastn rather than megablast option) and see read by read what is in there.

          Other options...your sample is not highly related to the reference, the reference may be incomplete and missing regions, the reference is lacking high copy repeat content like mtDNA or chloroplast and many reads go to those.
          Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

          Comment


          • How to use usejni with latest version (

            I just installed the latest version of the BBTools (38.26), and I can't seem to get the usejni flag to work. The java .so file compiles fine, but then I get error messages like this when I run, e.g., bbmap.sh with usejni=t:
            Code:
            Native library can not be found in java.library.path.
            I found this in the changelog:
            Removed JNI path flag from BBMerge, BBMap, and RQCFilter shell scripts.
            and this in docs/compiling.txt:
            3) C code. This was developed by Jonathan Rood to accelerate BBMap, BBMerge, and Dedupe, but is currently disabled.
            And sure enough, it is commented out in the bbmap.sh code:
            Code:
                    #local CMD="java -Djava.library.path=$NATIVELIBDIR $EA $z -cp $CP align2.BBMap build=1 overwrite=true fastareadlen=500 $@"
                    local CMD="java $EA $z -cp $CP align2.BBMap build=1 overwrite=true fastareadlen=500 $@"
            If I revert to the previous version of the CMD, with the java.library.path set, then the command runs with the C code fine.

            Why was this disabled? Does this affect previous analyses that used this C code? Or was this purely a performance issue?

            Sorry if I've missed this posted somewhere else, and thanks in advance for any help.

            Chris

            Comment


            • usejni and compiled C code in BBTools

              I just installed the latest version of the BBTools (38.26), and I notice that the C code provided by the usejni=t flag for some tools has been depreciated / disabled.

              I found this in the changelog:
              Removed JNI path flag from BBMerge, BBMap, and RQCFilter shell scripts.
              and this in docs/compiling.txt:
              3) C code. This was developed by Jonathan Rood to accelerate BBMap, BBMerge, and Dedupe, but is currently disabled.
              Sure enough, it is commented out in the bbmap.sh code:
              Code:
                      #local CMD="java -Djava.library.path=$NATIVELIBDIR $EA $z -cp $CP align2.BBMap build=1 overwrite=true fastareadlen=500 $@"
                      local CMD="java $EA $z -cp $CP align2.BBMap build=1 overwrite=true fastareadlen=500 $@"
              If I revert to the previous version of the CMD, with the java.library.path set, then the command runs with the compiled C code just fine.

              Why was this disabled? Does this affect previous analyses that used this C code? That is, does the C code contain an error that means usejni=t in previous versions will produce different output than the java-only code? Or was this purely a performance or compatibility issue, or something else?

              Sorry if I've missed this already posted somewhere, and thanks in advance for any help.

              Chris

              Comment


              • usejni and compiled C code in BBTools

                I just installed the latest version of the BBTools (38.26), and I notice that the C code provided by the usejni=t flag for some tools has been depreciated / disabled.

                I found this in the changelog:
                Removed JNI path flag from BBMerge, BBMap, and RQCFilter shell scripts.
                and this in docs/compiling.txt:
                3) C code. This was developed by Jonathan Rood to accelerate BBMap, BBMerge, and Dedupe, but is currently disabled.
                Sure enough, it is commented out in the bbmap.sh code:
                Code:
                        #local CMD="java -Djava.library.path=$NATIVELIBDIR $EA $z -cp $CP align2.BBMap build=1 overwrite=true fastareadlen=500 $@"
                        local CMD="java $EA $z -cp $CP align2.BBMap build=1 overwrite=true fastareadlen=500 $@"
                If I revert to the previous version of the CMD, with the java.library.path set, then the command runs with the compiled C code just fine.

                Why was this disabled? Does this affect previous analyses that used this C code? That is, does the C code contain an error that means usejni=t in previous versions will produce different output than the java-only code? Or was this purely a performance or compatibility issue, or something else?

                Sorry if I've missed this already posted somewhere, and thanks in advance for any help.

                Chris

                Comment


                • Hi Brian & all,
                  I'm using BBmap 38.26 with a very big reference genome, and some chromosome in this genome is big enough to break the bbmap ref building session.

                  Here is the fasta index of this reference:
                  Chr01 301019445 7 60 61
                  Chr02 163962470 306036450 60 61
                  Chr03 261511374 472731635 60 61
                  Chr04 215701946 738601539 60 61
                  Chr05 217274494 957898525 60 61
                  Chr06 219521584 1178794268 60 61
                  Chr07 222112641 1401974553 60 61
                  Chr08 153299543 1627789079 60 61
                  Chr09 238794889 1783643622 60 61
                  Chr10 205736368 2026418433 60 61
                  Chr11 220335243 2235583748 60 61
                  Chr12 229934170 2459591253 60 61
                  Chr00 714758103 2693357667 60 61
                  Can see that the longest chromosome is beyond 536670912, which cause a problem like this:
                  bbmap-38.26/bbmap.sh ref=ref.fasta rebuild=t usemodulo=t -Xmx60g
                  java -ea -Xmx60g -cp /home/sn/software/bbmap-38.26/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 ref=/home/yangjy/16T4/Genome/GEN181516HEB/_db/Capsicum.annuum.L_Zunla-1_Release_2.0.fasta rebuild=t usemodulo=t -Xmx60g
                  Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=/home/yangjy/16T4/Genome/GEN181516HEB/_db/Capsicum.annuum.L_Zunla-1_Release_2.0.fasta, rebuild=t, usemodulo=t, -Xmx60g]
                  Version 38.26

                  No output file.
                  Writing reference.
                  Executing dna.FastaToChromArrays2 [/home/yangjy/16T4/Genome/GEN181516HEB/_db/Capsicum.annuum.L_Zunla-1_Release_2.0.fasta, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]

                  Set genScaffoldInfo=true
                  Writing chunk 1
                  Writing chunk 2
                  Writing chunk 3
                  Writing chunk 4
                  Writing chunk 5
                  Writing chunk 6
                  Exception in thread "main" java.lang.AssertionError: 714758103, 8000, 7999, 536670912
                  at dna.FastaToChromArrays2.makeNextChrom(FastaToChromArrays2.java:440)
                  at dna.FastaToChromArrays2.makeChroms(FastaToChromArrays2.java:343)
                  at dna.FastaToChromArrays2.main2(FastaToChromArrays2.java:151)
                  at align2.RefToIndex.makeIndex(RefToIndex.java:147)
                  at align2.BBMap.setup(BBMap.java:278)
                  at align2.AbstractMapper.<init>(AbstractMapper.java:57)
                  at align2.BBMap.<init>(BBMap.java:43)
                  at align2.BBMap.main(BBMap.java:31)
                  I'm pretty sure it's the 'maxlen' argument of dna.FastaToChromArrays2 that is not fit my situation, but I'm not sure how can I fix this.

                  Did anyone deal with this kinda things before? Any suggestion and discussion is of help! >_<

                  Comment


                  • I see that you are assigning 60G of RAM. Have you tried to assign more and see if it helps?

                    Comment


                    • That's great new. I will download it asap . Thanks for sharing!
                      I am Sarah, an enthusiastic blondie that has worked as a Brussels escort. These days I am a full-time blogger .

                      Comment


                      • Originally posted by GenoMax View Post
                        I see that you are assigning 60G of RAM. Have you tried to assign more and see if it helps?
                        Thanks for your replying. In my test I've tried adding java RAM upper limit from 20G all the way to 400G. Yet still the "maxlen" arguments of dna.FastaToChromArrays2 hadn't changed, neighter the error message.

                        Comment


                        • @1989sn1027: Brian has not been participating on SA for last few months. You could try to create a ticket at Source Forge and see if he responds to this report.

                          Comment


                          • Originally posted by GenoMax View Post
                            @1989sn1027: Brian has not been participating on SA for last few months. You could try to create a ticket at Source Forge and see if he responds to this report.
                            Thanks for your directing. I'll give that a shot.

                            Comment


                            • Hi Brain, could you please answer my questions posted here at your convenience? http://seqanswers.com/forums/showthread.php?t=85967

                              Thanks in advance.

                              Comment


                              • problem with output

                                Hi I am having troubles when running bbwrap. Also how can I get a file that tells me that perfectage that was mapped and unmapped.

                                This is what I am running:

                                #!/bin/bash
                                cd /space/home/aguilar/Ofav_temp/Trim
                                /space/home/aguilar/Programs/bbmap/bbwrap.sh t=40 in=\
                                S1_F_paired_1.fq,S10_F_paired_1.fq,S11_F_paired_1.fq,S12_F_paired_1.fq,S13_F_paired_1.fq,S14_F_paired_1.fq,S15_F_paired_1.fq,S16_F_paired_1.fq,S17_F_paired_1.fq,S18_F_paired_1.fq,S19_F_paired_1.fq,S2_F_paired_1.fq,S20_F_paired_1.fq,S21_F_paired_1.fq,S22_F_paired_1.fq,S23_F_paired_1.fq,S24_F_paired_1.fq,S25_F_paired_1.fq,S26_F_paired_1.fq,S27_F_paired_1.fq,S28_F_paired_1.fq,S29_F_paired_1.fq,S3_F_paired_1.fq,S30_F_paired_1.fq,S31_F_paired_1.fq,S32_F_paired_1.fq,S33_F_paired_1.fq,S34_F_paired_1.fq,S35_F_paired_1.fq,S36_F_paired_1.fq,S37_F_paired_1.fq,S38_F_paired_1.fq,S39_F_paired_1.fq,S4_F_paired_1.fq,S40_F_paired_1.fq,S41_F_paired_1.fq,S42_F_paired_1.fq,S43_F_paired_1.fq,S44_F_paired_1.fq,S45_F_paired_1.fq,S46_F_paired_1.fq,S47_F_paired_1.fq,S48_F_paired_1.fq,S5_F_paired_1.fq,S6_F_paired_1.fq,S7_F_paired_1.fq,S8_F_paired_1.fq,S9_F_paired_1.fq \
                                in2=S1_R_paired_2.fq,S10_R_paired_2.fq,S11_R_paired_2.fq,S12_R_paired_2.fq,S13_R_paired_2.fq,S14_R_paired_2.fq,S15_R_paired_2.fq,S16_R_paired_2.fq,S17_R_paired_2.fq,S18_R_paired_2.fq,S19_R_paired_2.fq,S2_R_paired_2.fq,S20_R_paired_2.fq,S21_R_paired_2.fq,S22_R_paired_2.fq,S23_R_paired_2.fq,S24_R_paired_2.fq,S25_R_paired_2.fq,S26_R_paired_2.fq,S27_R_paired_2.fq,S28_R_paired_2.fq,S29_R_paired_2.fq,S3_R_paired_2.fq,S30_R_paired_2.fq,S31_R_paired_2.fq,S32_R_paired_2.fq,S33_R_paired_2.fq,S34_R_paired_2.fq,S35_R_paired_2.fq,S36_R_paired_2.fq,S37_R_paired_2.fq,S38_R_paired_2.fq,S39_R_paired_2.fq,S4_R_paired_2.fq,S40_R_paired_2.fq,S41_R_paired_2.fq,S42_R_paired_2.fq,S43_R_paired_2.fq,S44_R_paired_2.fq,S45_R_paired_2.fq,S46_R_paired_2.fq,S47_R_paired_2.fq,S48_R_paired_2.fq,S5_R_paired_2.fq,S6_R_paired_2.fq,S7_R_paired_2.fq,S8_R_paired_2.fq,S9_R_paired_2.fq \
                                ref=/space/home/aguilar/Ofav_temp/Genomes/Orbicella_faveolata_v2_scaffolds.fa \
                                outu=/space/home/aguilar/Ofav_temp/bbmap/ReadsUnm.R1.fastq.gz \
                                outu2=/space/home/aguilar/Ofav_temp/bbmap/ReadsUnmR2.fastq.gz \
                                outm=/space/home/aguilar/Ofav_temp/bbmap/ReadsMappedR1.fastq.gz \
                                outm2=/space/home/aguilar/Ofav_temp/bbmap/ReadsMappedR2.fastq.gz \

                                And I am getting this message:

                                Retaining first best site only for ambiguous mappings.
                                No output file.
                                Exception in thread "main" java.lang.AssertionError: ASCII encoding for quality (currently ASCII-33) appears to be wrong.
                                +��[ԽMo3͒,6��+.�7��l�.���®�w�.��}m��2"�����F���#Q�

                                Thanks

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X