Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems running BBDuk with class path

    Hi all,

    I'm trying to run bbduk.sh on mac os High Sierra 10.13.1, and it is stuck with some issue with classpath. I've already included the path to the bbduk.sh file in the path and the classpath, but still not working.

    This is the input in terminal:

    Sisi$ bbduk.sh -Xmx24g in=B1_sub_R1.fq in2=B1_sub_R2.fq \
    out=B1_sub_R1_trimmed.fq.gz out2=B1_sub_R2_trimmed.fq.gz \
    literal=GTGCCAGCMGCCGCGGTAA,GGACTACHVGGGTWTCTAAT k=10 ordered=t mink=2 \
    ktrim=l rcomp=f minlength=220 maxlength=280 tbo tpe

    And this is the output:

    java -ea -Xmx24g -Xms24g -cp /usr/local/bin/current/ jgi.BBDukF -Xmx24g in=B1_sub_R1.fq in2=B1_sub_R2.fq out=B1_sub_R1_trimmed.fq.gz out2=B1_sub_R2_trimmed.fq.gz literal=GTGCCAGCMGCCGCGGTAA,GGACTACHVGGGTWTCTAAT k=10 ordered=t mink=2 ktrim=l rcomp=f minlength=220 maxlength=280 tbo tpe
    Error: Could not find or load main class jgi.BBDukF
    Caused by: java.lang.ClassNotFoundException: jgi.BBDukF

    by the way, I couldn't find BBDuckF file.

    Any idea??

    Thanks a lot,

    Comment


    • What's in your /bbmap/current/ directory? I have /bbmap/current/jgi/BBDukF.class
      Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

      Comment


      • Hi, sorry, I come back with this. I have:
        /Users/owner/bbmap/current/jgi/BBDuk.class
        /Users/owner/bbmap/current/jgi/BBDuk2.class
        but any file with the name BBDukF.class

        The bbduk.sh, which is in /Users/owner/bbmap/bbduk.sh
        has this options at the end:
        fi
        local CMD="java -Djava.library.path=$NATIVELIBDIR $EA $z $z2 -cp $CP jgi.BBDuk $@"
        local CMD="java $EA $z $z2 -cp $CP jgi.BBDuk $@"
        if [[ $silent == 0 ]] && [[ $json == 0 ]]; then
        echo $CMD >&2
        I don't know which should be active and which not.
        Perhaps that's the problem.

        Any idea?

        Thanks

        Comment


        • I am not sure why you are having this problem. All you should need to do is download BBMap software on your mac. Unarchive the tar-zipped file and then extend your path to include the "bbmap" directory (export PATH=$PATH:/path_to_bbmap_dir). Don't move contents of the bbmap directory. Move the entire directory to whatever location you want and then amend $PATH.

          Comment


          • Yes, that is what I've done.

            the path to bbmap is --> /Users/owner/bbmap and it is included in the $PATH

            I think that the problem is related to the way the script searches the classes. None of sh files manage to find the path. As example, executing bbduk.sh retrieves:

            java -ea -Xmx24g -Xms24g -cp /usr/local/bin/current/ jgi.BBDuk ......
            Error: Could not find or load main class jgi.BBDuk
            Caused by: java.lang.ClassNotFoundException: jgi.BBDuk

            I've checked and I don't have any current dir on bin, so I don't know how to tell the script to go directly to the bbmap dir.

            Any idea is greatly appreciated, I stuck with this.

            Comment


            • On a Mac I tested this on nothing else was needed to be done. What happens if you just run "bbmap.sh". Does that produce "in-line" bbmap help output?

              Which Java version are you using on your Mac?

              Comment


              • Thanks for the patience, answering your questions:
                1. If I run "bbmap.sh" or "bbduck.sh" for instance the content of the file is shown (parameters, flags, etc).

                If I run the command with the parameters, like:

                $ bbduk.sh in=sample14_S14_R1.fastq.gz in2=sample14_S14_R2.fastq.gz out=sample14_S14_R1_btrimmed.fastq.gz out2=sample14_S14_R1_btrimmed.fastq.gz literal=GTACACAMCGCCCGTCGC,TGATCCTTCTGCVGGTTCWCCTACG k=10 ordered=t mink=2 ktrim=l rcomp=f minlength=50 maxlength=155 tbo tpe


                Max memory cannot be determined. Attempting to use 1400 MB.
                If this fails, please add the -Xmx flag (e.g. -Xmx24g) to your command,
                or run this program qsubbed or from a qlogin session on Genepool, or set ulimit to an appropriate value.
                java -ea -Xmx1400m -Xms1400m -cp /usr/local/bin/current/ jgi.BBDuk in=sample14_S14_R1.fastq.gz in2=sample14_S14_R2.fastq.gz out=sample14_S14_R1_btrimmed.fastq.gz out2=sample14_S14_R1_btrimmed.fastq.gz literal=GTACACAMCGCCCGTCGC,TGATCCTTCTGCVGGTTCWCCTACG k=10 ordered=t mink=2 ktrim=l rcomp=f minlength=50 maxlength=155 tbo tpe
                Error: Could not find or load main class jgi.BBDuk
                Caused by: java.lang.ClassNotFoundException: jgi.BBDuk

                My current java version is Java 8 Update 191

                Comment


                • We are making progress!

                  How much memory do you have on this Mac? I suggest using 85% of the maximum memory you have available with BBTools. You also need to specify "in1= and out1=" to go with in2= out2= etc. Since you are using IUPAC bases in your literal sequence you also need to run this option on

                  copyundefined=f (cu) Process non-AGCT IUPAC reference bases by making all
                  possible unambiguous copies.

                  You are also trimming to the left side of the read. Is that correct?

                  Can you try this command?

                  Code:
                  bbduk.sh -Xmx4g in1=sample14_S14_R1.fastq.gz in2=sample14_S14_R2.fastq.gz out1=sample14_S14_R1_btrimmed.fastq.gz out2=sample14_S14_R1_btrimmed.fastq.gz literal=GTACACAMCGCCCGTCGC,TGATCCTTCTGCVGGTTCWCCTACG k=10 ordered=t mink=2 ktrim=l rcomp=f minlength=50 maxlength=155 copyundefined=t tbo tpe

                  Comment


                  • Hello,
                    I was trying to filter out rRNA reads using the command like this:
                    bbduk.sh -Xmx3g in=in.fastq out=nonribo.fastq outm=ribo.fastq ref=ribokmers.fa.gz k=31 minlen=3
                    where ribokmers.fa.gz is taken from Brian's googledrive link posted at https://www.biostars.org/p/159959/
                    I noticed that my most abundant rRNA reads (CGCGACCTCAGATCAGACGTGGCGACCCGCTGAATTT) are not filtered. Can anyone explain how this ribokmers.fa was created? What will be the difference if I use, for example, "Human ribosomal DNA complete repeating unit" from GenBank (U13369)? Is there any other recommended source of rRNA sequences for this purpose?
                    Last edited by aushev; 02-25-2019, 12:08 PM.

                    Comment


                    • @aushev: That k-mers file is likely for non-human genomes since it was made from SILVA database.

                      You could use U13369 fasta sequence and then bin the reads that map to it using bbsplit.sh.

                      Comment


                      • Originally posted by GenoMax View Post
                        @aushev: That k-mers file is likely for non-human genomes since it was made from SILVA database.

                        You could use U13369 fasta sequence and then bin the reads that map to it using bbsplit.sh.
                        Thank you @GenoMax!
                        What would be the main advantage of using bbsplit instead of bbduk? As I understand, BBSplit internally uses BBMap, unlike BBDuk - but what would it practically mean? In my scenario, I want to filter out all rRNA reads before doing any further mapping.

                        Comment


                        • Any reads that align to the ribosomal repeat will be identified and separated in a file. Isn't that what you are looking to do?

                          Comment


                          • Originally posted by GenoMax View Post
                            Any reads that align to the ribosomal repeat will be identified and separated in a file. Isn't that what you are looking to do?
                            yes, that's what I wanted - but I just wanted also to understand what is the difference between bbduk and bbsplit for this purpose.

                            Comment


                            • sorry for another dummy question, but I really want to understand how bbduk works and currently I'm having troubles with that... Below I list example of 11 reads containing adapter sequence which I all expected to be detected with the following parameters:
                              Code:
                              bbduk.sh in=falseneg.fastq literal=AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC ktrim=r mink=10 hdist=1 edist=1 hdist2=1 edist2=1
                              So, reference adapter sequence is `AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC`, and all those reads have a match of at least 10 nt (mink=10) and no more than 1 mismatch:

                              ***_(ref)_______________________________AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
                              1_______________________________________AGATCGGAAGAG_ACACGTCTGAACTCCAGTCACTCGAAGATCTCGTATGC
                              2_______________AGCAGCATTGTACAGGGCTATGACAGATCGGAAGAGCACACGTC_GAACTC
                              3________AGCAGTTGAACATGGGTCAGTCGGTCCTGAGAGATCGGAAGAGCACACAT
                              4______________________________CCTGAGGCTAGATCGGAAGAGCACACGTCTGAAC_CCAGTCACTCGAAGAT
                              5__CGCGACCTCAGATCAGACGTGGCGACCCGCTGAATTTAGATCGGAAGAGT
                              6_________GCATGGGTGGTTCAGTGGTAGAATTCTCGCAGATCGGAAGAGCACACCGT
                              7_______GCATTGGTGGTTCAGTGGTAGAATTCTCGCCTAGATCGGAAGAGCACTCG
                              8________________TAGCTTATCAGACTGATGTTGACAGATCGGAAGAGCACACGTCTGA_CTCC
                              9______TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCTAGATCGGAAGAGCACAG
                              10______TCCCTGTGGTCTAGTGGTTAGGATTCGGCGCTAGATCGGAAGAGCACGCG
                              11_______________TCGGATCCGTCTGAGCTTGGCTAAGATCGGAAGAGCACACGTCTGGACTC
                              ***_(ref)_______________________________AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC


                              Can you give a hint why none of those reads are matched?
                              Thanks in advance!

                              P.S. Adding qhdist=1 made correct matching, but I still don't understand why edist=1 did not work...
                              Attached Files
                              Last edited by aushev; 02-25-2019, 06:47 PM.

                              Comment


                              • @aushev: Unfortunately Brian no longer has time to participate on this forum. He would really be the only person who can authoritatively answer your questions. You could try to create a ticket on SF site to see if he responds.

                                "edist" directive is for indels so perhaps that is the reason it did not work. I have never had a need to use that directive. Many options for BBTools programs may be applicable in very specific use cases so unless you know for sure you need that option I would go with the defaults. That is all I can offer.

                                Use the smallest/core sequence you are trying to match if you intend to remove all sequence to the right of where that core sequence is found.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Genetic Variation in Immunogenetics and Antibody Diversity
                                  by seqadmin



                                  The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                                  11-06-2024, 07:24 PM
                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Today, 11:09 AM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Today, 06:13 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 11-01-2024, 06:09 AM
                                0 responses
                                30 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-30-2024, 05:31 AM
                                0 responses
                                21 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X