Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Divide by 0 error in randomreads.sh

    I am having an issue with randomreads.sh that I cannot make sense of myself.

    I am using this tool to try to extract a random subset of a genome. Most tools subset by selecting some proportion of sequences, but I want to randomly sample pieces of randomly-sampled sequences. So read simulators seem to be the better option for this.

    In this case, I'm trying to sample a (giant!) salamander genome from NCBI. For now I just have some arbitrary length/number settings, as follows:

    randomreads.sh ref=GCA_002915635.2_ASM291563v2_genomic.fna out=test.fq reads=100 minlength=50000 maxlength=500000 seed=5 banns=t adderrors=f

    As the command shows, I do not want variants or errors added in at all; the sequences should be identical to the reference genome.

    Here is the output I'm getting, which indicates some sort of 'divide by 0' error. Hopefully someone can help me diagnose and overcome this issue.

    Executing align2.RandomReads3 [build=1, ref=GCA_002915635.2_ASM291563v2_genomic.fna, out=test.fq, reads=100, minlength=50000, maxlength=500000, seed=5, banns=t, adderrors=f]

    Writing reference.
    Executing dna.FastaToChromArrays2 [GCA_002915635.2_ASM291563v2_genomic.fna, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=500, nodisk=false]

    Set genScaffoldInfo=true
    Exception in thread "main" java.lang.ArithmeticException: / by zero
    at align2.RandomReads3.fillRandomChrom(RandomReads3.java:1758)
    at align2.RandomReads3.<init>(RandomReads3.java:585)
    at align2.RandomReads3.main(RandomReads3.java:389)

    Thanks!
    Daren

    Comment


    • Has anyone used bbmap for QuantSeq Data Analysis (more precisely the QuantSeq FWD protocol). The Lexogen website recommends bbduk for quality trimming and suggests use of STAR for downstream analysis.

      Is it possible to do something similar (to STAR) with bbmap? In other words, is there an analogous bbmap command similar to how one does mapping with STAR (using the genome index and gtf together)?

      Thanks in advance.

      Comment


      • reformat.sh Failed to auto-detect quality coding

        Hi,
        when I process some fastq files from SRA using the following command



        reformat.sh maxcalledquality=40 out=SRR1_R1.fq.gz qout=33 mincalledquality=2 out2=SRR1_R2.fq.gz qin=auto fixjunk=t in=/RNA-Seq/Raw/fastq.20190712/SRR1_1.fastq.gz in2=/RNA-Seq/Raw/fastq.20190712/SRR1_2.fastq.gz

        reformat crashes with the following error:


        Set INTERLEAVED to false
        Changed from ASCII-33 to ASCII-64 on input quality 64 for base N while prescanning.
        Changed from ASCII-64 to ASCII-33 on input quality 35 while prescanning.
        Exception in thread "main" java.lang.AssertionError: Failed to auto-detect quality coding; quitting. Please manually set qin=33 or qin=64.
        at stream.FASTQ.testQuality(FASTQ.java:218)
        at stream.FASTQ.isInterleaved(FASTQ.java:129)
        at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentRea at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:119)
        at jgi.ReformatReads.process(ReformatReads.java:377)
        m(ConcurrentReadInputStream.java:55)
        at jgi.ReformatReads.process(ReformatReads.java:377)
        at jgi.ReformatReads.main(ReformatReads.java:45)

        Any suggestions on how to address this problem?
        I would prefer to avoid checking each file individually with a different tool and setting the qin value for each. This is part of a bigger pipeline that is intented to be applied to 100s of samples.

        Thanks in advance for your help

        Comment


        • reformat.sh Failed to auto-detect quality coding

          Hi,
          when I am running reformat.sh on a set of fastq files downloaded from SRA

          reformat.sh maxcalledquality=40 out=SRR1_R1.fq.gz qout=33 mincalledquality=2 out2=SRR1_R2.fq.gz qin=auto fixjunk=t in=/RNA-Seq/Raw/fastq.20190712/SRR1_1.fastq.gz in2=/RNA-Seq/Raw/fastq.20190712/SRR1_2.fastq.gz

          The program crashes with the following error:

          Set INTERLEAVED to false
          Changed from ASCII-33 to ASCII-64 on input quality 64 for base N while prescanning.
          Changed from ASCII-64 to ASCII-33 on input quality 35 while prescanning.
          Exception in thread "main" java.lang.AssertionError: Failed to auto-detect quality coding; quitting. Please manually set qin=33 or qin=64.
          at stream.FASTQ.testQuality(FASTQ.java:218)
          at stream.FASTQ.isInterleaved(FASTQ.java:129)
          at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentRea at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:119)
          at jgi.ReformatReads.process(ReformatReads.java:377)
          m(ConcurrentReadInputStream.java:55)
          at jgi.ReformatReads.process(ReformatReads.java:377)
          at jgi.ReformatReads.main(ReformatReads.java:45)
          The version of BBMAP used is 37.64.
          Any advice on how to address this issue?
          This is part of a bigger pipeline and the intention is to apply it on 100s of files.
          Thanks in advance for your help.

          Comment


          • Did you try the suggested "Please manually set qin=33 or qin=64"?
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment


            • Change in minid implementation?

              Hello, I recently noticed that newer versions >=38.68 do not seem to implement minid in the same way as previous versions.

              For the same command (same minid, 99), same reads, same reference, with versions <38.68 I get 0 mated pairs mapped, 4 Read 1 mapped, and 6 Read 2 mapped.

              Code:
              bbmap.sh nodisk minid=99 mappedonly=T threads=4 ref=parcu18S.fasta in=LKH421_FPE_q24_minlen100.fastq.gz in2=LKH421_RPE_q24_minlen100.fastq.gz out=stdout.sam | reformat.sh in=stdin.sam out=stdout.sam minlength=80 | samtools view -h -b -S | samtools sort >67_99parcu.bam
              For versions >=38.68 I get 94 mated pairs mapped, 94 Read 1 reads mapped, and 94 Read 2 reads mapped.

              Is the minid command just not working in newer versions or is there a new detail I am missing?
              Thanks!

              Comment


              • @Robinsleith: I suggest that you post this as a ticket on BBMap SF page. Brian no longer visits SeqAnswers regularly. He would be the only person who can answer this.

                Comment


                • mapPacBio strange behavior

                  Hi there, I really like this tool for Illumina reads and am trying it on some PacBio/Nanopore reads using the mapPacBio.sh function.

                  I am "successfully" getting aligned bam files out the other end, but the log file lists >99% of reads as Low-Q Discards. I also get unmapped fastq files out (I do this for my own sanity) and these are usually larger than the input files themselves.

                  Should I be suspicious of this behavior? I have also tried using the low-quality data suggestion (setting the key to a lower value, adjusting the minimum score ratio, etc), but this did not result in any improvement. I know the input files themselves - subread fastq.gz - are "good enough" for Canu assembly but for I'd also like to align the reads for consensus sequence generation.

                  Thanks!

                  Comment


                  • Is mapping stochastic?

                    Hey, did a little test run mapping sequences against a reference fasta that contains two identical sequences called >one and >two, then repeated the process with the sequences renamed to >three and >four. The amount of reads mapping to each were as follows:

                    1st run:

                    >one: 47,699
                    >two: 330

                    2nd run:

                    >three: 47,688
                    >four: 338

                    BBmap options were all default, just minidentity = 90 and T=12

                    How come that (1) BBmap apparently misses some reads that map on the first sequence and then maps them on the second, identical sequence, and (2) how come the runs give different results?

                    Just curious, my apologies if this was addressed elsewhere!

                    cheers

                    Comment


                    • Originally posted by pck0 View Post
                      Hey, did a little test run mapping sequences against a reference fasta that contains two identical sequences called >one and >two, then repeated the process with the sequences renamed to >three and >four. The amount of reads mapping to each were as follows:

                      1st run:

                      >one: 47,699
                      >two: 330

                      2nd run:

                      >three: 47,688
                      >four: 338

                      BBmap options were all default, just minidentity = 90 and T=12

                      How come that (1) BBmap apparently misses some reads that map on the first sequence and then maps them on the second, identical sequence, and (2) how come the runs give different results?

                      Just curious, my apologies if this was addressed elsewhere!

                      cheers
                      Most NGS aligners are non-deterministic i.e. they will not produce exactly identical results if run multiple times.

                      Fortunately, BBMap does have an option to run in deterministic mode.
                      Code:
                      deterministic=f         Run in deterministic mode.  In this case it is good
                                              to set averagepairdist.  BBMap is deterministic
                                              without this flag if using single-ended reads,
                                              or run singlethreaded.
                      You could also run the analysis using just a single thread.

                      Comment


                      • bbmap.sh unfished job

                        Hello, Brian,

                        I am wondering whether bbmap.sh could resume an unfished job.

                        mewu3

                        Comment


                        • @mewu3: Unfortunately it can't. You will need to restart the job.

                          Originally posted by mewu3 View Post
                          Hello, Brian,

                          I am wondering whether bbmap.sh could resume an unfished job.

                          mewu3

                          Comment


                          • [help]

                            Hello,

                            I am using bbmap on HPC and I get the fallowing message :

                            Aligning C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001 reads to fasta file ...
                            java -Djava.library.path=/opt/apps/bbtools-37.97/jni/ -ea -Xmx50G -cp /opt/apps/bbtools-37.97/current/ align2.BBWrap build=1 overwrite=true fastareadlen=500 build=1 in1=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R1_trim.fastq.gz,C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_se_trim.fastq.gz in2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R2_trim.fastq.gz,null trimreaddescriptions=t outm=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_aligned.sam outu1=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R1.sam,C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_se.sam outu2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R2.sam threads=8 pairlen=1000 pairedonly=t minid=0.9 mdtag=t xstag=fs nmtag=t sam=1.3 ambiguous=best secondary=t saa=f maxsites=10 -Xmx50G
                            Executing align2.BBWrap [build=1, overwrite=true, fastareadlen=500, build=1, in1=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R1_trim.fastq.gz,C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_se_trim.fastq.gz, in2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R2_trim.fastq.gz,null, trimreaddescriptions=t, outm=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_aligned.sam, outu1=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R1.sam,C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_se.sam, outu2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R2.sam, threads=8, pairlen=1000, pairedonly=t, minid=0.9, mdtag=t, xstag=fs, nmtag=t, sam=1.3, ambiguous=best, secondary=t, saa=f, maxsites=10, -Xmx50G]

                            Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, build=1, trimreaddescriptions=t, threads=8, pairlen=1000, pairedonly=t, minid=0.9, mdtag=t, xstag=fs, nmtag=t, sam=1.3, ambiguous=best, secondary=t, saa=f, maxsites=10, in=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R1_trim.fastq.gz, outu=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R1.sam, outm=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_aligned.sam, in2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R2_trim.fastq.gz, outu2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R2.sam]
                            Version 37.97 [build=1, overwrite=true, fastareadlen=500, build=1, trimreaddescriptions=t, threads=8, pairlen=1000, pairedonly=t, minid=0.9, mdtag=t, xstag=fs, nmtag=t, sam=1.3, ambiguous=best, secondary=t, saa=f, maxsites=10, in=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R1_trim.fastq.gz, outu=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R1.sam, outm=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_aligned.sam, in2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_R2_trim.fastq.gz, outu2=C_CGTCTGCG-ATTGTGAA-AHGKC2BBXY_L001_unaligned_R2.sam]

                            Set threads to 8
                            Retaining first best site only for ambiguous mappings.
                            Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.816
                            Set genome to 1

                            Loaded Reference: 3.463 seconds.
                            Loading index for chunk 1-1, build 1
                            Generated Index: 2.556 seconds.
                            Analyzed Index: 7.028 seconds.
                            Started output stream: 0.043 seconds.
                            Exception in thread "main" java.lang.AssertionError: Attempting to output paired reads to different sam files.
                            at stream.ReadStreamWriter.<init>(ReadStreamWriter.java:51)
                            at stream.ReadStreamByteWriter.<init>(ReadStreamByteWriter.java:17)
                            at stream.ConcurrentGenericReadOutputStream.<init>(ConcurrentGenericReadOutputStream.java:40)
                            at stream.ConcurrentReadOutputStream.getStream(ConcurrentReadOutputStream.java:52)
                            at stream.ConcurrentReadOutputStream.getStream(ConcurrentReadOutputStream.java:29)
                            at align2.AbstractMapper.openStreams(AbstractMapper.java:873)
                            at align2.BBMap.testSpeed(BBMap.java:437)
                            at align2.BBMap.main(BBMap.java:34)
                            at align2.BBWrap.execute(BBWrap.java:144)
                            at align2.BBWrap.main(BBWrap.java:22)
                            I get the impression that bbmap is stuck on something and i don't know what's wrong with it. Please help !

                            mewu3

                            Comment


                            • "Exception in thread "main" java.lang.AssertionError: Attempting to output paired reads to different sam files."

                              Typically, BBMap tools keep paired reads together. You're attempting to write aligned and unaligned reads to separate files, which violates that function.

                              Comment


                              • Thank you very so much !

                                I downloaded it
                                Hello from France

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Exploring the Dynamics of the Tumor Microenvironment
                                  by seqadmin




                                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                  07-08-2024, 03:19 PM
                                • seqadmin
                                  Exploring Human Diversity Through Large-Scale Omics
                                  by seqadmin


                                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                  06-25-2024, 06:43 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 07-19-2024, 07:20 AM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-16-2024, 05:49 AM
                                0 responses
                                39 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-15-2024, 06:53 AM
                                0 responses
                                45 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-10-2024, 07:30 AM
                                0 responses
                                42 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X