Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    java.lang.AssertionError: Seems odd so I added this assertion. I don't see anywhere it was being used. Use -da flag to override.
    Wow, that's not a very good error message I will examine that carefully this weekend. Anyway, you can try running with the -da flag, which may fix things. It looks like I made some changes to the code related to processing custom read names since I designed that workflow. The -da flag will allow you to circumvent that crash, but I'm not certain whether it will have a different problem later (though that should become apparent quickly).

    Alternatively, you can go here:
    https://sourceforge.net/projects/bbmap/files/

    and download this version:
    BBMap_33.41_java7.tar.gz
    ...which was released closest in time to that post. I don't generally recommend using old versions, but the custom naming schema has changed substantially since then so I'm not sure whether the current state supports that mode. But I'll fix it if possible.

    -Brian

    Comment


    • #32
      Originally posted by Brian Bushnell View Post
      Wow, that's not a very good error message I will examine that carefully this weekend. Anyway, you can try running with the -da flag, which may fix things. It looks like I made some changes to the code related to processing custom read names since I designed that workflow. The -da flag will allow you to circumvent that crash, but I'm not certain whether it will have a different problem later (though that should become apparent quickly).

      Alternatively, you can go here:
      https://sourceforge.net/projects/bbmap/files/

      and download this version:
      BBMap_33.41_java7.tar.gz
      ...which was released closest in time to that post. I don't generally recommend using old versions, but the custom naming schema has changed substantially since then so I'm not sure whether the current state supports that mode. But I'll fix it if possible.

      -Brian
      Hi Brian - Your da flag seemed to do the trick. One thing I noticed is that the majority of my read 1 data didn't seem to map very well to the reference archaea sequence (I picked one of the three genera in the phylum Thaumarchaeota - ammonia oxidizing archaea).

      For instance the size of all the sequences in read 1 and 2 were about 250 MB a piece. The mapped Read 1s were about 10 MB compared to the read 2s which were 180 MB. I was wondering if that could be because of the reference file? If so is it possible to add multiple references? Or does the workflow have to be done for each reference I am interested i (each of the 3 genera)?

      Despite most of my reads being unmapped, I was able to go in Geneious, import the merged by mapping output and demultiplex the dual-indexed samples as if they had been merged using the normal overlapping methods. So that is actually great news.

      You did mention concatenate the merge by mapping and the merge by overlap sequences. How exactly do I do that? I know Geneious can do it but it asks me which order it should be in? Thoughts on this?

      I think your tool would definitely be a huge asset in processing paired-end reads that do not overlap (ideally they would overlap, but there are obvious limitation in terms on sequence length for Illumina sequences (upper limit for the insert probably around ~550 bp with the v3 2X300 kits)). The workflow works. Anyhow I'd like to hear your thoughts on what I stated above.

      Thanks again!!
      Last edited by Illusive Man; 03-09-2016, 11:31 PM.

      Comment


      • #33
        For concatenating, the order does not matter in this case. However, the results are strange. When mapping paired reads, BBMap displays the mapping and error rates for read 1 and read 2 independently; normally, the mapping rate is higher for read 1. Can you post the screen output of the mapping phase?

        Comment


        • #34
          Originally posted by Brian Bushnell View Post
          For concatenating, the order does not matter in this case. However, the results are strange. When mapping paired reads, BBMap displays the mapping and error rates for read 1 and read 2 independently; normally, the mapping rate is higher for read 1. Can you post the screen output of the mapping phase?
          Bubbas-Mac-Pro:bbmap bubba$ bash bbmap.sh -Xmx10g threads=4 ref=amoaref1.fasta in=all_reads.fastq outm=mapped_ref1.fq outu=unmapped.fq nodisk po int rbm don -da
          java -Djava.library.path=/Users/bubba/Downloads/bbmap/jni/ -da -Xmx10g -cp /Users/bubba/Downloads/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx10g threads=4 ref=amoaref1.fasta in=all_reads.fastq outm=mapped_ref1.fq outu=unmapped.fq nodisk po int rbm don -da
          Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, -Xmx10g, threads=4, ref=amoaref1.fasta, in=all_reads.fastq, outm=mapped_ref1.fq, outu=unmapped.fq, nodisk, po, int, rbm, don, -da]

          BBMap version 35.85
          Set threads to 4
          Set INTERLEAVED to true
          Retaining first best site only for ambiguous mappings.
          Executing dna.FastaToChromArrays2 [amoaref1.fasta, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=false, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=true]

          Set genScaffoldInfo=true
          Set genome to 1

          Loaded Reference: 0.005 seconds.
          Loading index for chunk 1-1, build 1
          Indexing threads started for block 0-1
          Indexing threads finished for block 0-1
          Generated Index: 0.539 seconds.
          Analyzed Index: 4.642 seconds.
          Started output stream: 0.057 seconds.
          Started output stream: 0.049 seconds.
          Cleared Memory: 0.154 seconds.
          Processing reads in paired-ended mode.
          Started read stream.
          Started 4 mapping threads.
          Detecting finished threads: 0, 1, 2, 3

          ------------------ Results ------------------

          Genome: 1
          Key Length: 13
          Max Indel: 16000
          Minimum Score Ratio: 0.56
          Mapping Mode: normal
          Reads Used: 424758 (125427725 bases)

          Mapping: 323.766 seconds.
          Reads/sec: 1311.93
          kBases/sec: 387.40


          Pairing data: pct reads num reads pct bases num bases

          mated pairs: 0.0245% 52 0.0250% 31304
          bad pairs: 0.0000% 0 0.0000% 0
          insert size avg: 648.12


          Read 1 data: pct reads num reads pct bases num bases

          mapped: 0.0245% 52 0.0249% 15652
          unambiguous: 0.0245% 52 0.0249% 15652
          ambiguous: 0.0000% 0 0.0000% 0
          low-Q discards: 0.0057% 12 0.0007% 420

          perfect best site: 0.0000% 0 0.0000% 0
          semiperfect site: 0.0000% 0 0.0000% 0
          rescued: 0.0000% 0

          Match Rate: NA NA 79.1619% 12392
          Error Rate: 0.1757% 52 18.8450% 2950
          Sub Rate: 0.1757% 52 18.8322% 2948
          Del Rate: 0.0068% 2 0.0128% 2
          Ins Rate: 0.0000% 0 0.0000% 0
          N Rate: 0.1757% 52 1.9931% 312


          Read 2 data: pct reads num reads pct bases num bases

          mapped: 0.0245% 52 0.0250% 15634
          unambiguous: 0.0245% 52 0.0250% 15634
          ambiguous: 0.0000% 0 0.0000% 0
          low-Q discards: 0.2067% 439 0.0245% 15365

          perfect best site: 0.0000% 0 0.0000% 0
          semiperfect site: 0.0000% 0 0.0000% 0
          rescued: 0.0000% 0

          Match Rate: NA NA 76.9157% 12025
          Error Rate: 98.1132% 52 23.0459% 3603
          Sub Rate: 98.1132% 52 23.0459% 3603
          Del Rate: 0.0000% 0 0.0000% 0
          Ins Rate: 0.0000% 0 0.0000% 0
          N Rate: 11.3208% 6 0.0384% 6

          Total time: 329.537 seconds.

          Comment


          • #35
            Brian or anyone else -


            Can you tell me if I can use more than one reference sequence to map my reads too? There are a total of 3 possible genera for the ammonia oxidizing archaea.

            Comment


            • #36
              Yes you can. Make a multi-fasta file with your references and create an index to align against or you may also be able to do ref="ref1.fa,ref2.fa,ref3.fa".
              Last edited by GenoMax; 03-16-2016, 02:03 PM.

              Comment


              • #37
                Originally posted by GenoMax View Post
                Yes you can. Make a multi-fasta file with your references and create an index to align against or you may also be able to do ref="ref1.fa,ref2.fa,ref3.fa".
                What do you mean make an index?

                Do you mean a file containing multiple reference sequences that looks like:

                >seq1 TAAATGA

                >seq2 CCGTTAAA

                If that's what you meant then I'm set as I already ran that file.


                I tried using ref="ref1.fa, ref2.fa" and it returned the following error:

                Bobbys-Mac-Pro:bbmap twpierson$ bash bbmap.sh -Xmx10g threads=4 ref1="amoa_1.fas,amoa_2.fas" in=all_reads.fastq outm=mapped_ref_1.fq outu=unmapped.fq nodisk po int rbm don -da
                java -Djava.library.path=/Users/tbobby/Downloads/bbmap/jni/ -da -Xmx10g -cp /Users/twpierson/Downloads/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx10g threads=4 ref1=amoa_1.fas,amoa_2.fas in=all_reads.fastq outm=mapped_ref_1.fq outu=unmapped.fq nodisk po int rbm don -da
                Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, -Xmx10g, threads=4, ref1=amoa_1.fas,amoa_2.fas, in=all_reads.fastq, outm=mapped_ref_1.fq, outu=unmapped.fq, nodisk, po, int, rbm, don, -da]

                BBMap version 35.85
                Set threads to 4
                Exception in thread "main" java.lang.RuntimeException: Unknown parameter: ref1=amoa_1.fas,amoa_2.fas
                at align2.AbstractMapper.parse(AbstractMapper.java:627)
                at align2.AbstractMapper.<init>(AbstractMapper.java:51)
                at align2.BBMap.<init>(BBMap.java:41)
                at align2.BBMap.main(BBMap.java:29)

                Comment


                • #38
                  Originally posted by Illusive Man View Post
                  What do you mean make an index?

                  Do you mean a file containing multiple reference sequences that looks like:

                  >seq1 TAAATGA

                  >seq2 CCGTTAAA

                  If that's what you meant then I'm set as I already ran that file.


                  I tried using ref="ref1.fa, ref2.fa" and it returned the following error:

                  Bobbys-Mac-Pro:bbmap twpierson$ bash bbmap.sh -Xmx10g threads=4 ref1="amoa_1.fas,amoa_2.fas" in=all_reads.fastq outm=mapped_ref_1.fq outu=unmapped.fq nodisk po int rbm don -da
                  java -Djava.library.path=/Users/tbobby/Downloads/bbmap/jni/ -da -Xmx10g -cp /Users/twpierson/Downloads/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xmx10g threads=4 ref1=amoa_1.fas,amoa_2.fas in=all_reads.fastq outm=mapped_ref_1.fq outu=unmapped.fq nodisk po int rbm don -da
                  Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, -Xmx10g, threads=4, ref1=amoa_1.fas,amoa_2.fas, in=all_reads.fastq, outm=mapped_ref_1.fq, outu=unmapped.fq, nodisk, po, int, rbm, don, -da]

                  BBMap version 35.85
                  Set threads to 4
                  Exception in thread "main" java.lang.RuntimeException: Unknown parameter: ref1=amoa_1.fas,amoa_2.fas
                  at align2.AbstractMapper.parse(AbstractMapper.java:627)
                  at align2.AbstractMapper.<init>(AbstractMapper.java:51)
                  at align2.BBMap.<init>(BBMap.java:41)
                  at align2.BBMap.main(BBMap.java:29)
                  You had a slight syntax error there - "ref1=" should be "ref=". But, BBMap won't accept that format (although some other tools do). You have to first concatenate them:

                  cat amoa_1.fas amoa_2.fas > all.fasta

                  Then align:

                  bbmap.sh ref=all.fasta other parameters

                  BBSplit will allow comma-delimited references, though, but its usage syntax is a bit different.

                  There is definitely something wrong here, as only 52 read pairs of 424000 aligned to the reference, which is even lower than the first run. Hard to say what it is... did you expect most of the reads to align?

                  And have you BLASTed some of these reads to see what they are?
                  Last edited by Brian Bushnell; 03-16-2016, 08:43 PM.

                  Comment


                  • #39
                    Originally posted by Brian Bushnell View Post
                    You had a slight syntax error there - "ref1=" should be "ref=". But, BBMap won't accept that format (although some other tools do). You have to first concatenate them:

                    cat amoa_1.fas amoa_2.fas > all.fasta

                    Then align:

                    bbmap.sh ref=all.fasta other parameters

                    BBSplit will allow comma-delimited references, though, but its usage syntax is a bit different.

                    There is definitely something wrong here, as only 52 read pairs of 424000 aligned to the reference, which is even lower than the first run. Hard to say what it is... did you expect most of the reads to align?

                    And have you BLASTed some of these reads to see what they are?
                    I have tried the second command you listed:

                    bash bbmap.sh ref=all.fasta other parameters
                    Max memory cannot be determined. Attempting to use 3200 MB.
                    If this fails, please add the -Xmx flag (e.g. -Xmx24g) to your command,
                    or run this program qsubbed or from a qlogin session on Genepool, or set ulimit to an appropriate value.
                    java -Djava.library.path=/Users/twpierson/Downloads/bbmap/jni/ -ea -Xmx3200m -cp /Users/twpierson/Downloads/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 ref=all.fasta other parameters
                    Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=all.fasta, other, parameters]

                    BBMap version 35.85
                    Exception in thread "main" java.lang.RuntimeException: Unknown parameter: other
                    at align2.AbstractMapper.parse(AbstractMapper.java:627)
                    at align2.AbstractMapper.<init>(AbstractMapper.java:51)
                    at align2.BBMap.<init>(BBMap.java:41)
                    at align2.BBMap.main(BBMap.java:29)

                    I have attempted to blast the reads that mapped and they appear correct...mostly archaea amoA gene clones. This is actually a friends sequencing data. Maybe the issue has something to do with the primers amplifying more than archaea. I'm not exactly sure myself.

                    Thanks again guys!

                    Comment


                    • #40
                      @Illusive Man: A multi fasta format file looks like this
                      Code:
                      >Genome_1
                      ACGATCTAGC
                      >Genome_2
                      ACGCCTAGCTAGCGCTA
                      >Genome_3
                      CGCTCGATCGATCGA
                      You get the idea.

                      cat command @Brian provided combined your genomes to make a single genomes file in multi-fasta format.

                      As for the other command you literally tried to run what @Brian wrote. What he meant was

                      Code:
                      $ bash bbmap.sh ref=all.fasta [B][COLOR="Red"]other parameters[/COLOR][/B]
                      Replace Other parameters with BBMap optional parameters you want to use to run the alignment.

                      Comment


                      • #41
                        Originally posted by GenoMax View Post
                        @Illusive Man: A multi fasta format file looks like this
                        Code:
                        >Genome_1
                        ACGATCTAGC
                        >Genome_2
                        ACGCCTAGCTAGCGCTA
                        >Genome_3
                        CGCTCGATCGATCGA
                        You get the idea.

                        cat command @Brian provided combined your genomes to make a single genomes file in multi-fasta format.

                        As for the other command you literally tried to run what @Brian wrote. What he meant was

                        Code:
                        $ bash bbmap.sh ref=all.fasta [B][COLOR="Red"]other parameters[/COLOR][/B]
                        Replace Other parameters with BBMap optional parameters you want to use to run the alignment.

                        I think I did exactly what you guys stated in the earlier scripts I posted. My reference file was already a multi-fasta file containing the three sequences concatenated. When I performed the alignment my script looked something like this...

                        bash bbmap.sh -Xmx10g threads=4 ref=amoaref2.fasta in=all_reads.fastq outm=mapped.fq outu=unmapped.fq nodisk po int rbm don -da
                        So I am fairly certain we are saying the same thing. That still leaves the question of why so few of read1 is aligning to the reference sequences. Perhaps I need to add some of the unclassified sequences in the archaea amoa NCBI database to my reference multi-fasta file (amoaref2.fasta).

                        Thanks!

                        Comment


                        • #42
                          Your command:

                          bash bbmap.sh -Xmx10g threads=4 ref=amoaref1.fasta in=all_reads.fastq outm=mapped_ref1.fq outu=unmapped.fq nodisk po int rbm don -da

                          ...is syntactically correct.

                          It looks like everything is OK except for the low mapping rate. Note that both read 1 and read 2 mapped at similar (low) rates in all cases, so basically, they just don't match the reference. And they don't match it equally badly.

                          Comment


                          • #43
                            Originally posted by Illusive Man View Post
                            That still leaves the question of why so few of read1 is aligning to the reference sequences. Perhaps I need to add some of the unclassified sequences in the archaea amoa NCBI database to my reference multi-fasta file (amoaref2.fasta).

                            Thanks!
                            Perhaps. But you should also check a sample of reads that do not align (you can collect those easily with BBMap) and check them with Blast @NCBI. That would rule out sample contamination.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            24 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            25 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            22 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            52 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X