Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Dinab
    Junior Member
    • Apr 2017
    • 3

    Quality distribution graph

    A bit late, but hope you can help me:
    I want to create an average quality distribution graph of multiple sequences from fastq file. Is it possible with reformat?
    Also, if I use reformat.sh with maq=XX, does it simply calculates the avvarge quality based on base quality and checks the error rate (2% max)?

    Thank you!

    Comment

    • Brian Bushnell
      Super Moderator
      • Jan 2014
      • 2709

      Originally posted by Dinab View Post
      A bit late, but hope you can help me:
      I want to create an average quality distribution graph of multiple sequences from fastq file. Is it possible with reformat?
      Yes, the histogram section of the help covers that. Depending on how you want to plot it, you can use qhist, qchist, aqhist, or bqhist. The most generic is aqhist, which plots the number of reads per average quality score.

      Also, if I use reformat.sh with maq=XX, does it simply calculates the average quality based on base quality and checks the error rate (2% max)?
      "maq" transforms the qualities into probability space, calculates the average error rate, and then transforms that back into the phred scale. So, a 2% maximum error rate would be approximately maq=17.

      Comment

      • Dinab
        Junior Member
        • Apr 2017
        • 3

        Perfect, Thank you!

        Comment

        • Dinab
          Junior Member
          • Apr 2017
          • 3

          Another question

          Hi Brian,
          Your previous response worked perfectly, thank you again.
          I now have additional complexity to add: is it possible to get a plot of the mean quality of each sequence as variable of the length?

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            Originally posted by Dinab View Post
            I now have additional complexity to add: is it possible to get a plot of the mean quality of each sequence as variable of the length?
            No, I've never implemented something like that... the closest you could get with the existing package is to use a loop for each length with something like...

            Code:
            reformat.sh in=reads.fq minlen=147 maxlen=147 out=stdout.fq int=f | reformat.sh in=stdin.fq aqhist=aqhist_147.txt

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              Originally posted by Dinab View Post
              is it possible to get a plot of the mean quality of each sequence as variable of the length?
              FastQC gives you an average per sequence quality score.

              Comment

              • blsfoxfox
                Member
                • Jan 2013
                • 12

                fastq format error

                Hi Brian,

                I found there is some format error with my fastq. By using reformat.sh in=in.fastq out=out.fastq, I get following error:

                Input is being processed as unpaired
                java.lang.AssertionError:
                Error in in.fastq, line 4241767, with these 4 lines:
                >m150430_235943_42146_c100804572550000001823173810081565_s1_p0/108988/0_3288 RQ=0.868
                CTCTTTCCGGTAACACGCGCCTAGATACTAACAACCTACCTTATACTAATCAGCATGCCTAATACCAACA
                ACCTACTTTATACCTAATAACAATGCCTTGATAACTAACAACATACCTTAATACCTAATACACGCTTAAT
                ACCTAACAACAATACCTTAATACCTAACAACATACTTTAATACCTAACAACCTACCTTAATAACTTAATA

                at stream.FASTQ.quadToRead(FASTQ.java:779)
                at stream.FASTQ.toReadList(FASTQ.java:710)
                at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:110)
                at stream.FastqReadInputStream.nextList(FastqReadInputStream.java:95)
                at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:656)
                at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)
                Input: 1060400 reads 9378503777 bases
                Output: 1060400 reads (100.00%) 9378503777 bases (100.00%)

                Time: 75.193 seconds.
                Reads Processed: 1060k 14.10k reads/sec
                Bases Processed: 9378m 124.73m bases/sec
                Exception in thread "main" java.lang.RuntimeException: ReformatReads terminated in an error state; the output may be corrupt.
                at jgi.ReformatReads.process(ReformatReads.java:1130)
                at jgi.ReformatReads.main(ReformatReads.java:45)

                Could you please provide any suggestions on which steps shoud I follow to find the exact error of that read?

                Thanks!

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  The problem here is that you are using a fasta sequence named "in.fastq". BBTools is sensitive to filenames, and *.fastq will be processed as a fastq file. If the file has no extension it will usually look at the contents to try to figure out what it is, but when there is a known extension, it assumes it is correct.

                  So, just rename the input file to "in.fasta" or add the flag "extin=.fasta" to override filename. Although for most uses of PacBio data I do recommend that you go back and get the original fastq file and use that, because the quality scores are often useful.
                  Last edited by Brian Bushnell; 05-07-2017, 10:35 PM.

                  Comment

                  • blsfoxfox
                    Member
                    • Jan 2013
                    • 12

                    Hi Brian,

                    Thanks for your quick response!

                    Actually, I am doing that on purpose. I found there is format error in my fastq file while using another software, but I don't know where is it. By using reformat.sh with in=in.fastq out=out.fastq, bbmap will report where is that error. In this case, read >m150430_235943_42146_c100804572550000001823173810081565_s1_p0/108988/0_3288 RQ=0.868 causes a corruption of the output. However, I still don't know what's wrong and the only solution I can think of now is using filterbyname tool to exclude that read from my fastq.
                    Thus, I really appreciate if you could provide any suggestion on how could I identify and fix that format error.

                    Thanks again for your time,

                    Comment

                    • blsfoxfox
                      Member
                      • Jan 2013
                      • 12

                      Originally posted by Brian Bushnell View Post
                      The problem here is that you are using a fasta sequence named "in.fastq". BBTools is sensitive to filenames, and *.fastq will be processed as a fastq file. If the file has no extension it will usually look at the contents to try to figure out what it is, but when there is a known extension, it assumes it is correct.

                      So, just rename the input file to "in.fasta" or add the flag "extin=.fasta" to override filename. Although for most uses of PacBio data I do recommend that you go back and get the original fastq file and use that, because the quality scores are often useful.
                      Hi Brian,

                      Sorry for that I should double checked my input.fastq. After grep multiple lines after that read, I find some fasta sequences in that fastq file. As they are raw sequences, I don't have a 'clean' backup for it. So my problem now is to find a tool which could extract fasta sequences from fastq.

                      Thanks,

                      Comment

                      • GenoMax
                        Senior Member
                        • Feb 2008
                        • 7142

                        Code:
                        reformat.sh in=seq.fq out=seq.fa
                        will convert fastq sequences to fasta. If your problem is malformed fastq records then you need to find and delete those records manually.

                        Comment

                        • blsfoxfox
                          Member
                          • Jan 2013
                          • 12

                          Originally posted by GenoMax View Post
                          Code:
                          reformat.sh in=seq.fq out=seq.fa
                          will convert fastq sequences to fasta. If your problem is malformed fastq records then you need to find and delete those records manually.
                          Hi,

                          Thanks for your response!

                          My problem is that my file is mixed of fasta and fastq, and I don't know how to identify those fasta sequences and extract them.

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            How did that happen?

                            Unless you have a defined reason I would suggest not trusting that file.

                            Comment

                            • skatrinli
                              Junior Member
                              • May 2017
                              • 1

                              Hey Brian,

                              Thanks for this great tool but I could not properly download it. I installed the latest version 37.25 but when i try to test the installation with the command
                              $ (installation directory)/stats.sh in=(installation directory)/resources/phix174_ill.ref.fa.gz
                              it comes this error:
                              Exception in thread "main" java.lang.RuntimeException: Unknown parameter Downloads/bbmap/resources/phix174_ill.ref.fa.gz
                              at jgi.AssemblyStats2.<init>(AssemblyStats2.java:166)
                              at jgi.AssemblyStats2.main(AssemblyStats2.java:39)

                              What did I do wrong?

                              Comment

                              • GenoMax
                                Senior Member
                                • Feb 2008
                                • 7142

                                There is no "installation" needed for BBMap. Just uncompress and run (as long as you have Java available for your OS). Are you using Java 1.7 or 1.8?
                                @Brian no longer tests against Java 1.6 (which is what you may be using) if I recall.

                                I get the following when I run stats.sh.

                                Code:
                                $ stats.sh in=/path_to/bbmap/resources/phix174_ill.ref.fa.gz 
                                A       C       G       T       N       IUPAC   Other   GC      GC_stdev
                                0.2399  0.2144  0.2326  0.3130  0.0000  0.0000  0.0000  0.4471  0.0000
                                
                                Main genome scaffold total:             1
                                Main genome contig total:               1
                                Main genome scaffold sequence total:    0.005 MB
                                Main genome contig sequence total:      0.005 MB        0.000% gap
                                Main genome scaffold N/L50:             1/5.386 KB
                                Main genome contig N/L50:               1/5.386 KB
                                Main genome scaffold N/L90:             1/5.386 KB
                                Main genome contig N/L90:               1/5.386 KB
                                Max scaffold length:                    5.386 KB
                                Max contig length:                      5.386 KB
                                Number of scaffolds > 50 KB:            0
                                % main genome in scaffolds > 50 KB:     0.00%
                                
                                
                                Minimum         Number          Number          Total           Total           Scaffold
                                Scaffold        of              of              Scaffold        Contig          Contig  
                                Length          Scaffolds       Contigs         Length          Length          Coverage
                                --------        --------------  --------------  --------------  --------------  --------
                                    All                      1               1           5,386           5,386   100.00%
                                   5 KB                      1               1           5,386           5,386   100.00%

                                Comment

                                Latest Articles

                                Collapse

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Today, 10:09 AM
                                0 responses
                                8 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, Yesterday, 08:59 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                23 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                20 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...