Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SOAPdenovo pregraph read counts

    I am trying to run SOAPdenovo for the first time. I launched it a few days ago and it seems to be running fine. There are no errors. This was the command:
    Code:
    SOAPdenovo-63mer all \
    -s config.config \
    -p 10 \
    -R \
    -o graph \
    -V hawkeye \
    1>ass.log.txt 2>ass.err.txt
    I feel like it's taking too long. What really worries me is the progress meter in the log file:
    Code:
    Version 2.04: released on July 13th, 2012
    Compile Jul  9 2013	11:57:16
    
    ********************
    Pregraph
    ********************
    
    Parameters: pregraph -s config.config -p 10 -R -o graph 
    
    In config.config, 1 lib(s), maximum read length 150, maximum name length 256.
    
    10 thread(s) initialized.
    Import reads from file:
     FASTQ/lane_NoIndex_L000_R1_001.fastq.gz
    Import reads from file:
     FASTQ/lane_NoIndex_L000_R2_001.fastq.gz
    --- 100000000th reads.
    --- 200000000th reads.
    --- 300000000th reads.
    ...
    ...
    --- 34500000000th reads.
    --- 34600000000th reads.
    --- 34700000000th reads.
    The last update is for 34.7 billion reads. However, the input FASTQs only contain 370 million reads combined. What would cause such a discrepancy? Are the reads reported not the same thing as the reads from the FASTQs?

  • #2
    Did you solve this problem? I'm experiencing the same issue
    Thanks!

    Comment


    • #3
      I am having the same issue as well, I am at 2.2 Billion reads although my two inputs are around 4 million reads each

      Comment


      • #4
        I fixed this by performing adapter and quality trimming before attempting assembly.

        Comment


        • #5
          What program would you recommend for trimming?

          I've seen people talk about trimmomatic as an option, but I don't have any

          experience in doing that regard

          Comment


          • #6
            Originally posted by samzorn1 View Post
            What program would you recommend for trimming?

            I've seen people talk about trimmomatic as an option, but I don't have any

            experience in doing that regard
            I use Trimmomatic. It does both adapter and quality trimming. The manual is very comprehensive, so you shouldn't really run into any questions.

            Comment


            • #7
              Thanks for the suggestion
              I have two nearly identical fastq pairs: same genome same Illumina sequencing parameters
              but trimmomatic does very, very different things with them
              one of them gets reduced from 1Gb to about 80Mb and runs through SOAP beautifully
              The other goes to about 997Mb, run through SOAP but doesn't assemble

              Any idea why the same parameters on nearly identical data sets would have such different results?

              I am running the default parameters given with -phred64

              Comment


              • #8
                Originally posted by samzorn1 View Post
                Thanks for the suggestion
                I have two nearly identical fastq pairs: same genome same Illumina sequencing parameters
                but trimmomatic does very, very different things with them
                one of them gets reduced from 1Gb to about 80Mb and runs through SOAP beautifully
                The other goes to about 997Mb, run through SOAP but doesn't assemble

                Any idea why the same parameters on nearly identical data sets would have such different results?

                I am running the default parameters given with -phred64
                I wouldn't set -phred64, since Trimmomatic can now detect quality automatically.

                The fact that you are losing more than 90% of your reads for one of your libraries indicated that there is a serious problem with it. I would also check your FASTQs with something like FastQC to see if that shows anything odd.

                Comment


                • #9
                  I had the same problem. SOAP would keep loading sequences for days. So, in case anybody else will have the same issue I leave it here:

                  The problem in my case was in the fasta file with single reads I provided. Which looked like this, as you usually get from many programs: few flat text lines of a given length e.g. 70 bases):
                  >T2_SSU
                  GGTTGATCCTGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTCTAAGTATAAGCAACTA
                  TACGGTGAAACTGCGAATGGCTCATTAAATCAGTTATCGTTTATTTGATAGTACCTTACTACATGGATAC
                  CTGTGGTAATTCTAGAGCTAATACATGCTAAAAACCCCGACTTCGGGAGGGGTGTATTTATTAGATAAAA
                  AACCAATGCCCTTCGGGGCTCCTTGGTGAATCATAATAACTTAACGAATCGCATGGCCTTGCGCCGGCGA
                  TGGTTCATTCAAATTTCTGCCCTATCAACTTTCGATGGTAGGATAGTGGCCTACCATGGTAGCAACGGGT
                  AACGGGGAATTAGGGTTCGACTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCA
                  GGCGCGCAAA
                  However, it seems that SOAPdenovo cannot recognise the line justification. Just changing the fasta file sequences to a single line will solve the problem, so SOAP can read this:
                  >T2_SSU
                  GGTTGATCCTGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTCTAAGTATAAGCAACTATACGGTGAAACTGCGAATGGCTCATTAAATCAGTTATCGTTTATTTGATAGTACCTTACTACATGGATAC
                  * I shortened the examples

                  Comment


                  • #10
                    Same unknown issue resulting in weirdly huge pregraph read counts

                    My data has been trimmed and error-corrected, and I am still experiencing this problem. Any solutions known?

                    I have 2.4 billion illumina reads, but pregraph is counting up to 300 billion before I give up.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    66 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X