Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • many N in soapdenovo assemble results

    Hi,

    I used soapdenovo2 assemble one plant genome, but there are many N in the scaffolds and the N50 only 15000bp. Below is some results. I don't know why? Hope somebody can help me. Thank you very much!

    Size_includeN 1339592585
    Size_withoutN 305042437
    Scaffold_Num 975033
    Mean_Size 1373
    Median_Size 120
    Longest_Seq 221407
    Shortest_Seq 100
    Singleton_Num 803540
    Average_length_of_break(N)_in_scaffold 1061

    Known_genome_size NaN
    Total_scaffold_length_as_percentage_of_known_genome_size NaN

    scaffolds>100 842372 86.39%
    scaffolds>500 119625 12.27%
    scaffolds>1K 93419 9.58%
    scaffolds>10K 35907 3.68%
    scaffolds>100K 187 0.02%
    scaffolds>1M 0 0.00%

    Nucleotide_A 85568194 6.39%
    Nucleotide_C 68439481 5.11%
    Nucleotide_G 67296842 5.02%
    Nucleotide_T 83737920 6.25%
    GapContent_N 1034550148 77.23%
    Non_ACGTN 0 0.00%
    GC_Content 44.50% (G+C)/(A+C+G+T)

    N10 59139 1741
    N20 43739 4409
    N30 34619 7868
    N40 27474 12218
    N50 21467 17745
    N60 15647 25064
    N70 10191 35422
    N80 7208 51584
    N90 931 95139



    Regards,
    huily

  • #2
    Looks like the scaffolding is rather poor, so its kind of expected to see a ton of gaps with Ns. What insert size libraries were used for the assembly?

    Comment


    • #3
      Two pair end libraries, each insert length is about 250bp and on mate-pair library of insert length 7K. I have tried many times use velvet and soapdenovo2, but all seems not good.

      Comment


      • #4
        Have you done any pre-processing steps to clean your reads prior to assembly? SOAPdenovo has issues with chimeric mate pair reads which effect proper scaffolding if not removed prior to assembly. Also specifying the mate pair reads in config file is rather tricky.

        Here's some helpful discussion if you haven't seen it already: https://www.biostars.org/p/13142/

        Comment


        • #5
          Dear vivek,

          Thank you very much! I trimmed and normalized my raw reads before assembly. Here is my config for soapdenovo.

          max_rd_len=100
          [LIB]
          avg_ins=250
          reverse_seq=0
          asm_flags=3
          rank=1
          q1=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/AU_normalized/DNA-1_CGATGT_L002_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq
          q2=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/AU_normalized/DNA-1_CGATGT_L002_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq
          q1=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU_normalized/SM01-PBU1_GTCCGC_L005_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq
          q2=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU_normalized/SM01-PBU1_GTCCGC_L005_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq
          [LIB]
          avg_ins=7000
          reverse_seq=0
          asm_flags=3
          rank=2
          q1=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU7k_normalized/SM01-PBU1-7k_CCGTCC_L005_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq
          q2=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU7k_normalized/SM01-PBU1-7k_CCGTCC_L005_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq

          My script is : SOAPdenovo-63mer all -s /diag/home/goosegrass_genome/SOAPdenovo/config -K 31 -R -o /diag/home/genome/SOAPdenovo/graph_prefix_1>scaff.log_2>scaff.err

          Do you think have some suggestions about improving my script? And I don't know whether my mate pair is chimeric or not. How can I judge it?

          Thanks a lot.

          Comment


          • #6
            Someone suggested in the thread I linked to set reverse_seq=1 for the mate pair library.

            Other than that something I did for a similar issue albeit a long while ago was to align the 7kb library reads to the draft genome you currently have and see what kind of insert size distribution you are observing.

            If some read are chimeric, you'll see the mate pair reads with insert sizes much less than 7kB in the alignment results, which you can subsequently discard and re-do the assembly with the remaining ones to see if it improves scaffolding.

            Comment


            • #7
              I reversed the mate pair sequence use clc before assemble, so I set reverse_seq=0.
              I also used velvet with multikemrs to assemble, it seems no N in the scaffolds but N50 is only 11199bp. I don't know why. I will try to allign mate pair reads to the draft genome to see the chimeric. Thanks!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X