Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • many N in soapdenovo assemble results

    Hi,

    I used soapdenovo2 assemble one plant genome, but there are many N in the scaffolds and the N50 only 15000bp. Below is some results. I don't know why? Hope somebody can help me. Thank you very much!

    Size_includeN 1339592585
    Size_withoutN 305042437
    Scaffold_Num 975033
    Mean_Size 1373
    Median_Size 120
    Longest_Seq 221407
    Shortest_Seq 100
    Singleton_Num 803540
    Average_length_of_break(N)_in_scaffold 1061

    Known_genome_size NaN
    Total_scaffold_length_as_percentage_of_known_genome_size NaN

    scaffolds>100 842372 86.39%
    scaffolds>500 119625 12.27%
    scaffolds>1K 93419 9.58%
    scaffolds>10K 35907 3.68%
    scaffolds>100K 187 0.02%
    scaffolds>1M 0 0.00%

    Nucleotide_A 85568194 6.39%
    Nucleotide_C 68439481 5.11%
    Nucleotide_G 67296842 5.02%
    Nucleotide_T 83737920 6.25%
    GapContent_N 1034550148 77.23%
    Non_ACGTN 0 0.00%
    GC_Content 44.50% (G+C)/(A+C+G+T)

    N10 59139 1741
    N20 43739 4409
    N30 34619 7868
    N40 27474 12218
    N50 21467 17745
    N60 15647 25064
    N70 10191 35422
    N80 7208 51584
    N90 931 95139



    Regards,
    huily

  • #2
    Looks like the scaffolding is rather poor, so its kind of expected to see a ton of gaps with Ns. What insert size libraries were used for the assembly?

    Comment


    • #3
      Two pair end libraries, each insert length is about 250bp and on mate-pair library of insert length 7K. I have tried many times use velvet and soapdenovo2, but all seems not good.

      Comment


      • #4
        Have you done any pre-processing steps to clean your reads prior to assembly? SOAPdenovo has issues with chimeric mate pair reads which effect proper scaffolding if not removed prior to assembly. Also specifying the mate pair reads in config file is rather tricky.

        Here's some helpful discussion if you haven't seen it already: https://www.biostars.org/p/13142/

        Comment


        • #5
          Dear vivek,

          Thank you very much! I trimmed and normalized my raw reads before assembly. Here is my config for soapdenovo.

          max_rd_len=100
          [LIB]
          avg_ins=250
          reverse_seq=0
          asm_flags=3
          rank=1
          q1=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/AU_normalized/DNA-1_CGATGT_L002_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq
          q2=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/AU_normalized/DNA-1_CGATGT_L002_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq
          q1=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU_normalized/SM01-PBU1_GTCCGC_L005_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq
          q2=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU_normalized/SM01-PBU1_GTCCGC_L005_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq
          [LIB]
          avg_ins=7000
          reverse_seq=0
          asm_flags=3
          rank=2
          q1=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU7k_normalized/SM01-PBU1-7k_CCGTCC_L005_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq
          q2=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU7k_normalized/SM01-PBU1-7k_CCGTCC_L005_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq

          My script is : SOAPdenovo-63mer all -s /diag/home/goosegrass_genome/SOAPdenovo/config -K 31 -R -o /diag/home/genome/SOAPdenovo/graph_prefix_1>scaff.log_2>scaff.err

          Do you think have some suggestions about improving my script? And I don't know whether my mate pair is chimeric or not. How can I judge it?

          Thanks a lot.

          Comment


          • #6
            Someone suggested in the thread I linked to set reverse_seq=1 for the mate pair library.

            Other than that something I did for a similar issue albeit a long while ago was to align the 7kb library reads to the draft genome you currently have and see what kind of insert size distribution you are observing.

            If some read are chimeric, you'll see the mate pair reads with insert sizes much less than 7kB in the alignment results, which you can subsequently discard and re-do the assembly with the remaining ones to see if it improves scaffolding.

            Comment


            • #7
              I reversed the mate pair sequence use clc before assemble, so I set reverse_seq=0.
              I also used velvet with multikemrs to assemble, it seems no N in the scaffolds but N50 is only 11199bp. I don't know why. I will try to allign mate pair reads to the draft genome to see the chimeric. Thanks!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Best Practices for Single-Cell Sequencing Analysis
                by seqadmin



                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                06-06-2024, 07:15 AM
              • seqadmin
                Latest Developments in Precision Medicine
                by seqadmin



                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                Somatic Genomics
                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                05-24-2024, 01:16 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 07:24 AM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-13-2024, 08:58 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-12-2024, 02:20 PM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-07-2024, 06:58 AM
              0 responses
              184 views
              0 likes
              Last Post seqadmin  
              Working...
              X