Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • many N in soapdenovo assemble results

    Hi,

    I used soapdenovo2 assemble one plant genome, but there are many N in the scaffolds and the N50 only 15000bp. Below is some results. I don't know why? Hope somebody can help me. Thank you very much!

    Size_includeN 1339592585
    Size_withoutN 305042437
    Scaffold_Num 975033
    Mean_Size 1373
    Median_Size 120
    Longest_Seq 221407
    Shortest_Seq 100
    Singleton_Num 803540
    Average_length_of_break(N)_in_scaffold 1061

    Known_genome_size NaN
    Total_scaffold_length_as_percentage_of_known_genome_size NaN

    scaffolds>100 842372 86.39%
    scaffolds>500 119625 12.27%
    scaffolds>1K 93419 9.58%
    scaffolds>10K 35907 3.68%
    scaffolds>100K 187 0.02%
    scaffolds>1M 0 0.00%

    Nucleotide_A 85568194 6.39%
    Nucleotide_C 68439481 5.11%
    Nucleotide_G 67296842 5.02%
    Nucleotide_T 83737920 6.25%
    GapContent_N 1034550148 77.23%
    Non_ACGTN 0 0.00%
    GC_Content 44.50% (G+C)/(A+C+G+T)

    N10 59139 1741
    N20 43739 4409
    N30 34619 7868
    N40 27474 12218
    N50 21467 17745
    N60 15647 25064
    N70 10191 35422
    N80 7208 51584
    N90 931 95139



    Regards,
    huily

  • #2
    Looks like the scaffolding is rather poor, so its kind of expected to see a ton of gaps with Ns. What insert size libraries were used for the assembly?

    Comment


    • #3
      Two pair end libraries, each insert length is about 250bp and on mate-pair library of insert length 7K. I have tried many times use velvet and soapdenovo2, but all seems not good.

      Comment


      • #4
        Have you done any pre-processing steps to clean your reads prior to assembly? SOAPdenovo has issues with chimeric mate pair reads which effect proper scaffolding if not removed prior to assembly. Also specifying the mate pair reads in config file is rather tricky.

        Here's some helpful discussion if you haven't seen it already: https://www.biostars.org/p/13142/

        Comment


        • #5
          Dear vivek,

          Thank you very much! I trimmed and normalized my raw reads before assembly. Here is my config for soapdenovo.

          max_rd_len=100
          [LIB]
          avg_ins=250
          reverse_seq=0
          asm_flags=3
          rank=1
          q1=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/AU_normalized/DNA-1_CGATGT_L002_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq
          q2=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/AU_normalized/DNA-1_CGATGT_L002_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq
          q1=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU_normalized/SM01-PBU1_GTCCGC_L005_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq
          q2=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU_normalized/SM01-PBU1_GTCCGC_L005_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq
          [LIB]
          avg_ins=7000
          reverse_seq=0
          asm_flags=3
          rank=2
          q1=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU7k_normalized/SM01-PBU1-7k_CCGTCC_L005_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq
          q2=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU7k_normalized/SM01-PBU1-7k_CCGTCC_L005_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq

          My script is : SOAPdenovo-63mer all -s /diag/home/goosegrass_genome/SOAPdenovo/config -K 31 -R -o /diag/home/genome/SOAPdenovo/graph_prefix_1>scaff.log_2>scaff.err

          Do you think have some suggestions about improving my script? And I don't know whether my mate pair is chimeric or not. How can I judge it?

          Thanks a lot.

          Comment


          • #6
            Someone suggested in the thread I linked to set reverse_seq=1 for the mate pair library.

            Other than that something I did for a similar issue albeit a long while ago was to align the 7kb library reads to the draft genome you currently have and see what kind of insert size distribution you are observing.

            If some read are chimeric, you'll see the mate pair reads with insert sizes much less than 7kB in the alignment results, which you can subsequently discard and re-do the assembly with the remaining ones to see if it improves scaffolding.

            Comment


            • #7
              I reversed the mate pair sequence use clc before assemble, so I set reverse_seq=0.
              I also used velvet with multikemrs to assemble, it seems no N in the scaffolds but N50 is only 11199bp. I don't know why. I will try to allign mate pair reads to the draft genome to see the chimeric. Thanks!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin


                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                Yesterday, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              39 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              41 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              35 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              55 views
              0 likes
              Last Post seqadmin  
              Working...
              X