Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bioman1
    Member
    • May 2012
    • 80

    Low coverge genome assembly- suggestions

    Hello,

    We have sequenced genomic sequence of fruit crop through Hiseq 2000. The data are illumina paired-end fastq reads. The raw reads are filtered using trimmomatic with default settings and test with FastQC tool. After filteration, the total sequences are reduced from 505956290 to 418812062, with %GC 38 and sequence length is 101.It passed all test with warnings in per base sequence content and sequence duplication levels.

    My single filtered fastq file size is 108Gbp and the genome size predicted through kmer genie and SGA preqc predicted to be around 2Gbp. The coverage is to be below 20x. Which genome assembler is good in assembling at low coverage?. What are the ways I can improve my genome assembly through computational approach?. Please let me know your suggestions and any pointer to journal papers which successed in assembling low coverage plant genome.
  • lorendarith

    #2
    Do you only have one single short fragment library which makes up these 20x coverage or is this a sum of different libraries?

    If you want to assemble genomes with short read technologies it is crucial to have several libraries and library types of different insert (and maybe also read) lengths.

    Is it not possible for you to sequence more or you really need to make something out of these 20x?

    Comment

    • zatelmar
      Junior Member
      • Apr 2013
      • 1

      #3
      Hi - I would first consider running error correction of your reads (e.g. using musket). Are your reads paired? This would be important to improve the assembly.SOAPdenovo could be a good starting point, you could also try abyss, velvet, that are also relatively easy to install and run, though velvet could be quite memory demanding for a big dataset as yours. It is important to optimise the k-mer size, kmer genie should have suggested one already. However, with the coverage you have, you cannot expect a really high N50. Hope this helps.

      Comment

      • bioman1
        Member
        • May 2012
        • 80

        #4
        I have paired-end reads (read1.fastq, read2.fastq), which I interleaved as single.fastq file. This single fastq file has %GC 38 and sequence length is 101. This file has coverage about 20x. I cannot able to sequence more due to my boss budget, I would like to make something out these reads to make publication. Any suggestions, to make draft genome for publication.

        Comment

        Latest Articles

        Collapse

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Yesterday, 10:09 AM
        0 responses
        10 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        20 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 12:03 PM
        0 responses
        27 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 11:40 AM
        0 responses
        21 views
        0 reactions
        Last Post SEQadmin2  
        Working...