Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Large K-mer Velvet

    Hi Folks,
    I am using Velvet to assemble a number of genes where the reads are of 75bp length. An issue I am having is that some of these genes are a result of duplications, where the parent and duplicate gene are very similar. Am I right in thinking that a high k-mer length will reduce that chances of an assembly error (smaller k-mers being merged as one contig despite coming from reads generated from duplicates). I realize sequencing errors may be unavoidable, hopefully good coverage will help avoid these. If longer k-mers are better for duplicates would it be better to generate longer reads?

  • #2
    Originally posted by NGS_user View Post
    Hi Folks,
    I am using Velvet to assemble a number of genes where the reads are of 75bp length.
    Are the reads paired ?

    Originally posted by NGS_user View Post

    An issue I am having is that some of these genes are a result of duplications, where the parent and duplicate gene are very similar. Am I right in thinking that a high k-mer length will reduce that chances of an assembly error (smaller k-mers being merged as one contig despite coming from reads generated from duplicates).
    Surely this will account for some differences in assemblers using bubble merging or bubble popping approaches such as Velvet or ABySS.

    In general, increasing the k-mer length increases the uniqueness of k-mers in the resulting graph.

    Two things disallow the use of a very large k-mer length. The first is obviously the read length. The second is the error rate.


    Originally posted by NGS_user View Post
    I realize sequencing errors may be unavoidable, hopefully good coverage will help avoid these.
    If sequencing errors occur randomly, they won't stack and therefore can be weeded out to some extent. Different assemblers will do that in different manners.

    For example, In Ray (see http://denovoassembler.sf.net; I am the author), these errors are just avoided, but are not removed from the graph.


    Originally posted by NGS_user View Post
    If longer k-mers are better for duplicates would it be better to generate longer reads?
    Longer reads is always better if the throughput scales as well.

    This is one of the goals that Pacific Biosciences aims to achieve -- longer reads.


    Maybe you can try Ray on your dataset. Ray does not merge similar paths in the assembly process so that might help.


    seb
    Last edited by seb567; 05-31-2011, 09:29 AM. Reason: fixed link

    Comment


    • #3
      The reads are single end but if I am to generate new data I could have paired end reads of either 100 or 150 bp (GAII). I am just concerned that the high error rate will affect my assemblies as I am not assembling a genome, rather a family of mammalian genes

      Comment


      • #4
        Originally posted by NGS_user View Post
        The reads are single end but if I am to generate new data I could have paired end reads of either 100 or 150 bp (GAII). I am just concerned that the high error rate will affect my assemblies as I am not assembling a genome, rather a family of mammalian genes
        Perhaps you could first perform simulations on those genes (if they are known) or on closely-related or similar genes.

        You can do that with Ray right away.

        First, you need these packages (available in all GNU/Linux distros):

        make
        g++
        open-mpi
        git (to get the development version of Ray)
        boost (to compile the read simulator shipped with Ray)


        What follows is the workflow you could use.

        Install Ray and VirtualNextGenSequencer

        Code:
        git clone [email protected]:sebhtml/ray.git
        cd ray
        make PREFIX=build MAXKMERLENGTH=128 VIRTUAL_SEQUENCER=y
        make install

        Sequencer your genes in silico


        Code:
        N=600000 #number of pairs of reads
        readLength=75
        errorRate=0.005 # 0.5%
        ref=~/nuccore/genes.fasta
        mean=400 # average insert size
        sd=40 # standard deviation
        
        ./build/VirtualNextGenSequencer $ref $errorRate \
        $mean $sd $N $readLength L1_1.fasta L1_2.fasta
        Build an assembly
        Code:
        mpirun -np 64 ./build/Ray -k 70 -p L1_1.fasta L2_2.fasta \
         -o GeneBuild

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM
        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 05-24-2024, 07:15 AM
        0 responses
        198 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-23-2024, 10:28 AM
        0 responses
        220 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-23-2024, 07:35 AM
        0 responses
        229 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-22-2024, 02:06 PM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Working...
        X