Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Increasing contig lengths

    I'm working on a project to identify by sequence viruses infecting grapevines.

    I have single end Illumina reads (50 bp) and have been trying to assemble them using a combination of Velvet and PRICE. I've been able to get to a max contig length of around 1500 with Velvet and an n50 of 46. After putting this output through PRICE, I can increase the n50 to 195. However, I am having trouble increasing my contig length after this. Do you have any advice regarding contig extension with single end reads?

  • #2
    Have you tried varying the kmer length when assembling? Also, it would be helpful to know more about your data, like the read length and total amount, and quality metrics.

    I encourage you to read this thread:
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Comment


    • #3
      ... and the amount of contaminating sequences?

      Originally posted by Brian Bushnell View Post
      Have you tried varying the kmer length when assembling? Also, it would be helpful to know more about your data, like the read length and total amount, and quality metrics.

      I encourage you to read this thread:
      http://seqanswers.com/forums/showthread.php?t=42555

      Comment


      • #4
        Thanks for the replies.

        I used VelvetOptimiser to determine optimal k-mer length. Our data contains a mixture of grape and virus reads, but we removed the reads that aligned to the grape reference genome. Our read length is 50 bp and we have 7,764,190 reads after filtering out the grape reads.

        Here is the quast output from the optimal velvet run:

        All statistics are based on contigs of size >= 100 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs).

        Assembly contigs
        # contigs (>= 0 bp) 3547
        # contigs (>= 1000 bp) 1
        Total length (>= 0 bp) 326445
        Total length (>= 1000 bp) 1073
        # contigs 941
        Largest contig 1073
        Total length 156559
        GC (%) 46.84
        N50 162
        N75 122
        L50 305
        L75 584
        # N's per 100 kbp 0.00

        Comment


        • #5
          Are you trying to assemble genomic data or transcriptomic data?

          What is the expected genome size of the virus genome you are trying to assemble?

          What kmer length have you used?

          As Brian already mentioned above, I would play around with the kmer length
          when using velvet, to see what kmer length gives you the best n50.

          Have you done any QC, adapter trimming or quality trimming on your reads?

          Comment


          • #6
            Do you think the viral genome will be divergent within a sample from replication errors? That could cause issues for assembly if there are lots of related kmers at a location instead of just one or two alleles and a low level of sequencing error.
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment


            • #7
              I was able to assemble my data using IDBA_UD. I set it to cycle through k mers that were less than my read size and it produced a 15000bp sequence: idba_ud -r ../data/trimmed-reads/LV89-02.fa -o ../results/contigs/008 --mink 19 --maxk 49 --step 2

              Comment


              • #8
                quast quality stats of the assembly?

                Comment


                • #9
                  All statistics are based on contigs of size >= 100 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs).

                  Assembly contig
                  # contigs (>= 0 bp) 295
                  # contigs (>= 1000 bp) 37
                  Total length (>= 0 bp) 199556
                  Total length (>= 1000 bp) 86293
                  # contigs 295
                  Largest contig 15124
                  Total length 199556
                  GC (%) 46.27
                  N50 759
                  N75 426
                  L50 53
                  L75 141
                  # N's per 100 kbp 0.00

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Non-Coding RNA Research and Technologies
                    by seqadmin


                    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                    [Article Coming Soon!]...
                    Today, 08:07 AM
                  • seqadmin
                    Recent Developments in Metagenomics
                    by seqadmin





                    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                    09-23-2024, 06:35 AM
                  • seqadmin
                    Understanding Genetic Influence on Infectious Disease
                    by seqadmin




                    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                    09-09-2024, 10:59 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 10-02-2024, 04:51 AM
                  0 responses
                  14 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-01-2024, 07:10 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 09-30-2024, 08:33 AM
                  1 response
                  31 views
                  0 likes
                  Last Post EmiTom
                  by EmiTom
                   
                  Started by seqadmin, 09-26-2024, 12:57 PM
                  0 responses
                  19 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X