Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Puzzling result from Illumina 150bp PE reads

    Hi all,

    We are using Illumina 150bp paired-end reads to perform de novo assembly for a bacterial genome (~5Mb). Our procedure goes like this:
    1. merge the paired-end reads into a single file
    2. trim the reads using Q20 as the cutoff (i.e., remove all positions following the first low quality base)
    3. discard reads that are <70bp after trimming
    4. separate the reads into two files, one for paired-end reads and one for single-end reads (i.e., one of the PE reads was removed in the previous step)
    5. feed the two files to velvet (v1.1.02), test all possible k-mer values and find one that produces best n50/max

    The initial result looks reasonably good. However, when we tried to simulate the effects of using shorter reads by first trimming all reads to 100bp, we found the assembly actually becomes much better! The n50 increased from ~175kb to ~341kp and the max increased from ~512kb to ~937kb (the total genome size and the number of reads used didn't change much). Blastn confirmed that the improvement comes from merging of contigs.

    I found this really puzzling because I was expecting the opposite result. Can this be due to higher error rates toward the 3' end (even though the quality scores look just fine)?
    Last edited by chkuo; 06-06-2011, 10:54 PM. Reason: typo

  • #2
    Originally posted by chkuo View Post
    I found this really puzzling because I was expecting the opposite result. Can this be due to higher error rates toward the 3' end (even though the quality scores look just fine)?
    Adapters perhaps? Unless you're very strict with the size separation, it's easy to have some fraction of the library with <150 base fragments (especially if you're making a <300bp library). When you sequence these short fragments, you read into the adapters on the 'other' end of the read.

    You're right that longer reads (if they are correct) should help in general.

    Comment


    • #3
      Originally posted by tonybolger View Post
      Adapters perhaps?
      Not sure if this was the problem. Velvet estimated the fragment size to be 325 +/- 41 bp (same for trimmed/untrimmed) and Bioanalyzer result showed average size of ~350bp.

      Comment


      • #4
        Originally posted by chkuo View Post
        Not sure if this was the problem. Velvet estimated the fragment size to be 325 +/- 41 bp (same for trimmed/untrimmed) and Bioanalyzer result showed average size of ~350bp.
        Lies, damned lies and library length statistics

        I've always found a fairly significant number of adapters in our data, even with 600bp libraries, but naturally YMMV.

        Comment


        • #5
          Originally posted by tonybolger View Post
          I've always found a fairly significant number of adapters in our data, even with 600bp libraries, but naturally YMMV.
          Any pointer on a quick and easy way to check for adapters? Many thanks!

          Comment


          • #6
            Originally posted by chkuo View Post
            Any pointer on a quick and easy way to check for adapters? Many thanks!
            Have you tried the FASTX-toolkit: http://hannonlab.cshl.edu/fastx_toolkit/

            Comment


            • #7
              Please keep us updated on whether adaptor removal solved the problem!

              Comment


              • #8
                Will need to talk with the sequencing facility to figure out the adapter sequence to do the trimming. In the mean time, I've tried different length cutoff for the trimming and 100bp performed better than longer ones.

                Comment


                • #9
                  Originally posted by chkuo View Post
                  Any pointer on a quick and easy way to check for adapters? Many thanks!
                  I've created a tool to do all the various pre-processing steps with illumina data aka Trimmomatic - you can find it here.

                  You'll need to make a fasta file of all the adapter sequences though - we're not allowed to distribute them, which is rather annoying. If you're having problems, email me at the link on the trimmomatic page.

                  BTW, you can find links to the adapter sequences in a sticky on the illumina board.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Understanding Genetic Influence on Infectious Disease
                    by seqadmin




                    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                    09-09-2024, 10:59 AM
                  • seqadmin
                    Addressing Off-Target Effects in CRISPR Technologies
                    by seqadmin






                    The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                    08-27-2024, 04:44 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 06:25 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 01:02 PM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 09-18-2024, 06:39 AM
                  0 responses
                  14 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 09-11-2024, 02:44 PM
                  0 responses
                  14 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X