Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ancient DNA adaptor removal and read merging

    Hi everyone,

    I've seen a few people doing ancient DNA work on this forum and I thought they might be able to help me out on a Bioinformatics question related to the analysis of an ancient DNA sample sequenced using Illumina HiSeq. Ive been following a protocol written by Kircher et al. (http://www.ncbi.nlm.nih.gov/pubmed/22237537) regarding the merging of paired end sequence data for small insert libraries. I have a library insert size of between 180-300bp and I'm working with read lengths of 100bp. Is this protocol appropriate for this project?

    Alternatively, is it just easier to remove adaptors, quality trim and do single-read mapping rather than merging?

    Cheers

  • #2
    I have not read that paper, but could you in more detail describe the wet lab steps, how many lanes of data you have, and what your reservations are regarding following what is done in that paper?

    Comment


    • #3
      Its a protocols paper, so nothing biologically relevant. I was asking advice as whether the process of merging with small insert sizes is common, appropriate given the stats I have given in my post, and whether this was a standard for ancient DNA analysis. There's not many good Bioinformatics methods papers for ancient DNA out there so it would be great to get advice on what others use.

      Only have one lane - about 158 million pairs. Wet lab steps isn't my area sorry..

      Comment


      • #4
        Originally posted by jimmybee View Post
        Its a protocols paper, so nothing biologically relevant. I was asking advice as whether the process of merging with small insert sizes is common, appropriate given the stats I have given in my post, and whether this was a standard for ancient DNA analysis. There's not many good Bioinformatics methods papers for ancient DNA out there so it would be great to get advice on what others use.

        Only have one lane - about 158 million pairs. Wet lab steps isn't my area sorry..
        Oh, I see, you mean merging the two paired ends if they overlap to yield one single end read? I thought you meant merging multiple lanes of data. I'll look at that paper when I get to work and can access it but my gut instinct is that since some of your insert sizes are >200bp then there is no point in merging; I'd rather align all of the data identically. But, I'll glance at that paper in a couple of hours.

        Comment


        • #5
          So glancing through the paper I think it's probably fine to follow it. If I had more time I'd read it in detail so perhaps somebody else will chime in.

          Comment


          • #6
            Thanks mate. Looks like a good way to go, especially considering my lack of experience in the QC of small insert libraries in paired end sequencing

            Comment


            • #7
              I actually wrote something that did something similar for my masters project, unfortunately it wasn't as intelligent as this in choosing the most likely overlap. Alas I don't have access to the final code (it's buried on a UCL fileserver which i no longer have access to). However, it was based on this: http://almlab.mit.edu/vibrioGenomes/SHERA_temp/ which might be worth a look.

              Comment


              • #8
                Ok no problem, I'll have a look at it. All good ideas

                Comment


                • #9
                  Check out pandaseq, works pretty well:

                  Background Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. Results PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. Conclusions PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Advances in Sequencing Analysis Tools
                    by seqadmin


                    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                    Yesterday, 07:48 AM
                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 07:17 AM
                  0 responses
                  11 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-02-2024, 08:06 AM
                  0 responses
                  19 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-30-2024, 12:17 PM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-29-2024, 10:49 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X