Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ancient DNA adaptor removal and read merging

    Hi everyone,

    I've seen a few people doing ancient DNA work on this forum and I thought they might be able to help me out on a Bioinformatics question related to the analysis of an ancient DNA sample sequenced using Illumina HiSeq. Ive been following a protocol written by Kircher et al. (http://www.ncbi.nlm.nih.gov/pubmed/22237537) regarding the merging of paired end sequence data for small insert libraries. I have a library insert size of between 180-300bp and I'm working with read lengths of 100bp. Is this protocol appropriate for this project?

    Alternatively, is it just easier to remove adaptors, quality trim and do single-read mapping rather than merging?

    Cheers

  • #2
    I have not read that paper, but could you in more detail describe the wet lab steps, how many lanes of data you have, and what your reservations are regarding following what is done in that paper?

    Comment


    • #3
      Its a protocols paper, so nothing biologically relevant. I was asking advice as whether the process of merging with small insert sizes is common, appropriate given the stats I have given in my post, and whether this was a standard for ancient DNA analysis. There's not many good Bioinformatics methods papers for ancient DNA out there so it would be great to get advice on what others use.

      Only have one lane - about 158 million pairs. Wet lab steps isn't my area sorry..

      Comment


      • #4
        Originally posted by jimmybee View Post
        Its a protocols paper, so nothing biologically relevant. I was asking advice as whether the process of merging with small insert sizes is common, appropriate given the stats I have given in my post, and whether this was a standard for ancient DNA analysis. There's not many good Bioinformatics methods papers for ancient DNA out there so it would be great to get advice on what others use.

        Only have one lane - about 158 million pairs. Wet lab steps isn't my area sorry..
        Oh, I see, you mean merging the two paired ends if they overlap to yield one single end read? I thought you meant merging multiple lanes of data. I'll look at that paper when I get to work and can access it but my gut instinct is that since some of your insert sizes are >200bp then there is no point in merging; I'd rather align all of the data identically. But, I'll glance at that paper in a couple of hours.

        Comment


        • #5
          So glancing through the paper I think it's probably fine to follow it. If I had more time I'd read it in detail so perhaps somebody else will chime in.

          Comment


          • #6
            Thanks mate. Looks like a good way to go, especially considering my lack of experience in the QC of small insert libraries in paired end sequencing

            Comment


            • #7
              I actually wrote something that did something similar for my masters project, unfortunately it wasn't as intelligent as this in choosing the most likely overlap. Alas I don't have access to the final code (it's buried on a UCL fileserver which i no longer have access to). However, it was based on this: http://almlab.mit.edu/vibrioGenomes/SHERA_temp/ which might be worth a look.

              Comment


              • #8
                Ok no problem, I'll have a look at it. All good ideas

                Comment


                • #9
                  Check out pandaseq, works pretty well:

                  Background Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. Results PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. Conclusions PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin


                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                    Yesterday, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  39 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  41 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  35 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X