Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trimming Nextera XT Sequence Data

    I've got some MiSeq data from Nextera XT prepped libraries. I figure that it will be necessary to trim adapters and/or transposase sequences from the data and I'm hoping someone can assist me. I'd like to use Scythe for trimming.

    Scythe requires an adapters/contaminants file (FASTA format) as input, and I'm confused as to how to construct this file. Illumina provides the following information on the sequences:

    Nextera® transposase sequences
    5’ TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
    (a) Read 1 -->
    5’ GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
    (d) Read 2 -->
    Nextera® Index Kit - PCR primers
    5’ AATGATACGGCGACCACCGAGATCTACAC[i5]TCGTCGGCAGCGTC
    (c) i5 Index read -->
    5’ CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG
    <-- i7 Index read (b)

    Could I simply use the following adapter FASTA file content? (I've combined the adapter and transposase sequences into single string using the overlapping region.) What do I replace the [i5] and [i7] barcode tags with?

    >FWD_Adapter
    AATGATACGGCGACCACCGAGATCTACAC[i5]TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
    >REV_Adapter
    CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG

    Thanks for any help you can provide!

  • #2
    A list of all Nextera adapter sequences, complete with bar codes, is packaged with BBTools in fasta format, in the resources directory (nextera.fa.gz).

    I recommend using BBDuk rather than Scythe, but of course, I'm biased. Anyway, the adapter sequences will work with any adapter-trimming program.

    Comment


    • #3
      Very helpful. Thanks Brian!

      Another question. I notice from various reading I have done on this forum and elsewhere that many people do not use the entire adapter/barcode/whatever-contaminant sequence when trimming. Often I see them put just the first 8, 10, 12 bases into their contaminant file. Is there an advantage/disadvantage to using the truncated sequence over the full sequence(s)? Do most trimmers simply look for the specified sequence and trim that and everything after?

      EDIT: middle of the night grammar

      Comment


      • #4
        I am not really sure what most trimmers do, but when looking for a full-sequence match, in the presence of error, you will trim more adapters with an 8bp sequence than a 12bp sequence. Of course, you will also incur more false-positives.

        Some people trim as little as 1bp, allowing up to 1 mismatch. That will, of course, shorten all reads by a minimum of 1... and I think that's a bad approach unless a single adapter base is devastating - in which case, I think it's better to reorganize your experiment so that a single adapter base will not be devastating.

        BBDuk matches full-length kmers in the middle of the read, and at the very end of the read, when there are fewer than K bases left, it will match kmers from the ends of adapters down to the "mink" setting. So, providing longer adapter sequences is generally advantageous. You can set "mink" to 8bp if you want, which will allow similar sensitivity but better specificity to using an 8-bp adapter sequence.

        Generally, though, I recommend 11bp as a minimum for mink - meaning, a match for the last 11bp of a read (with a hamming distance of 0 or 1), and a kmer length of 23 for nonterminal kmers. If you use 8bp adapter sequences and trim wherever you see a match for them, you have a 4^8 = 2^16 = 1/65356 chance of a spurious match, even if you require an exact match. That means that for assembly, you will on average not get any contigs longer than 64kbp! Which is terrible.

        It depends on what your goal is, though. When looking for super-rare 1/100 rate mutations, trimming adapters as much as possible may be wise, even if you lose data in a biased way.

        P.S. I forgot to mention, BBDuk's "tbo" flag will allow you to trim reads with even 1bp of adapter sequence with very little risk of false-positives, by finding looking for where the reads overlap. It requires paired reads, but it will work even with unknown adapter sequences.
        Last edited by Brian Bushnell; 02-19-2015, 06:34 PM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-27-2024, 06:37 PM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-27-2024, 06:07 PM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        69 views
        0 likes
        Last Post seqadmin  
        Working...
        X