Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trimming Nextera XT Sequence Data

    I've got some MiSeq data from Nextera XT prepped libraries. I figure that it will be necessary to trim adapters and/or transposase sequences from the data and I'm hoping someone can assist me. I'd like to use Scythe for trimming.

    Scythe requires an adapters/contaminants file (FASTA format) as input, and I'm confused as to how to construct this file. Illumina provides the following information on the sequences:

    Nextera® transposase sequences
    5’ TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
    (a) Read 1 -->
    5’ GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
    (d) Read 2 -->
    Nextera® Index Kit - PCR primers
    5’ AATGATACGGCGACCACCGAGATCTACAC[i5]TCGTCGGCAGCGTC
    (c) i5 Index read -->
    5’ CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG
    <-- i7 Index read (b)

    Could I simply use the following adapter FASTA file content? (I've combined the adapter and transposase sequences into single string using the overlapping region.) What do I replace the [i5] and [i7] barcode tags with?

    >FWD_Adapter
    AATGATACGGCGACCACCGAGATCTACAC[i5]TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
    >REV_Adapter
    CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG

    Thanks for any help you can provide!

  • #2
    A list of all Nextera adapter sequences, complete with bar codes, is packaged with BBTools in fasta format, in the resources directory (nextera.fa.gz).

    I recommend using BBDuk rather than Scythe, but of course, I'm biased. Anyway, the adapter sequences will work with any adapter-trimming program.

    Comment


    • #3
      Very helpful. Thanks Brian!

      Another question. I notice from various reading I have done on this forum and elsewhere that many people do not use the entire adapter/barcode/whatever-contaminant sequence when trimming. Often I see them put just the first 8, 10, 12 bases into their contaminant file. Is there an advantage/disadvantage to using the truncated sequence over the full sequence(s)? Do most trimmers simply look for the specified sequence and trim that and everything after?

      EDIT: middle of the night grammar

      Comment


      • #4
        I am not really sure what most trimmers do, but when looking for a full-sequence match, in the presence of error, you will trim more adapters with an 8bp sequence than a 12bp sequence. Of course, you will also incur more false-positives.

        Some people trim as little as 1bp, allowing up to 1 mismatch. That will, of course, shorten all reads by a minimum of 1... and I think that's a bad approach unless a single adapter base is devastating - in which case, I think it's better to reorganize your experiment so that a single adapter base will not be devastating.

        BBDuk matches full-length kmers in the middle of the read, and at the very end of the read, when there are fewer than K bases left, it will match kmers from the ends of adapters down to the "mink" setting. So, providing longer adapter sequences is generally advantageous. You can set "mink" to 8bp if you want, which will allow similar sensitivity but better specificity to using an 8-bp adapter sequence.

        Generally, though, I recommend 11bp as a minimum for mink - meaning, a match for the last 11bp of a read (with a hamming distance of 0 or 1), and a kmer length of 23 for nonterminal kmers. If you use 8bp adapter sequences and trim wherever you see a match for them, you have a 4^8 = 2^16 = 1/65356 chance of a spurious match, even if you require an exact match. That means that for assembly, you will on average not get any contigs longer than 64kbp! Which is terrible.

        It depends on what your goal is, though. When looking for super-rare 1/100 rate mutations, trimming adapters as much as possible may be wise, even if you lose data in a biased way.

        P.S. I forgot to mention, BBDuk's "tbo" flag will allow you to trim reads with even 1bp of adapter sequence with very little risk of false-positives, by finding looking for where the reads overlap. It requires paired reads, but it will work even with unknown adapter sequences.
        Last edited by Brian Bushnell; 02-19-2015, 06:34 PM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Genetic Variation in Immunogenetics and Antibody Diversity
          by seqadmin



          The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
          11-06-2024, 07:24 PM
        • seqadmin
          Choosing Between NGS and qPCR
          by seqadmin



          Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
          10-18-2024, 07:11 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 11-08-2024, 11:09 AM
        0 responses
        38 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 11-08-2024, 06:13 AM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 11-01-2024, 06:09 AM
        0 responses
        33 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-30-2024, 05:31 AM
        0 responses
        23 views
        0 likes
        Last Post seqadmin  
        Working...
        X