Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Process to remove primers, adapters, etc. from Illumina data

    Hi all,

    I have some Illumina paired-end (100 bp) read data and have seen this very useful page: http://intron.ccam.uchc.edu/groups/t...Sequences.html

    It has some primer adapter sequences and primer sequences in it. My question is, do I have to remove any additional sequences than the ones on this page and the primers I used to amplify my cDNA, such as the index primers? Is the sequence for index primers the same as the PE sequencing or PCR primers given in the link above, with just the index tag added?

    Should I worry about reverse complementing all of these and removing those sequences as well?

    Liz

  • #2
    It is always good to start with QC on your data. You will find there are several tools to do this.

    FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/) and Fastx toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) are good ones to start with. There are utilities that will help you remove the adapters if they are present in your sequences. One more alternative is cutadapt (http://code.google.com/p/cutadapt/).
    Last edited by GenoMax; 01-27-2012, 05:45 AM.

    Comment


    • #3
      Hi, I'm aware of the FastXtoolkit and other tools mentioned. My question is not how to remove adapters and primers, but more whether I need to reverse complement adapters and primers and remove the reverse complemented sequences as well.

      Comment


      • #4
        We find that using the first 13bp of the Illumina adapter ('AGATCGGAAGAGC') efficiently removes adapter contamination for both paired-end files (the adapters on both sides share this sequence before they fork, and any of the Illumina multiplex barcodes should be further downstream of that).

        A typical command for Cutadapt could be

        ./cutadapt -f fastq -O $stringency -q 20 -a AGATCGGAAGAGC input_file.fastq

        $stringency would define the overlap with the adapter required for it to remove sequence from the end, the default is 3 I believe. This command would remove poor quality sequence as well as adapters from your FastQ file.

        You should only be careful with the option of removing sequences if they become too short, because this can throw off the sequence-by-sequence order of paired-end files which is required by many aligners.

        I hope this helps

        Comment


        • #5
          Thanks! I'm trying to figure out how to QC data before trying to use it in Velvet and Trinity.

          Comment


          • #6
            I've tried the 13-mer adapter end sequence with the FastXtoolkit (Clip), and it didn't remove any reads. However, when I use the full primer sequences, reads are clipped and removed. I'm going to try clipping the 13-mer sequence with CutAdapt, but I thought I would mention it in case anyone can tell me the difference between how these programs work.

            Comment


            • #7
              Hi,
              One could also try simple grep to have a rough idea regarding the adapter sequences.
              HTML Code:
              grep -c "^GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG" *.fastq
              grep -c "GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG$" *.fastq
              grep -c "GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG" *.fastq
              You can also try 13-mer sequence.
              Thanks,
              Rahul
              Rahul Sharma,
              Ph.D
              Frankfurt am Main, Germany

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Quality Control Essentials for Next-Generation Sequencing Workflows
                by seqadmin




                Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

                Nucleic Acid Quality Control
                Preparing for NGS starts with isolating the...
                02-10-2025, 01:58 PM
              • seqadmin
                An Introduction to the Technologies Transforming Precision Medicine
                by seqadmin


                In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
                01-27-2025, 07:46 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 02-07-2025, 09:30 AM
              0 responses
              65 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-05-2025, 10:34 AM
              0 responses
              101 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-03-2025, 09:07 AM
              0 responses
              79 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 01-31-2025, 08:31 AM
              0 responses
              45 views
              0 likes
              Last Post seqadmin  
              Working...
              X