Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adapter trimming with cutadapt

    Hi,
    can you please give me an advice. I have paired end sequencing data from amplicon sequencing using TruSeq adaptors (and some barcoding using Myeloid panel Illumina kit).

    I have noticed using FastQC that there are many reads that have contamination by Illumina universal adapter in R2 but not in R1. I quite do not understand the fact that there is such disproportion. So maybe I am missing something such as reverse complement the reads from R1 or adapter sequence.

    However knowing that adapter sequence starts with AGATCGGAAGAG (I guess for both adapters) is it sufficient to remove adapters by this command?

    cutadapt -a AGATCGGAAGAG -A AGATCGGAAGAG -o out.1.fastq -p out.2.fastq IN_R1_001.fastq IN_R2_001.fastq

    Thank you,
    Vojtech.
    Data analysis www.persmed.eu

  • #2
    Adapter sequence for fragment libraries should always appear at the exact same location in R1 as in R2 for a given pair. Thus adapter-trimmed paired reads should come out with R1 and R2 being the same length.

    If you are not sure what your adapter sequence is, you can find out with BBMerge (post #48). 12bp is kind of short for adapter removal, particularly if mismatches are allowed.

    Comment


    • #3
      Thank you,
      you pointed me out in a good direction.

      Acording to extensive analysis of supplied files I have found out adapter sequences:
      Read1 :GCGAATTTCGACGATCGTTGCATTAACTCGCGA
      Read2 :AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

      resulting in a command
      cutadapt -a GCGAATTTCGACGATCGTTGCATTAACTCGCGA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o /dev/null -p /dev/null R1.fastq R2.fastq

      I am also considering using trimmomatic.

      I have barcoded sequences. And the automatic detection would be good. The problem with my data is that Illumina (or my sequencing facility in their name) refused to confirm adapter sequences. I know that the data are coming from http://www.illumina.com/products/trusight-myeloid.html and are barcoded using Illumina barcodes. Adaptor contamination is present only in a proportion of my files. In some files they are not present.

      I promise to test my data with your tool if the detection is correct. The problem is that I do not understand how to supply first pair reads and how second pair reads. And how your tool will figure out where to put N's for barcode when I supply only one sample barcoded with only one barcode. I have approximately 96 samples sequenced in one run, multiplexed. All samples (same library) were sequenced on the two lanes as paired end experiment. Thus I have 4 files for each sample. If you are interested in testing your program on these data I am interested. PM me.

      Another Issue will be to remove primer sequences from the reads and then it would be nice to use FASTX or your software to cover whole amplicon with a read. But I doubt that using such tools is of no benefit when primer sequences are present in the data.

      Vojtech.
      Data analysis www.persmed.eu

      Comment


      • #4
        Vojtech: Both BBDuk (part of BBMap) and trimmomatic include sequence files for standard illumina adapters.

        With BBDuk if you're not sure which adapters are used, you can add "ref=truseq.fa.gz,truseq_rna.fa.gz,nextera.fa.gz" to your command line and get them all (this will increase the amount of overtrimming, though it should still be negligible).

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Developments in Metagenomics
          by seqadmin





          Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
          09-23-2024, 06:35 AM
        • seqadmin
          Understanding Genetic Influence on Infectious Disease
          by seqadmin




          During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

          Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
          09-09-2024, 10:59 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 10-02-2024, 04:51 AM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-01-2024, 07:10 AM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-30-2024, 08:33 AM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-26-2024, 12:57 PM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Working...
        X