Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data Clean & Adapter Contamination

    Hello All,

    I am currently starting to analyse my RAD data set (Illumina HiSeq, 150bp PE sequencing run). I have used Trimmomatic for the data clean and have checked for adapter contamination (as well as complementary adapter sequence contamination) using grep. Trimmomatic managed to get rid of all of the adapters and cleaned my data set nicely.

    However, most of my adapter contamination seems to have occurred within the sequencing read (and is actually reverse complementary adapter sequence). I believe the way Trimmomatic handles within read contamination is by retaining the 5' end of the read up until the contamination occurs. What I am worried about is that this way, the retained part of the read will not necessarily match the corresponding paired read anymore (if this makes any sense at all).

    The way that I imagine the adapter ends up within the read is that a small fragment with an adapter is ligated to another small fragment with adapters so i have a read that is as follows

    P1- read(- P1 -read)------ P2-read. The brackets indicate what Trimmomatic would trim. My concern is that I have now paired reads where the forward does not belong to the reverse read:P1- read--- P2-read.

    I am trying to assemble these de-novo into contigs and am worried that these 'trimmed contaminated sequences' could lead to false assemblies. However, altogether, these make ~3% of the total reads. I assume of these some will have been discarded due to poor quality or failed to be demultiplexed etc and mostly these are different sequences, so as long as my assemblies are created sufficiently strictly this shouldn't be much of a problem?

    I would value any opinions on whether I should find a solution to this problem or whether this is sufficiently small for me to just go ahead with the analysis anyway?

    Any ideas why I end up having so many sequences with reverse complementary adapter sequence contamination? E.g. via grep I find 2 million rc adapter infested reads, but only 1000 non-rc adapter infested reads? Is this normal with RAD sequencing?

    Thank you so much for any help in advance,

    Sarah

  • #2
    If the problematic reads are only 3% of the total, then you can remove or ignore them.

    Are you using software specific for RAD-Seq like Stacks or RADTools?

    Comment


    • #3
      Thank you very much for your quick reply! It is much appreciated.
      Last edited by Corydoras; 06-04-2014, 07:49 AM.

      Comment


      • #4
        Just in case anybody ever has a similar problem or is confused and stumbles across my post:

        Looking closer at my files and where the reverse adapter contamination occurred, it became obvious that the rc sequences were actually simply adapter read through and everything that followed was nonesense which Trimmomatic then perfectly removed. This means in roughly 3% of cases, my fragments were too short for the 150bp HiSeq and the RAD size selection did not work perfectly, but considering it is only 3% and it was my first set of libraries I am fairly happy with that.

        Above I stated that I was concerned the forward read would not match the reverse read. Now I believe this is only the case in a couple of hundred fragments at best, that do consist of tiny fragments with adapter ligating to other tiny fragments of adapter. The majority of the contamination however presents itself in reverse complementary form.

        This all obviously rests upon the assumption that when read-through occurs, it will be reverse complementary of the P2 adapters in the forward reads, and reverse complementary of the P1 adapters in the reverse reads. Please feel free to point out if there is something wrong with my logic!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Exploring the Dynamics of the Tumor Microenvironment
          by seqadmin




          The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
          07-08-2024, 03:19 PM
        • seqadmin
          Exploring Human Diversity Through Large-Scale Omics
          by seqadmin


          In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
          06-25-2024, 06:43 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 07-19-2024, 07:20 AM
        0 responses
        29 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-16-2024, 05:49 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-15-2024, 06:53 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-10-2024, 07:30 AM
        0 responses
        43 views
        0 likes
        Last Post seqadmin  
        Working...
        X