Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • vallejov
    Member
    • Jul 2011
    • 10

    Adatper contamination

    Hello!

    I ran Fastqc on my reads and didn't detect any overrepresented sequences. As a result I did not remove any adapter sequences. I did do a 12 bp head crop and then quality trimmimg to process my raw reads as well as a low complexity filter. In my nievete I didn't realize that there could be adapters present even if they don't show up in my Fastqc report! My resulting assembly seems fine, I've been characterizing and annotating it and havent' found anything glaringly wrong.

    My questions are:
    1. Is my assembly ok even though I did not remove adapter sequences?
    2. What could happen in the assembly if the adapter sequences were not removed?

    I would greatly appreciate anyones experience in this!!
    Veronica
  • usad
    Member
    • Sep 2009
    • 53

    #2
    Hi Veronica

    adapters might "become" part of your genome (you see this in a few draft genomes) or cause bridges.

    You could try searching your genomes against all the possible adapter/primer sequences. If you find got enough hits, you definitely know you have a somewhat major issue.

    More often they might just negatively effect your N50.
    But anyway 12bp headcrop seems high (of course this might have been caused by your qual values or as a way to remove adapter. In the latter case, they can also be on the other end, as you can get read-through)

    How much work did you put in and what size is the genome?

    Cheers,
    b

    Comment

    • lorendarith

      #3
      Originally posted by usad View Post
      adapters might "become" part of your genome (you see this in a few draft genomes) or cause bridges.

      You could try searching your genomes against all the possible adapter/primer sequences. If you find got enough hits, you definitely know you have a somewhat major issue.
      But how many matches of primers/adapters to your sequence is actually true contamination or actual biological sequence? How can you distinguish these two? I mean, surely there is a possibility that actual biological sequences have the same sequences as adapter/primers.

      If I do a BLAST search of adapter sequences, you get all sorts of matches in databases. Is this contamination or what?

      Comment

      • usad
        Member
        • Sep 2009
        • 53

        #4
        Hi,

        This is of course difficult to say.
        Also with your BLAST searches. But if you re-assemble a genome from scratch with stringent adapter removal and the adapters dissappear it is a good indication. (Yeah I do this sometimes for some courses I teach)

        That said of course the match length and its e-value gives you an indication.

        Cheers
        Björn

        Comment

        • vallejov
          Member
          • Jul 2011
          • 10

          #5
          Hi Björn,

          I've put quite a bit of time into this assembly so I would rather "fix" it if possible rather than scrap it and start again. It is a de novo transcriptome assembly not a genome.

          I tried searching for the adapter sequences using blast but ran into the problem of interpretation. Is a 10 bp 100% identity match real adapter/primer contamination? I guess I don't know how to use match length and e-value to determine this.

          I decided to remove the transcripts that have obvious matches (nearly the whole length of the primer or adapter present in the sequence) but those partial sequence hits I just left in.

          Thanks for your replies!
          Veronica

          Comment

          • martin2
            Member
            • Nov 2010
            • 42

            #6
            Originally posted by vallejov View Post
            I've put quite a bit of time into this assembly so I would rather "fix" it if possible rather than scrap it and start again. It is a de novo transcriptome assembly not a genome.

            I tried searching for the adapter sequences using blast but ran into the problem of interpretation. Is a 10 bp 100% identity match real adapter/primer contamination? I guess I don't know how to use match length and e-value to determine this.
            Provided e.g. Roche MID tags have 10 or 11 nt in length, it is perfectly valid to interpret such matches are true, and trim them away with their upstream/downstream regions. But, you have to understand what happened in the lab. From a bioinformatician's side of view, 10nt match seems too weak and crappy, even worse there are 2-3 sequencing errors. You really have to dive into the details of the sequencing technology and of the lab protocol, otherwise it is just bad guesswork. In certain locations within a read I do treat such matches as e.g. MID tags. Same for some adapter remnants which have been cleaved by a restriction endonuclease.
            Hi Veronica,

            Originally posted by vallejov View Post
            I decided to remove the transcripts that have obvious matches (nearly the whole length of the primer or adapter present in the sequence) but those partial sequence hits I just left in.
            This is still not correct. How about partial PCR products? How about truncated adapters somebody left in after improper trimming? How about partial adapters left in after vendor-specific trimming (e.g. trimming becased on some quality criteria)? Of course you don't want to leave them in the dataset. A typical place where most adapter-trimming tools fail. You also have to anticipate e.g. two sample molecules stitched together via some adapter/linker/artifact, which are mostly somehow distorted, for example truncated or contain some short linker sequence.

            The only general advice would be: Trim more rather than less and start with trimming of the original data, not of some partially trimmed data. It is much more difficult to uncover partial targets which somebody left in.

            Martin

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            33 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            97 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            117 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            111 views
            0 reactions
            Last Post SEQadmin2  
            Working...