Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Vector contamination?

    After preliminary de novo clustering/assembly of transcriptomic data from a non-model organism, I've found what appears to be a pretty good indication of vector contamination (bitscore:6318; evalue:0.0; 3423 out of 3424 identities) (Accession# AY817672)

    This is how the library was prepared: I extracted high quality Total RNA from the organism, and shipped it to the sequencing facility who generated the library and ran 1 Lane of a flow cell (2x76bp) that generated ~5.1Gb of total data (~34,000,000 paired end reads)
    Now the overall frequency appears to be low, only ~600,000bases (or <0.01%). And it actually winds working almost as an assembly "quality control metric" to allow us to assess consistency between different assemblers. But as far as i'm concerned, this vector shouldn't be in our library, and it seems like it's something that our sequencing service provider should be able to account for.

    Has anybody else found this sort of thing in their Solexa/Illumina libraries? As far as I'm aware, cloning vectors are not a part of the Illumina protocol so it's unlikely to simply be an artifact from library prep. Am I wrong?

    Thanks for the insight.
    Last edited by gconcepcion; 02-05-2011, 11:14 AM.

  • #2
    I do bacteria, and I've found stuff like that too.

    The simplest answer is that it was in your sample. That's what the sequencing facility will tell you. You'd have to make cDNA and sanger to be sure, but in general, you should believe your data. Your data tells you you've got vector, you should believe that until you have empirical data (like a failed PCR reaction) that conflicts.

    Comment


    • #3
      Originally posted by swbarnes2 View Post
      I do bacteria, and I've found stuff like that too.

      The simplest answer is that it was in your sample. That's what the sequencing facility will tell you. You'd have to make cDNA and sanger to be sure, but in general, you should believe your data. Your data tells you you've got vector, you should believe that until you have empirical data (like a failed PCR reaction) that conflicts.
      Thanks for the response. I didn't mention in the first post that the Total RNA that I sent to the facility was used to prepare two EST libraries, one for 454 pyrosequencing and one for Illumina Solexa sequencing. The 454 data was assembled and no evidence of any vector was found whatsoever. This 'evidence' (or lack thereof) leads me to believe that the vector was not in the original sample. Coupled with the fact that we work with eukaryotic protists and have never had that vector in our lab makes me doubt that the sample is the source.

      But what do I know!? i could be wrong!

      Comment


      • #4
        We've seen contamination coming from some odd places. In one case we had heavy contamination with bacterial DNA in what should have been a eukaryotic sample. It turned out the contamination was in a preparation of streptavidin beads used for a ChIP.

        Since you now have the sequence for your vector you could always run a PCR on your original material which should tell you if it was present before you sent your sample off for sequencing.

        Comment


        • #5
          Originally posted by simonandrews View Post
          We've seen contamination coming from some odd places. In one case we had heavy contamination with bacterial DNA in what should have been a eukaryotic sample. It turned out the contamination was in a preparation of streptavidin beads used for a ChIP.
          Interesting, It didn't occur to me that there may be contamination from supposedly "clean" reagents/disposables used during extraction. At any rate, primers have been ordered and i'll be checking for contamination in the actual sample.

          Cheers!

          Comment


          • #6
            Where in the vector does your sequence match? I ask because bases 1-10270 of the genbank record you provide are the sequence of an SIV provirus. Is your non-model species a mammal? It may just contain some viral RNA.

            Also, how deep was your 454 run? At the frequency you mention above, a typical 454 run (400 million bases) would give you about 7000 bases of sequence -- only enough to go about 2x on the contig of the size you found in your Illumina data. So unless you took your suspect Illumina contig and blasted it against your full 454 data set (pre-assembly), then you might be just missing sequence that is there.

            That said, I would have to say your suspicions are reasonable. Here is the problem though: how good do you expect the contamination control of any facility to be? If the only contamination present in your sequence is that of the contig you describe, that would put you at less than 20 parts per million. Given that all second generation sequencers have PCR as part of their work flow what does it take to prevent residual amplicon levels to get that high? Will using plug seal pippette tips and keeping post and pre-PCR areas separate be sufficient? Or do we need clean room level measures?

            --
            Phillip

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-27-2024, 06:37 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-27-2024, 06:07 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X