Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Many alignments from unknown genome

    We've just had a set of 3 samples sequenced using an Illumina GA with 76bp single-end reads, with one sample per lane. Each lane contains around 15 million reads, with chastity enforced.

    I aligned the samples with bowtie using the "--solexa1.3-quals" option. Very few of the reads align against the source organism for the samples (mouse): 13%, 3% and 3% respectively. I also tried MAQ, which was no better. Our conclusion is that there is something wrong with the samples; now we just need to identify that problem. In particular, the mysterious source of the large number of remaining high quality reads that are not from mouse.

    One possibility is that the samples are contaminated with another genome. Is it possible to check a read against a variety of genomes to see where it came from? A quick Google search reveals that GenomeMapper has this feature: is this a good approach?

    Can anybody suggest any other techniques for tackling this problem?

    Thanks,

    Peter

  • #2
    I had a similar problem recently,
    and assembled the non-aligning reads with ABYSS,
    and blasted the resulting longer fragments - found an Acinetobacter!

    Comment


    • #3
      Originally posted by ffinkernagel View Post
      I had a similar problem recently,
      and assembled the non-aligning reads with ABYSS,
      and blasted the resulting longer fragments - found an Acinetobacter!
      OK - that's really interesting. We had exactly the same thing. Assembled with velvet and blasted the longest contig and got an ~85% match to acinetobacter.

      I'm guessing that the chances of two labs having the same source of contamination is pretty unlikely so it's probably something else.

      We checked our samples for complexity and there weren't many repeated reads so it's not something simple like a vector or a primer. Ours was a ChIP-Seq experiment. We checked against Ecoli and every other species used in that lab and it wasn't any of those.

      Could there be something in the protocols which could contain this DNA? We were wondering if it could be a blocking reagent or something like that, but the lab concerned assured us that they didn't use anything like that.

      It must be something common to these different experiments and I'd love to get to the bottom of this. We lost about 3 lanes this way so far!

      Comment


      • #4
        Same here, less than 50% aligning with different aligners.
        I heard rumors about the usage of D. melanogaster or S. salar DNA in some epigenomics experiments... That wasn't my case.
        Assembled unmapped reads with velvet and found only 2% which give interesting contigs. Turned out that was an expression vector.
        Still 7 milions reads don't align. I'm scared by the possibility of a strongly mutated clone... I'm trying bfast to align, it should be more tolerant to these situations.

        Comment


        • #5
          @Simon: we've been pcr-ing around to find where the contamination occurred, so far I don't have a definite answer back.
          Since it was also a ChIP-Seq experiment, maybe we should check for common reagents (Especially Protain-A and protein-G - the literature lead me to believe they might be produced with an acetinobacter..)

          @dawe: yes, there's often salmon or herring's sperm DNA used to prevent unspecific interaction.
          Assembly itself is also a bit of an art, I'd play around with the kmer's etc and see if you find something more interesting (also consider pooling the unmatched samples of the three experiments).

          Comment


          • #6
            Originally posted by ffinkernagel View Post
            Since it was also a ChIP-Seq experiment, maybe we should check for common reagents (Especially Protain-A and protein-G - the literature lead me to believe they might be produced with an acetinobacter..)
            That's a really good call. It looks like OmpA used on some affinity beads does indeed come from an Acinetobacter strain. I suppose the question then is whether we have significant amounts of DNA contamination in the beads, or if we're getting nothing coming off the column and we're just PCRing up the tiny amounts of contamination which are always present.

            I'll check on the source of the beads we used to see if this holds up, but it certainly makes sense.

            Comment


            • #7
              In fact, it looks like we may have had a problem in sample preparation. Our DNA fragments were too short so in many cases we were sequencing into the adaptor. Clipping the reads back to 54bp improved the number of alignments considerably.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Genetic Variation in Immunogenetics and Antibody Diversity
                by seqadmin



                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                11-06-2024, 07:24 PM
              • seqadmin
                Choosing Between NGS and qPCR
                by seqadmin



                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                10-18-2024, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 11-08-2024, 11:09 AM
              0 responses
              211 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-08-2024, 06:13 AM
              0 responses
              156 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-01-2024, 06:09 AM
              0 responses
              80 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-30-2024, 05:31 AM
              0 responses
              27 views
              0 likes
              Last Post seqadmin  
              Working...
              X