Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • id0
    Senior Member
    • Sep 2012
    • 130

    Checking reads for contamination

    Is there a tool to check for the source of contamination in sequencing reads? I am looking for something like BLAST, but that would summarize across many reads.

    For example, I have a FASTQ that is supposed to be human. Only 50% of the reads align to human. Where are the other reads coming from?
  • skbrimer
    Member
    • Mar 2014
    • 55

    #2
    maybe this https://github.com/blaxterlab/blobology it looks like a good tool for quick looks.

    Comment

    • id0
      Senior Member
      • Sep 2012
      • 130

      #3
      Originally posted by skbrimer View Post
      maybe this https://github.com/blaxterlab/blobology it looks like a good tool for quick looks.
      It looks like it performs an ABySS assembly. That seems computationally intensive. More importantly, I am not sure how well it would do with dilute samples.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        I suggest that you use BBSplit from BBMap with human reference and then collect the unmapped reads in a separate file for examination.
        Last edited by GenoMax; 06-22-2015, 03:58 PM.

        Comment

        • Brian Bushnell
          Super Moderator
          • Jan 2014
          • 2709

          #5
          Both BBMap and BBDuk (and BBSplit) can output a file indicating the percent and number of reads matching a given sequence, and can do so quickly for large numbers of reads. We run all of our reads through BBDuk for screening against small synthetic contaminants (primers, spike-ins, vectors, etc), and it does a nice job of quantifying their absolute abundance, but it would run out of memory processing a reference as big as nt (I don't normally give BBDuk a reference bigger than 1Gbp or so). If you follow GenoMax's advice, just grab a handful (~1000) of the reads that don't map to human and blast them against nt; hopefully something will turn up.

          Comment

          • NextGenSeq
            Senior Member
            • Apr 2009
            • 482

            #6
            If you barcode check the reads which are not de-multiplexed. Any reads which have a barcode you didn't make the libraries with are contamination.

            Comment

            • SNPsaurus
              Registered Vendor
              • May 2013
              • 525

              #7
              In my sequencing class the students take a cheek swab, do a Nextera prep and get 10M reads. The first exercise is to see what is living in their mouth. So they align to the human reference with Novoalign, then pull out the non-aligners, convert to fasta and submit a blastn job to see which bacteria are in there. Students report a huge increase in flossing frequency after seeing the typical results! The one-liner to find the non-aligners and make a fasta file is:

              cat yourname_vs_hg19.align | grep NM | head -500 | cut -f 3 | awk '{print ">" $1 "\n" $1}'
              This is for Novoalign which reports a 'NM' for non-aligners and has the sequence in column 3. You can modify for other aligners, I think, pretty easily.

              As part of our genotyping of populations we always check 1000 reads from each sample. It often explains some discordant results (lots of reads but low depth at the loci because most the sample is something else!).
              Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

              Comment

              Latest Articles

              Collapse

              • GATTACAT
                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by GATTACAT
                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                Today, 11:43 AM
              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 05:37 AM
              0 responses
              9 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              18 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              52 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              110 views
              0 reactions
              Last Post SEQadmin2  
              Working...