Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • filtering out human seqs from metagenomic reads

    What's considered to be the best tool for this -- removing human sequences from large sets of metagenomic next-gen reads? We tried BMTagger at default values , on a set of ~200 million 100nt Illumina reads, and it left in a lot of reads that hit human seqs with high confidence in subsequent blastn search vs. the NCBI nt database.

  • #2
    What about bowtie2 against the human genome? They even have prebuilt indexes available. Blastn of over 100M reads against nt sounds rather wasteful use of computing resources..
    savetherhino.org

    Comment


    • #3
      Yes, it's far from good. But that's how many were left in our metagenomic set after filtering out short reads, duplicate reads, and (via BMTagger) human reads. So we''d like to try a better human read remover, to help insure that the final read set for downstream analysis (e.g. blastn) is all nonhuman. And smaller.

      Comment


      • #4
        Originally posted by ssully View Post
        Yes, it's far from good. But that's how many were left in our metagenomic set after filtering out short reads, duplicate reads, and (via BMTagger) human reads. So we''d like to try a better human read remover, to help insure that the final read set for downstream analysis (e.g. blastn) is all nonhuman. And smaller.
        If I were you, I'd do trimming, bowtie2 against the human genome, assembly, and then blasts. Although for certain things like species distribution, assembly tends to introduce rather big bias (in my experience it increases the apparent presence of the most common taxa).

        p.s. If you have human reads, you probably have other contaminants too, like bacteria from human skin among other stuff. Keep that in mind especially if your contamination rate is high..
        Last edited by rhinoceros; 08-29-2013, 12:51 PM.
        savetherhino.org

        Comment


        • #5
          We don't want to do assembly, because our main goal is to interrogate the diversity of taxa in our samples. We've done quality score filtering, length filtering, adapter trimming, duplicate removal - more vigorous quality trimming may be detrimental to uncovering diversity according to this study

          We are studying a surface microbiome that humans interact with, so we don't mind skin bacteria; we want to catalog those, as well as any eukaryotic seqs. We don't even 'mind' the human sequences, it's just that their numbers make the seq files very large, so we want to split them out and treat human/nonhuman sets separately.
          Last edited by ssully; 08-29-2013, 01:49 PM.

          Comment


          • #6
            Perhaps one of these would be useful:



            Comment


            • #7
              deconseq?...i haven't used it for anything larger than microbial genomes, but it works fairly well.

              Comment


              • #8
                Great help!

                These links are quite good options!

                Thanks!

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin


                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                  Yesterday, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                50 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                43 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Working...
                X