Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filtering OTU table scripts

    Hi All-

    I have some V9 amplicon Miseq and Hiseq metagenomic data that I have been processing in Qiime that I am trying to filter my OTU table to only contain OTU's that are at least 0.01% abundant in a given sample in a data set.

    Searching online I found filter_otus_from_otu_table.py but this removes any OTU that is less than the given threshold within the whole data set. Therefore, the whole OTU is removed from the table. My issue is that an OTU might be below 0.01% in the given whole data set but might be a higher percentage within a certain sample within the data set. Therefore, I am losing wanted/ keeping unwanted information within my OTU table by completely removing an OTU based on the abundance in the data set. Also results might change based on what samples I include in the whole data set.

    I have also found filter_otus_by_sample.py but this is in order to remove certain samples from the OTU table. I do not want to remove a certain sample just want to remove counts of OTUs from a sample that is below a given threshold.

    Does anyone know of a script that will look through a OTU table sample by sample and remove OTU counts for abundances below a certain threshold? (change counts to 0 if below the threshold and remove OTU's that are zero across all samples).

    Thanks in advance

  • #2
    You should have a look on the Qiime forum (https://groups.google.com/forum/#!forum/qiime-forum). They are usually very good at answering queries, and people in the community are always quick to lend a hand

    Comment


    • #3
      yeah I have posted it this question there also. There is no script within Qiime and they do not know of a script that can do what I wish. I figured I would post on here and see if anyone here knows of something. It doesn't need to be a script within Qiime, just something that can look at a tab delimited OTU table and look at abundance on a sample by sample bases.

      Comment


      • #4
        Not sure if you still need help with this question. But I bet that you would able to do this in Phyloseq that is a package in R. It gives you much more control over manipulating your dataset.

        Comment


        • #5
          curious

          You might want to post something in the Qiime help forum. Often times they use this as a suggestion box for new features. I am curious how you resolve this issue. I typically do the same type of filtering and have always done it in excel. It's a clunky way to deal with this kind of information, but will work in a pinch.

          Comment


          • #6
            I would recommend using Phyloseq for additional control over your data. It takes a bit of time to get used to using it, but has a lot more power than QIIME. I think QIIME is great for a lot of the analyses and making figures, etc, but Phyloseq is also helpful. I use both. If you look half way down the page on this link http://joey711.github.io/phyloseq/preprocess

            You will see how to filter your OTU table based on abundance. You would want it to be present in at least 1 sample and abundance 0.001.

            Comment


            • #7
              I've played with Phyloseq and thought it made some really cool ggplot2 type figures(which I like). I also like that you can make some figures that don't look identical to what everyone else is doing with pyrotags and iTags.
              Maybe phyloseq has been updated since I last used it (~1yr ago) but I thought their workflow was not very well fleshed out. The thing I like about Qiime is a big user community and a wealth of experience and literature behind pretty much anything that you choose to do with your data. I seem to remember that you had to tweak the .biom file to get it to go into Phyloseq and would prefer to have a pipeline that could seamlessly move back into Qiime. That said, I think the kind of filtering step noted above would be a great thing to do with Phyloseq especially if you were going to do your abundance visualization within that suite.

              Comment


              • #8
                seems like an easy awk or R question. I would probably do it in R because I would be prepping to make some graphics

                R pseudocode would be something like this assuming the data is in a dataframe and organized by columns:

                sampleSums<-colSums(df.all.samples)
                for (i in 1:ncols(df.all.samples)){
                df.all.samples[,i]<-ifelse(df.all.samples[,i]>0.001*sampleSums[i],df.all.samples[,i],0)
                }

                and assuming all you really want to do is call taxa less than your threshold to be ND=0

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Best Practices for Single-Cell Sequencing Analysis
                  by seqadmin



                  While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                  06-06-2024, 07:15 AM
                • seqadmin
                  Latest Developments in Precision Medicine
                  by seqadmin



                  Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                  Somatic Genomics
                  “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                  05-24-2024, 01:16 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 07:24 AM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 08:58 AM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-12-2024, 02:20 PM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-07-2024, 06:58 AM
                0 responses
                184 views
                0 likes
                Last Post seqadmin  
                Working...
                X