Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filtering OTU table scripts

    Hi All-

    I have some V9 amplicon Miseq and Hiseq metagenomic data that I have been processing in Qiime that I am trying to filter my OTU table to only contain OTU's that are at least 0.01% abundant in a given sample in a data set.

    Searching online I found filter_otus_from_otu_table.py but this removes any OTU that is less than the given threshold within the whole data set. Therefore, the whole OTU is removed from the table. My issue is that an OTU might be below 0.01% in the given whole data set but might be a higher percentage within a certain sample within the data set. Therefore, I am losing wanted/ keeping unwanted information within my OTU table by completely removing an OTU based on the abundance in the data set. Also results might change based on what samples I include in the whole data set.

    I have also found filter_otus_by_sample.py but this is in order to remove certain samples from the OTU table. I do not want to remove a certain sample just want to remove counts of OTUs from a sample that is below a given threshold.

    Does anyone know of a script that will look through a OTU table sample by sample and remove OTU counts for abundances below a certain threshold? (change counts to 0 if below the threshold and remove OTU's that are zero across all samples).

    Thanks in advance

  • #2
    You should have a look on the Qiime forum (https://groups.google.com/forum/#!forum/qiime-forum). They are usually very good at answering queries, and people in the community are always quick to lend a hand

    Comment


    • #3
      yeah I have posted it this question there also. There is no script within Qiime and they do not know of a script that can do what I wish. I figured I would post on here and see if anyone here knows of something. It doesn't need to be a script within Qiime, just something that can look at a tab delimited OTU table and look at abundance on a sample by sample bases.

      Comment


      • #4
        Not sure if you still need help with this question. But I bet that you would able to do this in Phyloseq that is a package in R. It gives you much more control over manipulating your dataset.

        Comment


        • #5
          curious

          You might want to post something in the Qiime help forum. Often times they use this as a suggestion box for new features. I am curious how you resolve this issue. I typically do the same type of filtering and have always done it in excel. It's a clunky way to deal with this kind of information, but will work in a pinch.

          Comment


          • #6
            I would recommend using Phyloseq for additional control over your data. It takes a bit of time to get used to using it, but has a lot more power than QIIME. I think QIIME is great for a lot of the analyses and making figures, etc, but Phyloseq is also helpful. I use both. If you look half way down the page on this link http://joey711.github.io/phyloseq/preprocess

            You will see how to filter your OTU table based on abundance. You would want it to be present in at least 1 sample and abundance 0.001.

            Comment


            • #7
              I've played with Phyloseq and thought it made some really cool ggplot2 type figures(which I like). I also like that you can make some figures that don't look identical to what everyone else is doing with pyrotags and iTags.
              Maybe phyloseq has been updated since I last used it (~1yr ago) but I thought their workflow was not very well fleshed out. The thing I like about Qiime is a big user community and a wealth of experience and literature behind pretty much anything that you choose to do with your data. I seem to remember that you had to tweak the .biom file to get it to go into Phyloseq and would prefer to have a pipeline that could seamlessly move back into Qiime. That said, I think the kind of filtering step noted above would be a great thing to do with Phyloseq especially if you were going to do your abundance visualization within that suite.

              Comment


              • #8
                seems like an easy awk or R question. I would probably do it in R because I would be prepping to make some graphics

                R pseudocode would be something like this assuming the data is in a dataframe and organized by columns:

                sampleSums<-colSums(df.all.samples)
                for (i in 1:ncols(df.all.samples)){
                df.all.samples[,i]<-ifelse(df.all.samples[,i]>0.001*sampleSums[i],df.all.samples[,i],0)
                }

                and assuming all you really want to do is call taxa less than your threshold to be ND=0

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X