Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • novel species discovery metagenomics

    hello everybody..

    I have two environmental bacteria data sequenced on Illumina for metagenomics. I have already done the taxonomic content of species in the sample. Now I want to find out any novel species, can somebody please suggest me some bioinformatics tools for that


    Can you please help..

    I appreciate your help

    Christopher

  • #2
    Can you be more specific about what type of of data you have (16S?) and what you have already done?

    Comment


    • #3
      Dear Mark,

      Its genomic data and I've already done taxonomic classification of the species that are present in the sample. using MEGAN. I have mapped reads to all bacteria NCBI database using bowtie, imported that sam file in MEGAN and got a nice tree view. but now I want to find out the novel species that have already been sequenced, from the reads which have not been aligned at all.

      Comment


      • #4
        Well, given your approach (bowtie vs nucleotide database) it seems likely that your hits should be very close matches. To see the next tier of taxonomic relatedness you might try aligning you reads using blast (or another such tool) to do translated searches against a comprehensive protein database. Note that when you do this and examine the taxonomic assignments made by MEGAN, the hits identified are often significant yet still far from exact (much more so than when using bowtie) thus implying the presence of potentially novel species.

        Comment


        • #5
          Im sorry Mark if Im wrong since Im new in metagenomics, but as far as I understand, if its a meta-transcriptome data then I should use tblastx and sear against nr database, right? what I feel is, this is genomic data, so matching similarity with nt database would solve the purpose..

          and I tried doing standalone blast as well, but i have tremendous number of reads, 18 million paired end illumina reads, 36 million in total, so blast ran for four days and still running so I had to stop it and then I opted for bowtie2. I am confident that this is not a memory problem since I am running it on cluster which has more than 210 GB ram..

          I'm truly thankful to your replies.

          Best,
          Christopher

          Comment


          • #6
            Hi Chris

            Actually, you would use blastx vs a protein database. tblastx is where both the query and the subject are translated and searched in protein space. This might also work but is even more computationally demanding than blastx.
            I think you probably do want to search in protein space as it is more sensitive since amino acid sequence evolves more slowly than nucleotide sequence.

            Yes, running a tool like blast on that much NGS data is burdensome unless you have prolonged access to a large cluster. One alternative that would still allow you to search in protein space is rapsearch2. It achieves 50-100X speedups over blastx with only limited loss in sensitivity. Parallelizing its execution may provide you with the speed you need to get the job done.

            Mark

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            57 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            45 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X