Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • novel species discovery metagenomics

    hello everybody..

    I have two environmental bacteria data sequenced on Illumina for metagenomics. I have already done the taxonomic content of species in the sample. Now I want to find out any novel species, can somebody please suggest me some bioinformatics tools for that


    Can you please help..

    I appreciate your help

    Christopher

  • #2
    Can you be more specific about what type of of data you have (16S?) and what you have already done?

    Comment


    • #3
      Dear Mark,

      Its genomic data and I've already done taxonomic classification of the species that are present in the sample. using MEGAN. I have mapped reads to all bacteria NCBI database using bowtie, imported that sam file in MEGAN and got a nice tree view. but now I want to find out the novel species that have already been sequenced, from the reads which have not been aligned at all.

      Comment


      • #4
        Well, given your approach (bowtie vs nucleotide database) it seems likely that your hits should be very close matches. To see the next tier of taxonomic relatedness you might try aligning you reads using blast (or another such tool) to do translated searches against a comprehensive protein database. Note that when you do this and examine the taxonomic assignments made by MEGAN, the hits identified are often significant yet still far from exact (much more so than when using bowtie) thus implying the presence of potentially novel species.

        Comment


        • #5
          Im sorry Mark if Im wrong since Im new in metagenomics, but as far as I understand, if its a meta-transcriptome data then I should use tblastx and sear against nr database, right? what I feel is, this is genomic data, so matching similarity with nt database would solve the purpose..

          and I tried doing standalone blast as well, but i have tremendous number of reads, 18 million paired end illumina reads, 36 million in total, so blast ran for four days and still running so I had to stop it and then I opted for bowtie2. I am confident that this is not a memory problem since I am running it on cluster which has more than 210 GB ram..

          I'm truly thankful to your replies.

          Best,
          Christopher

          Comment


          • #6
            Hi Chris

            Actually, you would use blastx vs a protein database. tblastx is where both the query and the subject are translated and searched in protein space. This might also work but is even more computationally demanding than blastx.
            I think you probably do want to search in protein space as it is more sensitive since amino acid sequence evolves more slowly than nucleotide sequence.

            Yes, running a tool like blast on that much NGS data is burdensome unless you have prolonged access to a large cluster. One alternative that would still allow you to search in protein space is rapsearch2. It achieves 50-100X speedups over blastx with only limited loss in sensitivity. Parallelizing its execution may provide you with the speed you need to get the job done.

            Mark

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Genetic Variation in Immunogenetics and Antibody Diversity
              by seqadmin



              The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
              11-06-2024, 07:24 PM
            • seqadmin
              Choosing Between NGS and qPCR
              by seqadmin



              Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
              10-18-2024, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 11:09 AM
            0 responses
            23 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Today, 06:13 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 11-01-2024, 06:09 AM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-30-2024, 05:31 AM
            0 responses
            21 views
            0 likes
            Last Post seqadmin  
            Working...
            X