Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Metatranscriptomics and residual human sequences

    Hi all,

    I'll be working on a large metatranscriptomics (meta-RNAseq) dataset (human host), and I'm not too familiar with the experimental protocol that was followed during the generation of this publicly available dataset (e.g. how rRNA depletion was carried out etc).

    I was wondering if someone with some metatranscriptomics analyses experience could comment on the degree of residual human sequences (mRNA) that he/she found present in the raw reads after sequencing. I realized that depending on the exp. procedures/kits followed, the raw reads can contain >80% rRNA/tRNA of both the host and the associated microbiota, but I'm not too familiar, in case the host is human, how much human mRNA would/could remain.. Can it effectively be ~0? I'd appreciate some comments.

    I did a literature scanning and people almost always are using a custom (mapping) database where the reference sequences are from the organims of interest (excluding the host), which in a way eliminates the need for host-specific pre-filtering of raw/QC'd reads. So it is not clear to me what would happen if the reference database contains host-specific genes.

    Cheers

  • #2
    I've worked with many different metagenomic and metatranscriptomic data sets with varying degrees of host contamination - anywhere from 10% to 99.9% of the reads were host sequence. If you are looking for sequences from organisms with sequenced genomes, you can use to align/sort the sequence data to the reference genomes. If you are looking for novel sequences, then it is essential to remove the host sequence data and then de novo assemble the remaining sequences. DNASTAR's SeqMan NGen provides a metagenomic worflow for both fully templated and a de novo approach from removing host sequences from a meta genomic/transcriptome sample.

    Comment


    • #3
      Hi, thank you for your response. So according to your experiences host-specific filtering is needed in metatranscriptomics... Could you perhaps provide any pointers to a free and open source package that is capable of identifying (eukaryote) host contamination in a __metatranscriptomics__ dataset? My concern is that a huge portion of eukaryote mRNA are produced after splicing, and therefore the (contamination) filtering package should be able to take into account (when mapping to the host genome) this very property of the contaminant mRNA. Any input is appreciated.

      PS: I will not be following a workflow that includes assembly.


      Originally posted by mchizar View Post
      I've worked with many different metagenomic and metatranscriptomic data sets with varying degrees of host contamination - anywhere from 10% to 99.9% of the reads were host sequence. If you are looking for sequences from organisms with sequenced genomes, you can use to align/sort the sequence data to the reference genomes. If you are looking for novel sequences, then it is essential to remove the host sequence data and then de novo assemble the remaining sequences. DNASTAR's SeqMan NGen provides a metagenomic worflow for both fully templated and a de novo approach from removing host sequences from a meta genomic/transcriptome sample.

      Comment


      • #4
        I doubt there is an open source package that would do this because you need three very different assemblers: 1) an aligner for the transcriptome seq data that can efficiently handle reads that map to adjacent exons 2) a metagenomic sorter that can align reads to a database of similar reference sequences 3) a de novo assembler to identify novel regions. I think TopHat would handle the first assembler, and there are several de novo assemblers available, like Velvet, but I do not know what open source metagenomic aligners are out there. In any event, learning multiple open source assemblers will take considerable time and effort.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-25-2024, 11:49 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        62 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        61 views
        0 likes
        Last Post seqadmin  
        Working...
        X