Seqanswers Leaderboard Ad

**mchizar** · 02-06-2014, 01:53 PM

I've worked with many different metagenomic and metatranscriptomic data sets with varying degrees of host contamination - anywhere from 10% to 99.9% of the reads were host sequence. If you are looking for sequences from organisms with sequenced genomes, you can use to align/sort the sequence data to the reference genomes. If you are looking for novel sequences, then it is essential to remove the host sequence data and then de novo assemble the remaining sequences. DNASTAR's SeqMan NGen provides a metagenomic worflow for both fully templated and a de novo approach from removing host sequences from a meta genomic/transcriptome sample.

**harlequin** · 02-07-2014, 01:16 AM

Hi, thank you for your response. So according to your experiences host-specific filtering is needed in metatranscriptomics... Could you perhaps provide any pointers to a free and open source package that is capable of identifying (eukaryote) host contamination in a __metatranscriptomics__ dataset? My concern is that a huge portion of eukaryote mRNA are produced after splicing, and therefore the (contamination) filtering package should be able to take into account (when mapping to the host genome) this very property of the contaminant mRNA. Any input is appreciated.

PS: I will not be following a workflow that includes assembly.

Originally posted by mchizar View Post

I've worked with many different metagenomic and metatranscriptomic data sets with varying degrees of host contamination - anywhere from 10% to 99.9% of the reads were host sequence. If you are looking for sequences from organisms with sequenced genomes, you can use to align/sort the sequence data to the reference genomes. If you are looking for novel sequences, then it is essential to remove the host sequence data and then de novo assemble the remaining sequences. DNASTAR's SeqMan NGen provides a metagenomic worflow for both fully templated and a de novo approach from removing host sequences from a meta genomic/transcriptome sample.

**mchizar** · 02-07-2014, 09:01 AM

I doubt there is an open source package that would do this because you need three very different assemblers: 1) an aligner for the transcriptome seq data that can efficiently handle reads that map to adjacent exons 2) a metagenomic sorter that can align reads to a database of similar reference sequences 3) a de novo assembler to identify novel regions. I think TopHat would handle the first assembler, and there are several de novo assemblers available, like Velvet, but I do not know what open source metagenomic aligners are out there. In any event, learning multiple open source assemblers will take considerable time and effort.

Topics	Statistics	Last Post
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 12 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 16 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 22 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM

Seqanswers Leaderboard Ad

Announcement

Metatranscriptomics and residual human sequences

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News