Unconfigured Ad

**Kennels** · 10-09-2013, 02:47 PM

you might have assembled chimeric transcripts by using all the reads from different sources. Kind of like a metagenomic assembly, so you might want to read some papers that contain information on handling this sort of data.

I would set up a 'contaminant' database containing all your non-target species, use a short read mapper (bowtie, bwa) to filter out reads that align to this database (i.e. take only the reads that didn't align to the contaminant database), and rerun trinity with only reads that didn't align.

Otherwise as you mention you could make a blast database of your non-target sequences, and align your current assembly to it and take only those that did not align. I'm not aware of any pipeline that would automate this. You'll need to take all the component IDs that did align to this non-target database, and then subtract this from your original fasta file. This would best be done in command line, using bash or perl.
The problem with this method is that there is a chance that your target sequences might also align to this non-target database, so you'll need to decide on some thresholds.

**jbono** · 10-09-2013, 03:14 PM

Thanks for the quick response!
I think the difficulty is that I have no idea what the non target organisms are so I don't think I could easily set up a database (it could be anything in rotting cactus). I assume this would be a typical issue with de novo assemblies but I haven't been able to find much information on how people are dealing with it, though I am continuing to look. My main goal is to look at differential expression, but I was hoping to create a transcriptome that is mostly free of contaminants before mapping reads back to it.

**Kennels** · 10-09-2013, 03:25 PM

How close is your target species to D.mel ? Could you alternatively align your reads to this with relaxed parameters, and use those that aligned to do a de novo assembly?

It also could be the contamination is at a minimal level. You could pick a few possible non-target organisms, and see what % of reads mapped to each, and decide if this is an acceptable level. Plus if your contaminant sequences are quite different to your Drosophila (bacteria vs plant vs fly), the assembler can still do a good job distinguishing and assembling the sequences. I.e. you might not even need to worry about it too much.

Topics	Statistics	Last Post
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM

Unconfigured Ad

Filtering out transcripts from non target organism

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News