Hi,
I am assembling a transcriptome for a Drosophila species without a reference genome (my species diverged from the most closely related with genome about 15 mya). I used Trinity for the assembly, which constructed over 65K components (which I assume is sort of like a gene). I'm guessing that a lot of the sequences are from non target species (e.g. bacteria, yeasts, cactus) as larvae were taken directly from their food source. Is there an easy way to identify and get rid of the bulk of the transcripts that come from non target species (e.g. using BLAST or something else)? All trinity transcripts are currently in FASTA format. I'm not particularly savvy with bioinformatics, so I'm sure if there is an easy pipeline I could use? Thanks!
I am assembling a transcriptome for a Drosophila species without a reference genome (my species diverged from the most closely related with genome about 15 mya). I used Trinity for the assembly, which constructed over 65K components (which I assume is sort of like a gene). I'm guessing that a lot of the sequences are from non target species (e.g. bacteria, yeasts, cactus) as larvae were taken directly from their food source. Is there an easy way to identify and get rid of the bulk of the transcripts that come from non target species (e.g. using BLAST or something else)? All trinity transcripts are currently in FASTA format. I'm not particularly savvy with bioinformatics, so I'm sure if there is an easy pipeline I could use? Thanks!
Comment