Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filtering out transcripts from non target organism

    Hi,

    I am assembling a transcriptome for a Drosophila species without a reference genome (my species diverged from the most closely related with genome about 15 mya). I used Trinity for the assembly, which constructed over 65K components (which I assume is sort of like a gene). I'm guessing that a lot of the sequences are from non target species (e.g. bacteria, yeasts, cactus) as larvae were taken directly from their food source. Is there an easy way to identify and get rid of the bulk of the transcripts that come from non target species (e.g. using BLAST or something else)? All trinity transcripts are currently in FASTA format. I'm not particularly savvy with bioinformatics, so I'm sure if there is an easy pipeline I could use? Thanks!

  • #2
    you might have assembled chimeric transcripts by using all the reads from different sources. Kind of like a metagenomic assembly, so you might want to read some papers that contain information on handling this sort of data.

    I would set up a 'contaminant' database containing all your non-target species, use a short read mapper (bowtie, bwa) to filter out reads that align to this database (i.e. take only the reads that didn't align to the contaminant database), and rerun trinity with only reads that didn't align.

    Otherwise as you mention you could make a blast database of your non-target sequences, and align your current assembly to it and take only those that did not align. I'm not aware of any pipeline that would automate this. You'll need to take all the component IDs that did align to this non-target database, and then subtract this from your original fasta file. This would best be done in command line, using bash or perl.
    The problem with this method is that there is a chance that your target sequences might also align to this non-target database, so you'll need to decide on some thresholds.

    Comment


    • #3
      Thanks for the quick response!
      I think the difficulty is that I have no idea what the non target organisms are so I don't think I could easily set up a database (it could be anything in rotting cactus). I assume this would be a typical issue with de novo assemblies but I haven't been able to find much information on how people are dealing with it, though I am continuing to look. My main goal is to look at differential expression, but I was hoping to create a transcriptome that is mostly free of contaminants before mapping reads back to it.

      Comment


      • #4
        How close is your target species to D.mel ? Could you alternatively align your reads to this with relaxed parameters, and use those that aligned to do a de novo assembly?

        It also could be the contamination is at a minimal level. You could pick a few possible non-target organisms, and see what % of reads mapped to each, and decide if this is an acceptable level. Plus if your contaminant sequences are quite different to your Drosophila (bacteria vs plant vs fly), the assembler can still do a good job distinguishing and assembling the sequences. I.e. you might not even need to worry about it too much.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X