Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • filtering out reads from abundant transcripts before using velevt

    Hi everybody,
    My situation is as following: We did a whole Solexa flow cell RNA-seq from a non model organism (so no refrence genome) and got 225 Million paired end reads (so a 112 Million reads for each direction). Additionally we did a 454 sequencing (also mRNA) of the same organism and got ~8000 contigs from that (many of these are very highly abundant in the tissue I am looking at). Additionally I have to say that a substantial amount of reads matches to ribosomal RNA (you can not get rid of these in the experimental protocol is a problem of this organism).
    From this I want to assemble the transcriptome using velvet (or abyss).
    As discussed here this many reads are difficult to handle in velvet (alone hashing the reads in velvet with 16 CPU's takes forever). The other problem is the very non-random coverage of the transcriptome (as discussed here as well) which is in my case pretty extreme (the tissue we look at is highly specialized). So I was thinking of the following strategy: Aligning all the short reads to the ribosomal RNA and the abundant transcripts from the 454 sequences with bowtie and then filtering out these reads before I start using velvet. In this way I would reduce complexity for velvet and at the same time get a more even coverage (yes I do not assemble or correct the abundant transcripts I have but this is not my focus right now). So far so good but my problem is that I don't know exactly how to do that. Bowtie gives me a SAM file which I would now have to compare with the Solexa reads and take out the ones that gave a match in bowtie. Since I want to use paired ends to resolve errors in velvet I need to keep the right order in the file for velvet (I used the shufflesequence algorythm coming with the velvet package) and also if a read aligns in bowtie I have to filter out this one and the corresponding mate pair. Does anybody know of script of some kind or can give me any advice on this. I found in this forum a post describing the filtering of ribosomal reads with the grep -v command but I am not sure this will work with amount of data.
    I would appreciate any help you could give me with this.
    Thanks for reading this long post.
    Marco

  • #2
    Hello,

    I've never done any of this, but I would probably start by checking this beta software: http://www.ebi.ac.uk/~zerbino/oases/

    Greetings,
    Leonardo
    L. Collado Torres, Ph.D. student in Biostatistics.

    Comment


    • #3
      still too many reads

      Hi Leonardo,
      Thanks for the link. This is definitely a good idea. The only problem is that the program needs a precomputed file from velvet and I can not run velvet with that many reads. I need to take the reads out which are mapping to abundant transcripts. An alternative I heard about is to first cluster the reads with something like UClust but that seems to me less straight forward.
      I think to take the reads out I need a simple script in Perl comparing the bowtie output with the solexa reads and keep only those reads which are not in the bowtie output.

      All the best,
      Marco

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      47 views
      0 likes
      Last Post seqadmin  
      Working...
      X