Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to assembe the transcripts for 454 reads?

    Hello! Now, I had downloaded 454 SE reads(~230nt) of a plant from NCBI. But, I don't know how to assemble them to transcripts using mapping to its genome, similar to the mothod that assembles the illumina PE reads using Tophat/cufflink. How can I do? Which tools can I use?
    Thanks!

  • #2
    Depends on the format that you downloaded the reads in. If they are sff then get Newbler from Roche, it is free to researchers, and use GSMapper. If you have it in fasta and are already familiar with tophat/cufflink then just use that.

    Comment


    • #3
      Hi, Jeremy! Thank you very much for your reply!
      The type of my downloaded reads is NCBI SRA(Short Read Archive), which should be FASTQ format, but their length are different, greater or smaller than 230nt.

      At first, I assembled their transcripts using Tophat/cufflink, but failed, which may be leaded to by the type of reads! Tophat calls for the standard FASTQ, "TopHat was designed to work with reads produced by the Illumina Genome Analyzer, although users have been successful in using TopHat with reads from other technologies"!


      Next, I change the 454 FASTQ reads to FASTA in-house script. And directly BLAT them to reference genome and connect the aligned related hits to one transcript using PASA. Unfortunately, most of the "transcripts" are too short, too fragmented!

      So, today, I am trying this: First, denovo-assemble those 454 reads to contigs using TGICL(similar to Newbler) and then map/BLAT to genome to get exon-intron trancripts. Good luck!

      gsMapper - to map reads to a transcriptome or genome reference. Sorry, I am not familar with it, but I guess it is similar with BLAT, Tophat, just is alignment tool and having no assmbling function, right?

      So, I guess there should be a tool to assemble the type of 454 SE reads(FASTQ/FASTA, etc) to transcripts based on their genome sequences, which I think is much more accurated than denovo-assemble.

      I want do one work in different ways and then get the best method!
      Happy for yours' reply! Thanks~
      Last edited by ZHONG Xiao; 11-05-2012, 06:37 AM.

      Comment


      • #4
        Hi Xiao,
        I processed a lot of 454 datasets (mostly fetched from NCBI Short Read Archive). My general recommendation is: cleanup the reads before throwing them into any assembler. The assemblers won't do anything magic on your behalf. Crap in, crap out.

        Second, as you mention transcriptome sequencing (of course, the plants) I fear the adapters used for sample preparation were from Evrogen/Clontech which offer molecular methods for cDNA first strand synthesis, directional or formerly non-directional cloning, and eventually normalization. These datasets have completely different types of issues compared to those made according to Roche protocols. If this procedure was taken in the lab then I am quite certain you will end up with chimeric assemblies. Lookup sequences of MINT/SMART adapters elsewhere and trim the raw reads.

        Ouch, extract the full raw reads from the .sra files and process them through the trimming pipeline. Don't presume the sequence in "high-qual" region is without adapters.

        Finally to say, some people deposited into NCBI SRA somehow trimmed FASTA/Q files. If you go and extract the sequences from .sra files you will end up with sequences in all uppercase letters, giving you the impression they are cleaned up. No. You don't even have to look into the FASTQ into quality values to learn where is a low-qual region. We are talking here about adapters, and sadly, due to lack of appropriate software and knowledge, they often do remain in the "high-qual" region. So do not get fooled that all-uppercase sequence is already cleaned up, and (re)do the work youself. Even worse, realizing what is left uncorrected in a dataset badly processed by somebody else is not an easy task. I hit some cases like that and unavailability of the original, "unprocessed" data is quite unpleasant.

        (I have to admit you will likely fail to do it right -- I saw in about 400 datasets from 454 pyrosequencers so many *different* issues that it will take you a long while to realize and overcome all of them).

        BTW: When you say ~ 230nt long reads .... That is a quality-trimmed read length, right? Were these from prepared by the titanium protocol? Don't expect long assembled transcripts from these, the properly trimmed reads might be in the range between 120-180nt, way too short to reconstruct CDS of even average proteins (in terms of their length).
        Last edited by martin2; 03-04-2014, 11:22 AM. Reason: Typo editing.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        14 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        43 views
        0 likes
        Last Post seqadmin  
        Working...
        X