Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Transcriptome data to improve genome assembly

    Hey guys,

    I need to assemble a genome de novo. Unfortunately I got Illumina SE50bp data (I know, worst case...). Coverage is approx. 20 - 25 so not as bad as it could be. However further on i got some transcritome data from the same organism which is also Illumina PE95bp reads. Now I am thinking of using this transcriptome data to improve the de novo genome assembly.

    As I am thinking of it a first shot workflow would look like:

    Assemble SE50bp DNA reads with SoapDenovo into contigs.
    Assemble PE95bp cDNA reads with SoapDenovo-trans (to stay in a closed form, say) into contigs.

    Try to improve contigs from DNA-seq with RNA-seq conitigs via CAP3, for instance.

    But the problem i got is, does it make sense at all to try. Sure more data isn't bad at all but the RNA-seq is not mentioned to be support data at all.

    What's your opinion?



    Thanks in advance for your post.


    Best,


    Phil

  • #2
    Sounds like a reasonable idea - I would try it.

    Why not use a scaffolder like SSPACE to attempt to bridge DNA contigs with the transcriptome derived PE reads ?

    Good luck.

    Comment


    • #3
      Didn't have that in mind... I will definitely give it a try!
      Thank you!

      Comment


      • #4
        Your transcript contigs will contain many cases of spliced exons, including alternative splicing. How will you handle such cases if you throw both transcripts and genomic contigs into CAP3 together?

        Comment


        • #5
          CAP3 isnt the right choice i guess. SSPACE seems to be more promising since it tries to elongate the given DNA contigs with RNA reads, contigs i.e..
          However, using CAP3 with the given quality trimming, strand specific and similarity score adjustments should be able to improve the contigs at least somewhat.
          Through the minimal overlap score one should be able to define splice sites in a way to filter those regions. Adding the similarity score of the contig itself should than be enough to verifiy wether the 'rna-contig' fits the 'dna-contig' or not. The only problem I see is the way how to recalculate alternative splice sites. BUT since it is a 'very simple' organism this event should not take to much effort into account!
          Fortunatelly it is not human...

          I will compare the two approaches if anyone is interested...
          Last edited by sphil; 01-09-2012, 05:51 AM.

          Comment


          • #6
            RNA-Seq has been used successfully to scaffold a nematode genome (see http://www.ncbi.nlm.nih.gov/pubmed/20980554). You might want to consider adapting their pipeline for your data.

            Comment


            • #7
              Thanks a lot for the hint!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              31 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              33 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Working...
              X