Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Denovo Hybrid Assembly using 454/illumina

    Hi,

    I have managed to get a reasonable assembly of a bacterial genome using newbler based on 454 NG sequence data.

    I have also available paired end reads for the same bacterial genome and I am wondering is there a way to improve the contigs and scaffolding.

    I read about AMOS-hybrid, http://www.biomedcentral.com/1471-2164/11/242, but it requires mates file, which I do not have available instead I have one sequence file for each of the paired ends.

    I also thought to try mira but it also requires mates files. Should I request the vender for mates file or there are other software available that can use contigs from my newbler run and raw paired end files and their qualities?

    I hope other people may have gone through hybrid assembly issue already and may be of help.

    Regards,

    Intikhab

  • #2
    I would refer you to the BAMBUS manual where you find a description of the mates format:

    http://sourceforge.net/apps/mediawik...he_.mates_file

    You may also find an example in the test folder of AMOS-Hybrid.
    library fiveK 4000 6000 (r).*
    pair (.*)\.1 (.*)\.2
    The first row describes the library information and the second row contains mate-pair relationships. You may create your own mate-file based on information your own library and read structure.

    Comment


    • #3
      Originally posted by intikhab View Post
      Hi,

      I have managed to get a reasonable assembly of a bacterial genome using newbler based on 454 NG sequence data.

      I have also available paired end reads for the same bacterial genome and I am wondering is there a way to improve the contigs and scaffolding.
      If you wanted to use Mira, there are instructions on how to directly use Mira with Bambus:

      Comment


      • #4
        About the mates file, if I have the following description of the paired ends reads:

        >NG-5247_HLP_3_1_1058_5238/1
        ATGATGCATCTGAGACGATNGGTGTAAACGATCGT

        >NG-5247_HLP_3_1_1058_5238/2
        ACAAGATATATACGTATTATTAGGGTCAATAATGG

        How should the mates file look a like?:

        library NG-5247 150 200 (^NG).*
        pair (.*)\/1 (.*)\/2


        One related question. When we have two separate files for paired end reads, how can one provide these to mira or AMOS-Hybrid, these programs require one file. Does, jut combining the two e.g. using cat a b >c should do?

        Intikhab

        Originally posted by andreas.sjodin View Post
        I would refer you to the BAMBUS manual where you find a description of the mates format:

        http://sourceforge.net/apps/mediawik...he_.mates_file

        You may also find an example in the test folder of AMOS-Hybrid.


        The first row describes the library information and the second row contains mate-pair relationships. You may create your own mate-file based on information your own library and read structure.

        Comment


        • #5
          I suggest checking MIRA again, I think that you are in error in thinking that it requires mates files. I have used MIRA with good results doing hybrid assemblies of single end read 454 and illumina data. See the thread: "Combining 454FLX and SOLiD runs for de novo genome assembly", I outlined the process that I used there.
          SBB

          Comment


          • #6
            Here I am using AMOS-Hybrid, where mates file is required. For mira I know we dont need mates file, what we need there is a caf file from an initial assembly and sequence file from the second technology.

            Can anybody correct me on the mates file, mentioned above. When I use this, the error log shows a large number of reads got excluded.

            I want to know specifically the regular expression for the library and for the forward and reverse reads, I have mentioned the description of forward and reverse reads above.

            Regards,

            Intikhab

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Working...
            X