Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • qqsmallfrog
    Junior Member
    • Dec 2010
    • 1

    Scaffolding suggestion?

    Hello:

    I'm assembling a genomic region about 10Mb, using data from various platforms. Here are the types of data I have:
    1. Some Sanger sequences of BAC ends and target genes
    2. Single end 454 reads
    3. Single end 50 bp Solexa reads
    4. Paired end 74 bp Solexa reads
    I think my current strategy is to assemble those data separately into 4 pools of contigs. Then I would like to assemble the 4 pools of contigs and then scaffold them together (with the PE Solexa data). There are two strategies:
    A. Assemble those contigs first (with CAP3 or so), and use the Solexa PE reads to help scaffolding the final long contigs.
    B. Assemble the contigs together with all the Solexa PE reads, in software like MIRA, then the scaffolding process is automatically done within MIRA.
    Do people have an idea which one is better? For strategy A to work, I assume I would need to map the Solexa PE reads to the contigs (with software like BWA) and use the mapping information for scaffolding. Do people know of a scaffolding software that could deal with this?

    Thanks,
    Cheng-Ruei Lee
  • flxlex
    Moderator
    • Nov 2008
    • 412

    #2
    If you have a close enough reference genome, you could run different assemblies and 'merge' them using MAIA: http://bioinformatics.oxfordjournals...6/18/i433.full. I haven't used it myself, but it looks very promising! Not what you asked for, but just another idea...

    Comment

    • huma Asif
      Member
      • Oct 2010
      • 53

      #3
      165 scaffold

      Dear All,
      I have sequenced a bacterial genome using solexa
      these days working working on assembly
      I have assembled it using SOAP denovo and have got 164 scaffold
      I am now confused that what must i do with the scaffold . shall i annotate the data i have got or try to improve scaffold with using other assembler
      please help

      Comment

      • jjohnson
        Member
        • Aug 2009
        • 20

        #4
        You could also try running the Celera assembler, which has a built in scaffolder and supports all of the data types you mention. http://j.mp/h7uX9i

        It can have a pretty steep learning curve, but I have found it produces spectacular results. There is excellent help and how to and the team supporting it at the Venter Institute and University of Maryland CBCB are always willing to help out.
        Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio

        Comment

        • jjohnson
          Member
          • Aug 2009
          • 20

          #5
          Originally posted by huma Asif View Post
          Dear All,
          I have sequenced a bacterial genome using solexa
          these days working working on assembly
          I have assembled it using SOAP denovo and have got 164 scaffold
          I am now confused that what must i do with the scaffold . shall i annotate the data i have got or try to improve scaffold with using other assembler
          please help
          Asif,

          This is a decision that is completely up to what the project dictates. You could try another assembler, like Celera, and see if you fill in gaps or produce a better assembly. If you want to annotate the genome, then scaffolding is not the sole important metric. You should look to see what your avg or N50 contig size is, if it is small, then producing good de novo annotation will be hard.
          Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio

          Comment

          • huma Asif
            Member
            • Oct 2010
            • 53

            #6
            N50=70234

            thank you for ur reply
            N50 of my assembly is 70234 .My project demand is just to assemble the data that i have got from illumina and to to figure out plant pathogenic genes.
            If u think that this N50 is nt bad suggest me some online bacterial genome annotation tool .I have tried Glimmer and in output i got some ORF .i want to check what they are or are they complete .
            I have expertise in Chloroplast genomics resequencing projects and have newly started working on bacterial genomics and denovo assembly so confused about how to generate complete sequence from Scaffold .this bacterial genome that i assemble through SOAP denovo shows 9942 gaps .As far as I understand i need to fill these gaps to get complete genome .At present i am not interested in completing the genome so my thought is make a rough map of the this bacteria with gap and see how many genes are covered and what do they code
            please help me with annotation tools to start with this thought

            Comment

            • Mona
              Member
              • Feb 2010
              • 27

              #7
              Hello Huma,
              Have you tried to blast the ORF that you got to check what these ORFs could be?

              Comment

              • huma Asif
                Member
                • Oct 2010
                • 53

                #8
                Yes

                yes I have checked these ORF
                Now I have covered many problems with the help of this Best forum
                I have assembled my genome again with the suggestions i got from this forum
                I have annotated them and and have checked the evolutionary genes and found that the species I am working is Pseudomonas putida.As far as I know it is not plant pathogen but having some virulence genes .these days I am trying to figure out papers on pseudomonas putida and their role in biofilm formation
                I will be obliged if I get any info about these organism from here
                Regards

                Comment

                • waterboy
                  Member
                  • Oct 2010
                  • 14

                  #9
                  how to merge mulitple scaffold files??

                  Hello All,
                  I have two scaffold sequence files obtained from assembly of SOLiD MP and 454 PE paired reads. I would like to build super scaffolds using these two scaffold sequence files with the help of 454 paired information(20kb). please suggest any pipeline/software's for this purpose.

                  Comment

                  • sivasubramani
                    Member
                    • Apr 2011
                    • 13

                    #10
                    Hello waterboy,

                    What is the assembly tool you used to assemble SOLiD MP data...??

                    Thanks,

                    Comment

                    • iaia
                      Junior Member
                      • Apr 2013
                      • 2

                      #11
                      Dear all,
                      I have sequence data in 4 contigs, please could you inform me which program to use to get one FASTA? I have no experience in this filed and please helm me.
                      Thank you in advance

                      Comment

                      • krobison
                        Senior Member
                        • Nov 2007
                        • 734

                        #12
                        Originally posted by iaia View Post
                        Dear all,
                        I have sequence data in 4 contigs, please could you inform me which program to use to get one FASTA? I have no experience in this filed and please helm me.
                        There's no magical solution; it will depend on the data you have & the genome you are studying. The obvious question is how many contigs do you expect to have when you are complete and why? What is the nature of the contigs and how did you generate them?

                        Comment

                        • Mona
                          Member
                          • Feb 2010
                          • 27

                          #13
                          Hi iaia,

                          Do you just want to merge them simply? or you want the proper scaffolding based on the sequence, which contig should come first and which later?

                          Comment

                          • iaia
                            Junior Member
                            • Apr 2013
                            • 2

                            #14
                            Thank you for your reply,
                            yea, I wanted to scaffold them based on the sequence...

                            Comment

                            Latest Articles

                            Collapse

                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              Yesterday, 10:05 AM
                            • SEQadmin2
                              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                              by SEQadmin2


                              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                              Introduction

                              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                              05-22-2026, 06:42 AM
                            • SEQadmin2
                              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                              by SEQadmin2

                              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                              05-06-2026, 09:04 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Yesterday, 12:03 PM
                            0 responses
                            19 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, Yesterday, 11:40 AM
                            0 responses
                            14 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-28-2026, 11:40 AM
                            0 responses
                            29 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-26-2026, 10:12 AM
                            0 responses
                            31 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...