Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scaffolding suggestion?

    Hello:

    I'm assembling a genomic region about 10Mb, using data from various platforms. Here are the types of data I have:
    1. Some Sanger sequences of BAC ends and target genes
    2. Single end 454 reads
    3. Single end 50 bp Solexa reads
    4. Paired end 74 bp Solexa reads
    I think my current strategy is to assemble those data separately into 4 pools of contigs. Then I would like to assemble the 4 pools of contigs and then scaffold them together (with the PE Solexa data). There are two strategies:
    A. Assemble those contigs first (with CAP3 or so), and use the Solexa PE reads to help scaffolding the final long contigs.
    B. Assemble the contigs together with all the Solexa PE reads, in software like MIRA, then the scaffolding process is automatically done within MIRA.
    Do people have an idea which one is better? For strategy A to work, I assume I would need to map the Solexa PE reads to the contigs (with software like BWA) and use the mapping information for scaffolding. Do people know of a scaffolding software that could deal with this?

    Thanks,
    Cheng-Ruei Lee

  • #2
    If you have a close enough reference genome, you could run different assemblies and 'merge' them using MAIA: http://bioinformatics.oxfordjournals...6/18/i433.full. I haven't used it myself, but it looks very promising! Not what you asked for, but just another idea...

    Comment


    • #3
      165 scaffold

      Dear All,
      I have sequenced a bacterial genome using solexa
      these days working working on assembly
      I have assembled it using SOAP denovo and have got 164 scaffold
      I am now confused that what must i do with the scaffold . shall i annotate the data i have got or try to improve scaffold with using other assembler
      please help

      Comment


      • #4
        You could also try running the Celera assembler, which has a built in scaffolder and supports all of the data types you mention. http://j.mp/h7uX9i

        It can have a pretty steep learning curve, but I have found it produces spectacular results. There is excellent help and how to and the team supporting it at the Venter Institute and University of Maryland CBCB are always willing to help out.
        Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio

        Comment


        • #5
          Originally posted by huma Asif View Post
          Dear All,
          I have sequenced a bacterial genome using solexa
          these days working working on assembly
          I have assembled it using SOAP denovo and have got 164 scaffold
          I am now confused that what must i do with the scaffold . shall i annotate the data i have got or try to improve scaffold with using other assembler
          please help
          Asif,

          This is a decision that is completely up to what the project dictates. You could try another assembler, like Celera, and see if you fill in gaps or produce a better assembly. If you want to annotate the genome, then scaffolding is not the sole important metric. You should look to see what your avg or N50 contig size is, if it is small, then producing good de novo annotation will be hard.
          Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio

          Comment


          • #6
            N50=70234

            thank you for ur reply
            N50 of my assembly is 70234 .My project demand is just to assemble the data that i have got from illumina and to to figure out plant pathogenic genes.
            If u think that this N50 is nt bad suggest me some online bacterial genome annotation tool .I have tried Glimmer and in output i got some ORF .i want to check what they are or are they complete .
            I have expertise in Chloroplast genomics resequencing projects and have newly started working on bacterial genomics and denovo assembly so confused about how to generate complete sequence from Scaffold .this bacterial genome that i assemble through SOAP denovo shows 9942 gaps .As far as I understand i need to fill these gaps to get complete genome .At present i am not interested in completing the genome so my thought is make a rough map of the this bacteria with gap and see how many genes are covered and what do they code
            please help me with annotation tools to start with this thought

            Comment


            • #7
              Hello Huma,
              Have you tried to blast the ORF that you got to check what these ORFs could be?

              Comment


              • #8
                Yes

                yes I have checked these ORF
                Now I have covered many problems with the help of this Best forum
                I have assembled my genome again with the suggestions i got from this forum
                I have annotated them and and have checked the evolutionary genes and found that the species I am working is Pseudomonas putida.As far as I know it is not plant pathogen but having some virulence genes .these days I am trying to figure out papers on pseudomonas putida and their role in biofilm formation
                I will be obliged if I get any info about these organism from here
                Regards

                Comment


                • #9
                  how to merge mulitple scaffold files??

                  Hello All,
                  I have two scaffold sequence files obtained from assembly of SOLiD MP and 454 PE paired reads. I would like to build super scaffolds using these two scaffold sequence files with the help of 454 paired information(20kb). please suggest any pipeline/software's for this purpose.

                  Comment


                  • #10
                    Hello waterboy,

                    What is the assembly tool you used to assemble SOLiD MP data...??

                    Thanks,

                    Comment


                    • #11
                      Dear all,
                      I have sequence data in 4 contigs, please could you inform me which program to use to get one FASTA? I have no experience in this filed and please helm me.
                      Thank you in advance

                      Comment


                      • #12
                        Originally posted by iaia View Post
                        Dear all,
                        I have sequence data in 4 contigs, please could you inform me which program to use to get one FASTA? I have no experience in this filed and please helm me.
                        There's no magical solution; it will depend on the data you have & the genome you are studying. The obvious question is how many contigs do you expect to have when you are complete and why? What is the nature of the contigs and how did you generate them?

                        Comment


                        • #13
                          Hi iaia,

                          Do you just want to merge them simply? or you want the proper scaffolding based on the sequence, which contig should come first and which later?

                          Comment


                          • #14
                            Thank you for your reply,
                            yea, I wanted to scaffold them based on the sequence...

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Best Practices for Single-Cell Sequencing Analysis
                              by seqadmin



                              While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                              06-06-2024, 07:15 AM
                            • seqadmin
                              Latest Developments in Precision Medicine
                              by seqadmin



                              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                              Somatic Genomics
                              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                              05-24-2024, 01:16 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 06-14-2024, 07:24 AM
                            0 responses
                            12 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 06-13-2024, 08:58 AM
                            0 responses
                            13 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 06-12-2024, 02:20 PM
                            0 responses
                            17 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 06-07-2024, 06:58 AM
                            0 responses
                            184 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X