Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Thanks Boetsie!

    Cheers,
    Ricardo

    Originally posted by boetsie View Post
    Hi Ricardo,

    look at this post where colindaven suggests how to fix the problem;

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    Simply chmod a+x all directories of SSPACE.

    Regards,
    Boetsie

    Comment


    • Hi Boetsie

      I'm trying to scaffold a set of contigs from a bacterial genome assembly. Before scaffolding, there were no non-ACGTN bases in my assembly, but after scaffolding with SSPACE, there were. Can you please let me know what is causing this, and if there is an option to turn it off?

      I have already set the -x parameter to 0 to turn off extension.

      Thanks.

      Comment


      • Hi mht,

        SSPACE only adds 'n' or 'N' characters to the assembly, so it would be strange if there are other characters included after scaffolding. Could you please show me an example of what non-ACGTN characters there are included?

        Regards,
        Boetsie

        Originally posted by mht View Post
        Hi Boetsie

        I'm trying to scaffold a set of contigs from a bacterial genome assembly. Before scaffolding, there were no non-ACGTN bases in my assembly, but after scaffolding with SSPACE, there were. Can you please let me know what is causing this, and if there is an option to turn it off?

        I have already set the -x parameter to 0 to turn off extension.

        Thanks.

        Comment


        • oops boetsie, my bad. they were lower-case ACGTN characters. I used Velvet as my assembler so the lower case acgts were from there. What is the difference between 'n' and 'N' characters in SSPACE?

          Originally posted by boetsie View Post
          Hi mht,

          SSPACE only adds 'n' or 'N' characters to the assembly, so it would be strange if there are other characters included after scaffolding. Could you please show me an example of what non-ACGTN characters there are included?

          Regards,
          Boetsie

          Comment


          • It will generate a ‘n’ if a negative gap was found, meaning that there is potential overlap between the contigs but SSPACE could not find a full overlap.

            It will generate a lower-case ‘acgt’ if there is actually an overlap found, e.g.;

            Ctg1: AGTAGATAGATGATCGCGCTGA
            Ctg2:.............ATCGCGCTGAAGTAGATAGATGAGATCGAC


            Will be;
            AGTAGATAGATGatcgcgctgaAGTAGATAGATGAGATCGAC

            Regards,
            Boetsie

            Comment


            • for the TAB delimited format like:

              <contig1> <startpos_on_contig1> <endpos_on_contig1> <contig2> <startpos_on_contig2> <endpos_on_contig2>

              E.g.
              contig1 100 150 contig1 350 300
              contig1 4000 4050 contig2 110 60

              if startpos greater than endpos means the reads mapped on to the - strand


              I map my BAC end reads on the congits by BLAT , how do I contain the strand information in my TAB files?

              Comment


              • Hi,

                Using SSPACE, will it always be better to do contig extension prior to scaffolding? And do I do extension with both paired end and single end reads, or just paired end?

                Thanks.

                Comment


                • I'm not really sure what you mean. You could just add your region of alignment in the tab-file, e.g. if the BAC aligns from contig 1 at position 1000-3000 and at contig 2 at position 4000-2000 (so reverse), you can just add this info:

                  contig1 1000 3000 contig2 4000 2000

                  SSPACE can only handle links between two contigs, so if a BAC aligns on multiple contigs you have to split it so you only have only a contig-contig link, instead of contig-contig-contig.

                  Regards,
                  Boetsie


                  Originally posted by biocomfun View Post
                  for the TAB delimited format like:

                  <contig1> <startpos_on_contig1> <endpos_on_contig1> <contig2> <startpos_on_contig2> <endpos_on_contig2>

                  E.g.
                  contig1 100 150 contig1 350 300
                  contig1 4000 4050 contig2 110 60

                  if startpos greater than endpos means the reads mapped on to the - strand


                  I map my BAC end reads on the congits by BLAT , how do I contain the strand information in my TAB files?

                  Comment


                  • Hi,

                    I can't really judge that, since it depends on what you think is 'better'. Anyway, if you have a nice draft assembly, I would not use the contig extension option, main reason is that it is a time and memory-consuming process. Our current strategy is to to use SSPACE for generating the scaffolds followed by our tool GapFiller to close the gaps (N's) produced by SSPACE. GapFiller uses local information from the paired-read data for the extension, instead of all the unaligned reads. This extension is much faster and more reliable.

                    Regards,
                    Boetsie

                    Originally posted by mht View Post
                    Hi,

                    Using SSPACE, will it always be better to do contig extension prior to scaffolding? And do I do extension with both paired end and single end reads, or just paired end?

                    Thanks.

                    Comment


                    • hi,
                      I have a question.I have some single-end 454 data, how would the SSPACE run if I artificially make it a pair-end data whose sequence of the other side is all "NNNNNNNNNN"?

                      Comment


                      • No, this won't work, since both reads of a pair should be mapped along the contigs. You better make paired-end data by splitting the reads. For example if your read is 200 long, you can make a paired-end read of the first 100bp and the last 100bp. Specify your insert size as 200bp. I've never done this, but I think this could work.

                        Regards,
                        Boetsie

                        Originally posted by sheepyuan View Post
                        hi,
                        I have a question.I have some single-end 454 data, how would the SSPACE run if I artificially make it a pair-end data whose sequence of the other side is all "NNNNNNNNNN"?

                        Comment


                        • Originally posted by boetsie View Post
                          No, this won't work, since both reads of a pair should be mapped along the contigs. You better make paired-end data by splitting the reads. For example if your read is 200 long, you can make a paired-end read of the first 100bp and the last 100bp. Specify your insert size as 200bp. I've never done this, but I think this could work.

                          Regards,
                          Boetsie
                          Thank you very much, I'll try your method of splitting the read!

                          Comment


                          • SSPACE combining cDNA and PE/MP

                            Hi all,

                            I'm using SSPACE with a wealth of data, from small PE libraries up to 20kb and 40kb mate pair libraries. In addition, I have three lanes of 2x100nt RNAseq which I'm curious if could be incorporated. My genome is highly repetitive (70%) , so I'm hoping that the more gene space sequence, the better.

                            I've seen the nematode paper where RNApath was used to scaffold a genome with RNAseq reads, but has anyone successfully used cDNA + PE/MP WGS data in SSPACE? There are some obvious considerations with splicing, but perhaps the plus/minus insert size error can take this into account?

                            Thanks,
                            Alex
                            Last edited by aharkess; 01-15-2013, 10:50 AM.
                            ==========
                            Alex Harkess
                            Leebens-Mack Lab
                            Plant Biology Department
                            University of Georgia, Athens GA

                            Comment


                            • Hello, have you use SSPACE for scaffolding your genome using RNA-seq data? How did you determine your insert size data?Thanks.
                              Originally posted by aharkess View Post
                              Hi all,

                              I'm using SSPACE with a wealth of data, from small PE libraries up to 20kb and 40kb mate pair libraries. In addition, I have three lanes of 2x100nt RNAseq which I'm curious if could be incorporated. My genome is highly repetitive (70%) , so I'm hoping that the more gene space sequence, the better.

                              I've seen the nematode paper where RNApath was used to scaffold a genome with RNAseq reads, but has anyone successfully used cDNA + PE/MP WGS data in SSPACE? There are some obvious considerations with splicing, but perhaps the plus/minus insert size error can take this into account?

                              Thanks,
                              Alex

                              Comment


                              • Hi!

                                I am getting very good results with SSPACE Boetsie, which I plan to use forward with GapFiller.
                                I have a bunch of questions though, but the one more important now is about the foundlinks files.

                                I am sure I am missing the true naming convention of the foundlinks file (I mean, r1 f1 does mean contig1 in formattedcontigs file, and so on?). Any light on this please?

                                If the question it is not well understood, read below (if it is, skip it)

                                I have done several SSPACE runs over Velvet generated contigs, arranged in different fasta inputs:
                                - 1: contigs 1,3,4,6,7
                                - 2: contigs 2,3,4,6
                                - 3: contigs 1,2,4,6

                                I use SSPACE with two read libraries, in two runs. The first one with both libraries, the second one with the bigger insert size library. Both runs are free of scaffolds correct ones, and then I inspect the links. However, in the run1.big_insert_lib.foundlinks I have the same links than in run2.big_insert_lib.foundlinks, but I am not able to associate them to the same contigs, using the formattedcontigs file for name translation. (the question above

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Exploring the Dynamics of the Tumor Microenvironment
                                  by seqadmin




                                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                  07-08-2024, 03:19 PM
                                • seqadmin
                                  Exploring Human Diversity Through Large-Scale Omics
                                  by seqadmin


                                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                  06-25-2024, 06:43 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 07-10-2024, 07:30 AM
                                0 responses
                                26 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-03-2024, 09:45 AM
                                0 responses
                                201 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-03-2024, 08:54 AM
                                0 responses
                                212 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-02-2024, 03:00 PM
                                0 responses
                                193 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X