Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by boetsie View Post
    ... since we do an hierarchical clustering ... for each library we align the reads to the new scaffolds, therefore, no predefined paired sequence alignments could be provided ...
    What you need to do is track the positions of these features from the input contigs onto the output scaffolds to internally generate a new tab-delimited input file with the right coordinates... I tried doing this with BioPerl, but unfortunately got tied in knots with the cryptic class hierarchy.

    In theory it shouldn't be hard to say 'position x on contig y in the input is now position j on scaffold k in the output', and simply run it again for the new library. However, I guess there is quite a bit of complexity to such a code.


    Anyway, just a suggestion for improvement of an already useful tool!

    Cheers,
    Dan.
    Homepage: Dan Bolser
    MetaBase the database of biological databases.

    Comment


    • #17
      Originally posted by dan View Post
      What you need to do is track the positions of these features from the input contigs onto the output scaffolds to internally generate a new tab-delimited input file with the right coordinates... I tried doing this with BioPerl, but unfortunately got tied in knots with the cryptic class hierarchy.

      In theory it shouldn't be hard to say 'position x on contig y in the input is now position j on scaffold k in the output', and simply run it again for the new library. However, I guess there is quite a bit of complexity to such a code.


      Anyway, just a suggestion for improvement of an already useful tool!

      Cheers,
      Dan.
      Hi Dan,

      first of all, thank you for the suggestions and the positive feedback!

      I see what you mean, and i think it is indeed a useful function to allow other input formats. I think as a start it would be nice to allow .sam format inputs.

      About remembering the positions i'm doing quite the same with remembering which contigs are on which scaffolds after each library. I think the same trick could be applied for mapping.
      I'll see what i can do.

      Thanks,
      Boetsie

      Comment


      • #18
        Hi boetsie again,

        I would like to ask you if only unique mapped reads are used for the scaffolding.

        If not, I am planing to mask repeat sequence before scaffolding.

        Thanks,
        Corthay

        Comment


        • #19
          Originally posted by corthay View Post
          Hi boetsie again,

          I would like to ask you if only unique mapped reads are used for the scaffolding.

          If not, I am planing to mask repeat sequence before scaffolding.

          Thanks,
          Corthay
          Hi again

          I indeed use only reads that can uniquely map to only one position on all the contigs. I use the option -m 1 from Bowtie (see; http://bowtie-bio.sourceforge.net/ma...html#reporting). Otherwise, it is impossible to know what link should be made if a read maps to multiple contigs.

          Is this what you mean?

          Kind regards,
          Boetsie

          Comment


          • #20
            Hi boetsie,

            Thanks for your quick reply. I understood how uniqueness is guaranteed.
            Then, I have two more questions please.

            Firstly, I am wondering why the total bases of scaffolds without N is increased even though I set 0 for "-x" option.

            Secondly, how do you calculate the distance of reads within a given contig pair.
            Do you estimate the size of gap using reads, or gap size is just ignored ?

            Sorry for asking so many questions.

            Thanks
            Corthay.


            Originally posted by boetsie View Post
            Hi again

            I indeed use only reads that can uniquely map to only one position on all the contigs. I use the option -m 1 from Bowtie (see; http://bowtie-bio.sourceforge.net/ma...html#reporting). Otherwise, it is impossible to know what link should be made if a read maps to multiple contigs.

            Is this what you mean?

            Kind regards,
            Boetsie

            Comment


            • #21
              Hi Corthay,

              no problem, good that it is clear now

              1)
              Hmmm, that should never be the case. Are you looking at the summary file to conclude that the total bases of scaffolds is increased? Because this value (sum (bp)) is the total number of bases WITH N's. The number of bases without N's should either be the same or less than the original total number of bases, since it tries to merge the contigs if they share -n overlap.

              If you want, i can send you a script which calculates the number of N's in the scaffolds.

              2)
              For estimating the gap, i use the size of gap using reads.

              Kind regards and no problem for the questions ,
              Boetsie

              Originally posted by corthay View Post
              Hi boetsie,

              Thanks for your quick reply. I understood how uniqueness is guaranteed.
              Then, I have two more questions please.

              Firstly, I am wondering why the total bases of scaffolds without N is increased even though I set 0 for "-x" option.

              Secondly, how do you calculate the distance of reads within a given contig pair.
              Do you estimate the size of gap using reads, or gap size is just ignored ?

              Sorry for asking so many questions.

              Thanks
              Corthay.

              Comment


              • #22
                congratulation

                I am using SSPACE and I find this tool very useful and user friendly (not as Bambus!).

                Thanks!

                Comment


                • #23
                  Originally posted by gstitan View Post
                  I am using SSPACE and I find this tool very useful and user friendly (not as Bambus!).

                  Thanks!
                  Thank you for this compliment

                  Comment


                  • #24
                    Hi, boetsie.
                    SSPACE is very good tool for scaffolding. I thanks you for your good job.

                    By the way, How does SSAPCE pronounce? "espeis"?

                    Comment


                    • #25
                      Hi, I'm excited to get SSPACE up and running. Unfortunately I'm getting a permission denial when making the directories (line 141). SSPACE is installed on a server in a directory where I don't have write permissions, which I suspect is the problem. Is there a way to direct where the results folders end up? or is my issue much simpler (and dumber).

                      Comment


                      • #26
                        Hi themwg,

                        good that it is working! Unfortunately, you can't specify where the folders end up. The folder structure is generated in your current working directory. Maybe you can turn the problem around; go to the directory where you would like the files/folders will end up and run the program from there. Then specify the full path to the contigs and also the full paths in the library file for your paired sequences.

                        If this won't work, i'm able to make a customised script for you You can mail me any time.

                        Boetsie

                        Originally posted by themwg View Post
                        Hi, I'm excited to get SSPACE up and running. Unfortunately I'm getting a permission denial when making the directories (line 141). SSPACE is installed on a server in a directory where I don't have write permissions, which I suspect is the problem. Is there a way to direct where the results folders end up? or is my issue much simpler (and dumber).

                        Comment


                        • #27
                          the next problem

                          Thanks Boetsie for the quick reply.
                          Sure enough I get further along if I just direct to SSPACE.pl from my directory. However I hit a second problem during the Reading, filtering and converting input seqs it Can't write to single file. here it is below

                          =>Fri Feb 11 11:55:38 2011: Reading, filtering and converting input sequences of library '/home/carroll/Desktop/data_carroll/SSPACEtests/leo95130_I' initiated
                          Can't write to single file -- fatal

                          =>Fri Feb 11 11:55:38 2011: Storing contigs to format for scaffolding

                          LIBRARY /home/carroll/Desktop/data_carroll/SSPACEtests/leo95130_I
                          ------------------------------------------------------------

                          =>Fri Feb 11 11:55:44 2011: Building Bowtie index for contigs (tmp.standard_output/subset_contigs.fasta)

                          Bowtie-build error; -1 at /opt/SSPACE-1.1_linux-x86_64/bin/mapWithBowtie.pl line 37.
                          WARNING: No scaffolding, because no reads found on contigs

                          I imagine the bowtie build error is related to the first. Any thoughts on why it can't write to the single file (merging the two seq files?). Those files are in fastq format from illumina. They are also both quite large >10GB. My machine has a meager 44GB Ram. IF any of that is at all relevant here.

                          Thanks!

                          Comment


                          • #28
                            Hi again,

                            I think i know what the problem is. You have a library called "/home/carroll/Desktop/data_carroll/SSPACEtests/leo95130_I". This is a very strange name for a library. Name it something like "leo95130_I" or "lib1" (without the quotes though). Now, with your current library name, the script will try to create a file containing this library name in folder 'reads'. It will now be something like;

                            reads/home/carroll/Desktop/data_carroll/SSPACEtests/leo95130_I.filtered.reads

                            This will surely cause problems (as you noticed). The other error you get is probably caused by the same problem, namely your library name.

                            Your library should be something like;

                            library1 /path-to-file/filename_1.fastq /path-to-file/filename_2.fastq 500 0.25 0

                            If you are unable to generate the library, you can mail me your current library file and i can help you.

                            Kind regards,
                            Boetsie

                            Comment


                            • #29
                              Hello, I am running into some problems while using SSPACE. I believe it has to do with tmp.alboxf_scaffolds_no_extension/subset_contigs.fasta not being built properly, so my question is how is subset_contigs.fasta built?

                              Thanks!

                              Comment


                              • #30
                                Hi goldenflaw,

                                what kind of problems are your running into?

                                The file you mention is generated by taking a short subset of the contigs. How this is done, is explained below (and in the README of SSPACE).

                                Before mapping, contigs are shortened, reducing the search space for Bowtie. Only edges of the contigs are considered for mapping. Cutting of edges is determined by taking the maximal allowed distance inserted by the user in the library file (insert size and insert standard deviation). The maximal distance is insert_size + (insert_size * insert_stdev). For example, with a insert size of 500 and a deviation of 0.5, the maximal distance is 750. First 750 bases and last 750 bases are subtracted from the contig sequence, in this case;

                                ------------------------------------------

                                ------------|-----------------|

                                -------------------------------------------
                                750bp------------------------750bp
                                Please do not look at the white stripes in the example. I couldn't get the spacings between the two dashed lines right

                                Kind regards,
                                Boetsie

                                Originally posted by goldenflaw View Post
                                Hello, I am running into some problems while using SSPACE. I believe it has to do with tmp.alboxf_scaffolds_no_extension/subset_contigs.fasta not being built properly, so my question is how is subset_contigs.fasta built?

                                Thanks!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Exploring the Dynamics of the Tumor Microenvironment
                                  by seqadmin




                                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                  07-08-2024, 03:19 PM
                                • seqadmin
                                  Exploring Human Diversity Through Large-Scale Omics
                                  by seqadmin


                                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                  06-25-2024, 06:43 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 07-10-2024, 07:30 AM
                                0 responses
                                30 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-03-2024, 09:45 AM
                                0 responses
                                201 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-03-2024, 08:54 AM
                                0 responses
                                212 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-02-2024, 03:00 PM
                                0 responses
                                194 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X