Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I James,
    Thanks for your quick reply. I have a question regarding the template display. Could you post the colour code for the reads in the display

    regards

    Brian

    Comment


    • #17
      re: libg2c0 on Ubuntu:

      Originally posted by torrey View Post
      Update:
      After installing libg2c0 package from the package manager and then http://packages.debian.org/lenny/amd...dc++5/download everything seems to be up and running on ubuntu.
      On Ubuntu 9.04 (workstation version on i386) simply installing libg2c0 works. Thanks for the tip

      dk

      Comment


      • #18
        @jkbonfield

        Dear James,

        Would it be possible to also put some test database(s) with sequences from various sequencing platforms? Something small just to play with?

        Best,

        dk
        Last edited by darked89; 06-19-2009, 01:46 AM.

        Comment


        • #19
          Originally posted by coldturkey View Post
          I James,
          Thanks for your quick reply. I have a question regarding the template display. Could you post the colour code for the reads in the display
          Certainly.

          The grey scale levels indicate the mapping quality for the template, from dark grey (bad) to white (good). You can control whether it colours this using the average of each end sequence, the minimum of the two, or the maximum of the two by using the menu towards the top left. Additionally the Filter button allows you to filter out poor quality (or peversely good quality) mapping reads.

          However this is only the base colour. It will be changed to one of the following if it fits into another category:

          Blue: single ended read. Or more specifically, only *this* end is in the Gap5 database. (That doesn't necessarily mean it was a single ended library.)

          Red: Inconsistent orientation. Ie instead of ---> <--- it finds ---> --->, <--- <--- or <--- --->. This needs some more work as certain library construction types (eg 454) will yield ---> ---> as standard. My plan here is to auto-analyse each library and attempt to work out the normal orientation. Then anything abnormal will be labelled as red.

          Orange: paired, but the other end isn't visible. It's not actually possible with ease to tell whether the other end is in another contig (most likely) or whether it is a long way away in this contig - significantly off the edge of the screen in other words. The ">>Acc" button (accurate mode) attempts to work out this distinction correctly, but it's slow as it has to read more data in order to figure out the correct location.

          The reason for this is that Gap5, being an editor, doesn't actually track *where* the other end is, only the record number. If it was found in the same range query then we know they form a pair, if not we require more searching to figure this out. The reason I don't cache such data is that it would mean contig joining or contig breaking suddenly requires millions of edits (to check whether the intra-contig or inter-contig spanning status needs changing).

          Having said that, I just noticed the >>Acc button crashed Gap5 for me on one of my test databases. Oops!

          For reads (2nd menu from left; set to "reads") the colours simply indicate forward vs reverse reads. It's not possible in all file formats though to distinguish which is which so sometimes it's simply a case of the "left" one being one colour while the "right" one is another. The main use of this is simply to see the extent that the physical sequence data extends along the insert.

          Finally I'd recommend using the Y-spread function when in Template Size mode (left-most menubutton). This mode is the default template display view where the Y coordinate is governed by the size of the insert. Thus structural variations stand out. However you can't easily tell how many lines are stacked up on the same pixel row - it maybe an isolated template or it may be a whole stack of templates with very similar start/end positions. The Y-spread slider basically adds a bit of pseudo-randomness to the Y coordinate making it easier to distinguish between 1 template and lots of templates stacked up at the same coordinates.

          James

          Comment


          • #20
            Originally posted by darked89 View Post
            Would it be possible to also put some test database(s) with sequences from various sequencing platforms? Something small just to play with?
            I through together something *very* small - a sanger human BAC clone (capillary sequencing) and a thin slice from mitochondrial genome out of one of the 1000genome submissions.

            Code:
            -rw-r--r--  1 badger hsg 5402975 Jun 19 16:59 DJ105K17.baf
            -rw-r--r--  1 badger hsg 1566608 Jun 19 16:59 DJ105K17_g5
            -rw-r--r--  1 badger hsg    8512 Jun 19 16:59 DJ105K17_g5.aux
            -rw-r--r--  1 badger hsg  348101 Jun 19 16:59 mt.bam
            -rw-r--r--  1 badger hsg  227824 Jun 19 16:59 mt_g5
            -rw-r--r--  1 badger hsg    1120 Jun 19 16:59 mt_g5.aux
            The files are in ftp://ftp.sanger.ac.uk/pub/badger/tmp/

            They're produced using:

            Code:
            tg_index -o DJ105K17_g5 -B DJ105K17.baf
            tg_index -o mt_g5 -b mt.bam
            Obviously I could put bigger data sets there, but it's hard to find something the "right size". (I was looking for S.Suis - a nice 2Mb organism, but failed dismally to find our original test set for this and I don't want the hassle of figuring out what's publically released and what isn't.)

            If I get a more appropriate data set I can place that there too. It'd be good to get some standard "moderate sized" test sets for developers to experiment with and compare/contrast tools on. Preferably with snps, structural variations, etc.

            James

            Comment


            • #21
              Primarily to address the issues of unwanted library dependencies I've rebuilt Gap5, now version 1.2.2. I also fixed a few other bugs, but I'm unsure if this fixes the odd ACE file crash people have reported.

              I also took the opportunity to remove some of the very common bundled libraries, which should remove some of the png warnings. So the third party requirements now should be: zlib, png, X11, libcurl, C/math library.

              If you're not having problems with the current code it's not important to download this new version.

              As before, binaries are at sourceforge:

              A fully developed set of DNA sequence assembly (Gap4 and Gap5), editing and analysis tools (Spin) for Unix, Linux, MacOSX and MS Windows.


              If there's a pressing need for other platforms let me know. I can't guarantee anything as it takes time, especially on the machines I don't own myself, but it's good to get feedback on user requirements.

              James

              Gap5 1.2.2, June 22nd 2009
              ===========================

              Minor bug fix release.

              Minor changes
              -------------


              * Added a Hide Annotation setting in the contig editor. This is
              also bound to control-Q (as per Gap4). This allows us a quick
              way to see the quality values underneath a tag.

              * Annotion contents are now visible in the editor information
              line, but the "Tag editor" itself still hasn't been ported
              over from Gap4.

              * Changed SAM reading of LB tags to use TG tags. In practice
              libraries rarely seem to appear directly in the sequence
              lines, but instead occur in read-groups. In theory we should
              then check the main read-group header for LB links from there,
              but for now treating a library as a group (eg a run or a lane)
              is perhaps more useful than a genuine library grouping.

              * Lots of internal code tweaking to support newer versions of
              Tcl/Tk (tested with 8.6b1). For now we still ship with 8.4,
              but this is the first stage of making the code more portable
              and easier to build from source.

              * Extra slider (there's too many I know - it's overdue a
              redesign) in the template display. This governs the "stacking"
              y-mode, controlling where the data gets binned into groups. Eg
              every 1k so that templates 0-1000 are stacked together,
              1000-2000bp are below, 2000-3000 below those, etc.

              Bug fixes
              ---------

              * Removed a memory corruption when displaying long sequence
              tags (> 1Kb).

              * Removed issues with diagonal lines appearing at the start
              of the library insert-size distribution plot.

              * Removed unneeded dependency on C++ (libstdc++.so.5) and
              FORTRAN (libg2c0.so) libraries.

              * The contig selector window now internally uses more 64-bit
              integers. This fixes issues where the total contig length grew
              beyond 2Gb.

              * Initialised more elements in some of the data structures.
              Previously some parts of range_t struct were
              uninitialised. Hopefully this resolves some of the random ACE
              related errors (unknown).

              Comment


              • #22
                Hi James,

                just stumbled over the thread today. We were eagerly awaiting the Gap5 release and will probably test it next week.
                I just have one question. How far are the editing capabilities integrated yet and when will they be fully usable? As this is the most important feature for us.

                Cheers,
                Andreas

                Comment


                • #23
                  To be honest editing has taken a backseat recently as most people have been after a decent viewer.

                  However it's always been my intention to replace Gap4's editing capabilities, and so Gap5 already has basic editing, contig joining and contig breaking (both at O(logN) complexity instead of O(N) used in Gap4), however it appears in the recent updates I managed to totally bust joining/breaking so I'll start work on fixing those next week when I get back to the office.

                  James

                  Comment


                  • #24
                    Ok, great. Thanks for the fast reply.
                    I'll get back to you if we find bugs.

                    Andreas

                    Comment


                    • #25
                      Hi James,

                      sorry for replying that late.
                      I'm trying to convert a gap4 database to the gap5 format as described in the README. However, caf2baf seems to depend on a Caftools.pm module in the pipe:
                      gap2caf -project Database/Lauf1+2 -version A | caf2baf > test/Lauf1+2.baf

                      Sadly, I'm not able to find that module. Neither in the gap5 distribution nor in caftools-2.0.2. Any ideas where to look?

                      Thanks in advance,
                      Andreas

                      Comment


                      • #26
                        Odd that caftools.pm wasn't placed in the caftools package, but I found a copy in the sanger miniphrap2gap distribution. See:

                        ftp://ftp.sanger.ac.uk/pub/PRODUCTIO...iphrap2gap.tgz

                        For what it's worth though, Gap4 is typically better at editing and displaying gap4 databases than Gap5 as it's more complete (albeit slow).

                        James

                        PS. In new heights of complexity we recently managed to get a 1.9 billion read Gap5 database! It did require a bit of tinkering with the source to build it though, so as always there's a lot more to do. The main issue is with tg_index, which even with tweaks took about 16 hours to build the DB.

                        Comment


                        • #27
                          I am having trouble installing gap5 on a 64 bit Gentoo server... configure is unable to use the installed tklib version, present at /usr/lib/libtk.so . I tried
                          Code:
                          -with-tklib=/usr/lib
                          and
                          -with-tklib=/usr
                          as well as not using the --with-tklib option at all.

                          Both give the following error message:

                          Code:
                          checking for Tcl configuration... found /usr/lib/tclConfig.sh
                          checking for existence of /usr/lib/tclConfig.sh... loading
                          checking for Tk configuration... found /usr/lib/tkConfig.sh
                          checking for existence of /usr/lib/tkConfig.sh... loading
                          checking for Tcl public headers... /usr/include
                          checking for Tcl private include files... Using srcdir found in tclConfig.sh: /usr/lib64/tcl8.4/include
                          checking for Tk public headers... /usr/include
                          checking for Tk private include files... Using srcdir found in tkConfig.sh: /usr/lib64/tk8.4/include
                          checking tklib directory... no
                          configure: error: Abort: no tklib package found, use --with-tklib=DIR

                          Comment


                          • #28
                            I just installed the gap5-1.2.2-linux-x86_64.tar.gz and it works with no warnings and errors. Nice.

                            Martin

                            Comment


                            • #29
                              Originally posted by greigite View Post
                              I am having trouble installing gap5 on a 64 bit Gentoo server... configure is unable to use the installed tklib version, present at /usr/lib/libtk.so .
                              Hmm, is there not a /usr/lib/tk8.4.so (or /usr/lib/tk8.5.so)? The version number is important, and appears to be how all systems package up their tcl/tk distributions - well until Gentoo perhaps! I'm just using the supplied tcl.m4 for my autoconf detection too, so I'm guessing other packages relying on tcl/tk will fail to compile cleanly on this system as well.

                              What Gentoo version is this? I may be able to find a vmplayer image and give it a test myself.

                              However have you tried the prebuilt binaries too? They're not as new, but mainly the change to 2.0 was a major coding / file layout reshuffle.

                              James

                              Comment


                              • #30
                                staden-1-7-b1 fails - libssl.so.0.9.7 not found

                                I get this error trying to run pregap4 from staden-1-7-1b:

                                Code:
                                staden-linux-x86_64-1-7-1b/linux-x86_64-bin$ ./pregap4
                                stash: error while loading shared libraries: libssl.so.0.9.7: cannot open shared object file: No such file or directory
                                I saw the same error running gap5 1.2.0 - but an upgrade to gap5 1.2.2 fixed that.

                                M

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-25-2024, 11:49 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                62 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X