Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I do not really get what your intention is.

    In the realm of bioinformatics we get constantly exposed to rather inefficient code covering all languages. It is not about the 20 time faster a C/C++ code runs in general, but about the question whether your code is running at all, your code can be easily adapted to your own needs, be modified, whether your code properly exploits the potential of your processing units. And yes, whether your code is working cross-platform.

    I used probably dozens of crap-out-of-the-box C executables. So we are not even close any situation to discuss superiority of C languages in bioinformatics in general. Specific applications might for sure benefit, but in general we need better programming practices first.

    Comment


    • #17
      Originally posted by mudshark View Post
      I do not really get what your intention is.

      In the realm of bioinformatics we get constantly exposed to rather inefficient code covering all languages. It is not about the 20 time faster a C/C++ code runs in general, but about the question whether your code is running at all, your code can be easily adapted to your own needs, be modified, whether your code properly exploits the potential of your processing units. And yes, whether your code is working cross-platform.

      I used probably dozens of crap-out-of-the-box C executables. So we are not even close any situation to discuss superiority of C languages in bioinformatics in general. Specific applications might for sure benefit, but in general we need better programming practices first.
      I feel the problem with my Java programs, is that they get to a point where there is one or major road blocks, so I would like the message to get out there to program managers that these things will be problems.

      A) They need to be ported, but there is no facility for porting them.
      B) It is too slow for any sophisticated algorithms.
      C) I need to use external C++ libraries, because there are no equivalents in Java (see B), and the interfaces for external code (JNI or system calls), are not portable.

      Throw in the problem of Java being not nice, running in high priority mode and hogging memory.

      Comment


      • #18
        I had a speed increase of about 100x converting some Perl code to Java (mostly because of really fast string manipulation), so it had some use there. But usually the speed increase is less, and converting doesn't really make sense if the code already finishes in a reasonable time.

        Unfortunately, the time taken to write Java code is much longer, and it's easier to introduce unexpected bugs into Java code than into R or Perl (I use warnings/strict mode). So I usually write my code in Perl or R, only switching to Java when I need the speed.

        Comment


        • #19
          Originally posted by gringer View Post
          I had a speed increase of about 100x converting some Perl code to Java (mostly because of really fast string manipulation), so it had some use there. But usually the speed increase is less, and converting doesn't really make sense if the code already finishes in a reasonable time.

          Unfortunately, the time taken to write Java code is much longer, and it's easier to introduce unexpected bugs into Java code than into R or Perl (I use warnings/strict mode). So I usually write my code in Perl or R, only switching to Java when I need the speed.
          That is interesting. I have had to rewrite Perl code for intensive string operations too, counting GC content for example, though I chose C for ease in deployment.

          Comment


          • #20
            Originally posted by rskr View Post
            Can you explain to me what you mean by cross-platform? I was under the impression that C was not considered cross-platform, even though there are probably more C compilers for more platforms than anything, and language supported porting constructs.
            The core C and C++ languages are cross-platform, but the libraries often aren't. Most of the Java applications I come across in this field are GUI programs for desktop use (GATK and Picard are obvious exceptions). They need to run on all the machines a typical lab might use, from the PI's Macbook, to the postdoc's Windows PC and the informatics guy's Linux box. Speed for something like IGV doesn't seem to be an issue. Sure, they might have come up with something prettier if they'd written it in C++ and lovingly hand-ported the code to make full use of the native API's bells and whistles on each platform, but would that really have been the best use of their time? Instead we have a single JAR that looks fine and runs everywhere that matters. It isn't bug-free, but I suspect that isn't the fault of the language!

            The C or C++ source we typically get for command-line NGS tools isn't always terribly cross-platform, especially if your target is Windows (which the developers usually aren't interested in supporting). Calls to POSIX functions are often used liberally. You can probably build these packages under Cygwin, but if you want a 64-bit application you're out of luck without a significant re-write, which a typical end-user isn't in a position to do.

            Comment


            • #21
              Originally posted by RDW View Post
              The core C and C++ languages are cross-platform, but the libraries often aren't. Most of the Java applications I come across in this field are GUI programs for desktop use (GATK and Picard are obvious exceptions). They need to run on all the machines a typical lab might use, from the PI's Macbook, to the postdoc's Windows PC and the informatics guy's Linux box. Speed for something like IGV doesn't seem to be an issue. Sure, they might have come up with something prettier if they'd written it in C++ and lovingly hand-ported the code to make full use of the native API's bells and whistles on each platform, but would that really have been the best use of their time? Instead we have a single JAR that looks fine and runs everywhere that matters. It isn't bug-free, but I suspect that isn't the fault of the language!

              The C or C++ source we typically get for command-line NGS tools isn't always terribly cross-platform, especially if your target is Windows (which the developers usually aren't interested in supporting). Calls to POSIX functions are often used liberally. You can probably build these packages under Cygwin, but if you want a 64-bit application you're out of luck without a significant re-write, which a typical end-user isn't in a position to do.
              I must admit the only bioinformatics platform I use on Windows is putty or cygwin with X11 forwarding, so cross-platform non-sense is not an issue. If I need to draw a chart for a paper GNUplot, will look much better than anything the swing libraries will produce. There is probably a reason the people who program the real software that bioinformaticians use daily is in C or C++ and on linux, and I don't see what Java brings to that, except a configuration nightmare, reconfiguring the servers so that all the paths are where Java expects them(or even worse installling hadoop) so that the programmers can use their trivial GUI language for high-throughput programming.

              Comment


              • #22
                A fundamental fault in your argument is to claim Java is slow:

                Originally posted by rskr
                C and C++ are faster, by up to 20 times in neutral language comparison benchmarks.
                It does not help to say "up to 20 times" given that most of time Java is only twice as slow, which most other benchmarks agree. There are a couple of things that Java is particularly not good at, but that is rare.

                I do not know how "cross-platform" is defined, but the fact I as well as most others see is that it is much easier to launch a Java program in different OS than compiling a C/C++ program from the source code.

                EDIT: What I most dislike about Java is it takes too much memory.
                Last edited by lh3; 09-29-2011, 04:21 PM.

                Comment


                • #23
                  Originally posted by lh3 View Post
                  A fundamental fault in your argument is to claim Java is slow:



                  It does not help to say "up to 20 times" given that most of time Java is only twice as slow, which most other benchmarks agree. There are a couple of things that Java is particularly not good at, but that is rare.

                  I do not know how "cross-platform" is defined, but the fact I as well as most others see is that it is much easier to launch a Java program in different OS than compiling a C/C++ program from the source code.

                  EDIT: What I most dislike about Java is it takes too much memory.
                  Well here is the link. There it is slower on ubuntu for sure. Personally for me a program taking two to three times as long to complete a task is like taking a month instead of a weak, and the difference in getting something done in memory on one computer vs. having to use several terabytes across a compute cluster.

                  Comment


                  • #24
                    Originally posted by rskr View Post
                    I feel the problem with my Java programs, is that they get to a point where there is one or major road blocks, so I would like the message to get out there to program managers that these things will be problems.
                    Then either learn something or find someone who knows. Chances are you can't afford them though, for java or any other language.

                    Originally posted by rskr View Post
                    A) They need to be ported, but there is no facility for porting them.
                    Almost a non-issue in java, a potentially serious problem in most alternatives.

                    B) It is too slow for any sophisticated algorithms.
                    If it's more than a factor of 2, maybe 3, vs C or C++, you're doing it wrong. Which beats most languages, certainly perl/python (as the most typical bioinformatics languages).

                    C) I need to use external C++ libraries, because there are no equivalents in Java (see B), and the interfaces for external code (JNI or system calls), are not portable.
                    JNI bridges to external libraries are source portable, if done properly. Using system calls which exist on one platform is rarely necessary, and by definition makes your code non-portable in any language.

                    Throw in the problem of Java being not nice
                    Meaningless.

                    running in high priority mode
                    Meaningless.

                    and hogging memory.
                    The closest you've come to a valid point. Congratulations.

                    Comment


                    • #25
                      Originally posted by rskr View Post
                      I must admit the only bioinformatics platform I use on Windows is putty or cygwin with X11 forwarding, so cross-platform non-sense is not an issue.
                      So why are you complaining about it? Or claiming to know anything about it?

                      Originally posted by rskr View Post
                      If I need to draw a chart for a paper GNUplot, will look much better than anything the swing libraries will produce.
                      Then perhaps you should try an actual charting library for java, not a GUI toolkit.

                      Originally posted by rskr View Post
                      There is probably a reason the people who program the real software that bioinformaticians use daily is in C or C++ and on linux, and I don't see what Java brings to that
                      Clearly you don't, but where does the fault for that lie? C / C++ are fine options, but clearly lie further towards the performance end of the performance / convenience scale. Likewise, there's nothing wrong with perl/python for quickly creating scripts for relatively low complexity tasks.

                      Originally posted by rskr View Post
                      except a configuration nightmare, reconfiguring the servers so that all the paths are where Java expects them(or even worse installling hadoop)
                      What are you on about? Java doesn't 'expect' much of anything anywhere - it relies on 4 core system libraries and one that it ships with.

                      Originally posted by rskr View Post
                      so that the programmers can use their trivial GUI language for high-throughput programming.
                      indeed. You don't even realise that java is primarily a server-side language, because its advantages for building large scale, maintainable software where resource usage is not the absolute top priority. Then again, i would expect java or similar languages to become more prevalent in bioinformatics as complexity and longevity of codebases become increasingly important.

                      And BTW, the only thing trivial here is your knowledge of the topic.

                      Comment


                      • #26
                        Originally posted by rskr View Post
                        Personally for me a program taking two to three times as long to complete a task is like taking a month instead of a weak, and the difference in getting something done in memory on one computer vs. having to use several terabytes across a compute cluster.
                        The time saved by using java for anything with >10K lines usually enables much more time to be devoted to optimization of that critical 1% of the codebase which takes 99% of the runtime.

                        And if you really need the fastest implementation, port that nugget to C and preferably look at using SIMD or CUDA (if applicable) while leaving the bulk of the code in something much faster to develop.

                        Comment


                        • #27
                          Purpose of Java in Bioinformatics (original post topic title)

                          At least in the clinical and corporate arena, many end users and PIs have no or minimal UNIX experience, and their corporate IT policies limit them to a windows workstation or laptop on their desk. They do not care about program code issues or anything else, they just want to visualize and manipulate their summary data that the core service produced and sent to them. Ultimately, they want to interpret their results and produce summary figures and tables for publications or reports and they do not want to learn any more than absolutely minimally necessary about computers or programming.

                          Bioinformatics encompasses so much more than just NGS and UNIX cluster based apps. When I think of the list of Java desktop apps used in my past and current places of work by one or more people who likely do not even know we have UNIX servers and a cluster, it gets respectfully long:

                          Seqmonk, IGV, GenomeVIew, MeV4, GSEA, GenePattern, various caBIG tools, even old JavaTreeView (which I still actually like for heatmaps). Heck even JMP Genomics leverages Java. The list is long and the programming choice seems usually to have been made to provide trivial to install end user standalone applications for windows, Mac and Linux, for users who may well have very limited IT knowledge or experience (and little desire nor time to expand on that part of their skill set).

                          It just seems to me that the original post was taking a very narrow view of bioinformatics and the array of users who work with genetic and genomic data. The "Purpose" of bioinformatics does not mean merely meeting the desires of programmers, since ultimately we all (most of us anyway) serve PIs and investigators who in no way consider themselves bioinformaticians, but do need to be able to work with terminal or summary data, be it NGS, microarray, or whatever.

                          So, from my experience over the years, to answer "Does java contribute anything to Bioinformatics,..."? Yes, it does and has, in terms of many highly useful, and heavily used, tools. You may have chosen to code those tools differently, but the reality is they were done in Java and met and continue to meet the needs of many investigators. If that is not a valuable contirbution, what is it?
                          Michael Black, Ph.D.
                          ScitoVation LLC. RTP, N.C.

                          Comment


                          • #28
                            Originally posted by mbblack View Post
                            They do not care about program code issues or anything else, they just want to visualize and manipulate their summary data that the core service produced and sent to them.
                            Java is good for manipulating data, it is perfect for PI types who like to white wash things and cheap/poor/dumb coders. I am afraid you aren't making the Java community look very good.

                            Comment


                            • #29
                              Originally posted by rskr View Post
                              Java is good for manipulating data, it is perfect for PI types who like to white wash things and cheap/poor/dumb coders. I am afraid you aren't making the Java community look very good.
                              Well, that I will say is a purely trollish post, twisting my words to imply meaning that simply was not there.

                              And I wonder how the folks in app development at the Broad Institute, the Dana-Farber Cancer (and previously with TIGR) Institute, and others feel being called "cheap, poor and dumb". Or the PIs I know and have known with hundreds of peer reviewed publications to their name and countless awards and recognition by their peers in their respective scientific fields as folks who simply "white wash" their results.

                              I'm done with this thread.
                              Last edited by mbblack; 09-30-2011, 08:50 AM.
                              Michael Black, Ph.D.
                              ScitoVation LLC. RTP, N.C.

                              Comment


                              • #30
                                Originally posted by mbblack View Post
                                Well, that I will say is a purely trollish post, twisting my words to imply meaning that simply was not there.

                                And I wonder how the folks in app development at the Broad Institute, the Dana-Farber Cancer (and previously with TIGR) Institute, and others feel being called "cheap, poor and dumb". Or the PIs I know and have known with hundreds of peer reviewed publications to their name and countless awards and recognition by their peers in their respective scientific fields as folks who simply "white wash" their results.

                                I'm done with this thread.

                                You can't be both Noble well meaning investigators and cheap sell outs at the same time can you? It seems that the fact that Java programmers with little skill can create simple GUI's and programs so that people even less computational skills can do computation, keeps coming up in this thread.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-25-2024, 11:49 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                62 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X