Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • quinlana
    replied
    Originally posted by Simon Anders View Post
    Sorry, but for the benefit of these "young programmers" I have to disagree strongly.

    Basically, you should distinguish two cases. (a) A young student aspires to become a professional in (scientific or other) software development. (b) A scientist who already works in research (or is studying biology, not CS) want to broaden his skills in order to perform some bioinformatics analyses himself.

    Jiaco's advice is well suited for case (a). There are many professional developers with a CS degree but without an understanding of what is going on under the hood of a computer. These usually come from universities which kicked C out of the curriculum and I share Jiaco's frustration about this.

    However, most reader here will fall into case (b). They don't want to be able to replace of a fully qualified computer scientist. Rather, they already have a qualification, and that is biology or biotech engineering.

    Hence, I fully agree with the emphasis that was put in this thread on scripting languages, especially Python.

    They allow you to get a job done fast, and they are much easier to learn.

    What has not yet been mentioned here is the fundamental trade-off between compiled languages and scripting languages, namely runtime speed versus development speed: An developer experienced in both languages might need half a day to code something in Python and two days to get the same job done in C. However, the Python program may take, say five minutes to run, while the C program needs only half a minute. But only if you plan to run the program very often, you will get back your investment in development time from having to wait less for the program to run.

    That is not to say that there are not a lot of problems in bioinformatics that require strong C/C++ and computer science theory skills but these skills are not something acquired within a few weeks or months. (I would be unemployed if all biologists were experts in computer science, too.)

    Simon
    Simon is dead-on here. Ferraris aren't wise choices for trips to the grocery just as my uncle's Prius won't win LeMans. I use C++ (and sometimes C) for the races and Python for getting groceries, daily work and medium-size applications/prototypes. The two play very nicely and the general programming concepts are transferable. In one day you can get over the whitespace issue in Python and in two more you can get past the extra bit of work that one must do to for REGEX and multi-level hashes/dictionaries relative to Perl. Then learn iterators, comprehensions and what is and isn't mutable and after that, it's smooth sailing.

    In closing, http://xkcd.com/353/

    Leave a comment:


  • Simon Anders
    replied
    Originally posted by jiaco View Post
    First off, I do not have anything against all the shiny new languages and IDEs and such, but if a young programmer stumbles across this thread and takes to heart what people are saying, I feel obligated to input my opinion.
    Sorry, but for the benefit of these "young programmers" I have to disagree strongly.

    Basically, you should distinguish two cases. (a) A young student aspires to become a professional in (scientific or other) software development. (b) A scientist who already works in research (or is studying biology, not CS) want to broaden his skills in order to perform some bioinformatics analyses himself.

    Jiaco's advice is well suited for case (a). There are many professional developers with a CS degree but without an understanding of what is going on under the hood of a computer. These usually come from universities which kicked C out of the curriculum and I share Jiaco's frustration about this.

    However, most reader here will fall into case (b). They don't want to be able to replace of a fully qualified computer scientist. Rather, they already have a qualification, and that is biology or biotech engineering.

    Hence, I fully agree with the emphasis that was put in this thread on scripting languages, especially Python.

    They allow you to get a job done fast, and they are much easier to learn.

    What has not yet been mentioned here is the fundamental trade-off between compiled languages and scripting languages, namely runtime speed versus development speed: An developer experienced in both languages might need half a day to code something in Python and two days to get the same job done in C. However, the Python program may take, say five minutes to run, while the C program needs only half a minute. But only if you plan to run the program very often, you will get back your investment in development time from having to wait less for the program to run.

    That is not to say that there are not a lot of problems in bioinformatics that require strong C/C++ and computer science theory skills but these skills are not something acquired within a few weeks or months. (I would be unemployed if all biologists were experts in computer science, too.)

    Simon
    Last edited by Simon Anders; 05-18-2010, 02:44 PM. Reason: slight rewording

    Leave a comment:


  • jiaco
    replied
    First off, I do not have anything against all the shiny new languages and IDEs and such, but if a young programmer stumbles across this thread and takes to heart what people are saying, I feel obligated to input my opinion.

    Learn C and bash and the most basic stuff first. LEARN vi as your IDE and your word processor and your only way of knowing how to enter text. Understand how to log into a machine with the most basic of linux available and to actually do something functional to bring it back to life. There will be times when there is no python, no jvm, no eclipse. If you cannot function in such an environment then you are shooting yourself in the foot.

    The compiler is your friend. You can spend loads of time writing code. If you learn stellar white space habits, your code is readable and you can be fairly confident of what you have written. Then having the compiler pass over it before debugging is a great way to catch stupid and serious errors without wasting more time debugging.

    While I now use exclusively Qt (C++), I still force myself to get dirty in C just to keep it fresh. Plus bash and the history command should be your best friends. They basically record your actions, give you an easy way to script up a pipeline and serve as a form of documentation. Once you have your script, be sure to comment it with database versions, and any other relevant info that may change in the future.

    Best of luck

    :wq

    Leave a comment:


  • martian_bob
    replied
    I got my Ph.D in computer science as opposed to anything biological, so here's my two cents from that point of view...

    - If you're writing for computational speed, go with C or C++
    - If you're writing a GUI, go with Java
    - If you just need to get it done and never look at it again, go with Perl
    - If you're setting up a pipeline, especially one dealing with converting data formats, go with Python

    People tend to advertise jobs looking for C++ or Java when they want folks to write code that'll eventually be released for other people to use as a stand-alone tool.

    I do all of my analysis using Python to massage and analyze data for tools like Bowtie and genome browsers.

    Leave a comment:


  • zbjorn
    replied
    I use a handful of languages. Here goes.

    Mathematica is my absolute favorite because of its versatility, high performance, incorporation of high level functions, incredible documentation and useful interactive front end. I'm also the only biologist I know who uses it, though I know the Lawrence National Labs at least prototypes code in it.

    I rarely use perl. It is easy to be sloppy in it and I don't like that about it, but it is nice for some basic scripts (e.g. rearranging data). Python is about the same in terms of functionality, and I like its syntax better.

    Matlab is so clunky and poorly documented that I don't think it's worth the trouble.

    R, dislike syntax structure. Who came up with the reverse symbol assignments? Plenty of other languages do what R can do, no need for a dedicated statistics language.

    Java is good for applications requiring GUIs. It's OK for backend work too... despite common belief it's not inherently slow.

    C is the king if you need to do real software authoring.

    Finally, I use any of the .NET languages (Windows!) for hardware interfaces (if I'm writing control software for lab robotics).

    So, if I had to choose two, it would be Mathematica and C I suppose.

    Leave a comment:


  • chaz81
    replied
    Originally posted by quinlana View Post
    I learned the basics from python.org and used O'Reilly's "Python Cookbook" to get a feel for the subtleties and advanced usage. In my case, however, I was mainly just needed to know the syntax and basic data structures as I already knew how to program.

    There's a new Python book for Bioinformatics from O'Reilly. No idea what the quality is like.
    http://oreilly.com/catalog/9780596154516/
    That O'reilly book is currently sitting on my desk about halfway finished. I would not recommended it as a book to start learning python with. I bought O'reilly Learning Python a few years ago (cover says now includes python 2.3!) and this book has been invaluable and I can only imagine one that covers more recent versions would be better. The Learning Python book is well-written and has good examples to work through.

    Leave a comment:


  • Simon Anders
    replied
    Originally posted by ymc View Post
    Does Python has a free bioinformatics library like BioPerl? I find that BioPython is quite lacking for now. I am wondering if there are better alternatives.
    If you are specifically after high-throughput sequencing: I'm currently working on a framework for that, called HTSeq, and I've come reasonably for by now.

    You can already do a lot with it: see this thread, in which I've advertised it, and, of course, the HTSeq web page.

    Simon

    Leave a comment:


  • sameet
    replied
    Originally posted by flxlex View Post
    I've come to rely more and more on awk. Being a self-trained perl programmer I find it fascinating to see how much I can do with (nearly) oneliners in awk instead of writing multiple-line perl scripts. Many of my input/output files are tab separated, which is ideal for awk.

    Next on my list is learning python...
    Actually if you are using Linux (linux-like environment) for your work, then all the well-known linux tools like sed, awk, bash and their combination with python actually makes life very easy.

    Leave a comment:


  • JohnK
    replied
    Originally posted by damiankao View Post
    I got to go with Python too. Perl has great available libraries, but Python is just so much more agile than Perl. Just having the dot notation format and slice syntax makes things so much easier.
    I haven't used Python, but I may give it a whirl someday. Perl also supports slicing and the '->' oop/method invocation aspect of perl is supposed to be similar to and mimic c++ '->' pointer notation; my memory might have left me on this stuff though, so no finger pointing . Python might be better at these things than perl though. It's whatever floats your boat really. I believe some call perl a "c-like syntax" language.

    As for CML, awk is awesome, but can lack some syntax sugar, in my opinion. A lot of the stuff I do on the CML in perl is just a manipulation along the lines of:

    < <in_file> perl -e 'while(<>){ #splitting, pushing onto an array, printing out, and then piping to some text filters, etc...# }' | a filter > <out file> &

    I wouldn't use perl to do things that filters and sed/awk can do for you, but knowing unix/linux filter and shell commands is a priceless skill that saves incredible amounts of time. I think I can say that without someone yelling at me.

    Leave a comment:


  • flxlex
    replied
    I've come to rely more and more on awk. Being a self-trained perl programmer I find it fascinating to see how much I can do with (nearly) oneliners in awk instead of writing multiple-line perl scripts. Many of my input/output files are tab separated, which is ideal for awk.

    Next on my list is learning python...

    Leave a comment:


  • damiankao
    replied
    I got to go with Python too. Perl has great available libraries, but Python is just so much more agile than Perl. Just having the dot notation format and slice syntax makes things so much easier.

    Leave a comment:


  • sameet
    replied
    Originally posted by JohnK View Post
    Aw man! I love perl, but scripting languages are all the same to me. Just remember to hit '#' constantly.
    One more vote for Python. I am a biologist by training, and got into this business accidentally, and found that I liked it. I taught myself PERL, but once I came to know about Python and started using it there has been no looking back. All the points that @quiniana makes are true. I believe it is the best language to start programming in.

    Leave a comment:


  • Calico
    replied
    By the way, is the R environment not used in the HTS community anymore?

    Leave a comment:


  • nilshomer
    replied
    Originally posted by JohnK View Post
    I wrote a longer response, but for some reason it never posted or maybe it posted to the wrong forum. Java and c++ are object oriented, which means everything you do will be within the confines of classes, objects, encapsulation and abstraction. For simple programs, you might want to leave simple flat/text file manipulation to a scripting language; ie leave the scripting tasks to scripting languages like perl, shell, and oop oriented tasks to the oop oriented programming languages. I think for bioinformatics purposes, a scripting language like perl, python, etc will suffice. Perl seems to be by far the most popular and there are large libraries like CPAN and BioPerl to support your coding. Perl is also very fast, but not as fast as shell programming. Learning shell programming in a Unix/Linux environment is a priceless skill and will save you much time. All the statisticians I've met have and are using 'R'. I myself haven't learned R, but it's definitely a future endeavor. Hope that helps.
    Good catch, the system thought you were spamming, but that's not the case.

    Leave a comment:


  • JohnK
    replied
    Originally posted by Calico View Post
    Hello everybody,

    My rather limited bioinfromatics skills come from having done some microarray data analysis in R (following a template code) and some minor coursework. So, I consider myself quite a newbie to the subject. I will quite soon be shaking hands with some sequencing data (from a Helicos machine) and need to prepare myself for this.

    Being of a younger generation, I would say I can handle computers pretty well. So far I have, as recommended in this nice thread, started to take a look at the Unix and Perl for Biologist tutorial and installed Ubuntu in Virtual PC on my Windows computer.

    What I'd like to ask you, SEQanswers community, is whether you can suggest me anything helpful. Am I starting out in the right way? I will get some bioinformatics help along the way, though I am unsure to what extent. Also, I see this as a part of my future career, so I am not just doing this for one particular project.

    Edit: I have just realized that the Helicos software package uses Python.
    I wrote a longer response, but for some reason it never posted or maybe it posted to the wrong forum. Java and c++ are object oriented, which means everything you do will be within the confines of classes, objects, encapsulation and abstraction. For simple programs, you might want to leave simple flat/text file manipulation to a scripting language; ie leave the scripting tasks to scripting languages like perl, shell, and oop oriented tasks to the oop oriented programming languages. I think for bioinformatics purposes, a scripting language like perl, python, etc will suffice. Perl seems to be by far the most popular and there are large libraries like CPAN and BioPerl to support your coding. Perl is also very fast, but not as fast as shell programming. Learning shell programming in a Unix/Linux environment is a priceless skill and will save you much time. All the statisticians I've met have and are using 'R'. I myself haven't learned R, but it's definitely a future endeavor. Hope that helps.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-27-2024, 06:37 PM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-27-2024, 06:07 PM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
69 views
0 likes
Last Post seqadmin  
Working...
X