Developing programming experience for bioinformatics

samanta replied

06-19-2012, 12:58 PM
Originally posted by chadn737 View Post

Please don't read into it something I did not say. I am not criticizing the work of computer scientists. I specifically criticize the attitude of rskr that is condescending and dismissive of any biologist or computer scientist that doesn't create a new program from scratch in his language of choice.

......
I'm not dismissing anyone here or their relative contributions. I'm just being frank about the fact that there are more practical concerns.

Sorry, I misunderstood your original comment. You make all valid points. You need to look after your self-interest (which is to get the best biological insights from your data) irrespective of which computer programs were used to get there. So, if you are competent enough to code/install and run computer programs, I see no reason for wasting time with a computer scientist trying to be in your shoes. The computer scientists, on the other hand, look after their self-interests of finding best algorithm.
Leave a comment:
chadn737 replied

06-11-2012, 02:03 PM
Originally posted by samanta View Post

Geez !! What a warped view of the world.

Computer science has two components - (i) algorithm development and (ii) coding the algorithm into some programming language. A new algorithm is a mathematical discovery that sometimes takes decades to develop, but once it is in place, if can revolutionize all aspects of science and non-science, including your beloved sequence analysis. Here is the development history of one chain of algorithms -

404 Not Found

http://www.homolog.us/blogs/2011/10/20/burrow-wheeler-transform-suffix-arrays-and-fm-index/

You may notice that when Myers and Manbar were working on the concept of suffix arrays, they had no clue about how the future of sequencing technology would develop, yet two important lines of programs for short read analysis (Bowtie/BWA and String graph assemblers) rely on mathematical constructs developed by them.

Just like you have a reward structure regarding quick publication of your sequence-related paper, computer scientists have a different reward structure related to development of new algorithms. Historically it has been found that their reward structure contributes more to biology than another incremental biology paper. So biologists themselves (those more knowledgeable than you) encourage discoveries of new algorithms.

Please don't read into it something I did not say. I am not criticizing the work of computer scientists. I specifically criticize the attitude of rskr that is condescending and dismissive of any biologist or computer scientist that doesn't create a new program from scratch in his language of choice.

While the continued development of new algorithms and programs are of great use to Biologists, their development is the primary concern of the specialists, not the biologist. Even for the computer scientists, it makes no sense to develop new programs from scratch for everything. It also make no sense for the biologist to spend the vast amount of time necessary to learn C for simple and mundane applications if they already know Perl and can implement it in Perl in a shorter amount of time, even if it takes an hour longer to run. I can spend those extra few hours of runtime doing wet lab experiments. And since I don't know C, I can actually get more done using Perl then all the time it would take me learning a new language and going through the hassle of implementing it.

And frankly, why am I going to send my data to someone else to analyze if they are simply going to use the exact same tools that already exist and which I already know how to implement? Why should I wait 6 months or a year for my results while they create a completely new program when I can get the results in a week reusing tried and true programs?

If the computer scientist wants to create a new algorithm, then they are doing their job and that is sufficient for a paper in itself. Besides it is better for the computer scientist because then he gets the credit rather than having to be a co-author on a paper where the program takes second place to the data.

They have their own careers to look after, I have mine. I understand that bioinformatics takes time to develop and I applaud those who develop it. But I am not seeking a career in the development of bioinformatic tools, nor do most biologists. We just use them and then its on to the next step. If there is no preexisting tool, then I'll take the time to work with the computer scientist and wait for them to develop one and then use it to get to the question in hand. But otherwise, I see no reason for the biologist not to take advantage of pre-existing tools and it is absurd to be dismissive of them just because they use a pre-existing tool or a language they are more comfortable with.

I'm not dismissing anyone here or their relative contributions. I'm just being frank about the fact that there are more practical concerns.

Last edited by chadn737; 06-11-2012, 02:58 PM.
Leave a comment:
samanta replied

06-11-2012, 12:53 PM
Originally posted by chadn737 View Post

I don't think the majority of people really care. I know I don't. I'm first and foremost a biologist. Sequencing is just a tool. Bioinformatics is just a tool. The real scientific question is the biology, not which is the best programming language. 10 years from now C++ and most of the bioinformatics will be outdated and lie unused, sequencing will be completely different, but the biology will remain. I think most of the hard core computer scientists here get that and certainly the biologists do. For most of us it is a waste of time writing new programs or rewriting old ones in a different language. It is far far smarter spending an extra hour of my time reusing a slightly slower program written by someone else in perl or python or java and getting my answer that week than spending a year trying to develop something completely new and then getting scooped by the guy who focused on the biology.

I've collaborated with enough computer scientists to know that it typically goes one of two ways:

1) They reuse tools already out there, which would be no different than what I could do on my own.

or

2) They want to develop something completely new and then I don't get my answer for 6 months, when I could have had it within the week and begun doing the follow up experiments.

So I have come to the conclusion that if I am going to collaborate to have that nice new program written in C++, I'd rather do my own work and get that published and let the Computer Scientist develop a program around already published data. Because if I get scooped waiting around that long, I'm the one whose screwed.

Geez !! What a warped view of the world.

Computer science has two components - (i) algorithm development and (ii) coding the algorithm into some programming language. A new algorithm is a mathematical discovery that sometimes takes decades to develop, but once it is in place, if can revolutionize all aspects of science and non-science, including your beloved sequence analysis. Here is the development history of one chain of algorithms -

404 Not Found

http://www.homolog.us/blogs/2011/10/20/burrow-wheeler-transform-suffix-arrays-and-fm-index/

You may notice that when Myers and Manbar were working on the concept of suffix arrays, they had no clue about how the future of sequencing technology would develop, yet two important lines of programs for short read analysis (Bowtie/BWA and String graph assemblers) rely on mathematical constructs developed by them.

Just like you have a reward structure regarding quick publication of your sequence-related paper, computer scientists have a different reward structure related to development of new algorithms. Historically it has been found that their reward structure contributes more to biology than another incremental biology paper. So biologists themselves (those more knowledgeable than you) encourage discoveries of new algorithms.
Leave a comment:
samanta replied

06-11-2012, 12:19 PM
Originally posted by Joann View Post

Hi Samanta,
Two very good links, thanks for the posts.

You are welcome !!
Leave a comment:
chadn737 replied

06-11-2012, 11:09 AM
Originally posted by rskr View Post

A) It is funny that people will spend tens of years programming languages that take five minutes to learn yet spend hours a day waiting for the programs to run.

B) Don't write thousands of lines of code in bash or perl they aren't designed for it. They are weakly typed and don't take advantage of compiler checking, not to mention the languages don't facilitate porting to many platforms.

I don't think the majority of people really care. I know I don't. I'm first and foremost a biologist. Sequencing is just a tool. Bioinformatics is just a tool. The real scientific question is the biology, not which is the best programming language. 10 years from now C and most of the bioinformatics will be outdated and lie unused, sequencing will be completely different, but the biology will remain. I think most of the hard core computer scientists here get that and certainly the biologists do. For most of us it is a waste of time writing new programs or rewriting old ones in a different language. It is far far smarter spending an extra hour of my time reusing a slightly slower program written by someone else in perl or python or java and getting my answer that week than spending a year trying to develop something completely new and then getting scooped by the guy who focused on the biology.

I've collaborated with enough computer scientists to know that it typically goes one of two ways:

1) They reuse tools already out there, which would be no different than what I could do on my own.

or

2) They want to develop something completely new and then I don't get my answer for 6 months, when I could have had it within the week and begun doing the follow up experiments.

So I have come to the conclusion that if I am going to collaborate to have that nice new program written in C, I'd rather do my own work and get that published and let the Computer Scientist develop a program around already published data. Because if I get scooped waiting around that long, I'm the one whose screwed.

Last edited by chadn737; 06-11-2012, 02:08 PM.
Leave a comment:
Joann replied

06-11-2012, 08:35 AM
Hi Samanta,
Two very good links, thanks for the posts.
Leave a comment:
samanta replied

06-07-2012, 05:55 PM
Originally posted by greenhilly View Post

I have an extensive molecular biology background but am relatively new to bioinformatics. Would like to extend my computational/programming skills to maximize utility in analyzing sequencing and other high-throughput data, as well as to improve my own marketability.

Many job postings refer to some combination of Perl/Python/C++/Java experience. Any suggestions regarding where to focus effort, particularly in a forward-looking manner?

Thanks for any suggestions.

Please note that bioinformatics can be done at various levels. Here is my modest attempt to answer your question.

404 Not Found

http://www.homolog.us/blogs/2011/07/22/a-beginners-guide-to-bioinformatics-part-i/

404 Not Found

http://www.homolog.us/blogs/2011/07/22/a-beginners-guide-to-bioinformatics-part-ii/

Searching at a website for folding of a set of miRNA sequences is bioinformatics. Writing server side code for the program that does that folding is also bioinformatics. Analyzing hundreds of expression numbers in excel or R is bioinformatics as well. Those three tasks take three different skills.

Last edited by samanta; 06-11-2012, 12:54 PM.
Leave a comment:
rskr replied

06-03-2012, 12:35 PM
Originally posted by krobison View Post

It's also worth contemplating the huge fraction of security holes in the world that are due to buffer overflow, an easy error to make in C/C++ and a challenging one to make in languages which supply memory management.

In terms of security holes there is a reason no one uses PERL for web development, even though that is what it was originally designed for. Oops my input field has an @ or a $ in it.

Originally posted by krobison View Post

It's also useful to think of all the poor user interfaces in the world, such as entry boxes for social security numbers or credit cards which do not accept human-friendly punctuation or spacing, that are there because it was hard to do in C or a similar language, and so trivial to do in Perl that almost nobody could be too lazy to do them.

A) It is funny that people will spend tens of years programming languages that take five minutes to learn yet spend hours a day waiting for the programs to run.

B) Don't write thousands of lines of code in bash or perl they aren't designed for it. They are weakly typed and don't take advantage of compiler checking, not to mention the languages don't facilitate porting to many platforms.
Leave a comment:
macro123 replied

05-31-2012, 05:11 AM
I have used a travel agent work few years
Leave a comment:
westerman replied

05-30-2012, 10:21 AM
Originally posted by Artem View Post

Where does Perl/Python fit into the mix?

They are a more powerful glue than Bash while being an easier language than C.

A person can write multi-hundred line Bash routines but at some point the scripts become hard to maintain and expand at which point you should use Perl/Python unless you wish to go into the complexities of C/C++.

BTW: My longest bash script is 430 lines and is used to set up ABySS runs in various combinations of paired-end and single-end runs. My Perl scripts can run many times that length.

As I have said before, I consider 'R' to be a different path than bash/perl/python/C. Those languages are similar enough to have a common way of thinking. 'R' is all about statistical computing.

Last edited by westerman; 05-30-2012, 10:23 AM. Reason: Added a comment about 'R'.
Leave a comment:
Artem replied

05-30-2012, 10:09 AM
I'm actually in the same position as Greenhilly, I have started seriously programming about a month ago with background knowledge in Python and bash. My work has been in bash and R though. I use R for calculations and I use bash for data formatting and pipe lining. Eventually I do hope to learn some C for writing functions but I see that as a while away.

Where does Perl/Python fit into the mix?
Leave a comment:
krobison replied

05-14-2012, 02:27 PM
Originally posted by rskr View Post

In a forward looking manner I wouldn't bother with Perl/Python/Java they are mostly just fads and any location you might want to work is just as likely to use the one you don't know, for no other reason than the CEO liked the monty python jokes or coffee. These scripting languages are easy enough to pick up if you know how to program in C, and most cool molecular dynamics simulators are in C for obvious performance reasons. Unix command line utilities are very handy for getting things done, and PERL and Python both draw heavily on the conventions so if you encounter a script done in either of these you should be able to figure out what it does(knowing linux that is).

The distinction between "scripting languages" and "real languages" is a silly one, propagated by snobs. Suggesting that Perl is a fad when it has contributed solidly to science for over 20 years is more than a bit silly. Java is core to using a number of modern high performance frameworks such as Hadoop. The Broad's GATK is entirely in Java.

I am a biologist first & program mostly in Perl, because it fits my brain well. So did C#, which I suspect you would also denigrate -- and I wrote some very sophisticated dynamic programming algorithms (if I do say so myself) in C#.

For most biologists, the extra bookkeeping required by C/C# isn't worth the execution speed advantage. Many other languages offer higher levels of abstraction that are a better fit to their line of thinking.

Ultimately, if you have the time it is worth exploring multiple languages, as many people find that there are a subset that fit their brain well. A rare few individuals are excellent at most. For me, Perl & C# have been the best fits, with Scala probably just missing out.

It's also worth contemplating the huge fraction of security holes in the world that are due to buffer overflow, an easy error to make in C/C++ and a challenging one to make in languages which supply memory management. It's also useful to think of all the poor user interfaces in the world, such as entry boxes for social security numbers or credit cards which do not accept human-friendly punctuation or spacing, that are there because it was hard to do in C or a similar language, and so trivial to do in Perl that almost nobody could be too lazy to do them.

Biologists & hard core computer scientists need to forge links, but I've always found it was the polylingual & inclusive computer whizzes who were a joy to work with; language snobs are likely to have other motes in their eyes which will interfere with collaborations.
Leave a comment:
SeekAnswers replied

05-14-2012, 12:43 PM
I normally think you'd need a good command over unix shell, your choice of scripting language (Perl/Pythin/Ruby) and one object oriented programming language. A decent understanding of SQL queries might be pretty helpful as well depending on the kind of set up you work in.

However for a biologist, writing bioinformatics software in C will be a very steep learning curve, mainly due to understanding memory management, not many computer science majors have a good command over it, so java is a much friendlier programming language which is well used around bioinformatics software.
Leave a comment:
HESmith replied

05-14-2012, 11:59 AM
"...the difference between an 'int' and a 'char'."

An ent (sic int) is a tree-like giant of Middle Earth; a char is a tasty cold-water fish.

(Sorry, I'm a little punchy from lack of sleep)
Leave a comment:
westerman replied

05-14-2012, 11:29 AM
... I said lack of merit, not without merit...

A rather pedantic difference, if you ask me. But maybe we shouldn't expect anything else from a person who knows the difference between an 'int' and a 'char'. :-)

Given the number of people on this forum for whom American English is not their first language I think we should allow a bit of leeway for the subtle differences in stating their opinions.

BTW: I agree with "dpryan". Learn the shell. Learn Perl/Python (or maybe Ruby). Learn R. Learn C. As for the differences between the 4 -- R is the most different in syntax. The other 3 are similar enough to be easy to pick up once you know one of them (although all are hard to master.)
Leave a comment:

Previous 1 2 3 template Next

Exploring the Dynamics of the Tumor Microenvironment

by seqadmin

The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
- Channel: Articles
07-08-2024, 03:19 PM

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News