Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • apfejes
    replied
    Originally posted by Michael.James.Clark View Post
    It's your opinion that the two subjects--code comments and user documentation--can't be discussed simultaneously. I disagree--I think both are important components that contribute to the quality of open-source software and, frankly, to how strong of a software engineer/bioinformaticist/team you are. So kudos to you for "over"-commenting and documenting your own code. Were I on your committee, that would count for something in my book.
    Thanks Michael. I really appreciate that. (=

    I would like to clarify one thing, however. It's not that the two types of documentation can't be discussed simultaneously, but that they shouldn't be discussed simultaneously. One targets developers, one targets users - and both need to be there. Since the thread started explicitly as a comment on the need to document better for future developers, I thought it might be more productive to keep them separate.

    In terms of attracting users, the issues are (IMHO) more straightforward: a good manual, functionality and interface. However, All of that documentation needs to be complete for the development community to embrace a piece of software, in addition to the many other factors involved in getting new developers to buy in. (e.g. choice of language, modularity of code, design, development model, ease of contribution, etc.) I'd rather focus on the later set, as that's where the real meat of this conversation seems to be for me.


    Originally posted by Michael.James.Clark View Post
    And I think our topic here is really that a greater incentive and emphasis should be made on code commenting, unit testing, user documentation, and interaction than is currently being made in the CS/computational biology/bioinformatics academic community.

    As Nils said, software engineering 101--the software needs to be accessible to both users and developers, and I do think it should be judged as such.
    First, I agree with your point that a greater commitment is required to good code practices in development. No doubt academic code has a long way to go.

    Second, I'd still separate the user/developer communities. While bioinformaticians do an ok job of appealing to users (far from good, but the basic elements are there), we do a terrible job of creating community projects, which is what I feel is really holding the field back.

    Where the real gains are to be made in bioinformatics are in making better use of developer's time - If we had all 40 people who had written their own ChIP-Seq code working together instead of generating 40 different peak finders, I think epigenetics would really be accelerated as a field. That would have required a serious, coordinated central project or two in which the learning curve for new developers was as small as possible, aka: good code documentation, etc. Think of a single peak-finder core with the ability for different developers/labs to strap on new modules for it, much the same way R has modules... but now I'm starting to digress.

    Anyhow, I agree with your other points on both sourceforge and rating systems. Though, I still think Nils' point makes sense: If we had a single collective bioinformatics repository, it would be much easier to build resources like the ones you've described around such a facility. Rating, project evaluations, and more open feedback mechanisms could all be integrated. I suspect by simply enriching for a bioinformatics population you'd find many of the mechanisms better used.

    As long as we lack a unified project location (which is encouraged by the academic environment where each lab hosts their own projects), it would be a much greater challenge to integrate all of this. However, I can imagine a bioinformatics portal fulfilling that function.

    Perhaps ECO would be interested in adding a new developer specific component to seqanswers?

    (And yes, I just said portal - the 90's are going to hunt me down for using such an outdated meme.)

    Leave a comment:


  • Michael.James.Clark
    replied
    Originally posted by apfejes View Post
    Michael: I think you're preaching to the choir - but really, there's a huge difference between documentation for developers and documentation for users. Conflating the two - or trying to discuss them both at once - isn't going to result in a productive set of action items.
    I haven't "conflated" anything, though. It's your opinion that the two subjects--code comments and user documentation--can't be discussed simultaneously. I disagree--I think both are important components that contribute to the quality of open-source software and, frankly, to how strong of a software engineer/bioinformaticist/team you are. So kudos to you for "over"-commenting and documenting your own code. Were I on your committee, that would count for something in my book.

    And I think our topic here is really that a greater incentive and emphasis should be made on code commenting, unit testing, user documentation, and interaction than is currently being made in the CS/computational biology/bioinformatics academic community.

    As Nils said, software engineering 101--the software needs to be accessible to both users and developers, and I do think it should be judged as such.

    Originally posted by nilshomer View Post
    I will throw a suggestion of the top of my head to get the ball rolling. How about an open source software repository, like an bio-app store, which allows for a central repository of cutting edge (not ten release ago) bio-apps that have software that meet (or are ranked on) certain criteria on documentation, code standards, and portability. I am sure a grant reviewer would then find it easy to look up past software and say, person Y always produces well-documented and well-used software. Similarly, a requirement of applying for a grant that includes software is submission and release of software that meets these standards. Accountability and training is the key.
    Certainly a central repository for such software would be fantastic. I think it's a great idea, but who would determine the criteria and judge the software and how?

    I'm thinking somewhere in between the like button and the help mailing list.

    Maybe hand-in-hand with that is a more general and open feedback mechanism. I recognize sites like SourceForge/GoogleCode/GitHub/etc. give you the opportunity to give a rating and feedback, but it's under-utilized and rather primitive from what I've seen. It might be nice to have a breakdown of different features--usability, documentation, error reporting, et cetera--that could be rated (perhaps by identified users/developers rather than random faceless anons). A review system might be nice.

    Leave a comment:


  • apfejes
    replied
    Michael: I think you're preaching to the choir - but really, there's a huge difference between documentation for developers and documentation for users. Conflating the two - or trying to discuss them both at once - isn't going to result in a productive set of action items.

    Personally, I've spent far more time maintaining, cleaning and documenting my code than my committee or my advisor really would like. It slows down development and only shows benefits in the long term. As long as code is being developed by people who expect to work on a project for less than a year or so, you're going to have a hard time convincing them of the benefits of good coding practice. And, I think it's relatively obvious, post-docs and grad students rarely have that kind of long term vision unless the project is actively managed by their PI or an institute staff member. (I'd like to think my own code is an exception to the rule, just because I really believe strongly in good coding practices.)

    At any rate, I really like Nils' suggestion of an open source repository for bioinformatics software. I currently use SourceForge (as a few others do), but there's no sense of developer community specific to bioinformatics in that environment. Nor is there a "bioinformatics app-store" specific for that community - thus, it would also be able to help with organizing projects and directing people to contribute to existing software, as well as providing forums much more tailored to bioinformatics needs. Even better, if it could be used to do automatic nightly builds of the software, it would force developers to use unit tests to keep from breaking the head of their trees - nightly builds are a good indication of the stability of software.

    Edit: Just to be clear, nightly builds + nightly unit tests would be a great indication of the stability, as long as the visitors to the site get some stats on the number of unit tests passed, etc. I realize that nightly builds on their own would only fail when compilation errors are present, which, on it's own would be good start.
    Last edited by apfejes; 04-22-2011, 04:20 PM. Reason: clarity

    Leave a comment:


  • nilshomer
    replied
    To summarize, unit tests and code documentation are great make robust code and better communicate across developers. User documentation and usability allow the software to be used more easily by a wider group of people, making your software invaluable and more likely to receive software support funding. Most of this is software engineering 101.

    Nonetheless, these are all indirect incentives, and are not built pro-actively into the funding or even training mechanisms; we have not talked about training individuals for software engineering (is this the PIs role and/or coursework?).

    What I really want to see come from this discussion is how can we train and support computer scientists, bioinformaticians, and biologists so that they write well documented (user and code) usable software?

    I will throw a suggestion of the top of my head to get the ball rolling. How about an open source software repository, like an bio-app store, which allows for a central repository of cutting edge (not ten release ago) bio-apps that have software that meet (or are ranked on) certain criteria on documentation, code standards, and portability. I am sure a grant reviewer would then find it easy to look up past software and say, person Y always produces well-documented and well-used software. Similarly, a requirement of applying for a grant that includes software is submission and release of software that meets these standards. Accountability and training is the key.

    Leave a comment:


  • Michael.James.Clark
    replied
    Well, I am probably expanding on the original topic a bit.

    I think the over-arching topic here is things developers probably don't need for themselves that are very useful to users. In that case, commenting code and strong documentation are both part of that.

    The other thing we all benefit from is strong error reporting, which again I think goes in the same bin of things users would love that developers don't necessarily need (or directly benefit from) for themselves.

    The point I'm making is that including things that may not count to your CS Ph.D committee or that may not have meaning on a grant app can still have an impact on you personally--making your code and programs easier to use for others makes it more likely they'll use it and therefore publish with it/communicate with you/et cetera, which benefits you personally.

    Leave a comment:


  • apfejes
    replied
    Hi Michael,

    I agree about your points when applied to "user documentation", but this particular thread was about "code documentation" and I'm not sure they're interchangeable.

    Leave a comment:


  • Michael.James.Clark
    replied
    There's the point that strong documentation and commenting throughout code doesn't tend to yield grant money or result in stronger publications, but I think it does in a tangential way. For example, making it so other people can utilize your program and code more effectively means you're more likely to have it used in their publications downstream. In that sense, it's a good idea to document well.

    The other major benefit is it typically leads to receiving many fewer questions on usage that suck up time down the line (assuming you actually support your program, which I have to say most academic bioinformatics developers do a bang up job of in my experience).

    Without pointing out specific examples, I can think of a number of programs that will not end up in my publications in the future because they were frankly too hard to use and not well documented enough. One in particular seemed impenetrable despite being incredibly useful in theory--I just had to give up after two days of trying with no email reply to my questions and no strong documentation because it wasn't worth my time.

    On the flip side, some positive examples of programs I've found benefiting from strong documentation include Annovar, BEDtools, BFAST, Dindel, GATK, VCFtools (and others). These are all from academic sources (with varying levels of funding and teamwork on them) and despite being fairly complicated programs generally (okay, BEDtools/VCFtools are very straightforward, but still great) I was able to get them up and running very quickly despite a relatively weak background in programming.

    Decent documentation probably should be a requirement when academic publishing bioinformatic software. We've all seen those programs where there's a paper, a program, and any questions about the program get referred to the paper, which isn't typically helpful. Hopefully that's been changing because of the above-mentioned advantages to strong documentation.

    Leave a comment:


  • cwhelan
    replied
    My thoughts on this are biased by my training in Agile development environments, but here's my two cents on one way to incentivize programmers to write documentation (if you'll stretch the definition a little ):

    My favorite type of documentation is an automated unit test suite, and the best way to get developers to write documentation and keep it up to date is to show them that writing automated tests for their code has benefits for them in the actual software development process. Commenting and documenting always used to end up a low priority for me because I was never sure if I'd end up keeping my code, changing it, or even throwing it away later, as my requirements or understanding of the problem changed. Plus as noted above there's not much external incentive to write them. So I added comments as an afterthought and often let them get out of date.

    When I learned test-driven development, I found that my code got much cleaner and easier to maintain if I wrote unit tests at the same time as production code. Plus it gave me nice documentation; if I forget what a method does the best way to understand it is to read through a well-written unit test.

    Of course, some algorithms are more testable than others (it's really hard to write a unit test that documents a dynamic programming algorithm in my experience), and it's hard to write tests for some of the more script-y things we have to do in bioinformatics. Some things just aren't worth the extra effort to test too, although those are exactly the things that tend to come back to bite me later when I skip testing. Overall, though, I really find writing and working with tested code productive and satisfying.

    Leave a comment:


  • apfejes
    replied
    Open source code with multiple developers does tend to have better documentation, by necessity.On the flip side, single person projects don't have the time or necessity to documenting everything. Thus, larger teams encourage better code both because they have more manpower to devote to documentation and because they have the need for better documentation.

    It's no surprise that Picard is well documented: It's open source, it's well funded, and used/worked on by a lot of people. [Edit: I don't actually know that it's well funded - I've just always thought it was because I know it has several developers working on it concurrently.]

    The recipe for good documentation is to fund software projects so that more people work on them, which means documentation goes from being optional to being required, and the resources exist to do it well. As long as the funds aren't there, you'll get single developer software, which doesn't require the same documentation and the developer probably doesn't have the time to devote to it anyhow.

    Leave a comment:


  • nilshomer
    replied
    A great example of well documented and well written code can be found from the Broad (GATK, and especially Picard). How do we incentivize other groups or graduate students to produce quality and commented code beyond simple altruism? My advisor wanted it yesterday and there is a one-in-a-million case where the competing tool is better are opposite to this goal.

    An extreme requirement would be that if any software is being produced as part of a grant, the code documentation system (javadoc/doxygen/etc) as well as the coding standards are proposed. We could also educate biologists (non-programmers) on the importance of good software engineering practices (beyond timeliness).

    Leave a comment:


  • apfejes
    replied
    I would also add, "my code is well documented" does not fly with PhD committees either.

    Leave a comment:


  • dp05yk
    replied
    This rings true for me - It was difficult to start creating pBWA due to lack of commenting (in fact I am still having trouble understanding how some of the biology stuff works due to this!), however I totally get where Nils is coming from.

    Leave a comment:


  • biznatch
    replied
    I extensively comment all my scripts because I'm pretty new at this and if I didn't I'd forgot what most of it did and have to spend a long time figuring it out again every time I went back to make modifications.

    Leave a comment:


  • dawe
    replied
    Originally posted by nilshomer View Post
    "My code is well-documented" does not get grants unfortunately.
    LOL.
    sad but true.
    BTW, DeNovoG is right about asking for better comments, unfortunately few bioinfo developers have a sufficient experience in large projects where good commenting is somehow mandatory. Also, I suspect most of the bosses do not look at the code but at results instead (especially when the boss is not a developer...).
    My experience: every time I start I try to comment everything, in a few days I start skipping long comments because I'm lazy. After a couple of releases I try to do "offline commenting" and I inevitably wonder "why did I write this? What's that?". At least I try to give functions and variables elucidating names (not simply x, y and i)

    d

    Leave a comment:


  • nilshomer
    replied
    "My code is well-documented" does not get grants unfortunately.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:37 PM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 06:07 PM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
51 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
68 views
0 likes
Last Post seqadmin  
Working...
X