Header Leaderboard Ad


bfast jobs for analyzing AB's SOLiD data



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • bfast jobs for analyzing AB's SOLiD data

    Hello bfast experts,

    I have split the output of AB SOLiD reads into different "reads.j.fastq" files for a speedy parallel processing. Each fastq file ~ 100MB.

    I would really like your help now to resolve an ambiguity in analysis time of the independent bfast jobs. This analysis refers to PART-B of my pervious post.

    Some jobs have converged with final outputs (called *.sam files) in < 5hrs (one of them as little as 1.5 hrs).

    Some jobs seem to be "progressing" much slowly - walltime is nearing 24hrs and its stuck in "bfast postprocess" step. Steps "bfast match" and "bfast localalign" have completed. The output *.sam file size is indeed incrementing slowly. I am concerned about the 5-20 fold diversity in the time duration for results to converge. The jobs are all running on single cores ( I have no choice there - it a matter of principle) - housed at central facility hosting hundreds of uniform cores. So there is uniformity of hardware on the compute nodes.

    Is the diversity in computation a cause of concern indicating a poor reads library preparation or is this the norm .. sometimes results converge after many more iterations than they would otherwise ! It could be stochastic .. Can one implement a flag in bfast postprocess that can speed up computation - AND also use the color space information. I prefer not to compromise on the accuracy of aligning the reads ..

    Hope you can please help,
    Thanks very much,
    a bfast analyzer.
    Last edited by genome_anawk1; 05-20-2011, 11:03 AM.

  • #2
    It may be the pairing-rescue is taking a long time. Try disabling that feature with the "-U" flag. It most likely will not affect the results too much.