bfast jobs for analyzing AB's SOLiD data

genome_anawk1

Junior Member

Join Date: May 2011

Posts: 7
- Share
- Tweet
#1

bfast jobs for analyzing AB's SOLiD data

05-20-2011, 11:00 AM

Hello bfast experts,

I have split the output of AB SOLiD reads into different "reads.j.fastq" files for a speedy parallel processing. Each fastq file ~ 100MB.

I would really like your help now to resolve an ambiguity in analysis time of the independent bfast jobs. This analysis refers to PART-B of my pervious post.

Some jobs have converged with final outputs (called *.sam files) in < 5hrs (one of them as little as 1.5 hrs).

Some jobs seem to be "progressing" much slowly - walltime is nearing 24hrs and its stuck in "bfast postprocess" step. Steps "bfast match" and "bfast localalign" have completed. The output *.sam file size is indeed incrementing slowly. I am concerned about the 5-20 fold diversity in the time duration for results to converge. The jobs are all running on single cores ( I have no choice there - it a matter of principle) - housed at central facility hosting hundreds of uniform cores. So there is uniformity of hardware on the compute nodes.

Is the diversity in computation a cause of concern indicating a poor reads library preparation or is this the norm .. sometimes results converge after many more iterations than they would otherwise ! It could be stochastic .. Can one implement a flag in bfast postprocess that can speed up computation - AND also use the color space information. I prefer not to compromise on the accuracy of aligning the reads ..

Hope you can please help,
Thanks very much,
a bfast analyzer.

Last edited by genome_anawk1; 05-20-2011, 11:03 AM.
Tags: None
nilshomer

Nils Homer

Join Date: Nov 2008

Posts: 1283
- Share
- Tweet
#2

05-20-2011, 12:45 PM

It may be the pairing-rescue is taking a long time. Try disabling that feature with the "-U" flag. It most likely will not affect the results too much.
Comment

Previous template Next

Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing

by GATTACAT

Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
- Channel: Articles
07-01-2026, 11:43 AM
Nine Things a Sample Prep Scientist Thinks About Before Sequencing

by SEQadmin2

I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

Here are nine questions we think about, in roughly the order they matter, before...
- Channel: Articles
06-18-2026, 07:11 AM

Topics	Statistics	Last Post
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, Yesterday, 11:05 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:05 AM
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 28 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 27 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 26 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM

Unconfigured Ad

bfast jobs for analyzing AB's SOLiD data

Comment

Latest Articles

ad_right_rmr

News