Dear Bfast experts and users,
I have 4 questions and hope to seek some answers from the community. My questions are interspersed in the bfast workflow described in two parts below.
PART-1
1- convert experimental data to bfast input (these are for mate-pair library preps).
solid2fastq -n 500000 -o reads *.csfasta *.qual
("reads.j.fastq" , j=1...,N files created)
Q1- Is this command right for mate-pair library prep.
2- reference sequence to nucleotide space and color space
bfast fasta2brg -f ref_genome.fa
bfast fasta2brg -f ref_genome.fa -A 1
3 - create 10 masks using information from manual
generate 10 bif files (M=10)
bfast index -f ref_genome.fa -m <mask> -w 14 -i <index number> -A 1
10 is optimal for analyzing the human genome as suggested by the authors for bfast.
Q2- Can anyone please suggest a number for the mouse genome? An approximate value would be good enough - say 15 or 50?
I prefer not to compromise on sensitivity - so I prefer to use all indices to map short reads - part2 .. summarized below.
PART-2
1- bfast match
bfast match -f ref_genome.fa -A 1 -r reads.<N>.fastq > bfast.matches.file.ref_genome.<N>.bmf
2- bfast localalign
bfast localalign -f ref_genome.fa -m bfast.matches.file.ref_genome.<N>.bmf -A 1 > bfast.aligned.file.ref_genome.<N>.baf
3- bfast postprocess
bfast postprocess -f ref_genome.fa -i bfast.aligned.file.ref_genome.<N>.baf -A 1 > bfast.reported.file.ref_genome.<N>.sam
Q3- If I choose to split jobs for PART-2, for specific indices, Do I use bmfmerge after #1 and before #2 ?
Q4 - With bmfmerge, What would be the cutoff / flag for reasonable analysis using the mouse genome? An example command would help please. It has been suggested that a value of "-M 500" may be useful when aligning the Human Genome. I would welcome any suggestions for the mouse genome.
Hope you can please help,
Thanks very much in advance,
cheers,
another new bfast analyzer.
---------------
I have 4 questions and hope to seek some answers from the community. My questions are interspersed in the bfast workflow described in two parts below.
PART-1
1- convert experimental data to bfast input (these are for mate-pair library preps).
solid2fastq -n 500000 -o reads *.csfasta *.qual
("reads.j.fastq" , j=1...,N files created)
Q1- Is this command right for mate-pair library prep.
2- reference sequence to nucleotide space and color space
bfast fasta2brg -f ref_genome.fa
bfast fasta2brg -f ref_genome.fa -A 1
3 - create 10 masks using information from manual
generate 10 bif files (M=10)
bfast index -f ref_genome.fa -m <mask> -w 14 -i <index number> -A 1
10 is optimal for analyzing the human genome as suggested by the authors for bfast.
Q2- Can anyone please suggest a number for the mouse genome? An approximate value would be good enough - say 15 or 50?
I prefer not to compromise on sensitivity - so I prefer to use all indices to map short reads - part2 .. summarized below.
PART-2
1- bfast match
bfast match -f ref_genome.fa -A 1 -r reads.<N>.fastq > bfast.matches.file.ref_genome.<N>.bmf
2- bfast localalign
bfast localalign -f ref_genome.fa -m bfast.matches.file.ref_genome.<N>.bmf -A 1 > bfast.aligned.file.ref_genome.<N>.baf
3- bfast postprocess
bfast postprocess -f ref_genome.fa -i bfast.aligned.file.ref_genome.<N>.baf -A 1 > bfast.reported.file.ref_genome.<N>.sam
Q3- If I choose to split jobs for PART-2, for specific indices, Do I use bmfmerge after #1 and before #2 ?
Q4 - With bmfmerge, What would be the cutoff / flag for reasonable analysis using the mouse genome? An example command would help please. It has been suggested that a value of "-M 500" may be useful when aligning the Human Genome. I would welcome any suggestions for the mouse genome.
Hope you can please help,
Thanks very much in advance,
cheers,
another new bfast analyzer.
---------------
Comment