Hello,
I'm trying to get statistics on an individual position basis in a pileup/contig of SOLiD reads mapped to a reference sequence (e.g. what percentage of reads have nucleotide G at position 236 of the reference sequence).
I'm starting with a SOLiD .csfasta.ma file (output of SOLiD's mapping program). Is there a script that can take this file and return a big table/array in base-space in which the columns are the individual positions of the padded reference sequence and the rows are the individual reads (table entries are the nucleotides of the reads at each position, or blank/padding)?
I know there are viewers that can produce graphs etc. with some of this information, but I actually need the full table itself which I could then process with R, Matlab etc.
If there is no such script, are there at least scripts that can convert the color-space .csfasta.ma alignment file into some base-space format like .ace or SAM, which I could then parse myself to create the table I need.
Thanks, g_solid
I'm trying to get statistics on an individual position basis in a pileup/contig of SOLiD reads mapped to a reference sequence (e.g. what percentage of reads have nucleotide G at position 236 of the reference sequence).
I'm starting with a SOLiD .csfasta.ma file (output of SOLiD's mapping program). Is there a script that can take this file and return a big table/array in base-space in which the columns are the individual positions of the padded reference sequence and the rows are the individual reads (table entries are the nucleotides of the reads at each position, or blank/padding)?
I know there are viewers that can produce graphs etc. with some of this information, but I actually need the full table itself which I could then process with R, Matlab etc.
If there is no such script, are there at least scripts that can convert the color-space .csfasta.ma alignment file into some base-space format like .ace or SAM, which I could then parse myself to create the table I need.
Thanks, g_solid
Comment