Hello,
I'm new to the use of ColorSpace ABI-SOLiD data (and new to SeqAnswers) and would like to inquire about the use of BFast to map my reads to a genome. I began with mate paired genomic ABI SOLiD output (.csfasta and .qual) with separation ~2000bp, and converted them into fastQ format using galaxy's solid2fastq.py script. The result after specifying -t (to trim off the _F3/_R3) gives me these fastq lines for F3:
@187_27_140/2
T33212113201210222220322220022222222222222322220222
+
%%&'%((()(-%&&%&)('%&%**('.*))*(***&%*4*)&)*)'&**.
@187_27_171/2
T21233202221321222220222220322221222222222122221222
+
%')'%%%&&)%%((%%'%)&+))*(&()*)*)***)%).*.'****%*.*
@187_27_181/2
T32223210102203121222032222022220222222222222222222
and these fastq lines for R3:
@187_27_140/1
G2.2...32.0233.2..22..03.21.20...2211.2..1.2..1.12.
+
>!)!!!;%!%:=&!&!!('!!9-!)&!*%!!!((*5!(!!9!)!!*!('!
@187_27_171/1
G0.2...22.3002.3..21..30.32.32...2333.2..3.2..1.12.
+
=!%!!!=+!A>@%!@!!'@!!?&!@2!+>!!!&B@@!)!!@!+!!2!%&!
@187_27_181/1
G1.2...02.3102.1..30..02.22.03...2201.0..2.2..2.22.
My question now is how do I imput this data into the command line for Bfast match to read both the mate paired datasets (F3 and R3) while keeping my predetermined 2kb distance between the mates.
the manual describes:
The first line begins with the @ symbol. The rest of the first
line will be the read name. The second line contains the sequence for the read.
The reads must be 5prime to 3 prime from left-to-right.
The third line will begin with the + symbol.
The rest of the line can be empty or contain an arbitrary comment string. The fourth line will contain the sequence qualities.
If I'm reading the G01238... on my _R3 then am I from 3prime to 5prime? If not then do I need to flip these reads that are in the R3?
"For ABI SOLiD or color space reads, the adaptor should be included in the
sequence and the colors should be encoded as [0−4] with 4 signifying a unknown color. There should be one Phred-like quality score for each base in the sequence (or number of colors for ABI SOLiD data)."
Where can I find an adaptor in my reads?
"For paired end or multi end data, each end should be specified separately
but have the same read name. The should be listed in order of 50 ! 30 from left-to-right and on the same strand. Multi end, paired end, or single end data can be incorporated into the same Reads FASTQ file as long as the data follows the above rules."
It asks for the R3/F3 to be specified separately but have the same name. Is this just in the command line argument for $ bfast match ?? or is there a difference in the actual reads in the fastq file.
Perhaps the solution to this is to use the Bfast solid2fastq converter, but the usage on it with mate paired reads is unclear.
Thanks for your help and imput,
Phil
I'm new to the use of ColorSpace ABI-SOLiD data (and new to SeqAnswers) and would like to inquire about the use of BFast to map my reads to a genome. I began with mate paired genomic ABI SOLiD output (.csfasta and .qual) with separation ~2000bp, and converted them into fastQ format using galaxy's solid2fastq.py script. The result after specifying -t (to trim off the _F3/_R3) gives me these fastq lines for F3:
@187_27_140/2
T33212113201210222220322220022222222222222322220222
+
%%&'%((()(-%&&%&)('%&%**('.*))*(***&%*4*)&)*)'&**.
@187_27_171/2
T21233202221321222220222220322221222222222122221222
+
%')'%%%&&)%%((%%'%)&+))*(&()*)*)***)%).*.'****%*.*
@187_27_181/2
T32223210102203121222032222022220222222222222222222
and these fastq lines for R3:
@187_27_140/1
G2.2...32.0233.2..22..03.21.20...2211.2..1.2..1.12.
+
>!)!!!;%!%:=&!&!!('!!9-!)&!*%!!!((*5!(!!9!)!!*!('!
@187_27_171/1
G0.2...22.3002.3..21..30.32.32...2333.2..3.2..1.12.
+
=!%!!!=+!A>@%!@!!'@!!?&!@2!+>!!!&B@@!)!!@!+!!2!%&!
@187_27_181/1
G1.2...02.3102.1..30..02.22.03...2201.0..2.2..2.22.
My question now is how do I imput this data into the command line for Bfast match to read both the mate paired datasets (F3 and R3) while keeping my predetermined 2kb distance between the mates.
the manual describes:
The first line begins with the @ symbol. The rest of the first
line will be the read name. The second line contains the sequence for the read.
The reads must be 5prime to 3 prime from left-to-right.
The third line will begin with the + symbol.
The rest of the line can be empty or contain an arbitrary comment string. The fourth line will contain the sequence qualities.
If I'm reading the G01238... on my _R3 then am I from 3prime to 5prime? If not then do I need to flip these reads that are in the R3?
"For ABI SOLiD or color space reads, the adaptor should be included in the
sequence and the colors should be encoded as [0−4] with 4 signifying a unknown color. There should be one Phred-like quality score for each base in the sequence (or number of colors for ABI SOLiD data)."
Where can I find an adaptor in my reads?
"For paired end or multi end data, each end should be specified separately
but have the same read name. The should be listed in order of 50 ! 30 from left-to-right and on the same strand. Multi end, paired end, or single end data can be incorporated into the same Reads FASTQ file as long as the data follows the above rules."
It asks for the R3/F3 to be specified separately but have the same name. Is this just in the command line argument for $ bfast match ?? or is there a difference in the actual reads in the fastq file.
Perhaps the solution to this is to use the Bfast solid2fastq converter, but the usage on it with mate paired reads is unclear.
Thanks for your help and imput,
Phil
Comment