Hello,
I am currently implementing color-space assembly into Ray, a de novo assembler running 100 % with message passing interface.
I have read several documents on color space.
Document 1
SOLiDTM Data Format and File Definitions Guide
Document 2
SOLiDTM de novo accessory tools 2.0
Document 3
Applied Biosystems SOLiDTM 3 Plus System, De Novo Assembly Protocol
Document 4
SRA Handbook
I have two questions:
Question 1
From SRR001354.fastq (SRA001031, converted sra file to fastq):
What is the meaning of the trailing F and B in the sequences above ?
Nothing is said about that in Document 1.
As I understand it (I might be wrong), a color-space sequence has a starting nucleotide for bootstrapping. Also, the first color (after the starting nucleotide) depends on the starting nucleotide.
Other colors are independent.
Am I right?
Question 2
For de novo assembly, one must skip the starting nucleotide and skip the first color, and convert the remaining colors to double-encoding.
Also, the reverse-complement of a vertex is simply the reverse, and so it is for any sequences of SOLiD colors. Right ?
So, how does a color-space contig is converted to base-space ?
As I see it, there are 4 possible base-space versions for any color-space sequence -- one for each possible starting letter. Am I right ?
Since an assembly has more than 1 color-space contig, I see there a great deal of combinatorics.
Thank you in advance for your anticipated collective wisdom.
Sébastien
PhD student
I am currently implementing color-space assembly into Ray, a de novo assembler running 100 % with message passing interface.
I have read several documents on color space.
Document 1
SOLiDTM Data Format and File Definitions Guide
Document 2
SOLiDTM de novo accessory tools 2.0
Document 3
Applied Biosystems SOLiDTM 3 Plus System, De Novo Assembly Protocol
Document 4
SRA Handbook
I have two questions:
Question 1
From SRR001354.fastq (SRA001031, converted sra file to fastq):
...
@SRR001354.1 S0013_20071128_2_DH10BFC_461_28_1048_F3 length=35
T2333132333313233232313333333233323F
+SRR001354.1 S0013_20071128_2_DH10BFC_461_28_1048_F3 length=35
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
...
@SRR001354.12 S0013_20071128_2_DH10BFC_461_59_1483_F3 length=35
T2133131333313111331113331131231133B
+SRR001354.12 S0013_20071128_2_DH10BFC_461_59_1483_F3 length=35
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&
...
@SRR001354.1 S0013_20071128_2_DH10BFC_461_28_1048_F3 length=35
T2333132333313233232313333333233323F
+SRR001354.1 S0013_20071128_2_DH10BFC_461_28_1048_F3 length=35
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
...
@SRR001354.12 S0013_20071128_2_DH10BFC_461_59_1483_F3 length=35
T2133131333313111331113331131231133B
+SRR001354.12 S0013_20071128_2_DH10BFC_461_59_1483_F3 length=35
!%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&
...
What is the meaning of the trailing F and B in the sequences above ?
Nothing is said about that in Document 1.
As I understand it (I might be wrong), a color-space sequence has a starting nucleotide for bootstrapping. Also, the first color (after the starting nucleotide) depends on the starting nucleotide.
Other colors are independent.
Am I right?
Question 2
For de novo assembly, one must skip the starting nucleotide and skip the first color, and convert the remaining colors to double-encoding.
Also, the reverse-complement of a vertex is simply the reverse, and so it is for any sequences of SOLiD colors. Right ?
So, how does a color-space contig is converted to base-space ?
As I see it, there are 4 possible base-space versions for any color-space sequence -- one for each possible starting letter. Am I right ?
Since an assembly has more than 1 color-space contig, I see there a great deal of combinatorics.
Thank you in advance for your anticipated collective wisdom.
Sébastien
PhD student
Comment