I want to do this in order to be able to use Bowtie software, but it doesn't support SOLiD data. Is there software to convert it?
Header Leaderboard Ad
Collapse
Convert SOLiD fastq to Illumina fastq
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by samt View PostI want to do this in order to be able to use Bowtie software, but it doesn't support SOLiD data. Is there software to convert it?
-
Originally posted by samt View PostThanks, will BFAST work on a machine with <3 GBs of memory? (Mapping to whole mouse genome)
Comment
-
Originally posted by samt View PostThanks, will BFAST work on a machine with <3 GBs of memory? (Mapping to whole mouse genome)
Comment
-
Originally posted by samt View PostI want to do this in order to be able to use Bowtie software, but it doesn't support SOLiD data. Is there software to convert it?
Comment
-
Originally posted by nilshomer View PostMouse genome is ~3Gb right? In either case, you will have to split the reference to get it to fit on your machine. This is easy and supported with BFAST, although 3Gb is not very much RAM at all. How many reads, what length, and what computational resources do you have?
It seems you are actively still developing BFAST and there isn't much documentation so its hard to tell if it has all the options I will need for the data. I will want to map, assemble and and align back to the genome.
Comment
-
Originally posted by samt View Post~100 million reads, 34 bps (SOliD), I was hoping to use my machine but I have a powerful enough cluster as well.
It seems you are actively still developing BFAST and there isn't much documentation so its hard to tell if it has all the options I will need for the data. I will want to map, assemble and and align back to the genome.
Comment
-
Originally posted by samt View PostI'm not worried about calling SNPS, just obtaining a consensus sequence mapped to the genome. Would converting to NT from CS still be a bad choice?
Comment
-
If your SOLiD FASTQ like this below, you can try this script to convert to Solexa/Illumina FASTQ
Code:@BARB_20071114_2_YorubanMP-BC3_3_16_150_F3 T0220100010131232212020122 +BARB_20071114_2_YorubanMP-BC3_3_16_150_F3 15 21 27 26 24 5 23 18 26 21 11 25 25 19 8 4 25 8 24 7 4 15 18 19 15
Attached FilesLast edited by BENM; 08-30-2009, 02:15 AM.
Comment
-
Hi BENM,
Thanks for providing the perl script. I am using the SOLiD files from 1000 genome project, and data look like this:
@VAB_Solid0044_20080423_1_Pilot2_YRI_1_8_3KB_MP_11137_718_114
G2203012023131303312303100
+
!611%%(-+%*.&*.,&2,,'%()31
So with your script, the quality line got lost. Just wonder in this case the original quality line can be kept without any change other than removing the first char. I am new to SOLiD data, so want to double check with you. It may be useful for others if you can modify your script to accommodate this format.
Thanks
Comment
-
Originally posted by pliang View PostHi BENM,
Thanks for providing the perl script. I am using the SOLiD files from 1000 genome project, and data look like this:
@VAB_Solid0044_20080423_1_Pilot2_YRI_1_8_3KB_MP_11137_718_114
G2203012023131303312303100
+
!611%%(-+%*.&*.,&2,,'%()31
So with your script, the quality line got lost. Just wonder in this case the original quality line can be kept without any change other than removing the first char. I am new to SOLiD data, so want to double check with you. It may be useful for others if you can modify your script to accommodate this format.
Thanks
Because samt's question is "Convert SOLiD fastq to Illumina fastq", Illumina FASTQ is different from Standard(Sanger) FASTQ in quality format.
The syntax of Solexa/Illumina read format is almost identical to the FASTQ format, but the qualities are scaled differently. Given a character $sq, the following Perl code gives the Phred quality $Q:
$Q = 10 * log(1 + 10 ** (ord($sq) - 64) / 10.0)) / log(10);
The ASCII charactars in Solexa FASTQ means:
Code:CHAR DEC QUALITY A 65 1 B 66 2 C 67 3 D 68 4 E 69 5 F 70 6 G 71 7 H 72 8 I 73 9 J 74 10 K 75 11 L 76 12 M 77 13 N 78 14 O 79 15 P 80 16 Q 81 17 R 82 18 S 83 19 T 84 20 U 85 21 V 86 22 W 87 23 X 88 24 Y 89 25 Z 90 26 [ 91 27 \ 92 28 ] 93 29 ^ 94 30 _ 95 31 ` 96 32 a 97 33 b 98 34 c 99 35 d 100 36 e 101 37 f 102 38 g 103 39 h 104 40 ; 59 -5 < 60 -4 = 61 -3 > 62 -2 ? 63 -1 @ 64 0
Code:CHAR DEC QUALITY ! 0 -64 ! 1 -63 ! 2 -62 ! 3 -61 ! 4 -60 ! 5 -59 ! 6 -58 ! 7 -57 ! 8 -56 ! 9 -55 ! 10 -54 ! 11 -53 ! 12 -52 ! 13 -51 ! 14 -50 ! 15 -49 ! 16 -48 ! 17 -47 ! 18 -46 ! 19 -45 ! 20 -44 ! 21 -43 ! 22 -42 ! 23 -41 ! 24 -40 ! 25 -39 ! 26 -38 ! 27 -37 ! 28 -36 ! 29 -35 ! 30 -34 ! 31 -33 ! 32 -32 ! 33 -31 ! 34 -30 ! 35 -29 ! 36 -28 ! 37 -27 ! 38 -26 ! 39 -25 ! 40 -24 ! 41 -23 ! 42 -22 ! 43 -21 ! 44 -20 ! 45 -19 ! 46 -18 ! 47 -17 ! 48 -16 ! 49 -15 ! 50 -14 ! 51 -13 ! 52 -12 ! 53 -11 ! 54 -10 " 55 -9 " 56 -8 " 57 -7 " 58 -6 " 59 -5 " 60 -4 # 61 -3 # 62 -2 $ 63 -1 $ 64 0 % 65 1 % 66 2 & 67 3 & 68 4 ' 69 5 ( 70 6 ) 71 7 * 72 8 + 73 9 + 74 10 , 75 11 - 76 12 . 77 13 / 78 14 0 79 15 1 80 16 2 81 17 3 82 18 4 83 19 5 84 20 6 85 21 7 86 22 8 87 23 9 88 24 : 89 25 ; 90 26 < 91 27 = 92 28 > 93 29 ? 94 30 @ 95 31 A 96 32 B 97 33 C 98 34 D 99 35 E 100 36 F 101 37 G 102 38 H 103 39 I 104 40 J 105 41 K 106 42 L 107 43 M 108 44 N 109 45 O 110 46 P 111 47 Q 112 48 R 113 49 S 114 50 T 115 51 U 116 52 V 117 53 W 118 54 X 119 55 Y 120 56 Z 121 57 [ 122 58 \ 123 59 ] 124 60 ^ 125 61 _ 126 62 ` 127 63 a 128 64
# Solexa->Sanger quality conversion table
my @conv_table;
for (-64..64) {
$conv_table[$_+64] = chr(int(33 + 10*log(1+10**($_/10.0))/log(10)+.499));
}
I am trying to write a universal script for Solexa/Illumina, SOLiD/ABi, 454/Roche, 3730/Sanger,...transforming to each other format for different purpose, but I need to know your requirements, after that, I will share it to you all.
Hope I answer your question.
BTW I attach the SOLiD2std.pl for your question, just make a little change in SOLiD2Solexa.plAttached FilesLast edited by BENM; 03-26-2012, 07:40 PM.
Comment
-
Hi BENM:
Thank you for response with the new information. It happens that I need to convert the SOLiD color space sequence in fastq to Solexa format for its sequence and quality format. I believe the quality score is already in the AscII scheme (see the copied sequence entry in my first email), that is why I thought that that quality score line can be kept without change for my use. Am I right about this? In any case, I think tool for converting among different format of the data from different platform can be useful for us. Thanks again?
Comment
Latest Articles
Collapse
-
by seqadmin
Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...-
Channel: Articles
03-10-2023, 05:31 AM -
-
by seqadmin
Using automation to prepare sequencing libraries isn’t a new concept, and most researchers are aware that there are numerous benefits to automating this process. However, many labs are still hesitant to switch to automation and often believe that it’s not suitable for their lab. To combat these concerns, we’ll cover some of the key advantages, review the most important considerations, and get real-world advice from automation experts to remove any lingering anxieties....-
Channel: Articles
02-21-2023, 02:14 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-17-2023, 12:32 PM
|
0 responses
7 views
0 likes
|
Last Post
by seqadmin
03-17-2023, 12:32 PM
|
||
Started by seqadmin, 03-15-2023, 12:42 PM
|
0 responses
17 views
0 likes
|
Last Post
by seqadmin
03-15-2023, 12:42 PM
|
||
Started by seqadmin, 03-09-2023, 10:17 AM
|
0 responses
66 views
1 like
|
Last Post
by seqadmin
03-09-2023, 10:17 AM
|
||
Started by seqadmin, 03-03-2023, 12:03 PM
|
0 responses
64 views
0 likes
|
Last Post
by seqadmin
03-03-2023, 12:03 PM
|
Comment