Hi All,
I was assigned a work to split a SFF file into a number of adapter specific SFF files.
If I read the SFF file into R as a SFFContainer, the reads slot looks like:
----------------------------------------------------------------
A QualityScaledDNAStringSet instance containing:
A DNAStringSet instance of length 377894
width seq names
[1] 211 TCAGAAGAGGATTCGATCTCG...GCCAAGCACACAGGGATAGG G2FU2:4:10
[2] 80 TCAGAAGAGGATTCGATTATA...TTCTCTCTCACAAGTTACAC G2FU2:4:47
[3] 46 TCAGAAGAGGATTCGTCTGCT...GTTGTCTTCTCTAAAATGCT G2FU2:4:49
[4] 180 TCAGTAAGGAGAACGATAGGC...GCCAAGGCAGACAGGGATAG G2FU2:5:15
[5] 133 TCAGCTAAGGTAACGATCTGA...TGTGTACATATCATGAGAGT G2FU2:5:16
[6] 65 TCAGCTAAGGTAACGATATTT...GTCATTCAAATGTCAAGTGA G2FU2:5:48
[7] 72 TCAGCTAAGGTAACGATGATC...TTAAGAAGTAAAATATAATA G2FU2:7:47
[8] 36 TCAGTAAGGAGAACGATTAGGTAACTTAATAAAAAT G2FU2:8:47
[9] 50 TCAGCAAGGTAACGTTGATAT...ACTGAGATACTTATCTTATT G2FU2:8:49
... ... ...
[377886] 296 TCAGTAAGGAGAACGATCTTT...GCACAGACGGGAAGGTAGAG G2FU2:1146:1271
[377887] 292 TCAGTAAGGAGAACGATGACT...CAGCAGCACAGAGGCGAGAG G2FU2:1146:1272
[377888] 191 TCAGTAAGGAGAACGATACTC...CAAGGCACACAGGGGATAGG G2FU2:1147:1252
[377889] 287 TCAGCAGAAGGAACGATGATC...AGAGCGAGCAAGCAGACAGG G2FU2:1147:1254
[377890] 292 TCAGTAAGGAGAACGATATCG...CTACTCGAGGAGACAGGTAG G2FU2:1147:1258
[377891] 281 TCAGCAGAAGGAACGATCGTC...GCGAAGGCAGCACAGGAGTA G2FU2:1147:1262
[377892] 274 TCAGCTAAGGTAACGATCAAA...CCGATGCCCATAGAGTGCAG G2FU2:1147:1269
[377893] 283 TCAGCTAAGGTAACGATGACT...CAAGGCACACAGGGAGTAGG G2FU2:1147:1271
[377894] 301 TCAGCTAAGGTAACGATATTC...AGACACGGAGGTAGAGTGTA G2FU2:1147:1274
A PhredQuality instance of length 377894
width seq names
[1] 211 AAAAA:>;>382(16549@00...+4.4&+++*11,,0%**33. G2FU2:4:10
[2] 80 @7==B=@@@>?>37<7714:8...-(*-***-**--(*-(*--/ G2FU2:4:47
[3] 46 A3225.000/13-21/00---...**&-**-&--***--&**-1 G2FU2:4:49
[4] 180 BBCCCC>B>BCC>BBBBBC>C....-&++/+0...1235,33// G2FU2:5:15
[5] 133 >3300,0+1(--(01110000...*********-*1**-**-*- G2FU2:5:16
[6] 65 >59::585:28<2;9456:<....-*-**(*--%***-.(*-*2 G2FU2:5:48
[7] 72 ;222313/3-00(01/0*--*...*%-(*-(***--%--*-(-- G2FU2:7:47
[8] 36 @7==>A:>9>>>7<757.21,0/-//%-(+/224)2 G2FU2:8:47
[9] 50 ;000-*&-&--(,-*&**--0...0***-----*-&-*-*&**. G2FU2:8:49
... ... ...
[377886] 296 B@@>::/929552<@::188)...+1..,,,**-%++/+***** G2FU2:1146:1271
[377887] 292 EEEEDD?C?CDD>CCCCCCCC...***,/1****,,.&****** G2FU2:1146:1272
[377888] 191 DDDCBC>C>[email protected];2:28?A::;BEE.?;84, G2FU2:1147:1252
[377889] 287 @668@CCC9C?C?DDDECCCE...0//,0****++,,*****1- G2FU2:1147:1254
[377890] 292 DDDEDD@D>[email protected],,,012,1,,,,4&+++ G2FU2:1147:1258
[377891] 281 @?AAA@@@:A:><>@@>>>;:...3++.&++/41++++3.1/++ G2FU2:1147:1262
[377892] 274 CCCCCCC?C:>>7A;=@<<;,...8&++7758,+**+++****1 G2FU2:1147:1269
[377893] 283 AAAAA>A9A57<14;66.24=...-,5;46,,,+,8;/+++34, G2FU2:1147:1271
[377894] 301 @@@>=9<6<*+02657631+2...*+*++11,+*%*.******1 G2FU2:1147:1274
------------------------------------------------------------------
And the adapters are like (total 96 adaptors):
AdaptName AdaptSeq
1 IonXpress_001 CTAAGGTAAC
2 IonXpress_002 TAAGGAGAAC
3 IonXpress_003 AAGAGGATTC
4 IonXpress_004 TACCAAGATC
5 IonXpress_005 CAGAAGGAAC
6 IonXpress_006 CTGCAAGTTC
7 IonXpress_007 TTCGTGATTC
8 IonXpress_008 TTCCGATAAC
9 IonXpress_009 TGAGCGGAAC
10 IonXpress_010 CTGACCGAAC
............................
Could any one please tell me if you have an idea about the meaning of "adapter specific SFF files"?
In order to classify each read by the adapters, should I align all adapters on each sequence, some thing similar to the following?
TCAGTACTGAGCTACAGTACACGATGCGTCCAGGAACCATCGGATGGCAATCG - sequence
TCGTATGCCG (scan all positions until the end) - (m=2, i=1, d=1)
TCGTATGCC - (m=2, i=1, d=0)
TCGTATGC - (m=2, i=1, d=0)
TCGTATG - (m=1, i=1, d=0)
TCGTAT - (m=1, i=1, d=0)
TCGTA - (m=1, i=1, d=0)
TCGT - (m=1, i=0, d=0)
TCG - (m=1, i=0, d=0) -> match!
Or should I find a specific adapter for each read by the functions on the page 15
Should I trim down each sequence from clipAdapterLeft position to clipAdapterRight position before any alignment or any other work?
Thank you very much in advance.
Best,
Heidi
I was assigned a work to split a SFF file into a number of adapter specific SFF files.
If I read the SFF file into R as a SFFContainer, the reads slot looks like:
----------------------------------------------------------------
A QualityScaledDNAStringSet instance containing:
A DNAStringSet instance of length 377894
width seq names
[1] 211 TCAGAAGAGGATTCGATCTCG...GCCAAGCACACAGGGATAGG G2FU2:4:10
[2] 80 TCAGAAGAGGATTCGATTATA...TTCTCTCTCACAAGTTACAC G2FU2:4:47
[3] 46 TCAGAAGAGGATTCGTCTGCT...GTTGTCTTCTCTAAAATGCT G2FU2:4:49
[4] 180 TCAGTAAGGAGAACGATAGGC...GCCAAGGCAGACAGGGATAG G2FU2:5:15
[5] 133 TCAGCTAAGGTAACGATCTGA...TGTGTACATATCATGAGAGT G2FU2:5:16
[6] 65 TCAGCTAAGGTAACGATATTT...GTCATTCAAATGTCAAGTGA G2FU2:5:48
[7] 72 TCAGCTAAGGTAACGATGATC...TTAAGAAGTAAAATATAATA G2FU2:7:47
[8] 36 TCAGTAAGGAGAACGATTAGGTAACTTAATAAAAAT G2FU2:8:47
[9] 50 TCAGCAAGGTAACGTTGATAT...ACTGAGATACTTATCTTATT G2FU2:8:49
... ... ...
[377886] 296 TCAGTAAGGAGAACGATCTTT...GCACAGACGGGAAGGTAGAG G2FU2:1146:1271
[377887] 292 TCAGTAAGGAGAACGATGACT...CAGCAGCACAGAGGCGAGAG G2FU2:1146:1272
[377888] 191 TCAGTAAGGAGAACGATACTC...CAAGGCACACAGGGGATAGG G2FU2:1147:1252
[377889] 287 TCAGCAGAAGGAACGATGATC...AGAGCGAGCAAGCAGACAGG G2FU2:1147:1254
[377890] 292 TCAGTAAGGAGAACGATATCG...CTACTCGAGGAGACAGGTAG G2FU2:1147:1258
[377891] 281 TCAGCAGAAGGAACGATCGTC...GCGAAGGCAGCACAGGAGTA G2FU2:1147:1262
[377892] 274 TCAGCTAAGGTAACGATCAAA...CCGATGCCCATAGAGTGCAG G2FU2:1147:1269
[377893] 283 TCAGCTAAGGTAACGATGACT...CAAGGCACACAGGGAGTAGG G2FU2:1147:1271
[377894] 301 TCAGCTAAGGTAACGATATTC...AGACACGGAGGTAGAGTGTA G2FU2:1147:1274
A PhredQuality instance of length 377894
width seq names
[1] 211 AAAAA:>;>382(16549@00...+4.4&+++*11,,0%**33. G2FU2:4:10
[2] 80 @7==B=@@@>?>37<7714:8...-(*-***-**--(*-(*--/ G2FU2:4:47
[3] 46 A3225.000/13-21/00---...**&-**-&--***--&**-1 G2FU2:4:49
[4] 180 BBCCCC>B>BCC>BBBBBC>C....-&++/+0...1235,33// G2FU2:5:15
[5] 133 >3300,0+1(--(01110000...*********-*1**-**-*- G2FU2:5:16
[6] 65 >59::585:28<2;9456:<....-*-**(*--%***-.(*-*2 G2FU2:5:48
[7] 72 ;222313/3-00(01/0*--*...*%-(*-(***--%--*-(-- G2FU2:7:47
[8] 36 @7==>A:>9>>>7<757.21,0/-//%-(+/224)2 G2FU2:8:47
[9] 50 ;000-*&-&--(,-*&**--0...0***-----*-&-*-*&**. G2FU2:8:49
... ... ...
[377886] 296 B@@>::/929552<@::188)...+1..,,,**-%++/+***** G2FU2:1146:1271
[377887] 292 EEEEDD?C?CDD>CCCCCCCC...***,/1****,,.&****** G2FU2:1146:1272
[377888] 191 DDDCBC>C>[email protected];2:28?A::;BEE.?;84, G2FU2:1147:1252
[377889] 287 @668@CCC9C?C?DDDECCCE...0//,0****++,,*****1- G2FU2:1147:1254
[377890] 292 DDDEDD@D>[email protected],,,012,1,,,,4&+++ G2FU2:1147:1258
[377891] 281 @?AAA@@@:A:><>@@>>>;:...3++.&++/41++++3.1/++ G2FU2:1147:1262
[377892] 274 CCCCCCC?C:>>7A;=@<<;,...8&++7758,+**+++****1 G2FU2:1147:1269
[377893] 283 AAAAA>A9A57<14;66.24=...-,5;46,,,+,8;/+++34, G2FU2:1147:1271
[377894] 301 @@@>=9<6<*+02657631+2...*+*++11,+*%*.******1 G2FU2:1147:1274
------------------------------------------------------------------
And the adapters are like (total 96 adaptors):
AdaptName AdaptSeq
1 IonXpress_001 CTAAGGTAAC
2 IonXpress_002 TAAGGAGAAC
3 IonXpress_003 AAGAGGATTC
4 IonXpress_004 TACCAAGATC
5 IonXpress_005 CAGAAGGAAC
6 IonXpress_006 CTGCAAGTTC
7 IonXpress_007 TTCGTGATTC
8 IonXpress_008 TTCCGATAAC
9 IonXpress_009 TGAGCGGAAC
10 IonXpress_010 CTGACCGAAC
............................
Could any one please tell me if you have an idea about the meaning of "adapter specific SFF files"?
In order to classify each read by the adapters, should I align all adapters on each sequence, some thing similar to the following?
TCAGTACTGAGCTACAGTACACGATGCGTCCAGGAACCATCGGATGGCAATCG - sequence
TCGTATGCCG (scan all positions until the end) - (m=2, i=1, d=1)
TCGTATGCC - (m=2, i=1, d=0)
TCGTATGC - (m=2, i=1, d=0)
TCGTATG - (m=1, i=1, d=0)
TCGTAT - (m=1, i=1, d=0)
TCGTA - (m=1, i=1, d=0)
TCGT - (m=1, i=0, d=0)
TCG - (m=1, i=0, d=0) -> match!
Or should I find a specific adapter for each read by the functions on the page 15
Should I trim down each sequence from clipAdapterLeft position to clipAdapterRight position before any alignment or any other work?
Thank you very much in advance.
Best,
Heidi
Comment