I've got question concerning the trimming of the primer base and the first color:
an example of the reads (.csfasta):
>186_2041_1641_F3
T122233110.3012011122133012030.1110.31220022220.120
>186_2041_1706_F3
T11132121312201321220103230123.2113.31201112230.031
>186_2041_1709_F3
T2103022220322301123212223030330323320201102233.123
According to the file description from ABI (http://www3.appliedbiosystems.com/cm...cms_058717.pdf) the file contains "UNprocessed color space data" (all first bases are T and therefore primer bases - however I still have some '.' inside). If I align the reads with Bowtie, the primer base 'T' and the first color are trimmed away - resulting in a final length of 49 bases (so one less than in color space due to the trimming). But is it really necessary to trim also the color? As far as I got it from the file description, it's not (using the ABI pipeline, the __processed color data would contain the first real base plus 49 colors and should therefore result in a final length of 50). Any ideas?
an example of the reads (.csfasta):
>186_2041_1641_F3
T122233110.3012011122133012030.1110.31220022220.120
>186_2041_1706_F3
T11132121312201321220103230123.2113.31201112230.031
>186_2041_1709_F3
T2103022220322301123212223030330323320201102233.123
According to the file description from ABI (http://www3.appliedbiosystems.com/cm...cms_058717.pdf) the file contains "UNprocessed color space data" (all first bases are T and therefore primer bases - however I still have some '.' inside). If I align the reads with Bowtie, the primer base 'T' and the first color are trimmed away - resulting in a final length of 49 bases (so one less than in color space due to the trimming). But is it really necessary to trim also the color? As far as I got it from the file description, it's not (using the ABI pipeline, the __processed color data would contain the first real base plus 49 colors and should therefore result in a final length of 50). Any ideas?
Comment