Hello, I'm starting in the transcriptome analysis. My question is about 454 files. I have two folders with 2 SFF files each. SFF files of the first folder about 300 000 readings, whereas in the second folder SFF files comprise about 160 000 readings.
As I understand it is not possible to have two readings with the same name, but in the second folder files are read with the same name as the first, but with different lengths. For example:
>EVADNQG01CUUPQ
tcagTCAGAAACCGCTTCGATAAGAGAGACCCACTGGGCCAAAGTTACATCACATACTATTAACTTGCGTTGAACCACAGGTTCGCATCAAGTATATGTTCACATc
>EVADNQG01CUUPQ
tcagTCAGAAACCGCTTCGATAAGAGAGACC ACTGG CAAAGT ACATCACATACTAT AACT GCGT GAA CACAG TCGCAt nagtatatgtcacatc
I guess that some criteria were eliminated readings and created the other two files with less reading, but what is the reason to have gaps and not just N in the intermediate regions?
Thanks
As I understand it is not possible to have two readings with the same name, but in the second folder files are read with the same name as the first, but with different lengths. For example:
>EVADNQG01CUUPQ
tcagTCAGAAACCGCTTCGATAAGAGAGACCCACTGGGCCAAAGTTACATCACATACTATTAACTTGCGTTGAACCACAGGTTCGCATCAAGTATATGTTCACATc
>EVADNQG01CUUPQ
tcagTCAGAAACCGCTTCGATAAGAGAGACC ACTGG CAAAGT ACATCACATACTAT AACT GCGT GAA CACAG TCGCAt nagtatatgtcacatc
I guess that some criteria were eliminated readings and created the other two files with less reading, but what is the reason to have gaps and not just N in the intermediate regions?
Thanks
Comment