Hi everyone,
The evidence2sam command which produces the SAM file, contains records/lines where a every read is repeated twice mapping to two respective different locations, such as:
Here, the read 'GS27657-FS3-L02-8:1660459' maps to the same chromosome (chr14) at two adjacent locations.
Following are my doubts about the Complete Genomics(CG) data:
1) Why is every read ID in the SAM file repeated twice mapping to different locations? Is there a way to resolve this? What does this signify?
2) It is observed that the Evidence file contains reads of uniform length - 70 bp. Post conversion using evidence2sam from cgatools, why does the read length reduce to 33 bp only?
3) Also, if the read sequence is compared between the evidence file and the SAM file,the read sequence is not at all matching to any part of the 70 bp sequence. Is this an error?
If anybody could help, it would be great.
Thanks in advance.
The evidence2sam command which produces the SAM file, contains records/lines where a every read is repeated twice mapping to two respective different locations, such as:
GS27657-FS3-L02-8:1660459 179 chr14 19089795 16 12M1I1P4I6M6N5M1I4M = 19090178 383 CCTAATTCTTATTTTTATTTTTTTATTTATTTT 9::::656877887;<<<:::<;6-47737783 RG:Z:NA19238-L2-200-37-ASM-chr14 GC:Z:3S2G28S GS:Z:AAAA GQ:Z:::4-
GS27657-FS3-L02-8:1660459 115 chr14 19090178 16 10M5N23M = 19089795 -383 TTCATGAGAGGGTCCACTATTTTTCCCTTGTTA .08877587857;*;1<<9778877871;;::7 RG:Z:NA19238-L2-200-37-ASM-chr14 GC:Z:28S2G3S GS:Z:TGTG GQ:Z:77;;
GS27657-FS3-L02-8:1660459 115 chr14 19090178 16 10M5N23M = 19089795 -383 TTCATGAGAGGGTCCACTATTTTTCCCTTGTTA .08877587857;*;1<<9778877871;;::7 RG:Z:NA19238-L2-200-37-ASM-chr14 GC:Z:28S2G3S GS:Z:TGTG GQ:Z:77;;
Following are my doubts about the Complete Genomics(CG) data:
1) Why is every read ID in the SAM file repeated twice mapping to different locations? Is there a way to resolve this? What does this signify?
2) It is observed that the Evidence file contains reads of uniform length - 70 bp. Post conversion using evidence2sam from cgatools, why does the read length reduce to 33 bp only?
3) Also, if the read sequence is compared between the evidence file and the SAM file,the read sequence is not at all matching to any part of the 70 bp sequence. Is this an error?
If anybody could help, it would be great.
Thanks in advance.
Comment