Hi,
I tried to sort the alignment file by read name, but it appears that truncated read names were produced. This phenomenon was observed no matter which program I used: SAMtools sort (0.1.8), Picard SortSam (1.77) or Novosort (2.08) .
Here is the first few records of the original SAM file:
After sorting:
Does anyone have any idea of what's wrong with the programs or data?
Thanks a lot!
Allen
I tried to sort the alignment file by read name, but it appears that truncated read names were produced. This phenomenon was observed no matter which program I used: SAMtools sort (0.1.8), Picard SortSam (1.77) or Novosort (2.08) .
Here is the first few records of the original SAM file:
Code:
HWI-ST621:415:D197AACXX:8:1101:1 113 chr2 236798427 70 100M1S chr8 3088040 0 ACCTCTGTTTCTAAGCAGTGGAATAGAATTGCTTATGGAATAGCCAGGTCATAGGATGTNATAANTTCCCTGGAAATCAGAGGGGAAAAGAAGCAAAACAN C@?>?AC@:C@>CECDEE@ACFEBFFDEEHECDACADHFHFEHIJGJIGIHJJIHDB80#HF?1#GDJIHCIGGHHAIIIJJHEHJJIHHHHHFFFDD=1# PG:Z:novoalign RG:Z:LS148 AS:i:18 UQ:i:18 NM:i:2 MD:Z:59G4T35 HWI-ST621:415:D197AACXX:8:1101:1 177 chr8 3088040 70 101M chr2 236798427 0 AAATACATACATACACACAGACTGATTTTCTCTTCAGCAATATTTTAATGAAACCCCATACTGCAAATTACATAAACTAGTTAAAGTACACCAACCTCAAG DEEDDDFDCEECEEDDBFFFDHHHFGHECJJIHFJJJIJJJIJHHGIHGDDGGJJJIIHGHIJJJIIJIGJJIIIFIIJJJJJJIIHFFAHHHDFFDFCCB PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST621:415:D197AACXX:8:1101:1223:2124 83 chr8 143208201 70 100M1S = 143207998 -303 CGCTGAGAGCAAGGTGCCAGCAGGGTGGGCCCTTCTGGAGGCTCCGGCCGGGATCTGTTCCAGGCCACCCCCGCCTTCCGGCCATCCTCAGCTTGGCTCCN >@CA>A:A>>>3(CA<AACDDDB<<?3?@9?CDCDCBCC?7<BBDBB@<93?DCCAA8<B?A<<DB7DCIGGBHGAHIIHFJJIEJIIHHHHHFFFDD=1# PG:Z:novoalign RG:Z:LS148 AS:i:47 UQ:i:47 NM:i:1 MD:Z:6C93 PQ:i:59 SM:i:70 AM:i:70 HWI-ST621:415:D197AACXX:8:1101:1223:2124 163 chr8 143207998 70 92M = 143208201 303 TTGTGGAGTCAGGTGTCCCTGGGGTCACGGTGACTGGCCAGGCGNGGGGAGCCAGGAGGCACACGGTCCTGGGCTCTNGCAGGGCTGGAGTG @BBDFFADD?FHH@@EGGGGIIII@BCGHG8?DGHGB@FHHGAG#-<CC;@E?ACEE?B7?BCA?B;?BDDCB9??A#++28?B?B@B1<>A PG:Z:novoalign RG:Z:LS148 AS:i:12 UQ:i:12 NM:i:2 MD:Z:44C32G14 PQ:i:59 SM:i:70 AM:i:70 HWI-ST621:415:D197AACXX:8:1101:14 65 chr6 74783346 70 1S100M chr1 1867309 0 NGATTAAGCAGCCAAGCTGTATCCTGAGGGAAACATGGGCAATGGAAAGCATCAGATTTCCTGGGTCAAAGCTATCCTGAGCTCAGGCACTGGGCTAACTG #4=DFFFFGHHHHJJJJJJIJJJJJJJJJJGHIJIHIIJIGIIJJBFHIIIJJJJDIJJIHHIJJIGGHHHHHFFFFFFEDEEEEDDD@DDDDDDCDCDDD PG:Z:novoalign RG:Z:LS148 AS:i:6 UQ:i:6 NM:i:0 MD:Z:100 HWI-ST621:415:D197AACXX:8:1101:14 129 chr1 1867309 70 101M chr6 74783346 0 ACACACACACACACACACGAACTGCAGGGGGCTCTGGAGCCATGGAGTTAGAAAAGCTCTCTGAGAGGCCAGGTGTAGTGGCTCATGCCTGTAATCCCAGC CCCFDFFFHHHHGJJJIJJIJJJJJFHIJIJFHIJJJDHEHHHHG@D?BDACCEDCBDDDDDDCDDDDBDBDB@CCCCCCBDDCCC@ACAC@>AB>CCACD PG:Z:novoalign RG:Z:LS148 AS:i:30 UQ:i:30 NM:i:1 MD:Z:68T32 HWI-ST621:415:D197AACXX:8:1101:14 97 chr2 62756955 70 1S100M chr6 74783591 0 NGTGCTGTTTGGTTTGTGTGTATTATATGGGTTTGGATTACAATAATTCCTCCCTTTTGTATAATGTTTTGCAGTTTTTAAAGCACTTCATGCTCTAAATC #1=DDFFDHHGGFHIIHHEHGFGIDHHIIIIFGIIICGGEHHHIIIII>GGGIIIIIIIICFGHHGGHIIIIDAAEHHHEBDDFCEEECCDCCCCCC>ACC PG:Z:novoalign RG:Z:LS148 AS:i:6 UQ:i:6 NM:i:0 MD:Z:100 HWI-ST621:415:D197AACXX:8:1101:14 145 chr6 74783591 70 101M chr2 62756955 0 ATTTTTGTAAGTCACCAATGGTTGGATGTTGGCAGTTTCATAAGGTTCATTCTAATAGTTCCTGGGACACAAATGACTCGAAGTAGGTCAAGACAGGTTCA <DDDDDDDDDDDDEEECCFDFFGHEHGJJIIJJJJIGHIHCIIIJIGCGIIGDIHEIIHGIJJJJIHIIJJIIHGBHHJIJJJJJJJJHHHFHFFFFD?C@ PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST621:415:D197AACXX:8:1101:1 81 chr1 155944063 70 101M chr11 19838477 0 CAGCTGTACCTGGCAGCAGCCCCTTCCCCAAGATGGTGACACCTCTGTCCACACCCTCTGTAATAGTGACCGGAGAGCCTGTGGAGCATTCCACCAGGATT DDDEDAA:BCAA:DD@BDDDDB?@=BDEDEEDFFFD@;??=HHIIIIGJIHF<JIHFGBIHIJIIIIIHJJJJJJJIJJJJIJJJJJIHHHHHFFFFFCC@ PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST621:415:D197AACXX:8:1101:1 161 chr11 19838477 70 101M chr1 155944063 0 AGCCCCTTATGCAGAAAAAGGGACTCCACCTGGAGCCCTCTCTGGATCTACTTCTCCCAGATAAATCAGTCGGCTGTGTAATCTTTCAGGAAACCTGACCC ??<DDFFFFHHDDDHIGDDAFE9FFGHGCHEGG9FGGHGGGGCFHBF*0BBCBGGE@GHGCHA@ECE@H;ADBFDCDDCCDD@CCC;33:32:595<9>3< PG:Z:novoalign RG:Z:LS148 AS:i:1 UQ:i:1 NM:i:0 MD:Z:101
Code:
HWI-ST 81 chr7 83652142 70 82M chr8 142160880 0 CTTTGTATTTACAGATACCACGGCCATTTTGCAATGTCCTCAGCACATAGTGGAAGCTGAACAAACAATCACATTTTCTAAT @D<EA?7)==77@=7)('-'FF;FABB*0>EDB9DFDGDEBDEECC<FHHHBE@9HHEAB<;>FFDBBFA<DFA;A,B48;? PG:Z:novoalign RG:Z:LS148 AS:i:22 UQ:i:22 NM:i:1 MD:Z:76A5 HWI-ST 65 chr9 120922414 70 101M chr6 160312253 0 TCACTGAGTCTGATTGAAGCAACTGGCATTGGTGATCATACTTCAATATTTCTCTCATATTTGAAGTTAGAATTAGTTGATGTGAGATATTATATTAGCCT @CCFFFFFHFHFAHHIDGHIJGIIJGHCGIJICFHIIIIIJIJJIIJGIIEIJHHGGIICGHIBGHFGHHGGHIDC@DHGIHGIGHHHHECBDFFFFFEDE PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST 81 chr2 46872242 70 101M chr17 79461315 0 CATGGATTAAAATATTAAGTAATTTGATCTAGATGATTGTTTACAGTTTAACGCAAATACACTTAGTCTGTTCTGATTATTTACTCAAGGATTATATTACT >C>:EDDFCDDFFDFFHHHHHHJIHGG=GIGJJIIIIJGIIIHIJHDGGHHJFIIJIIGC:JHHAIIFJJJIHGH@IJJJHHCGB>HGGHGHHFFFFF@C@ PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST 65 chr8 103315908 70 93M chr17 40205036 0 AGATATCTGAGAAACTGACCTAAATAAGCAATCTGAAAAGATTAAGGTTCCTTCAATTATTATACTACTTGTTCTCCAAATAACACACTAACT <@@ADD>DDBA<FG?A43?@FFF:3AEB>DFECE91:C<CFCFCFFC::4?D>FCDDD<FC8DFEFDG88@.==C=4@D;7@:7?CCBDD@>@ PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:93 HWI-ST 89 chr16 61016706 70 101M = 61016706 0 TGTTGAGTCAATGTAAGACCTTGGTAAGAATTCTTCAATTTAGACATGGCTAATTTTTAATGTCAACCACAGCTATTGAGGTACTTATATTAATTAACCTT C?CECACCFFFFDDDE?=CCGGIIIGGEGIIIGGIIEGIIHHDBFGIGFIIIHGIIIIIGGIHG@CHHHGHHHGHDEIFIIGIHBIIIHHDDHEDEDF@?@ PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST 97 chr12 16510044 70 101M chr9 75346048 0 TAATAAAAATTCAGTTTTAACTATAGATGCCTTCTTCTCCTCTTGTGTTTGATTTATTGCTCCAAATGGGCCAACCTGGATGTCTATATTTCTTCCACTAA CCCFFFFFHHHHHJIIGIIJJIJIIJJJJJJIEIIJJJJJHJIJGFGFHJJJIIJJJJJJJJIGJJJJIIJJJIJHJHHFHHBBEDFFCFEFEEEEDDDDD PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST 73 chr5 22843028 70 97M = 22843028 0 TAACTGTGTTTACTTTTCTCAGTTTCTACCAGAGAAAAGGCAGGTGCATTTTTTTGGTATGTTTGTGTAAAGTGAATTTGGCTTTACTTTTTCAAAT =?<DD>=;FHDFFHGE@EFH?EA<B4AA@EBGCC1?91*:8CFG0?@?<D@@B;AFB=7=3?CHEEBE77B@6>;(6;.;;@;?>A>5(5:@CC5@> PG:Z:novoalign RG:Z:LS148 AS:i:3 UQ:i:3 NM:i:0 MD:Z:97 HWI-ST 73 chr6 152150636 70 101M = 152150636 0 CATTTGTCATCATTACACGGTCATGGGAGTGCTAAGAAGACTTAAATGCAGGGCTACCACCCCTTCCCAATTCATCTTTTATCCATTTTATTTCTCTAAGG @CCDDDDEHHFHHFBHGGHHAFEFFHIGG:?CFGIGIGGHHEGIEHIGHGDE@;B=FA@F@FGGGEEHECCFFEFFCECDECCCDDDEDDCC@BCC>CCCC PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST 113 chr7 63064316 30 101M chr17 26080536 0 CCTGCTCATCTCAGGCCTGCCGGCTCCTCCACCTGCCTTTTCGAGTACCCTGGGAACCCCCCGAGGACAGGTGTCATCGGTTGCTTCATCTCACCATCCCT A94+(:ACCC??@BB@@7DDBDB<2????@8;BDB@A@BCDBCCCA<-DCC>3?8DB=7@@IHCIIJIGIJIIIJGHHGGGGHGEIDIFFFFAFFFDF@@@ PG:Z:novoalign RG:Z:LS148 AS:i:31 UQ:i:31 NM:i:1 MD:Z:42C58 HWI-ST 89 chr4 96140737 70 101M = 96140737 0 AACAACGAGCCTCACTAGGTGACGATTAGCTATGGTTTCCCTGGTCTATACTGGATTTGGGTTCATTGGTAAATCATTCTATTCATAGCAATACAAGATAT <<A?8DDDDDDCCAEEEFFFFHHHHHFIJJJJIIIIIGIGHIFIGHIIGDGGIJIJJIIHIHIEHIIJJJJJJIIJJJIJJIIJJIJJHGHHHFFFFFB@@ PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101
Thanks a lot!
Allen
Comment