Hi,
I tried to sort the alignment file by read name, but it appears that truncated read names were produced. This phenomenon was observed no matter which program I used: SAMtools sort (0.1.8), Picard SortSam (1.77) or Novosort (2.08) .
Here is the first few records of the original SAM file:
After sorting:
Does anyone have any idea of what's wrong with the programs or data?
Thanks a lot!
Allen
I tried to sort the alignment file by read name, but it appears that truncated read names were produced. This phenomenon was observed no matter which program I used: SAMtools sort (0.1.8), Picard SortSam (1.77) or Novosort (2.08) .
Here is the first few records of the original SAM file:
Code:
HWI-ST621:415:D197AACXX:8:1101:1 113 chr2 236798427 70 100M1S chr8 3088040 0 ACCTCTGTTTCTAAGCAGTGGAATAGAATTGCTTATGGAATAGCCAGGTCATAGGATGTNATAANTTCCCTGGAAATCAGAGGGGAAAAGAAGCAAAACAN C@?>?AC@:C@>[email protected]#HF?1#GDJIHCIGGHHAIIIJJHEHJJIHHHHHFFFDD=1# PG:Z:novoalign RG:Z:LS148 AS:i:18 UQ:i:18 NM:i:2 MD:Z:59G4T35 HWI-ST621:415:D197AACXX:8:1101:1 177 chr8 3088040 70 101M chr2 236798427 0 AAATACATACATACACACAGACTGATTTTCTCTTCAGCAATATTTTAATGAAACCCCATACTGCAAATTACATAAACTAGTTAAAGTACACCAACCTCAAG DEEDDDFDCEECEEDDBFFFDHHHFGHECJJIHFJJJIJJJIJHHGIHGDDGGJJJIIHGHIJJJIIJIGJJIIIFIIJJJJJJIIHFFAHHHDFFDFCCB PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST621:415:D197AACXX:8:1101:1223:2124 83 chr8 143208201 70 100M1S = 143207998 -303 CGCTGAGAGCAAGGTGCCAGCAGGGTGGGCCCTTCTGGAGGCTCCGGCCGGGATCTGTTCCAGGCCACCCCCGCCTTCCGGCCATCCTCAGCTTGGCTCCN >@CA>A:A>>>3(CA<AACDDDB<<[email protected]?CDCDCBCC?7<BBDBB@<93?DCCAA8<B?A<<DB7DCIGGBHGAHIIHFJJIEJIIHHHHHFFFDD=1# PG:Z:novoalign RG:Z:LS148 AS:i:47 UQ:i:47 NM:i:1 MD:Z:6C93 PQ:i:59 SM:i:70 AM:i:70 HWI-ST621:415:D197AACXX:8:1101:1223:2124 163 chr8 143207998 70 92M = 143208201 303 TTGTGGAGTCAGGTGTCCCTGGGGTCACGGTGACTGGCCAGGCGNGGGGAGCCAGGAGGCACACGGTCCTGGGCTCTNGCAGGGCTGGAGTG @BBDFFADD?FHH@@[email protected][email protected]#-<CC;@E?ACEE?B7?BCA?B;?BDDCB9??A#[email protected]<>A PG:Z:novoalign RG:Z:LS148 AS:i:12 UQ:i:12 NM:i:2 MD:Z:44C32G14 PQ:i:59 SM:i:70 AM:i:70 HWI-ST621:415:D197AACXX:8:1101:14 65 chr6 74783346 70 1S100M chr1 1867309 0 NGATTAAGCAGCCAAGCTGTATCCTGAGGGAAACATGGGCAATGGAAAGCATCAGATTTCCTGGGTCAAAGCTATCCTGAGCTCAGGCACTGGGCTAACTG #4=DFFFFGHHHHJJJJJJIJJJJJJJJJJGHIJIHIIJIGIIJJBFHIIIJJJJDIJJIHHIJJIGGHHHHHFFF[email protected] PG:Z:novoalign RG:Z:LS148 AS:i:6 UQ:i:6 NM:i:0 MD:Z:100 HWI-ST621:415:D197AACXX:8:1101:14 129 chr1 1867309 70 101M chr6 74783346 0 ACACACACACACACACACGAACTGCAGGGGGCTCTGGAGCCATGGAGTTAGAAAAGCTCTCTGAGAGGCCAGGTGTAGTGGCTCATGCCTGTAATCCCAGC [email protected][email protected]@ACAC@>AB>CCACD PG:Z:novoalign RG:Z:LS148 AS:i:30 UQ:i:30 NM:i:1 MD:Z:68T32 HWI-ST621:415:D197AACXX:8:1101:14 97 chr2 62756955 70 1S100M chr6 74783591 0 NGTGCTGTTTGGTTTGTGTGTATTATATGGGTTTGGATTACAATAATTCCTCCCTTTTGTATAATGTTTTGCAGTTTTTAAAGCACTTCATGCTCTAAATC #1=DDFFDHHGGFHIIHHEHGFGIDHHIIIIFGIIICGGEHHHIIIII>GGGIIIIIIIICFGHHGGHIIIIDAAEHHHEBDDFCEEECCDCCCCCC>ACC PG:Z:novoalign RG:Z:LS148 AS:i:6 UQ:i:6 NM:i:0 MD:Z:100 HWI-ST621:415:D197AACXX:8:1101:14 145 chr6 74783591 70 101M chr2 62756955 0 ATTTTTGTAAGTCACCAATGGTTGGATGTTGGCAGTTTCATAAGGTTCATTCTAATAGTTCCTGGGACACAAATGACTCGAAGTAGGTCAAGACAGGTTCA <DDDDDDDDDDDDEEECCFDFFGHEHGJJIIJJJJIGHIHCIIIJIGCGIIGDIHEIIHGIJJJJIHIIJJIIHGBHHJIJJJJJJJJHHHFHFFFFD?C@ PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST621:415:D197AACXX:8:1101:1 81 chr1 155944063 70 101M chr11 19838477 0 CAGCTGTACCTGGCAGCAGCCCCTTCCCCAAGATGGTGACACCTCTGTCCACACCCTCTGTAATAGTGACCGGAGAGCCTGTGGAGCATTCCACCAGGATT DDDEDAA:BCAA:[email protected]?@=BDEDEEDFFFD@;??=HHIIIIGJIHF<JIHFGBIHIJIIIIIHJJJJJJJIJJJJIJJJJJIHHHHHFFFFFCC@ PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST621:415:D197AACXX:8:1101:1 161 chr11 19838477 70 101M chr1 155944063 0 AGCCCCTTATGCAGAAAAAGGGACTCCACCTGGAGCCCTCTCTGGATCTACTTCTCCCAGATAAATCAGTCGGCTGTGTAATCTTTCAGGAAACCTGACCC ??<DDFFFFHHDDDHIGDDAFE9FFGHGCHEGG9FGGHGGGGCFHBF*[email protected]@[email protected];[email protected];33:32:595<9>3< PG:Z:novoalign RG:Z:LS148 AS:i:1 UQ:i:1 NM:i:0 MD:Z:101
Code:
HWI-ST 81 chr7 83652142 70 82M chr8 142160880 0 CTTTGTATTTACAGATACCACGGCCATTTTGCAATGTCCTCAGCACATAGTGGAAGCTGAACAAACAATCACATTTTCTAAT @D<EA?7)==77@=7)('-'FF;FABB*0>EDB9DFDGDEBDEECC<[email protected]<;>FFDBBFA<DFA;A,B48;? PG:Z:novoalign RG:Z:LS148 AS:i:22 UQ:i:22 NM:i:1 MD:Z:76A5 HWI-ST 65 chr9 120922414 70 101M chr6 160312253 0 TCACTGAGTCTGATTGAAGCAACTGGCATTGGTGATCATACTTCAATATTTCTCTCATATTTGAAGTTAGAATTAGTTGATGTGAGATATTATATTAGCCT @CCFFFFFHFHFAHHIDGHIJGIIJGHCGIJICFHIIIIIJIJJIIJGIIEIJHHGGIICGHIBGHFGHHGGHIDC@DHGIHGIGHHHHECBDFFFFFEDE PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST 81 chr2 46872242 70 101M chr17 79461315 0 CATGGATTAAAATATTAAGTAATTTGATCTAGATGATTGTTTACAGTTTAACGCAAATACACTTAGTCTGTTCTGATTATTTACTCAAGGATTATATTACT >C>:EDDFCDDFFDFFHHHHHHJIHGG=GIGJJIIIIJGIIIHIJHDGGHHJFIIJIIGC:[email protected]>[email protected]@ PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST 65 chr8 103315908 70 93M chr17 40205036 0 AGATATCTGAGAAACTGACCTAAATAAGCAATCTGAAAAGATTAAGGTTCCTTCAATTATTATACTACTTGTTCTCCAAATAACACACTAACT <@@ADD>DDBA<[email protected]:3AEB>DFECE91:C<CFCFCFFC::4?D>FCDDD<FC8DFEFDG88@[email protected];7@:7?CCBDD@>@ PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:93 HWI-ST 89 chr16 61016706 70 101M = 61016706 0 TGTTGAGTCAATGTAAGACCTTGGTAAGAATTCTTCAATTTAGACATGGCTAATTTTTAATGTCAACCACAGCTATTGAGGTACTTATATTAATTAACCTT C?CECACCFFFFDDDE?=CCGGIIIGGEGIIIGGIIEGIIHHDBFGIGFIIIHGIIIIIGGIHG@CHHHGHHHGHDEIFIIGIHBIIIHHDDHEDEDF@?@ PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST 97 chr12 16510044 70 101M chr9 75346048 0 TAATAAAAATTCAGTTTTAACTATAGATGCCTTCTTCTCCTCTTGTGTTTGATTTATTGCTCCAAATGGGCCAACCTGGATGTCTATATTTCTTCCACTAA CCCFFFFFHHHHHJIIGIIJJIJIIJJJJJJIEIIJJJJJHJIJGFGFHJJJIIJJJJJJJJIGJJJJIIJJJIJHJHHFHHBBEDFFCFEFEEEEDDDDD PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST 73 chr5 22843028 70 97M = 22843028 0 TAACTGTGTTTACTTTTCTCAGTTTCTACCAGAGAAAAGGCAGGTGCATTTTTTTGGTATGTTTGTGTAAAGTGAATTTGGCTTTACTTTTTCAAAT =?<DD>=;[email protected]?EA<[email protected]?91*:8CFG0?@?<D@@B;[email protected]>;(6;.;;@;?>A>5(5:@CC5@> PG:Z:novoalign RG:Z:LS148 AS:i:3 UQ:i:3 NM:i:0 MD:Z:97 HWI-ST 73 chr6 152150636 70 101M = 152150636 0 CATTTGTCATCATTACACGGTCATGGGAGTGCTAAGAAGACTTAAATGCAGGGCTACCACCCCTTCCCAATTCATCTTTTATCCATTTTATTTCTCTAAGG @CCDDDDEHHFHHFBHGGHHAFEFFHIGG:?CFGIGIGGHHEGIEHIGHGDE@;[email protected]@[email protected]>CCCC PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101 HWI-ST 113 chr7 63064316 30 101M chr17 26080536 0 CCTGCTCATCTCAGGCCTGCCGGCTCCTCCACCTGCCTTTTCGAGTACCCTGGGAACCCCCCGAGGACAGGTGTCATCGGTTGCTTCATCTCACCATCCCT A94+(:[email protected]@@7DDBDB<[email protected];[email protected]@BCDBCCCA<-DCC>3?8DB=7@@IHCIIJIGIJIIIJGHHGGGGHGEIDIFFFFAFFFDF@@@ PG:Z:novoalign RG:Z:LS148 AS:i:31 UQ:i:31 NM:i:1 MD:Z:42C58 HWI-ST 89 chr4 96140737 70 101M = 96140737 0 AACAACGAGCCTCACTAGGTGACGATTAGCTATGGTTTCCCTGGTCTATACTGGATTTGGGTTCATTGGTAAATCATTCTATTCATAGCAATACAAGATAT <<A?8DDDDDDCCAEEEFFFFHHHHHFIJJJJIIIIIGIGHIFIGHIIGDGGIJIJJIIHIHIEHIIJJJJJJIIJJJIJJIIJJIJJHGHHHFFFFFB@@ PG:Z:novoalign RG:Z:LS148 AS:i:0 UQ:i:0 NM:i:0 MD:Z:101
Thanks a lot!
Allen
Comment