The Cufflinks manual states that SAM files should be sorted according to the following:
However, when I sort this way, the chromosomes are sorted something like this:
I assume the intention of the sort is to end up with chr1 ... chr20, chrX, in numerical order? If so, how can I achieve this? I tried using various flags for field 3, and I also tried making "chr" a delimiter, but it seems delimiters must be one character long.
I've been running Cufflinks with my SAM files ordered like this, but I've no idea if it will make a difference or not.
Note: I started with Bioscope BAM files (PE, strand-specific), which were converted to SAM with SAMtools. The 'XS:A:' field was added based on strand info from field 2. A sample of my SAM files is below:
Can anyone clarify this issue for me? Thanks.
Code:
sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted
Code:
chr1 chr11 [...] chr19 chr2 chr20 chr3 [...] chr9
I've been running Cufflinks with my SAM files ordered like this, but I've no idea if it will make a difference or not.
Note: I started with Bioscope BAM files (PE, strand-specific), which were converted to SAM with SAMtools. The 'XS:A:' field was added based on strand info from field 2. A sample of my SAM files is below:
Code:
1384_723_1125 0 chr1 121 0 25M * 0 0 CATTTTCCTCTAGAGTCAGAAACGN IH8IIIIIIIIIEII77IIIIHEI! NH:i:0 RG:Z:20100828211420290 CS:Z:G3130002022232221112200133 CQ:Z:BB'2BBB?2BB?42BB776BA:/75 SM:i:0 CM:i:2 XS:A:+ 200_1536_1533 73 chr1 7467 1 18H28M4H * 0 0 GTTTTTCCTAATTTGATATTTAAAAAAA //-.2.**787;033""".*)--4F>., NH:i:0 RG:Z:20100828211420290 CS:Z:T12132211201311202001000020230300122130030000002000 CQ:Z:6??<=?><;A;AA?/-%,)')%*)&%&2'1+&.&%))&%%)%07('&2-* SM:i:2 CM:i:2 XS:A:+ 2234_1292_1060 129 chr1 8334 33 25M chr8 47073575 0 GAGATCCCCAAGAATCCTTACCTTT +EIII))))519IIIA5:8/%:D4& NH:i:1 RG:Z:20100828211420290 CS:Z:G0222320001022032020320200 CQ:Z:'%ABB<)=>)-%5=B=%1*/<%6/& SM:i:1 CM:i:3 XS:A:+ 21_385_199 89 chr1 10073 0 3H27M20H * 0 0 AGCCCCGAAAAAAAAAATAAATATCAG 72/@I91=B?@4/03E@<II%%,/(0I NH:i:0 RG:Z:20100828211420290 CS:Z:T01223203301132231023222233100330000000002300032030 CQ:Z:&,87'*/%%%.*-((/%&+*5:0((%%684)8.&+%01/4*(28)',,)6 SM:i:3 CM:i:2 XS:A:-
Comment