I'm trying to use the new Cufflinks 0.9.2 together with a sorted BAM file (sorting done via samtools). However, I am getting an error message complaining about the sort order, which nevertheless looks to be correct.
This is the command and the output:
But my alignments *do* come in the same order as in the header:
My guess is that Cufflinks still wants the old lexicographic order (1, 10, 11, ..., 2, ...) that you would get from sorting a SAM file using sort -k3,3 -k4,4n. But it's a bit of a hassle to convert back the BAM to SAM and then sort the big SAM file, so it would be nice if the BAM file would work with Cufflinks as advertised.
This is the command and the output:
bash$ ./cufflinks -G ../../Data/Homo_sapiens.GRCh37.59.gtf -o cuff-out-tophat-7a ../../incoming/tophat.7a.sorted.bam
[23:17:47] Inspecting reads and determining fragment length distribution.
> Processing Locus 9:141150044-141150148 [******* ] 29%
Error: this SAM file doesn't appear to be correctly sorted!
current hit is at 10:93917, last one was at 9:141152127
Cufflinks requires that if your file has SQ records in
the SAM header that they appear in the same order as the chromosomes names
in the alignments.
If there are no SQ records in the header, or if the header is missing,
the alignments must be sorted lexicographically by chromsome
name and by position.
[23:17:47] Inspecting reads and determining fragment length distribution.
> Processing Locus 9:141150044-141150148 [******* ] 29%
Error: this SAM file doesn't appear to be correctly sorted!
current hit is at 10:93917, last one was at 9:141152127
Cufflinks requires that if your file has SQ records in
the SAM header that they appear in the same order as the chromosomes names
in the alignments.
If there are no SQ records in the header, or if the header is missing,
the alignments must be sorted lexicographically by chromsome
name and by position.
bash$ samtools view -H tophat.7a.sorted.bam
@SQ SN:1 LN:249250621
@SQ SN:2 LN:243199373
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
@SQ SN:10 LN:135534747
@SQ SN:11 LN:135006516
@SQ SN:12 LN:133851895
@SQ SN:13 LN:115169878
@SQ SN:14 LN:107349540
@SQ SN:15 LN:102531392
@SQ SN:16 LN:90354753
@SQ SN:17 LN:81195210
@SQ SN:18 LN:78077248
@SQ SN:19 LN:59128983
@SQ SN:20 LN:63025520
@SQ SN:21 LN:48129895
@SQ SN:22 LN:51304566
@SQ SN:X LN:155270560
@SQ SN:Y LN:59373566
@SQ SN:MT LN:16569
bash$ samtools view tophat.7a.sorted.bam |cut -f3 |uniq
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Y
MT
@SQ SN:1 LN:249250621
@SQ SN:2 LN:243199373
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
@SQ SN:10 LN:135534747
@SQ SN:11 LN:135006516
@SQ SN:12 LN:133851895
@SQ SN:13 LN:115169878
@SQ SN:14 LN:107349540
@SQ SN:15 LN:102531392
@SQ SN:16 LN:90354753
@SQ SN:17 LN:81195210
@SQ SN:18 LN:78077248
@SQ SN:19 LN:59128983
@SQ SN:20 LN:63025520
@SQ SN:21 LN:48129895
@SQ SN:22 LN:51304566
@SQ SN:X LN:155270560
@SQ SN:Y LN:59373566
@SQ SN:MT LN:16569
bash$ samtools view tophat.7a.sorted.bam |cut -f3 |uniq
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Y
MT
Comment