Hi all,
I am trying to follow the simple recipe for differential analysis in RNA-seq data without gene and transcript discovery.
I ran tophat (v1.0.12) using the default parameters and then cuffdiff (v1.3.0) with the standard mm9 files and NCBIM37 (with added "chr" prefix to contig names; This work is done in mouse).
I am getting an error from cuffdiff saying that the sam files are not sorted. However, looking at the files I see that they are lexicographically sorted (as expected since they are produced by cuffdiff: < chr1, chr10, chr11,...,chr2, chr3, ... , chrM/X/Y> ).
I tried to add a header to the sam files , but it did not help.
Here are the technical details:
Tophat Command:
tophat -r 150 -G $REF_FILE -o $OUT mm9 $IN_1 $IN_2
Cuffdiff command:
cuffdiff -o $OUT Mus_musculus.NCBIM37.61.chr.gtf $SAMPLE_1/accepted_hits.sam $SAMPLE_2/accepted_hits.sam
Error from cuffdiff:
========
You are using Cufflinks v1.3.0, which is the most recent release.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File 3//accepted_hits.sam doesn't appear to be a valid BAM file, trying SAM...
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File 5//accepted_hits.sam doesn't appear to be a valid BAM file, trying SAM...
[16:44:37] Loading reference annotation.
[16:44:44] Inspecting maps and determining fragment length distributions.
Error: this SAM file doesn't appear to be correctly sorted!
current hit is at chrX:3030137, last one was at chrM:16220
Cufflinks requires that if your file has SQ records in
the SAM header that they appear in the same order as the chromosomes names
in the alignments.
If there are no SQ records in the header, or if the header is missing,
the alignments must be sorted lexicographically by chromsome
name and by position.
===============
Edited sam files and added header:
@HD VN:1.0 SO:coordinate
@SQ SN:chr1 LN:197195432
@SQ SN:chr10 LN:129993255
@SQ SN:chr11 LN:121843856
@SQ SN:chr12 LN:121257530
@SQ SN:chr13 LN:120284312
@SQ SN:chr14 LN:125194864
@SQ SN:chr15 LN:103494974
@SQ SN:chr16 LN:98319150
@SQ SN:chr17 LN:95272651
@SQ SN:chr18 LN:90772031
@SQ SN:chr19 LN:61342430
@SQ SN:chr2 LN:181748087
@SQ SN:chr3 LN:159599783
@SQ SN:chr4 LN:155630120
@SQ SN:chr5 LN:152537259
@SQ SN:chr6 LN:149517037
@SQ SN:chr7 LN:152524553
@SQ SN:chr8 LN:131738871
@SQ SN:chr9 LN:124076172
@SQ SN:chrM LN:16299
@SQ SN:chrX LN:166650296
@SQ SN:chrY LN:15902555
@PG TopHat VN:1.0.130
Output with headers:
=============
You are using Cufflinks v1.3.0, which is the most recent release.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File 3//accepted_hits.sam doesn't appear to be a valid BAM file, trying SAM...
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File 5//accepted_hits.sam doesn't appear to be a valid BAM file, trying SAM...
[20:30:45] Loading reference annotation.
[20:30:54] Inspecting maps and determining fragment length distributions.
Error: this SAM file doesn't appear to be correctly sorted!
current hit is at chr10:3001890, last one was at chr1:197184857
Cufflinks requires that if your file has SQ records in
the SAM header that they appear in the same order as the chromosomes names
in the alignments.
If there are no SQ records in the header, or if the header is missing,
the alignments must be sorted lexicographically by chromsome
name and by position.
==============
Would greatly appreciate your help!
I am trying to follow the simple recipe for differential analysis in RNA-seq data without gene and transcript discovery.
I ran tophat (v1.0.12) using the default parameters and then cuffdiff (v1.3.0) with the standard mm9 files and NCBIM37 (with added "chr" prefix to contig names; This work is done in mouse).
I am getting an error from cuffdiff saying that the sam files are not sorted. However, looking at the files I see that they are lexicographically sorted (as expected since they are produced by cuffdiff: < chr1, chr10, chr11,...,chr2, chr3, ... , chrM/X/Y> ).
I tried to add a header to the sam files , but it did not help.
Here are the technical details:
Tophat Command:
tophat -r 150 -G $REF_FILE -o $OUT mm9 $IN_1 $IN_2
Cuffdiff command:
cuffdiff -o $OUT Mus_musculus.NCBIM37.61.chr.gtf $SAMPLE_1/accepted_hits.sam $SAMPLE_2/accepted_hits.sam
Error from cuffdiff:
========
You are using Cufflinks v1.3.0, which is the most recent release.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File 3//accepted_hits.sam doesn't appear to be a valid BAM file, trying SAM...
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File 5//accepted_hits.sam doesn't appear to be a valid BAM file, trying SAM...
[16:44:37] Loading reference annotation.
[16:44:44] Inspecting maps and determining fragment length distributions.
Error: this SAM file doesn't appear to be correctly sorted!
current hit is at chrX:3030137, last one was at chrM:16220
Cufflinks requires that if your file has SQ records in
the SAM header that they appear in the same order as the chromosomes names
in the alignments.
If there are no SQ records in the header, or if the header is missing,
the alignments must be sorted lexicographically by chromsome
name and by position.
===============
Edited sam files and added header:
@HD VN:1.0 SO:coordinate
@SQ SN:chr1 LN:197195432
@SQ SN:chr10 LN:129993255
@SQ SN:chr11 LN:121843856
@SQ SN:chr12 LN:121257530
@SQ SN:chr13 LN:120284312
@SQ SN:chr14 LN:125194864
@SQ SN:chr15 LN:103494974
@SQ SN:chr16 LN:98319150
@SQ SN:chr17 LN:95272651
@SQ SN:chr18 LN:90772031
@SQ SN:chr19 LN:61342430
@SQ SN:chr2 LN:181748087
@SQ SN:chr3 LN:159599783
@SQ SN:chr4 LN:155630120
@SQ SN:chr5 LN:152537259
@SQ SN:chr6 LN:149517037
@SQ SN:chr7 LN:152524553
@SQ SN:chr8 LN:131738871
@SQ SN:chr9 LN:124076172
@SQ SN:chrM LN:16299
@SQ SN:chrX LN:166650296
@SQ SN:chrY LN:15902555
@PG TopHat VN:1.0.130
Output with headers:
=============
You are using Cufflinks v1.3.0, which is the most recent release.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File 3//accepted_hits.sam doesn't appear to be a valid BAM file, trying SAM...
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File 5//accepted_hits.sam doesn't appear to be a valid BAM file, trying SAM...
[20:30:45] Loading reference annotation.
[20:30:54] Inspecting maps and determining fragment length distributions.
Error: this SAM file doesn't appear to be correctly sorted!
current hit is at chr10:3001890, last one was at chr1:197184857
Cufflinks requires that if your file has SQ records in
the SAM header that they appear in the same order as the chromosomes names
in the alignments.
If there are no SQ records in the header, or if the header is missing,
the alignments must be sorted lexicographically by chromsome
name and by position.
==============
Would greatly appreciate your help!
Comment