I have recently switched to using Terminal to map my FASTQ files using the following command
tophat --solexa1.3-quals --no-coverage-search -g 1 -G hg19.mRNA.refseq.gtf --library-type fr-firststrand -p 8 -o ./Fetal_NPC hg19 Fetal_NPC_fastqsanger.fq
I have been able to convert the accepted_hits.bam file to a bedgraph and upload and view on UCSC without any problems.
However when I try to convert to a BED file using bamToBed it gives me this
head test.bed
chr1 10059 10109 ZTAN?XMCXOC 0 +
chr1 10076 10126 ZTAN? 0 +
chr1 10161 10211 ZTAN?XMCXOC 0 -
chr1 12199 12667 ZTAL
| 2 -
chr1 13586 13636 ZTAL| 2 +
chr1 13880 13930 ZTAL
| 1 +
chr1 14247 14297 ZTAN12587 2 +
chr1 14363 14413 ZTAN55 50M * 0 0 CCCCCTTCTCATTCTT
when I'm used to seeing this
head test.bed
chr1 3044397 3044447 1219.006250.80 50 -
chr1 3044430 3044480 1671.605573.80 50 +
chr1 3044451 3044501 638.709901.20 50 +
chr1 3044464 3044514 1651.302808.10 50 -
chr1 3044480 3044530 249.208139.00 50 -
chr1 3044535 3044585 2005.306540.20 50 +
chr1 3044537 3044587 1473.509936.10 50 -
chr1 3044555 3044605 621.208151.90 50 -
chr1 3044639 3044689 1473.905834.10 50 +
chr1 3060410 3060460 1338.207917.00 50 +
I tried to sort the bam file and it gave the following error
sort accepted_hits.bam
sort: string comparison failed: Illegal byte sequence
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were `S"\365(\006\304\024\262r\024\341Ń\026;\353\316ѯ\373\244S_D\vF\262]\323m\233\021\221z\310\331 \235smH' and `\340rnFL\370\302\214\210\177\272/\342\240)\245\021\367\341\'\335\304\375\b\306MP,\344\2066\335~\231I\361\316\202\353\fV\262\272דI,4[-G8\204qF{U\343\t\204\237\334\327>\315\035\035\020\004y^\224E\024\315\363\305BH\b$3w\214\030\267\017\341y1\246\225\353B\340u\035_WC\214\352\214\356\321/\355<G\367;\324mXIL\301Z%\226b'.
It has worked with other files that were aligned using that same command. These BAM files are between 1.3 and 1.9 Gigs.
These same FASTQ files aligned using tophat through galaxy can be converted to BED just fine.
Any suggestions?
tophat --solexa1.3-quals --no-coverage-search -g 1 -G hg19.mRNA.refseq.gtf --library-type fr-firststrand -p 8 -o ./Fetal_NPC hg19 Fetal_NPC_fastqsanger.fq
I have been able to convert the accepted_hits.bam file to a bedgraph and upload and view on UCSC without any problems.
However when I try to convert to a BED file using bamToBed it gives me this
head test.bed
chr1 10059 10109 ZTAN?XMCXOC 0 +
chr1 10076 10126 ZTAN? 0 +
chr1 10161 10211 ZTAN?XMCXOC 0 -
chr1 12199 12667 ZTAL
| 2 -
chr1 13586 13636 ZTAL| 2 +
chr1 13880 13930 ZTAL
| 1 +
chr1 14247 14297 ZTAN12587 2 +
chr1 14363 14413 ZTAN55 50M * 0 0 CCCCCTTCTCATTCTT
when I'm used to seeing this
head test.bed
chr1 3044397 3044447 1219.006250.80 50 -
chr1 3044430 3044480 1671.605573.80 50 +
chr1 3044451 3044501 638.709901.20 50 +
chr1 3044464 3044514 1651.302808.10 50 -
chr1 3044480 3044530 249.208139.00 50 -
chr1 3044535 3044585 2005.306540.20 50 +
chr1 3044537 3044587 1473.509936.10 50 -
chr1 3044555 3044605 621.208151.90 50 -
chr1 3044639 3044689 1473.905834.10 50 +
chr1 3060410 3060460 1338.207917.00 50 +
I tried to sort the bam file and it gave the following error
sort accepted_hits.bam
sort: string comparison failed: Illegal byte sequence
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were `S"\365(\006\304\024\262r\024\341Ń\026;\353\316ѯ\373\244S_D\vF\262]\323m\233\021\221z\310\331 \235smH' and `\340rnFL\370\302\214\210\177\272/\342\240)\245\021\367\341\'\335\304\375\b\306MP,\344\2066\335~\231I\361\316\202\353\fV\262\272דI,4[-G8\204qF{U\343\t\204\237\334\327>\315\035\035\020\004y^\224E\024\315\363\305BH\b$3w\214\030\267\017\341y1\246\225\353B\340u\035_WC\214\352\214\356\321/\355<G\367;\324mXIL\301Z%\226b'.
It has worked with other files that were aligned using that same command. These BAM files are between 1.3 and 1.9 Gigs.
These same FASTQ files aligned using tophat through galaxy can be converted to BED just fine.
Any suggestions?