I am trying to use mapBed but I am running into a problem that I can illustrate with the examples below. I am using this testfile.
>cat test.bed
chr1 100000 109000
chr2 100000 109000
chr3 100000 109000
chr4 100000 109000
chr5 100000 109000
chr6 100000 109000
chr7 100000 109000
chr8 100000 109000
chr9 100000 109000
chr10 100000 109000
chr11 100000 109000
chr12 100000 109000
chr13 100000 109000
chr14 100000 109000
chr15 100000 109000
chr16 100000 109000
chr17 100000 109000
chr18 100000 109000
chr19 100000 109000
chr20 100000 109000
chr21 100000 109000
chr22 100000 109000
I am querying that against a bedgraph file that I downloaded and sorted from UCSC genome browser. I can't show the whole file but I can show you how it is sorted.
>cat wgEncodeSydhHistoneK562bH3k4me3bUcdSig.bedgraph.noM | awk '{print $1}' | uniq
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr20
chr21
chr22
chrX
When I run the command...
>mapBed -c 3 -o sum -a test.bed -b wgEncodeSydhHistoneK562H3k9acbUcdSig.bedgraph.noM
I get...
chr1 100000 109000 1584710
chr2 100000 109000 6965660
chr3 100000 109000 4067179
chr4 100000 109000 3997623
chr5 100000 109000 22672757
chr6 100000 109000 100786
chr7 100000 109000 13187590
chr8 100000 109000 .
chr9 100000 109000 5536045
chr10 100000 109000 .
chr11 100000 109000 .
chr12 100000 109000 .
chr13 100000 109000 .
chr14 100000 109000 .
chr15 100000 109000 .
chr16 100000 109000 .
chr17 100000 109000 .
chr18 100000 109000 .
chr19 100000 109000 .
chr20 100000 109000 .
chr21 100000 109000 .
chr22 100000 109000 .
It is fine that I dont get anything for chr8. I checked that one and there is no coverage in that region. But for the later chromosomes (10 and on) I should be seeing a signal. Can anyone explain why this is happening or what I can do to fix it? mapBed says...
"Notes:
(1) Both input files must be sorted by chrom, then start."
Are my files sorted wrong? Thanks so much.
>cat test.bed
chr1 100000 109000
chr2 100000 109000
chr3 100000 109000
chr4 100000 109000
chr5 100000 109000
chr6 100000 109000
chr7 100000 109000
chr8 100000 109000
chr9 100000 109000
chr10 100000 109000
chr11 100000 109000
chr12 100000 109000
chr13 100000 109000
chr14 100000 109000
chr15 100000 109000
chr16 100000 109000
chr17 100000 109000
chr18 100000 109000
chr19 100000 109000
chr20 100000 109000
chr21 100000 109000
chr22 100000 109000
I am querying that against a bedgraph file that I downloaded and sorted from UCSC genome browser. I can't show the whole file but I can show you how it is sorted.
>cat wgEncodeSydhHistoneK562bH3k4me3bUcdSig.bedgraph.noM | awk '{print $1}' | uniq
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr20
chr21
chr22
chrX
When I run the command...
>mapBed -c 3 -o sum -a test.bed -b wgEncodeSydhHistoneK562H3k9acbUcdSig.bedgraph.noM
I get...
chr1 100000 109000 1584710
chr2 100000 109000 6965660
chr3 100000 109000 4067179
chr4 100000 109000 3997623
chr5 100000 109000 22672757
chr6 100000 109000 100786
chr7 100000 109000 13187590
chr8 100000 109000 .
chr9 100000 109000 5536045
chr10 100000 109000 .
chr11 100000 109000 .
chr12 100000 109000 .
chr13 100000 109000 .
chr14 100000 109000 .
chr15 100000 109000 .
chr16 100000 109000 .
chr17 100000 109000 .
chr18 100000 109000 .
chr19 100000 109000 .
chr20 100000 109000 .
chr21 100000 109000 .
chr22 100000 109000 .
It is fine that I dont get anything for chr8. I checked that one and there is no coverage in that region. But for the later chromosomes (10 and on) I should be seeing a signal. Can anyone explain why this is happening or what I can do to fix it? mapBed says...
"Notes:
(1) Both input files must be sorted by chrom, then start."
Are my files sorted wrong? Thanks so much.
Comment