Hi all,
I am running into troubles with the microarray data I am trying to analyze. Lets try to explain this clearly. I have two sets of genes for which Iam trying to find differences in histone modifications. For most of the histone modifications, I was able to get pretty elaborate .bam files and making the metaplots was easy-peazy (attach3).
For some of the older data published, such as from microarray data, the process is less straight forward. I converted the microarray data to .bed files and after this to .bam files to analyze them with the R metagene package. For most .bam files this package works great. However because there is no read count available but only intensity the metaplot package does not give nice outputs.
Here is an example of the .bedgraph file used to make .bam (added a mockID)
chrnumber; start; end; normalized intensity
Chr1 25 50 0.005
Chr1 60 85 0.001
Chr1 113 138 0.001
Chr1 154 179 0.359
Chr1 185 210 0.001
Chr1 219 244 0.004
Chr1 254 279 4.599
Chr1 287 312 3.908
And this is how the .bam files look
id-1 0 Chr1 26 255 25M * 0 0 * *
id-2 0 Chr1 61 255 25M * 0 0 * *
id-3 0 Chr1 114 255 25M * 0 0 * *
id-4 0 Chr1 155 255 25M * 0 0 * *
I added an attachment to view the output from the metagene package. When I view these .bedgraph file in IGV for example the curves look really nice for the histone marks, compared to the .bam files generated from the .bedgraphs. This is most likely also the reason why the metagene package is not able to plot my data well.
I tried plotting the .bedGraphs with another package called metagene-maker but this program gives me IndexError: list index out of range. I think this error is caused because most of the reads are not in my designated .bed files with the regions of the genes I want to map. It would take quite some effort to do this manually and this is probably not the way to go. I was thinking about giving the .bam file some mock read count, and use the intensity as the mapping quality but most likely this is not a great idea from a bioinformatics view, and could give some problems upon publication.
I am just wondering what the way forward would be from a bioinformaticians view as other complete .bam files give beautifull output.
So summing it up; I have .bed, .bam and .bedGraph files from microarray data; location and intensity of predesigned probes mapped to genome. Want to know which is the best way to make metaplots of this data against .bed files with self-defined regions (in .bed format).
Help would be greatly appreciated!
R
I am running into troubles with the microarray data I am trying to analyze. Lets try to explain this clearly. I have two sets of genes for which Iam trying to find differences in histone modifications. For most of the histone modifications, I was able to get pretty elaborate .bam files and making the metaplots was easy-peazy (attach3).
For some of the older data published, such as from microarray data, the process is less straight forward. I converted the microarray data to .bed files and after this to .bam files to analyze them with the R metagene package. For most .bam files this package works great. However because there is no read count available but only intensity the metaplot package does not give nice outputs.
Here is an example of the .bedgraph file used to make .bam (added a mockID)
chrnumber; start; end; normalized intensity
Chr1 25 50 0.005
Chr1 60 85 0.001
Chr1 113 138 0.001
Chr1 154 179 0.359
Chr1 185 210 0.001
Chr1 219 244 0.004
Chr1 254 279 4.599
Chr1 287 312 3.908
And this is how the .bam files look
id-1 0 Chr1 26 255 25M * 0 0 * *
id-2 0 Chr1 61 255 25M * 0 0 * *
id-3 0 Chr1 114 255 25M * 0 0 * *
id-4 0 Chr1 155 255 25M * 0 0 * *
I added an attachment to view the output from the metagene package. When I view these .bedgraph file in IGV for example the curves look really nice for the histone marks, compared to the .bam files generated from the .bedgraphs. This is most likely also the reason why the metagene package is not able to plot my data well.
I tried plotting the .bedGraphs with another package called metagene-maker but this program gives me IndexError: list index out of range. I think this error is caused because most of the reads are not in my designated .bed files with the regions of the genes I want to map. It would take quite some effort to do this manually and this is probably not the way to go. I was thinking about giving the .bam file some mock read count, and use the intensity as the mapping quality but most likely this is not a great idea from a bioinformatics view, and could give some problems upon publication.
I am just wondering what the way forward would be from a bioinformaticians view as other complete .bam files give beautifull output.
So summing it up; I have .bed, .bam and .bedGraph files from microarray data; location and intensity of predesigned probes mapped to genome. Want to know which is the best way to make metaplots of this data against .bed files with self-defined regions (in .bed format).
Help would be greatly appreciated!
R
Comment