Hello,
So I am working on an experement in which I have narrowed down the data of interest to a pretty small number of genes, about 200. I want to use many of the tools out there for analysis, like picard for instance, but SAM/BAM files are often required. So since I need to compare the data output from picard with other data on these genes specifically, I want to make a pseudo-SAM file that is just the concatenated information from the SAM file for the right locations. So lets say that I want to look at data from a gene that starts at position 100 and goes to position 1000, then the next gene I am interested in is from position 5000-5500. Is there a relatively simple way to grab 100-1000, 5000-5500, etc and make a SAM/BAM file from only those regions of interest so I can use that (in conjunction with a respective pseudo-reference genome) to do my analysis? Thanks very much for any advice!
So I am working on an experement in which I have narrowed down the data of interest to a pretty small number of genes, about 200. I want to use many of the tools out there for analysis, like picard for instance, but SAM/BAM files are often required. So since I need to compare the data output from picard with other data on these genes specifically, I want to make a pseudo-SAM file that is just the concatenated information from the SAM file for the right locations. So lets say that I want to look at data from a gene that starts at position 100 and goes to position 1000, then the next gene I am interested in is from position 5000-5500. Is there a relatively simple way to grab 100-1000, 5000-5500, etc and make a SAM/BAM file from only those regions of interest so I can use that (in conjunction with a respective pseudo-reference genome) to do my analysis? Thanks very much for any advice!
Comment