I have some RNA-Seq data in FASTQ format that was sequenced using Illumina Genome Analyzer. My end goal is to get an expression value for each human gene.
As a test, I downloaded the FASTA files for chromosome 22 from ENSEMBL. Then I used bowtie to map the reads. I'm using the following parameters for this.
./bowtie -a -v 2 --best --suppress 1,5,6,8 --quiet -p 8 chr22.fa $sequenceFile $sequenceFile.map
My first desire is to understand how to interpret the .map file that is output by bowtie. The documentation explains what each column means. But I'm getting more lines in my .map file than there are lines in my FASTQ file, which is unintuitive to me. I was thinking that each line of output would correspond to one read that had aligned to a specific region of the genome. Is that not the case? If not, what does each line in the output represent as a whole?
Any help would be greatly appreciated.
As a test, I downloaded the FASTA files for chromosome 22 from ENSEMBL. Then I used bowtie to map the reads. I'm using the following parameters for this.
./bowtie -a -v 2 --best --suppress 1,5,6,8 --quiet -p 8 chr22.fa $sequenceFile $sequenceFile.map
My first desire is to understand how to interpret the .map file that is output by bowtie. The documentation explains what each column means. But I'm getting more lines in my .map file than there are lines in my FASTQ file, which is unintuitive to me. I was thinking that each line of output would correspond to one read that had aligned to a specific region of the genome. Is that not the case? If not, what does each line in the output represent as a whole?
Any help would be greatly appreciated.
Comment