Hi!
I started working with RNA-seq data on an in-house Galaxy installation, but they did not have tophat installed, which I wanted to give a try. So I installed tophat2 on my own computer, together with a hg19 Bowtie2 index that I found on the Illumina ftp server (iGenomes).
One significant difference that I noted was that the Illumina hg19 genome did not contain the chrUn or hap sequences, which were present in the Galaxy installation. However, the associated genes.gtf did contain annotation on these "chromosomes".
Here comes my question: does it matter to tophat2/bowtie2 whether I have the chrUn etc. sequences present or not?
I would expect that my fastq data will align less complete if parts of the genome data is missing and indeed, I find ~75% of reads aligning to the genome, while a related STAR run on the Galaxy-installed genome covers almost 90% of the reads. I could imagine that the difference is junk that I am not interested in anyway, but I am not sure.
What I would like to do afterwards is an expression analysis with cufflinks/cuffdiff.
Happy to hear your feedback.
abisko00
I started working with RNA-seq data on an in-house Galaxy installation, but they did not have tophat installed, which I wanted to give a try. So I installed tophat2 on my own computer, together with a hg19 Bowtie2 index that I found on the Illumina ftp server (iGenomes).
One significant difference that I noted was that the Illumina hg19 genome did not contain the chrUn or hap sequences, which were present in the Galaxy installation. However, the associated genes.gtf did contain annotation on these "chromosomes".
Here comes my question: does it matter to tophat2/bowtie2 whether I have the chrUn etc. sequences present or not?
I would expect that my fastq data will align less complete if parts of the genome data is missing and indeed, I find ~75% of reads aligning to the genome, while a related STAR run on the Galaxy-installed genome covers almost 90% of the reads. I could imagine that the difference is junk that I am not interested in anyway, but I am not sure.
What I would like to do afterwards is an expression analysis with cufflinks/cuffdiff.
Happy to hear your feedback.
abisko00
Comment