Hi,
I am fairly new to RNA-Seq analysis and hoped someone could help me diagnose the reason why I have such a low proportion of reads mapped to features with my current method.
The experiment:
The methods:
As you can see I have only a very small proportion of reads mapped to a feature. I don't really know what is normal but this seems very small. I am certain that it is the correct genes.gtf annotation file as it was provided gzipped with the genome itself. What kind of % of reads mapped on features should I be expecting? What can I do to diagnose this problem? Can anyone spot anything wrong that I am doing? Thanks.
I also have a second question, I am able to do limited down-stream analysis using DESeq using the currently read counts. I get very few (only two) genes with differential expression. These two genes are directly adjacent to each other on the genome but not overlapping, these are Uckl1 and Znf512b (http://genome-euro.ucsc.edu/cgi-bin/...gsid=194787325). There are 130 bp between them including the UTRs. I can't imagine how this may be happening if I am using the --union option in HTSeq-count as any reads that overlap both should be labeled as ambiguous features.
Thanks for any help.
I am fairly new to RNA-Seq analysis and hoped someone could help me diagnose the reason why I have such a low proportion of reads mapped to features with my current method.
The experiment:
Code:
- Were looking at the retina in developing mice - I have two conditions (wild-type vs mutant), each with 4 biological replicates (8 in total). - RNA prepared using Illumina TruSeq stranded kit. Ribo-Zero Deplete techneque. - Using a pair-end 2 x 100bp HiSeq reads. Approximately 40 million pairs per replicate.
Code:
- Trimmed low quality regions/reads using Trimmomatic. - Using TopHat2 to map reads to UCSC mm10 reference genome available on the tophat website. - --no-novel-juncs and larger --mate-inner-dist arguments - Example numbers from one replicate: - 92.8% overall mapping rate - 30,244,035 aligned pairs; of these: - 9.4% have multiple alignments - 1.9% are discordant alignments - 87.6% concordant pair alignment rate - These numbers seem pretty good to me. - Sorted the acccepted_hits.bam file by read name using 'samtools sort -n' - Converted bam to sam using 'samtools view' - Extracting feature read counts using HTSeq-count and the gtf annotation file also from UCSC mm10. - Using default arguments (method=union and stranded=yes) - Example numbers: - successfull feature = 505,622 - no feature = 29,039,281 - ambiguous feature = 2525 - non unique feature = 10,519,556 - Success rate = 1.26%
I also have a second question, I am able to do limited down-stream analysis using DESeq using the currently read counts. I get very few (only two) genes with differential expression. These two genes are directly adjacent to each other on the genome but not overlapping, these are Uckl1 and Znf512b (http://genome-euro.ucsc.edu/cgi-bin/...gsid=194787325). There are 130 bp between them including the UTRs. I can't imagine how this may be happening if I am using the --union option in HTSeq-count as any reads that overlap both should be labeled as ambiguous features.
Thanks for any help.
Comment