Hi guys,
I am a beginner in the data analysis of experiments RNASeq.
I am currently trying to align reads (36 bp) from an RNASeq experiment done on a machine Illumina GaIIx.
As a first analysis I am aligning the reads to the genome with Bowtie, in order to understand how many reads mapped outside coding regions of genes (theoretically, I do not expect to see anything).
For many samples I can map more than 70% of reads, but of these 60% -70% mapped out of the genes annotated regions.
Among the readings that map in the genes, many (60-70%) are potential "PCR duplicates".
Evidently there was some serious problem in the construction of these libraries and in fact I do not think they are usable for expression quantification.
So, these are my questions:
1) What is an acceptable percentage of reads (from a good library) that map on the genome outside regions annotated as genes?
2) Based on your experience, from SE library of 36-bp, what might be an acceptable percentage of "PCR duplicates" (I saw on the forum that this is a much debated topic...)?
3) In the case of a read that can be mapped in "n" places on the genome, above which the value of "n" is advisable to discard the reading (in practice I refer to the options -m/-k of Bowtie).
Thank you for your support!
Francesco.
I am a beginner in the data analysis of experiments RNASeq.
I am currently trying to align reads (36 bp) from an RNASeq experiment done on a machine Illumina GaIIx.
As a first analysis I am aligning the reads to the genome with Bowtie, in order to understand how many reads mapped outside coding regions of genes (theoretically, I do not expect to see anything).
For many samples I can map more than 70% of reads, but of these 60% -70% mapped out of the genes annotated regions.
Among the readings that map in the genes, many (60-70%) are potential "PCR duplicates".
Evidently there was some serious problem in the construction of these libraries and in fact I do not think they are usable for expression quantification.
So, these are my questions:
1) What is an acceptable percentage of reads (from a good library) that map on the genome outside regions annotated as genes?
2) Based on your experience, from SE library of 36-bp, what might be an acceptable percentage of "PCR duplicates" (I saw on the forum that this is a much debated topic...)?
3) In the case of a read that can be mapped in "n" places on the genome, above which the value of "n" is advisable to discard the reading (in practice I refer to the options -m/-k of Bowtie).
Thank you for your support!
Francesco.
Comment