Unconfigured Ad

**mgogol** · 11-17-2010, 07:52 AM

The reads that would have aligned there could have been thrown out because they mapped too many places in the genome, is that possible?

You could test this by cutting up the gene and trying to align it to the genome. Tophat by default will throw out reads that map > 40 places.

If you think it's expressed, you could test it with qPCR too.

**malachig** · 11-17-2010, 12:53 PM

I agree with mgogol. One possible explanation is mapping, although with 50mers you would think some part of the gene would be mappable. How big is this gene? How many exons?

Try a BLAT of some of the exons of this gene against the genome (do they hit everywhere?). You could also try mapping reads directly against a database of transcripts to see whether there are any matching reads (this would again suggest that the reason they are not coming through in your Tophat analysis is that they are ambiguous in the genome...). It would also allow you to identify a subset of reads to try different mapping approaches on and maybe eliminate the possibility of some technical issue(s) with your tophat run...

**bioinfosm** · 11-17-2010, 02:22 PM

Another possibility could be small exons, and then junctions come into play. I am not sure is tophat/cufflinks take a de novo approach that overcomes this?

**mnkyboy** · 11-17-2010, 02:32 PM

We have seen trimming down to 25 with bowtie and tophat can help with this.

**RockChalkJayhawk** · 11-19-2010, 09:47 AM

One easy way to check if it is a mappability issue it to go to UCSC for hg19,
click the Mapability link under "Mapping and Sequence Tracks"
select the 50bp option,
look at your gene.

Otherwise, maybe you just found some biology in your experiment.

**malachig** · 11-19-2010, 11:41 AM

Tophat/cufflinks does try to predict novel exons and junctions. I'm not sure how well it handles small exons... Mapping small reads to the genome and then trying to find small exons is a challenging problem. A read that overlaps a small exon may require an alignment into three or more short blocks with potentially large gaps between (e.g. <exon>-intron-<exon>-intron-<exon>). Doing this with full length cDNAs (never mind short reads) can be a difficult. I suspect that most methods in use right now have fairly low sensitivity for short exons. One strategy to at least capture the short exons from known transcripts is to integrate alignment to transcripts and the genome (as advocated in Griffith et al.). This way, reads that hit short exons can be aligned to a transcript sequence without gaps which is much easier. This of course does not work for novel transcripts and relies on the accuracy and completion of transcript annotations for the species being analyzed.

**bioinfosm** · 11-19-2010, 12:12 PM

Exactly, and I think doing alignments to genome and transcripts raises an important question of which one to get priority. There would certainly be reads well aligned to both, then I suppose transcript alignment would get preference.

Similar issue comes when using a separate reads junction database or contamination database.. given equal alignments of a read to multiple datasets, which one should get preference

**malachig** · 11-19-2010, 12:17 PM

Regarding the comment that the observation may simply be due to biology. Good point! Once you convince yourself that the observation is not due to an artifact of the analysis (mapability, small exons, etc.) you should definitely consider this.

One of the benefits of RNA-seq over microarrays (IMHO) is the excellent signal-to-noise ratio. I have observed many cases where a gene appears to be turned off in one condition and on in another condition (tissue type, treatment, etc.). Even in very deep RNA-seq libraries, i have been amazed to see just how few reads are reported for the 'off' state. And since in this case we have RNA-seq libraries (analyzed by the same method) for alternate conditions that do get covered by reads we can reassure ourselves that the lack of coverage is not due to some artifact of the analysis.

For example, consider this data set (ALEXA-seq by Griffith et al) consisting of four cell types (normal luminal breast, normal myepithelial breast, hESCs, and vHMECs). These libraries have ~150 million paired-end 75-mers each. The quality of these libraries was very high. After performing differential expression analysis we can find many examples where a particular gene has 0 reads (or just a few) in one of these conditions, and thousands in one or more of the others. And we can find 'off' genes for any of the four cell types. For example, CCL2 has 0 reads in the vHMECs library and 124,664 in the myoepithelial breast library. Similarly, COL17A1 has 776,907 reads in the vHMECs library but only 142 in the luminal epithelial breast tissue. You can explore further examples in this list of DE genes.

**bioinfosm** · 11-19-2010, 12:24 PM

Yes. A paired or comparison analysis really reduces a lot of sequencing and mapping biases, so differential expression comparisons are more or less accurate.

I really wish to give this trans-Abyss tool a try, seems like a one-stop solution to rna-seq analysis, but has a huge list of pre-reqs and config files to figure out

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 44 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Expression in RNA-seq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News