Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ETHANol
    replied
    Yes, "mappability" was the word I was looking for and not "mapping efficiency". Oops.

    Anyway, it appears to be a little more complex of an issue then I have time or skills to undertake. But it appears some more computationally oriented people are on the issue. Until then, cufflinks or just dividing by transcript length should be good enough for my purposes. Thanks everyone for the insight!

    Leave a comment:


  • sdriscoll
    replied
    cufflinks is fine but keep in mind it's also giving you a more "processed" result than a simple read counter like htseq-count. if you want to use it i recommend using the -b (providing your genome's FASTA source) option because without it i've seen cufflinks give some very odd expression levels to genes that are not justified based on the actual reads aligned to those genes. the -b option seems to fix the over-estimates. there are still under-estimates but at least those seem to be justified in some way. For example if a gene has coverage at only 80% of its exons. If I count reads aligning to that gene and compute the RPKM of it manually i get a higher value than what cufflinks produces while 90+ percent of the rest of the genes have roughly equal expression between my own calculation and theirs. so cufflinks is counting the fact that the gene doesn't have balanced and complete coverage against its FPKM value.

    Leave a comment:


  • lh3
    replied
    Probably I misunderstood "mapping efficiency" (I took it as sort of sensitivity). Anyway, I was talking about a global effect. For the vast majority of genes, changing mappers/settings would not lead to a big effect. Nonetheless, if you look at a particular gene having multiple paralogs, the mapping algorithm and the way to compute FPKM may matter a lot. I know a few groups still prefer their in-house pipelines so that they can fully understand and fix potential artifacts.

    Leave a comment:


  • kopi-o
    replied
    I have had the exact same experience with Nanog in RNA-seq!

    I do think "mapping efficiency" (which is often referred to as "mappability") matters in RNA-seq; I have read a manuscript (not published yet) which argued pretty convincingly that it should be corrected for (and showed a nice way to do it). Methods like NEUMA and some others attempt to do this. The manuscript I mentioned showed that Cufflinks does have a certain systematic bias due to mappability effects.

    Leave a comment:


  • alexdobin
    replied
    Hi Ethan,

    I do not think it is possible to calculate mapping efficiency for RNA-seq data, since reads are spliced and can span hundreds of kilo-bases. In principle, we could do that just for the transcriptome, but then, of course, we would be blind to anything except annotations.

    Alignments do have a big effect on the transcript assembly. We actually looked at the precisely Nanog locus on ENCODE H1ES data. The attached figures show the Cufflinks assembly with Tophat or STAR alignment. In this case, Tophat misses one of the junctions because it maps the contiguously with mismatches to a pseudogene, so Cufflinks cannot assemble the full-length transcript. However, there are still reads mapping to this locus so it will return non-zero FPKM. STAR recovers this junction and allows Cufflinks to reconstruct the whole transcript. Note that these are pretty old results, from Fall 2010, and Tophat may have improved since then.

    In any case, it is probably prudent to try a few different aligners for problematic genes.
    Attached Files

    Leave a comment:


  • ETHANol
    replied
    I don't work with Cufflinks either, but it seems like a reasonable tool to compute FPKM.

    This is the example that concerned me. Nanog has several pseudogenes. If you throw away reads that map to more then one location, I was told that nothing maps to Nanog. Thus, even though Nanog transcription is activated during the transformation from differentiated cells to iPS cells, you do not see it. If this is true, which I was told it is (I've never looked myself), in this case a gene that is highly expressed appears to be indictable.

    Still, bottom line is I don't really know, but would like to hear others opinions.

    Leave a comment:


  • lh3
    replied
    I rarely work with RNA-seq data and I do not use cufflinks, but I am not sure how much mapping efficiency matters. The difference between mapping algorithms/settings is mostly caused by difference in sensitivity. After normalization, FPKM should largely stay the same except a few regions with high diversity.

    I do not think a mask is useful in general, either, unless you are comparing data of very different read lengths or using a mapper without a proper mapping quality. This is at least true for variant calling.

    Leave a comment:


  • ETHANol
    started a topic FPKM and mapping efficiency

    FPKM and mapping efficiency

    Cufflinks addresses some biases in the calculation. I don't know enough about it to say much, but it looks like perhaps the most advanced user-friendly method of FPKM calculation at this time.

    My concern is it doesn't address mapping efficiency. Thus, your parameters and software used for read mapping could have a large effect on the calculated FPKM values. Has anyone addressed this?

    It seems like you could figure out read mapping efficiency with single-end reads by generating a file of every possible read in the genome and a mapping that and then dividing by the gene length. Maybe this is a little too simplistic.

    Does anyone have any thoughts on this?

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin



    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 07:41 AM
0 responses
6 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-11-2024, 07:45 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-10-2024, 07:59 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-09-2024, 08:22 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Working...
X