Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dpryan
    replied
    Yes, but there's more than one way to get and even define FPKMs. You could just get the counts with something like featureCounts and then convert them to FPKM (either before or after normalization). That's the fastest method. You could also get expected counts with something like eXpress and convert those to FPKMs. That would take longer and given you different (through pretty similar) results. I can conceive of still other routes.

    Leave a comment:


  • wanfahmi
    replied
    Can we get the FPKM value only without through a long process of Cuffdiff? I do have 100 samples and each sample has 3 biological replicates. I am only interested in FPKM value which will later use for network analysis. Thanks

    Leave a comment:


  • dpryan
    replied
    Usually one uses the union of all exons and gets the length of that. At least unless you want to determine an expected gene length given alignment data (some tools can do that).

    Leave a comment:


  • jhb1980
    replied
    I'm curious about calculating RPKM / FPKM from counts on the gene level. What would you use as "gene length" for a gene with multiple isoforms...? The longest exon chain (i.e. longest transcript isoform)? Or a median transcript length based on all the annotated isoforms of a given gene...?

    Leave a comment:


  • dpryan
    replied
    Originally posted by raymb View Post
    Can someone please explain where the "10^9" in the calculation comes from? Why isn't it 10^6?

    Thanks in advance!
    L is usually measured in base pairs but the RPKM/FPKM metric requires things in KB, so the 10^9 comes from that conversion. If you input a length measure in KB then just use 10^6.

    Leave a comment:


  • raymb
    replied
    Can someone please explain where the "10^9" in the calculation comes from? Why isn't it 10^6?

    Thanks in advance!

    Leave a comment:


  • bswhite
    replied
    Originally posted by ThePresident View Post
    I calculated RPKM values manually, still didn't compare it with Cufflinks RPKMs, but visually I have a lognormal distribution. Sounds logical for me...

    TP
    If/when you do compare to cufflinks RPKM, I would be interested to hear
    the results.

    I also see a (skewed) lognormal computing RPKMs via htseq, but I
    see a bimodal lognormal from cufflinks FPKMs. [Yes, I mean RPKM
    in the first and FPKM in the latter--an oversimplification on my part
    using these paired-end data.]

    Leave a comment:


  • dpryan
    replied
    @ThePresident: That makes more sense then. Yeah, bowtie1 did have its advantages.

    Leave a comment:


  • ThePresident
    replied
    @Ryan: I totally agree with you and that was my fear before I run alignments with bowtie2. However, i reasoned that since I'm dealing with bacterial transcriptome, the complexity of the reference genome is way smaller then that of human RNA-seq. So, I didn't expect a huge number of multimappers as defined by the --score-min function (i.e. reads that have more then one valid alignment).

    I turns out that I was probably right: I lose about 3% of my total reads by filtering for multimappers with XS: tag which is not dramatic. However, one library was more of a mess... It had a deepest sequencing depth (61 M reads) but I lose 14% on "multimappers".

    In my opinion, bowtie1 had somewhat more flexible options. Reporting could be controlled with "-a --best --strata" or "--best -k 1".

    TP

    Leave a comment:


  • dpryan
    replied
    If you just want to keep the first reported alignment of a multimapper then that'll work. To just ignore them, just use MAPQ==255 for STAR.

    Leave a comment:


  • bruce01
    replied
    Can I ask, what do people think of using flags 99/163 or 83/147, eg a 'primary' alignment with paired mate, as another way to do this. Of course this removes all multimappers. I have a script to use the alignment score from STAR to determine which multimapper to keep, but again, wondering is this a safe method? Any potential pitfalls for using those flags, or for using alignment score out of STAR?

    Leave a comment:


  • dpryan
    replied
    @ThePresident: Depending on your definition of multimapper, that grep command may remove a lot more than multimappers. What you're doing is removing any read that has at least one other "valid alignment". But keep in mind that bowtie2 defines a valid alignment as any yielding an alignment score >= that defined by the --score-min function. So, if you have a 100bp read that has 1 alignment with no mismatches (an alignment score of 0 with default settings) and a second with 10 mismatches (minimum alignment score of -60, but possibly higher), then you're calling that a multimapper. The question becomes if you're comfortable having the rate of multimappers change if you tweak --score-min. The more conventional definition of a multimapper is one where the best and second best alignments have equal scores. In bowtie2, those will be given MAPQs of either 0 or 1 and the NH:i auxiliary tag will be set accordingly.

    So yeah, you're getting rid of multimappers. But you're probably getting rid of a LOT of accurate alignments in the process.

    BTW, --sensitive is the default, you needn't specify it.

    Leave a comment:


  • ThePresident
    replied
    Originally posted by bswhite View Post
    Did you mean samtools view -F 256? i.e., exclude any read that
    has the 'not primary alignment' flag set?



    I'm also trying to simplify by excluding both mates if either is multi mapped
    (i.e., has flag 256 set or has NH field set with something other than NH:i:1).
    This should be extremely conservative in giving uniquely mapped reads.

    I plan to run these through cufflinks, htseq-count, and featureCounts
    to compare results.
    So, just to give you a brief update: finally, I aligned with bowtie2 using --sensitive parameter. Then, I used
    Code:
    grep -v XS:i: YourFile.sam > YourFileSingleMappers.sam
    That was the easiest way of dealing with it (bowtie2 prints XS: tag for multimappers). Then I run HTseq-count with -intersection-nonempty parameter. Everything went smooth. I calculated RPKM values manually, still didn't compare it with Cufflinks RPKMs, but visually I have a lognormal distribution. Sounds logical for me...

    TP

    Leave a comment:


  • dpryan
    replied
    Originally posted by Monika_bioinf View Post
    It should be probably mentioned that author of RPKM concept (which has been later replaced by FPKM) deprecated it in his last talk and encouraged not using it.

    http://www.youtube.com/watch?v=5NiFibnbE8o
    Thanks for pointing out that video. His slide around minute 23 is particularly interesting.

    One unfortunate thing about the second part of his talk is that many of us hesitate using estimated values when look at gene-level changes because the earlier methods (cufflinks chiefly among them) simply produced unreliable results. Granted, this seems to be going away as the programs mature, but "once bitten twice shy" as the saying goes.

    Leave a comment:


  • Monika_bioinf
    replied
    It should be probably mentioned that author of RPKM concept (which has been later replaced by FPKM) deprecated it in his last talk and encouraged not using it.

    Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Non-Coding RNA Research and Technologies
    by seqadmin




    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

    Nobel Prize for MicroRNA Discovery
    This week,...
    Yesterday, 08:07 AM
  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM
  • seqadmin
    Understanding Genetic Influence on Infectious Disease
    by seqadmin




    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
    09-09-2024, 10:59 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 10-02-2024, 04:51 AM
0 responses
95 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-01-2024, 07:10 AM
0 responses
106 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-30-2024, 08:33 AM
1 response
104 views
0 likes
Last Post EmiTom
by EmiTom
 
Started by seqadmin, 09-26-2024, 12:57 PM
0 responses
20 views
0 likes
Last Post seqadmin  
Working...
X