I needed a little help in understanding the abundance estimation by
Cufflinks.
Please refer to cufflinks supplementary methods.
Let me reiterate some of the key points/definitions; for the sake of
convenience of explanation.
ρ(t) = abundance of transcript t
α(t) = probability of choosing a transcript t
[identified by abundance and length]
β(g) = sum(α(t)) (t belongs to g) = probability of choosing a transcript from
a locus g
γ(t) = probability that chosen transcript has given abundance and length
Question 1: Does that mean that a transcript is fully identified by its
length and abundance ?
Question 2: In the parameter estimation section, I didnt quite understand
how MLE of β becomes X(g)/M.
Shouldnt it be the solution of ∑ ∂(X(g).log(β(g)))/∂β(g) = 0 ?
Question 3: I dont understand importance sampling method much, but is
there an intuitive way of understanding how is γ estimated from input
variable i.e. reads ?
FPKM calculation has l(t) in denominator. Cufflinks should accept any
SAM/BAM file regardless of whether its passed through Tophat. If I pass to
cufflinks, the reads aligned to transcriptome (refseq), and I dont provide
any annotations, then:
Question 4: How is a locus designated ?
Question 5: How is l(t) estimated for FPKM calculation; length of a
transcript should be smaller than a locus?
Finally, how can I use cufflinks without involving genome alignments !?
Cufflinks.
Please refer to cufflinks supplementary methods.
Let me reiterate some of the key points/definitions; for the sake of
convenience of explanation.
ρ(t) = abundance of transcript t
α(t) = probability of choosing a transcript t
[identified by abundance and length]
β(g) = sum(α(t)) (t belongs to g) = probability of choosing a transcript from
a locus g
γ(t) = probability that chosen transcript has given abundance and length
Question 1: Does that mean that a transcript is fully identified by its
length and abundance ?
Question 2: In the parameter estimation section, I didnt quite understand
how MLE of β becomes X(g)/M.
Shouldnt it be the solution of ∑ ∂(X(g).log(β(g)))/∂β(g) = 0 ?
Question 3: I dont understand importance sampling method much, but is
there an intuitive way of understanding how is γ estimated from input
variable i.e. reads ?
FPKM calculation has l(t) in denominator. Cufflinks should accept any
SAM/BAM file regardless of whether its passed through Tophat. If I pass to
cufflinks, the reads aligned to transcriptome (refseq), and I dont provide
any annotations, then:
Question 4: How is a locus designated ?
Question 5: How is l(t) estimated for FPKM calculation; length of a
transcript should be smaller than a locus?
Finally, how can I use cufflinks without involving genome alignments !?