HTseq to DeSeq/EdgeR to Heatmap

dpryan replied

05-15-2014, 09:56 AM
I wouldn't normally recommend the kOverA method for RNAseq, since it's usually more meaningful to use summed counts or average counts (I think DESeq2 uses the average counts).

Using genefilter with edgeR would work the same as in the aforementioned PDF. You perform your tests as normal and then put whatever filtering metric (summed counts, average counts, etc.) and the raw p-values into filtered_p() or filtered_R() and continue in a similar manner.
Leave a comment:
super0925 replied

05-15-2014, 09:34 AM
Originally posted by dpryan View Post

You'll want to read the "Diagnostics for independent filtering" PDF, which uses an RNAseq example. I usually find filtered_R() to be the most convenient function in a script (for your first time doing things manually you should go ahead and use filtered_p() and rejection_plot() to get a feel for things.)

I have seen you PDF. It is nice. However, they use DESeq's fitNbinomGLMs function to decide the p-value, (but we could use DESeq2 to instead of DESeq and DESeq2 is atumatically filtered as you said)But how about edgeR?

And how about my command in my post #115? Is is too naive?
Leave a comment:
dpryan replied

05-15-2014, 08:00 AM
You'll want to read the "Diagnostics for independent filtering" PDF, which uses an RNAseq example. I usually find filtered_R() to be the most convenient function in a script (for your first time doing things manually you should go ahead and use filtered_p() and rejection_plot() to get a feel for things.)
Leave a comment:
super0925 replied

05-15-2014, 07:56 AM
Originally posted by dpryan View Post

At least the more recent versions, yes.

But so many function in genefilter,how about the parameter seeting?
their usage in Manual:
set.seed(-1)
f1 <- kOverA(5, 10)
flist <- filterfun(f1, allNA)
exprA <- matrix(rnorm(1000, 10), ncol = 10)
ans <- genefilter(exprA, flist)

My commands:
set.seed(-1)
f1 <- kOverA(5, 10)----Q: This means an expression measure above 10 in at least 5 samples.but how to decide this parameter? And, could I use ttest instead?
flist <- filterfun(f1)
exprA <- counts----Two conditions with 3 samples in each group
head(counts)
C1.R1 C1.R2 C1.R3 C2.R1 C2.R2 C2.R3
ENSBTAG00000000003 0 0 0 0 0 0
ENSBTAG00000000005 1 0 0 1 0 0
ENSBTAG00000000008 2 2 1 0 2 0
ans <- genefilter(exprA, flist)
counts_filter<-counts[which(ans==1), ]

Is this OK?

Last edited by super0925; 05-15-2014, 08:00 AM.
Leave a comment:
dpryan replied

05-15-2014, 07:22 AM
At least the more recent versions, yes.
Leave a comment:
super0925 replied

05-15-2014, 07:21 AM
Originally posted by dpryan View Post

I haven't a clue how cuffdiff works internally, the documentation is simply insufficient there and I have no desire to go through its code.

I assume you mean to ask what the advantages of DESeq2 are over DESeq. Just have a look at the DESeq2 paper, which lays them out.

Hi D,
Q1:I have seen your post on bioStar. You suggest we remove the low counts by genefilter when using DESeq/edgeR/limma. But DESeq2 automatically has filter. Am I right?
Q2:
But so many function in genefilter, do I use genefilter function?
how about the parameter?

their usage:
set.seed(-1)
f1 <- kOverA(5, 10)
flist <- filterfun(f1, allNA)
exprA <- matrix(rnorm(1000, 10), ncol = 10) (change it to count matrix)
ans <- genefilter(exprA, flist)

Is that OK?

Last edited by super0925; 05-15-2014, 07:30 AM.
Leave a comment:
dpryan replied

05-15-2014, 02:55 AM
I haven't a clue how cuffdiff works internally, the documentation is simply insufficient there and I have no desire to go through its code.

I assume you mean to ask what the advantages of DESeq2 are over DESeq. Just have a look at the DESeq2 paper, which lays them out.
Leave a comment:
super0925 replied

05-15-2014, 02:34 AM
Originally posted by dpryan View Post

DESeq (better would be DESeq2) would work fine as well.

But Cuffdiff could only do pairwise groups, am I right?
So I need to do C1vsC2,C2vsC3,C3vsC1 in cuffdiff one by one, and compare the result with edgeR or DESeq2(What you recommend is better and advantage than DESeq).
Leave a comment:
dpryan replied

05-14-2014, 07:02 AM
Originally posted by super0925 View Post

Hi I have a small question.
If I have 3 conditions . I thought I could use egdeR (is that GLM model I read from some other posts) to analysis it. Am I right?

DESeq (better would be DESeq2) would work fine as well.
Leave a comment:
super0925 replied

05-14-2014, 06:56 AM
Originally posted by dpryan View Post

I suppose as long as you told cufflinks to only perform its quantitation based on the annotation and not do anything else. Otherwise you'd be comparing apples to oranges.

Awk would work if you want. I'd personally use R with GenomicRanges, but there are many ways to skin this proverbial cat.

Hi I have a small question.
If I have 3 conditions . I thought I could use egdeR (is that GLM model I read from some other posts) to analysis it. Am I right?
How about DESeq?
So far I know I could use 3 independence test (C1 vs C2, C2 vs C3, C1 vs C3) to get the DE genes. It is good but requiring for three times.

Last edited by super0925; 05-14-2014, 06:59 AM.
Leave a comment:
dpryan replied

05-13-2014, 06:54 AM
Originally posted by super0925 View Post

How about my methods in post#105?

I suppose as long as you told cufflinks to only perform its quantitation based on the annotation and not do anything else. Otherwise you'd be comparing apples to oranges.

Could you please tell me how to convert texudo output to Ensembl ID by using annotation file's mappings? using AWK?

Awk would work if you want. I'd personally use R with GenomicRanges, but there are many ways to skin this proverbial cat.
Leave a comment:
super0925 replied

05-13-2014, 05:22 AM
Originally posted by dpryan View Post

I was talking about the ensembl ID and the uniprot ID, which both are. Ensembl IDs are easy to deal with, so just convert everything to that by using your annotation file's mappings and then use those to derive whatever other names you need. You can do that in most any language (or just load things into R as a GRanges object, in which case it'll parse things for you and you can just use unique() on a subset of the mcols()).

How about my methods in post#105?
"
Use Ensembl ID which is uniformed by the genes.gtf (1)"Tophat-htseq-edgeR" or (2)"Tophat-Cuffdiff" but without Cufflinks and Cuffmerge(because Cufflinks+Cuffmerge would generate a merged.gtf whose gene name is mixture with gene name and uniprot ID as I show to you) , after then, I translate them to gene name by R package or online tools (Biomart) .
"

And "Tophat-htseq-edgeR/DESeq", the reads counts table is given by Ensembl ID, which I have tried.
I also have run the "Texudo without Cufflinks and Cuffmerge" and the result could also give you the gene Ensembl ID.
However, I miss the Cufflinks+Cuffmerge steps.
Could you please tell me how to convert texudo output to Ensembl ID by using annotation file's mappings? using AWK?

Last edited by super0925; 05-13-2014, 05:59 AM.
Leave a comment:
dpryan replied

05-13-2014, 03:42 AM
I was talking about the ensembl ID and the uniprot ID, which both are. Ensembl IDs are easy to deal with, so just convert everything to that by using your annotation file's mappings and then use those to derive whatever other names you need. You can do that in most any language (or just load things into R as a GRanges object, in which case it'll parse things for you and you can just use unique() on a subset of the mcols()).
Leave a comment:
super0925 replied

05-13-2014, 01:19 AM
Originally posted by dpryan View Post

No, just use R/biopython/bioperl/whatever to read in the GTF/GFF file that you used with cufflinks & htseq-count and then extract the ID conversions with that. Since I assume that you used the same annotation file for both tools, it must have both IDs, which means that that file can be used to define the ID mappings. Yes, this will take a modicum of programming.

Hi Devon
I still have two questions:
Q1:
The full name "prostaglandin D2 receptor" is not from my GTF file... What is strange , in gene names from GTF file, I think some use UniProt Accession(e.g. ATPO_BOVIN) but some use gene names(e.g. ITSN1 ). It made me confused.
I have screenshot to you (My GTF file is from Ensembl Database. Bovine's genome annotation, Btau_4.0).
Hence I don't know how to uniform them? I think the reasonable method is use Ensembl ID which is uniformed (by (1)"Tophat-htseq-edgeR" or (2)"Tophat-Cuffdiff" but without Cufflinks and Cuffmerge) , after then, I translate them to gene name by R package or online tools (Biomart) .
Am I right? Or DO you have any better solution?
Q2:
The full name（e.g. "2'-5'-oligoadenylate synthetase","prostaglandin D2 receptor"） is my collaborator sent to me, who is more interested with the long full name than gene name. He google the gene name or Ensembl ID and get the full name...
Hence I am think a automatically transfer method than manually Google it.
Or may I change another annotation file (GTF) e.g. from UCSC database rather than Ensembl?
Thank you!
Attached Files

Untitled.png (99.6 KB, 2 views)
Last edited by super0925; 05-13-2014, 03:23 AM.
Leave a comment:
dpryan replied

05-13-2014, 12:41 AM
No, just use R/biopython/bioperl/whatever to read in the GTF/GFF file that you used with cufflinks & htseq-count and then extract the ID conversions with that. Since I assume that you used the same annotation file for both tools, it must have both IDs, which means that that file can be used to define the ID mappings. Yes, this will take a modicum of programming.
Leave a comment:

Previous 1 3 4 5 6 7 8 9 13 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News