I wouldn't normally recommend the kOverA method for RNAseq, since it's usually more meaningful to use summed counts or average counts (I think DESeq2 uses the average counts).
Using genefilter with edgeR would work the same as in the aforementioned PDF. You perform your tests as normal and then put whatever filtering metric (summed counts, average counts, etc.) and the raw p-values into filtered_p() or filtered_R() and continue in a similar manner.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by dpryan View PostYou'll want to read the "Diagnostics for independent filtering" PDF, which uses an RNAseq example. I usually find filtered_R() to be the most convenient function in a script (for your first time doing things manually you should go ahead and use filtered_p() and rejection_plot() to get a feel for things.)
And how about my command in my post #115? Is is too naive?
Leave a comment:
-
You'll want to read the "Diagnostics for independent filtering" PDF, which uses an RNAseq example. I usually find filtered_R() to be the most convenient function in a script (for your first time doing things manually you should go ahead and use filtered_p() and rejection_plot() to get a feel for things.)
Leave a comment:
-
Originally posted by dpryan View PostAt least the more recent versions, yes.
their usage in Manual:
set.seed(-1)
f1 <- kOverA(5, 10)
flist <- filterfun(f1, allNA)
exprA <- matrix(rnorm(1000, 10), ncol = 10)
ans <- genefilter(exprA, flist)
My commands:
set.seed(-1)
f1 <- kOverA(5, 10)----Q: This means an expression measure above 10 in at least 5 samples.but how to decide this parameter? And, could I use ttest instead?
flist <- filterfun(f1)
exprA <- counts----Two conditions with 3 samples in each group
head(counts)
C1.R1 C1.R2 C1.R3 C2.R1 C2.R2 C2.R3
ENSBTAG00000000003 0 0 0 0 0 0
ENSBTAG00000000005 1 0 0 1 0 0
ENSBTAG00000000008 2 2 1 0 2 0
ans <- genefilter(exprA, flist)
counts_filter<-counts[which(ans==1), ]
Is this OK?Last edited by super0925; 05-15-2014, 08:00 AM.
Leave a comment:
-
Originally posted by dpryan View PostI haven't a clue how cuffdiff works internally, the documentation is simply insufficient there and I have no desire to go through its code.
I assume you mean to ask what the advantages of DESeq2 are over DESeq. Just have a look at the DESeq2 paper, which lays them out.
Q1:I have seen your post on bioStar. You suggest we remove the low counts by genefilter when using DESeq/edgeR/limma. But DESeq2 automatically has filter. Am I right?
Q2:
But so many function in genefilter, do I use genefilter function?
how about the parameter?
their usage:
set.seed(-1)
f1 <- kOverA(5, 10)
flist <- filterfun(f1, allNA)
exprA <- matrix(rnorm(1000, 10), ncol = 10) (change it to count matrix)
ans <- genefilter(exprA, flist)
Is that OK?Last edited by super0925; 05-15-2014, 07:30 AM.
Leave a comment:
-
I haven't a clue how cuffdiff works internally, the documentation is simply insufficient there and I have no desire to go through its code.
I assume you mean to ask what the advantages of DESeq2 are over DESeq. Just have a look at the DESeq2 paper, which lays them out.
Leave a comment:
-
Originally posted by dpryan View PostDESeq (better would be DESeq2) would work fine as well.
So I need to do C1vsC2,C2vsC3,C3vsC1 in cuffdiff one by one, and compare the result with edgeR or DESeq2(What you recommend is better and advantage than DESeq).
Leave a comment:
-
Originally posted by dpryan View PostI suppose as long as you told cufflinks to only perform its quantitation based on the annotation and not do anything else. Otherwise you'd be comparing apples to oranges.
Awk would work if you want. I'd personally use R with GenomicRanges, but there are many ways to skin this proverbial cat.
Hi I have a small question.
If I have 3 conditions . I thought I could use egdeR (is that GLM model I read from some other posts) to analysis it. Am I right?
How about DESeq?
So far I know I could use 3 independence test (C1 vs C2, C2 vs C3, C1 vs C3) to get the DE genes. It is good but requiring for three times.Last edited by super0925; 05-14-2014, 06:59 AM.
Leave a comment:
-
Originally posted by super0925 View PostHow about my methods in post#105?
Could you please tell me how to convert texudo output to Ensembl ID by using annotation file's mappings? using AWK?
Leave a comment:
-
Originally posted by dpryan View PostI was talking about the ensembl ID and the uniprot ID, which both are. Ensembl IDs are easy to deal with, so just convert everything to that by using your annotation file's mappings and then use those to derive whatever other names you need. You can do that in most any language (or just load things into R as a GRanges object, in which case it'll parse things for you and you can just use unique() on a subset of the mcols()).
"
Use Ensembl ID which is uniformed by the genes.gtf (1)"Tophat-htseq-edgeR" or (2)"Tophat-Cuffdiff" but without Cufflinks and Cuffmerge(because Cufflinks+Cuffmerge would generate a merged.gtf whose gene name is mixture with gene name and uniprot ID as I show to you) , after then, I translate them to gene name by R package or online tools (Biomart) .
"
And "Tophat-htseq-edgeR/DESeq", the reads counts table is given by Ensembl ID, which I have tried.
I also have run the "Texudo without Cufflinks and Cuffmerge" and the result could also give you the gene Ensembl ID.
However, I miss the Cufflinks+Cuffmerge steps.
Could you please tell me how to convert texudo output to Ensembl ID by using annotation file's mappings? using AWK?Last edited by super0925; 05-13-2014, 05:59 AM.
Leave a comment:
-
I was talking about the ensembl ID and the uniprot ID, which both are. Ensembl IDs are easy to deal with, so just convert everything to that by using your annotation file's mappings and then use those to derive whatever other names you need. You can do that in most any language (or just load things into R as a GRanges object, in which case it'll parse things for you and you can just use unique() on a subset of the mcols()).
Leave a comment:
-
Originally posted by dpryan View PostNo, just use R/biopython/bioperl/whatever to read in the GTF/GFF file that you used with cufflinks & htseq-count and then extract the ID conversions with that. Since I assume that you used the same annotation file for both tools, it must have both IDs, which means that that file can be used to define the ID mappings. Yes, this will take a modicum of programming.
I still have two questions:
Q1:
The full name "prostaglandin D2 receptor" is not from my GTF file... What is strange , in gene names from GTF file, I think some use UniProt Accession(e.g. ATPO_BOVIN) but some use gene names(e.g. ITSN1 ). It made me confused.
I have screenshot to you (My GTF file is from Ensembl Database. Bovine's genome annotation, Btau_4.0).
Hence I don't know how to uniform them? I think the reasonable method is use Ensembl ID which is uniformed (by (1)"Tophat-htseq-edgeR" or (2)"Tophat-Cuffdiff" but without Cufflinks and Cuffmerge) , after then, I translate them to gene name by R package or online tools (Biomart) .
Am I right? Or DO you have any better solution?
Q2:
The full name(e.g. "2'-5'-oligoadenylate synthetase","prostaglandin D2 receptor") is my collaborator sent to me, who is more interested with the long full name than gene name. He google the gene name or Ensembl ID and get the full name...
Hence I am think a automatically transfer method than manually Google it.
Or may I change another annotation file (GTF) e.g. from UCSC database rather than Ensembl?
Thank you!Attached FilesLast edited by super0925; 05-13-2014, 03:23 AM.
Leave a comment:
-
No, just use R/biopython/bioperl/whatever to read in the GTF/GFF file that you used with cufflinks & htseq-count and then extract the ID conversions with that. Since I assume that you used the same annotation file for both tools, it must have both IDs, which means that that file can be used to define the ID mappings. Yes, this will take a modicum of programming.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
19 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
17 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
62 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Leave a comment: