Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dpryan
    replied
    I wouldn't normally recommend the kOverA method for RNAseq, since it's usually more meaningful to use summed counts or average counts (I think DESeq2 uses the average counts).

    Using genefilter with edgeR would work the same as in the aforementioned PDF. You perform your tests as normal and then put whatever filtering metric (summed counts, average counts, etc.) and the raw p-values into filtered_p() or filtered_R() and continue in a similar manner.

    Leave a comment:


  • super0925
    replied
    Originally posted by dpryan View Post
    You'll want to read the "Diagnostics for independent filtering" PDF, which uses an RNAseq example. I usually find filtered_R() to be the most convenient function in a script (for your first time doing things manually you should go ahead and use filtered_p() and rejection_plot() to get a feel for things.)
    I have seen you PDF. It is nice. However, they use DESeq's fitNbinomGLMs function to decide the p-value, (but we could use DESeq2 to instead of DESeq and DESeq2 is atumatically filtered as you said)But how about edgeR?


    And how about my command in my post #115? Is is too naive?

    Leave a comment:


  • dpryan
    replied
    You'll want to read the "Diagnostics for independent filtering" PDF, which uses an RNAseq example. I usually find filtered_R() to be the most convenient function in a script (for your first time doing things manually you should go ahead and use filtered_p() and rejection_plot() to get a feel for things.)

    Leave a comment:


  • super0925
    replied
    Originally posted by dpryan View Post
    At least the more recent versions, yes.
    But so many function in genefilter,how about the parameter seeting?
    their usage in Manual:
    set.seed(-1)
    f1 <- kOverA(5, 10)
    flist <- filterfun(f1, allNA)
    exprA <- matrix(rnorm(1000, 10), ncol = 10)
    ans <- genefilter(exprA, flist)

    My commands:
    set.seed(-1)
    f1 <- kOverA(5, 10)----Q: This means an expression measure above 10 in at least 5 samples.but how to decide this parameter? And, could I use ttest instead?
    flist <- filterfun(f1)
    exprA <- counts
    ----Two conditions with 3 samples in each group
    head(counts)
    C1.R1 C1.R2 C1.R3 C2.R1 C2.R2 C2.R3
    ENSBTAG00000000003 0 0 0 0 0 0
    ENSBTAG00000000005 1 0 0 1 0 0
    ENSBTAG00000000008 2 2 1 0 2 0
    ans <- genefilter(exprA, flist)
    counts_filter<-counts[which(ans==1), ]


    Is this OK?
    Last edited by super0925; 05-15-2014, 08:00 AM.

    Leave a comment:


  • dpryan
    replied
    At least the more recent versions, yes.

    Leave a comment:


  • super0925
    replied
    Originally posted by dpryan View Post
    I haven't a clue how cuffdiff works internally, the documentation is simply insufficient there and I have no desire to go through its code.

    I assume you mean to ask what the advantages of DESeq2 are over DESeq. Just have a look at the DESeq2 paper, which lays them out.
    Hi D,
    Q1:I have seen your post on bioStar. You suggest we remove the low counts by genefilter when using DESeq/edgeR/limma. But DESeq2 automatically has filter. Am I right?
    Q2:
    But so many function in genefilter, do I use genefilter function?
    how about the parameter?


    their usage:
    set.seed(-1)
    f1 <- kOverA(5, 10)
    flist <- filterfun(f1, allNA)
    exprA <- matrix(rnorm(1000, 10), ncol = 10) (change it to count matrix)
    ans <- genefilter(exprA, flist)

    Is that OK?
    Last edited by super0925; 05-15-2014, 07:30 AM.

    Leave a comment:


  • dpryan
    replied
    I haven't a clue how cuffdiff works internally, the documentation is simply insufficient there and I have no desire to go through its code.

    I assume you mean to ask what the advantages of DESeq2 are over DESeq. Just have a look at the DESeq2 paper, which lays them out.

    Leave a comment:


  • super0925
    replied
    Originally posted by dpryan View Post
    DESeq (better would be DESeq2) would work fine as well.
    But Cuffdiff could only do pairwise groups, am I right?
    So I need to do C1vsC2,C2vsC3,C3vsC1 in cuffdiff one by one, and compare the result with edgeR or DESeq2(What you recommend is better and advantage than DESeq).

    Leave a comment:


  • dpryan
    replied
    Originally posted by super0925 View Post
    Hi I have a small question.
    If I have 3 conditions . I thought I could use egdeR (is that GLM model I read from some other posts) to analysis it. Am I right?
    DESeq (better would be DESeq2) would work fine as well.

    Leave a comment:


  • super0925
    replied
    Originally posted by dpryan View Post
    I suppose as long as you told cufflinks to only perform its quantitation based on the annotation and not do anything else. Otherwise you'd be comparing apples to oranges.



    Awk would work if you want. I'd personally use R with GenomicRanges, but there are many ways to skin this proverbial cat.

    Hi I have a small question.
    If I have 3 conditions . I thought I could use egdeR (is that GLM model I read from some other posts) to analysis it. Am I right?
    How about DESeq?
    So far I know I could use 3 independence test (C1 vs C2, C2 vs C3, C1 vs C3) to get the DE genes. It is good but requiring for three times.
    Last edited by super0925; 05-14-2014, 06:59 AM.

    Leave a comment:


  • dpryan
    replied
    Originally posted by super0925 View Post
    How about my methods in post#105?
    I suppose as long as you told cufflinks to only perform its quantitation based on the annotation and not do anything else. Otherwise you'd be comparing apples to oranges.

    Could you please tell me how to convert texudo output to Ensembl ID by using annotation file's mappings? using AWK?
    Awk would work if you want. I'd personally use R with GenomicRanges, but there are many ways to skin this proverbial cat.

    Leave a comment:


  • super0925
    replied
    Originally posted by dpryan View Post
    I was talking about the ensembl ID and the uniprot ID, which both are. Ensembl IDs are easy to deal with, so just convert everything to that by using your annotation file's mappings and then use those to derive whatever other names you need. You can do that in most any language (or just load things into R as a GRanges object, in which case it'll parse things for you and you can just use unique() on a subset of the mcols()).
    How about my methods in post#105?
    "
    Use Ensembl ID which is uniformed by the genes.gtf (1)"Tophat-htseq-edgeR" or (2)"Tophat-Cuffdiff" but without Cufflinks and Cuffmerge(because Cufflinks+Cuffmerge would generate a merged.gtf whose gene name is mixture with gene name and uniprot ID as I show to you) , after then, I translate them to gene name by R package or online tools (Biomart) .
    "

    And "Tophat-htseq-edgeR/DESeq", the reads counts table is given by Ensembl ID, which I have tried.
    I also have run the "Texudo without Cufflinks and Cuffmerge" and the result could also give you the gene Ensembl ID.
    However, I miss the Cufflinks+Cuffmerge steps.
    Could you please tell me how to convert texudo output to Ensembl ID by using annotation file's mappings? using AWK?
    Last edited by super0925; 05-13-2014, 05:59 AM.

    Leave a comment:


  • dpryan
    replied
    I was talking about the ensembl ID and the uniprot ID, which both are. Ensembl IDs are easy to deal with, so just convert everything to that by using your annotation file's mappings and then use those to derive whatever other names you need. You can do that in most any language (or just load things into R as a GRanges object, in which case it'll parse things for you and you can just use unique() on a subset of the mcols()).

    Leave a comment:


  • super0925
    replied
    Originally posted by dpryan View Post
    No, just use R/biopython/bioperl/whatever to read in the GTF/GFF file that you used with cufflinks & htseq-count and then extract the ID conversions with that. Since I assume that you used the same annotation file for both tools, it must have both IDs, which means that that file can be used to define the ID mappings. Yes, this will take a modicum of programming.
    Hi Devon
    I still have two questions:
    Q1:
    The full name "prostaglandin D2 receptor" is not from my GTF file... What is strange , in gene names from GTF file, I think some use UniProt Accession(e.g. ATPO_BOVIN) but some use gene names(e.g. ITSN1 ). It made me confused.
    I have screenshot to you (My GTF file is from Ensembl Database. Bovine's genome annotation, Btau_4.0).
    Hence I don't know how to uniform them? I think the reasonable method is use Ensembl ID which is uniformed (by (1)"Tophat-htseq-edgeR" or (2)"Tophat-Cuffdiff" but without Cufflinks and Cuffmerge) , after then, I translate them to gene name by R package or online tools (Biomart) .
    Am I right? Or DO you have any better solution?
    Q2:
    The full name(e.g. "2'-5'-oligoadenylate synthetase","prostaglandin D2 receptor") is my collaborator sent to me, who is more interested with the long full name than gene name. He google the gene name or Ensembl ID and get the full name...
    Hence I am think a automatically transfer method than manually Google it.
    Or may I change another annotation file (GTF) e.g. from UCSC database rather than Ensembl?
    Thank you!
    Attached Files
    Last edited by super0925; 05-13-2014, 03:23 AM.

    Leave a comment:


  • dpryan
    replied
    No, just use R/biopython/bioperl/whatever to read in the GTF/GFF file that you used with cufflinks & htseq-count and then extract the ID conversions with that. Since I assume that you used the same annotation file for both tools, it must have both IDs, which means that that file can be used to define the ID mappings. Yes, this will take a modicum of programming.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-25-2024, 11:49 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
62 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X