Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Xi Wang
    Senior Member
    • Oct 2009
    • 317

    Originally posted by Sol View Post
    but, o z-score, is based on what data? and what is difference between q-value and p-value. I don't understand.
    Thanks
    Z-score is also based on your input data. We assume that most of genes are not differentially expressed. Please refer to our DEGseq paper's supplementary material: http://bioinformatics.oxfordjournals...28-File001.pdf
    Search "Z-score" for details.

    q-value is a kind of corrected p-value for multiple testing. Please refer to Section 2.3 of our DEGseq paper:
    "2.3 Multiple testing correction
    For the above methods, the P-values calculated for each gene are adjusted to Q-values for multiple testing corrections by two alternative strategies (Benjamini and Hochberg, 1995; Storey and Tibshirani, 2003). Users can set either a P-value or a false discovery rate (FDR) threshold to identify differentially expressed genes.
    "

    If it is still unclear, please let me know. Thanks.
    Xi Wang

    Comment

    • Sol
      Member
      • Oct 2010
      • 13

      Thanks

      and the RPKM. how to normalize the data?
      I divide the number of reads by the size of the gene and divide by all the reads?
      How it is calculated
      thanks

      Comment

      • Xi Wang
        Senior Member
        • Oct 2009
        • 317

        Originally posted by Sol View Post
        Thanks

        and the RPKM. how to normalize the data?
        I divide the number of reads by the size of the gene and divide by all the reads?
        How it is calculated
        thanks
        Actually, we recommand the users feed the raw read counts (that is the number of reads falling in a gene's exonic region) to DEGseq. DEGseq will normalize the data according the sequencing depth for each sample.
        Xi Wang

        Comment

        • Sol
          Member
          • Oct 2010
          • 13

          but, what is the mean RPKM?
          thanks

          Comment

          • Xi Wang
            Senior Member
            • Oct 2009
            • 317

            Originally posted by Sol View Post
            but, what is the mean RPKM?
            thanks
            The normalized read count for a region (say a gene or an exon), against the region length (measured by kilo-base) and the sequencing depth (measured by million reads). So RPKM is short for Reads Per Kilo-base per Million reads.
            Xi Wang

            Comment

            • Sol
              Member
              • Oct 2010
              • 13

              What is the function of the MA plot the fold change??
              The graph shows the relationship in the MA-plot, no??
              thanks

              Comment

              • Sol
                Member
                • Oct 2010
                • 13

                the results of the DEGseq, already can be used directly for analysis or should i make some other standardization
                thanks

                Comment

                • Xi Wang
                  Senior Member
                  • Oct 2009
                  • 317

                  Originally posted by Sol View Post
                  What is the function of the MA plot the fold change??
                  The graph shows the relationship in the MA-plot, no??
                  thanks
                  I am not very clear what you want to know by asking these questions.
                  For detailed questions, I prefer you sent me emails: [email protected]. I will give you more rapid replies regarding DEGseq. Thanks.
                  Xi Wang

                  Comment

                  • Xi Wang
                    Senior Member
                    • Oct 2009
                    • 317

                    Originally posted by Sol View Post
                    the results of the DEGseq, already can be used directly for analysis or should i make some other standardization
                    thanks
                    The answer is yes. You can apply the results to function analysis, say GO enrichment analysis. Alternatively, you can refer to GOseq, which takes into account the gene length bias.
                    Xi Wang

                    Comment

                    • Marisa_Miller
                      Member
                      • Aug 2010
                      • 34

                      Hello,
                      I am new to R, and would like to use your DEGseq package to identify differentially expressed genes between my libraries. In my case I have 2 libraries to compare. I have calculated the RPKM's using a program written by a member of my lab, an example of how the file looks is shown here:

                      Chr Gene Start End Gene_len Reads RPKM Log2(RPKM)
                      1 AT1G01010.1 3631 5899 1688 45 1.58899 0.668107
                      1 AT1G01020.1 5928 8737 1623 104.73 3.84621 1.94344
                      1 AT1G01020.2 6790 8737 1085 72.2697 3.97015 1.98919
                      1 AT1G01030.1 11649 13714 1905 78 2.44051 1.28718
                      1 AT1G01040.1 23146 31227 6251 1159 11.0513 3.46615
                      1 AT1G01046.1 28500 28706 207 4 1.15178 0.203866
                      1 AT1G01050.1 31170 33153 976 2186 133.5 7.06069

                      I am unsure which part of your package to use (I think DEGexp?) to analyze the data. Also, if you could help me with what method to use (i.e. LRT, MATR, etc...).

                      I have read through the examples on the usage of the package, but am still unsure.

                      Thank you in advance for your help

                      Comment

                      • Xi Wang
                        Senior Member
                        • Oct 2009
                        • 317

                        Hello Marisa,

                        Thanks for using DEGseq. You may use the function DEGexp to detect differentially expressed genes, and the read counts (the 6th column of your file) are recommanded to feed to DEGexp. Details can be found in our Bioinformatics paper. There are slight difference between the method LRT, FET and MARS, of which the MARS method was proposed by us based on the M-A plot using a normal distribution approximation.

                        hope this helps.
                        Xi Wang

                        Comment

                        • Marisa_Miller
                          Member
                          • Aug 2010
                          • 34

                          Hi and thanks for the reply! I have decided to use the MARS method after reading through your paper. The one thing I an confused about is the actual usage of R. When reading through your manual I am still confused about the actual commands used to run DEGexp. Below is the usage you have listed in the manual for DEGexp2 (which I think I need to use since I have two different input files).

                          geneExpFile <- system.file("extdata", "GeneExpExample5000.txt", package="DEGseq")
                          outputDir <- file.path(tempdir(), "DEGexpExample")
                          exp <- readGeneExp(file=geneExpFile, geneCol=1, valCol=c(7,9,12,15,18))
                          exp[30:35,]
                          exp <- readGeneExp(file=geneExpFile, geneCol=1, valCol=c(8,10,11,13,16))
                          exp[30:35,]
                          DEGexp2(geneExpFile1=geneExpFile, geneCol1=1, expCol1=c(7,9,12,15,18), groupLabel1="kidney", geneExpFile2=geneExpFile, geneCol2=1, expCol2=c(8,10,11,13,16), groupLabel2="liver", method="MARS", outputDir=outputDir)
                          cat("outputDir:", outputDir, "\n")

                          Questions:
                          1) Is each command entered on a separate line? It is unclear where the line breaks are.
                          2)I am unsure which parts of the example usage listed above I need to change for my specific case. Obviously I need to specify the correct file paths and names etc...
                          3)Since I am using two separate files where do I specify this in the commands above? I can't tell from the example commands where to do this.
                          4) Where can I enter a q-value threshold?
                          5)I tried to highlight in bold parts of the example I do not understand the meaning of or could not find an explanation of in the manual.

                          The example of my input file is in the previous post.
                          Basically, my problem is with the usage of R. If you could help me by indicating how I can apply the example usage to my files that would be great.

                          Thank you,
                          Marisa

                          Comment

                          • Xi Wang
                            Senior Member
                            • Oct 2009
                            • 317

                            Hi Marisa,

                            Thanks for your questions.

                            1) please refer to this site: http://www.bioconductor.org/packages...t/doc/DEGseq.R
                            "exp[30:35,]" is just used for display the values of the matrix "exp" in lines 30-35

                            2) yes, you need to specify your files, and the column for gene names (geneCol=?), the columns for gene expression values (valCol=??), etc..

                            3) Please pay attention to the parts in bold
                            Code:
                            DEGexp2([B]geneExpFile1="your_gene_exp_file_1"[/B], geneCol1=1, expCol1=c(7,9,12,15,18), groupLabel1="kidney", [B]geneExpFile2="your_gene_exp_file_2"[/B],  geneCol2=1, expCol2=c(8,10,11,13,16), groupLabel2="liver", method="MARS", outputDir=outputDir)
                            4) do it like this:
                            Code:
                            DEGexp2(geneExpFile1="your_gene_exp_file_1", geneCol1=1, expCol1=c(7,9,12,15,18),geneExpFile2="your_gene_exp_file_2",  geneCol2=1, expCol2=c(8,10,11,13,16), [B]thresholdKind=3, qValue=1e-3[/B], method="MARS", outputDir=outputDir)
                            5) ignore the bold words in the first two lines, they are just for this example.
                            valCol stands for which column contains the (expression) value you want to analyze. For you case, you may set "valCol=6".
                            Last edited by Xi Wang; 11-08-2010, 11:23 PM.
                            Xi Wang

                            Comment

                            • Marisa_Miller
                              Member
                              • Aug 2010
                              • 34

                              Thank you so much! That solved my problems!

                              Comment

                              • anamaretti
                                Junior Member
                                • Sep 2010
                                • 2

                                Hello

                                I'm using samWrapper to do some statistical analysis in my samples.
                                I have 2 groups, each one with 5 biological replicates.
                                However, I'm having some weird results.
                                Even if all the samples don't show any read to some genes, some times these genes are included in the list of genes with difference in gene expression (Signature =TRUE).

                                The parameters that I used are:
                                Value are in RPKM
                                nperm= 1000
                                min.fold-change=2
                                max.qValue=1e-04
                                paired=FALSE

                                Should I include some restriction term to avoid that??

                                By the way, the seed standard value is 100???
                                Is there some benefit if I modify it?

                                Thanks for the help and for this program, it is great!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Pathogen Surveillance with Advanced Genomic Tools
                                  by seqadmin




                                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                  03-24-2025, 11:48 AM
                                • seqadmin
                                  New Genomics Tools and Methods Shared at AGBT 2025
                                  by seqadmin


                                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                  The Headliner
                                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                  03-03-2025, 01:39 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-20-2025, 05:03 AM
                                0 responses
                                42 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-19-2025, 07:27 AM
                                0 responses
                                53 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-18-2025, 12:50 PM
                                0 responses
                                38 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-03-2025, 01:15 PM
                                0 responses
                                194 views
                                0 reactions
                                Last Post seqadmin  
                                Working...