Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Originally posted by m!x View Post
    Hi,

    I am trying to use samWrapper to analyze my RNA-seq data on Mac OS X.
    Is there a simple way to specify the path to the files?

    I noticed that you can use the following on Windows:
    >geneExpFile <- "D:/data/sample1.txt"

    Thanks!
    It's not very difficult. For example:
    Code:
    >geneExpFile <- "/PATH/TO/FILE"
    but you should know where you file is. maybe "pwd" command can help you.
    Xi Wang

    Comment


    • #47
      Hi,
      I wanted to use DEGseq to identify differentially expressed genes between wildtype and mutant experiments. I have 4 datasets for each condition (WT,mutant) : 2 biological replicates and for each of them 2 technical replicates.

      First I tried the method "CTR" to check the variation between the replicates, but I couldn't find a code example for this method in package material. On the last output plot produced by the method "CTR" one can see the difference between the standard deviation of M according to the RSM and the theoretical four-fold local standard deviation of M by the comparison of technical replicates. But what does it mean when there is a distance between these to lines (read and blue)? Can I use the method "MATR" which is based on technical replicates anyway?

      Because I have these 4 datasets per condition, I wanted to uses them in the correct way, not simple adding the raw counts of each gene. Which method would you propose in this case and how should the correct code of the function DEGexp(..?..)
      look like?

      Many thanks for your help!
      Steffen

      Comment


      • #48
        Hi Steffen,

        Thanks for using DEGseq.

        First I tried the method "CTR" to check the variation between the replicates, but I couldn't find a code example for this method in package material.
        Actually, the "CTR" expample code is similar to the example for DEGexp function. The only modification is to specify: method="CTR". So, a code example is:
        Code:
        DEGexp(geneExpFile1=geneExpFile, geneCol1=1, expCol1=2, groupLabel1="R1", geneExpFile2=geneExpFile, geneCol2=1, expCol2=3, groupLabel2="R2",
        method="CTR", outputDir=outputDir)
        Note: geneExpFile contains the expression values for the two replicates, where gene names are listed in column 1, expresssion values for replicate 1listed in column 2, and expresssion values for replicate 2 listed in column 3.


        On the last output plot produced by the method "CTR" one can see the difference between the standard deviation of M according to the RSM and the theoretical four-fold local standard deviation of M by the comparison of technical replicates. But what does it mean when there is a distance between these to lines (read and blue)? Can I use the method "MATR" which is based on technical replicates anyway?
        This phenomenon means the two replicates do not match well. Yes, you can use MATR method anyway. Besides, you may also use other methods to get the corresponding results. An extra validation step should be done (if feasible) and then you can jude which method is better.

        Because I have these 4 datasets per condition, I wanted to uses them in the correct way, not simple adding the raw counts of each gene. Which method would you propose in this case and how should the correct code of the function DEGexp(..?..)
        look like?
        There is another function "samWrapper" between two samples with biological replicates. You can try this on your 4 datasets. But, theoretically, the technical replicates cannot be treated as biological replicates.
        Xi Wang

        Comment


        • #49
          Originally posted by svl View Post
          I seem to be unable to install the package...anyone had succes?
          ----
          source("http://bioconductor.org/biocLite.R")
          biocLite("DEGseq")
          ----

          Also their site is unavailable: http://bioinfo.au.tsinghua.edu.cn/software/degseq
          Here is the page at Bioc: http://www.bioconductor.org/packages...ml/DEGseq.html
          Thanks for your information! It`s usefull!
          Harbin Institute of Technology

          Comment


          • #50
            I successfully ran the SamWrapper, but would appreciate further explanation about the output columns.

            What do the "numerator" and "denominator" columns represent?
            How do I tell whether the genes are upregulated or downregulated?
            eg. if I set the min. foldchange = 2, will this include the genes that are downregulated by 2 times?

            I also got different result when I used "filtered" data as an input.
            First, I used "getGeneExp" to get the raw read counts for each gene.
            From the output, I removed all genes that have less than 2 reads. Then, I filtered for the genes that are common between the biological replicates for each sample.
            I used this filtered set as the input for samWrapper.
            Is this a valid approach?
            Or, should I have used the unfiltered set?

            Thank you!

            Comment


            • #51
              Originally posted by m!x View Post
              I successfully ran the SamWrapper, but would appreciate further explanation about the output columns.

              What do the "numerator" and "denominator" columns represent?
              How do I tell whether the genes are upregulated or downregulated?
              eg. if I set the min. foldchange = 2, will this include the genes that are downregulated by 2 times?

              I also got different result when I used "filtered" data as an input.
              First, I used "getGeneExp" to get the raw read counts for each gene.
              From the output, I removed all genes that have less than 2 reads. Then, I filtered for the genes that are common between the biological replicates for each sample.
              I used this filtered set as the input for samWrapper.
              Is this a valid approach?
              Or, should I have used the unfiltered set?

              Thank you!
              Hi,

              For samWrapper function, the output file contains some columns related to the T statistic, such as score(d) for the T-statistic value, numerator(r) for the numerator of the T-statistic, and denominator(s + s0) for the denominator of the T-statistic. For more details, please find in the Section 12.2 of sam manual.


              The Signature column indicates each gene is differentially expressed or not. If it is a "TURE", the gene can be either upregulated or downregulated. The default foldchange = 2 including both cases. I.e., including foldchange > 2 and foldchange < 0.5. You validate it easily by comparing the two columns "Fold Change" and "Signature".

              What do you mean by "the genes that are common"? I think it si no need to do this filtering. Also, you can compare the results, and pick up the difference (genes appear in either results), and analyze what cause the difference. If you could show me an example, I may give you some clue. Thanks.
              Xi Wang

              Comment


              • #52
                Hi Xi,

                Thanks for the explanation about the foldchange.

                The weird thing about my data is that all the downregulated genes has at least q-value of 0.5, so they are counted as "FALSE" in the signature column. I set the max qValue = 0.1.

                When I just tried running the data with DEGexp and MARS method (setting p value = 0.001), I get the genes that are downregulated to have very low p-value and q-values.
                I believe the calculation of the q-values in samWrapper and DEGexp is different. But, how can it be so different for the downregulated genes in my dataset?

                I understand that it's best to use samWrapper for biological replicates (as in my case).
                If I may know, what's the fundamental difference between using DEGexp and samWrapper for biological replicates?
                If the fold-change in samWrapper is not calculated by summing the raw read counts, how then the comparison is made?

                Below are the lines that I used for the samWrapper analysis (3 biological replicates and unpaired data).

                > geneExpFile <- "/Users/lilyanamargaretha/Desktop/ICMgene_filtered.exp"
                > set.seed(100)
                > geneExpFile1 <- geneExpFile
                > geneExpFile2 <- geneExpFile
                > output <- "/Users/lilyanamargaretha/Desktop/samWrapperOutICM.txt"
                > expCol1 = c(2, 5, 8)
                > expCol2 = c(11, 14, 17)
                > measure1 = rep(1, length(expCol1))
                > measure2 = rep(2, length(expCol2))

                > samWrapper(geneExpFile1 = geneExpFile1, geneCol1 = 1, expCol1 = expCol1, measure1 = measure1, geneExpFile2 = geneExpFile2, geneCol2 = 1, expCol2 = expCol2, measure2 = measure2, nperms = 100, min.foldchange = 2, max.qValue = 0.1, logged2=FALSE, output = output, paired = FALSE)



                Thanks for the answering the questions.

                Comment


                • #53
                  Originally posted by m!x View Post
                  Hi Xi,

                  Thanks for the explanation about the foldchange.

                  The weird thing about my data is that all the downregulated genes has at least q-value of 0.5, so they are counted as "FALSE" in the signature column. I set the max qValue = 0.1.

                  When I just tried running the data with DEGexp and MARS method (setting p value = 0.001), I get the genes that are downregulated to have very low p-value and q-values.
                  I believe the calculation of the q-values in samWrapper and DEGexp is different. But, how can it be so different for the downregulated genes in my dataset?

                  I understand that it's best to use samWrapper for biological replicates (as in my case).
                  If I may know, what's the fundamental difference between using DEGexp and samWrapper for biological replicates?
                  If the fold-change in samWrapper is not calculated by summing the raw read counts, how then the comparison is made?

                  Below are the lines that I used for the samWrapper analysis (3 biological replicates and unpaired data).

                  > geneExpFile <- "/Users/lilyanamargaretha/Desktop/ICMgene_filtered.exp"
                  > set.seed(100)
                  > geneExpFile1 <- geneExpFile
                  > geneExpFile2 <- geneExpFile
                  > output <- "/Users/lilyanamargaretha/Desktop/samWrapperOutICM.txt"
                  > expCol1 = c(2, 5, 8)
                  > expCol2 = c(11, 14, 17)
                  > measure1 = rep(1, length(expCol1))
                  > measure2 = rep(2, length(expCol2))

                  > samWrapper(geneExpFile1 = geneExpFile1, geneCol1 = 1, expCol1 = expCol1, measure1 = measure1, geneExpFile2 = geneExpFile2, geneCol2 = 1, expCol2 = expCol2, measure2 = measure2, nperms = 100, min.foldchange = 2, max.qValue = 0.1, logged2=FALSE, output = output, paired = FALSE)



                  Thanks for the answering the questions.
                  Hi,

                  Sorry for the late reply.

                  1. By downregulated genes, did you mean the genes have "Fold Change“ value less than 0.5?

                  2. In samWrapper, the fold-change is calculated based on the average expression values from biological replicates. And with other statistical metric to determine a gene is differentially expressed or not. If you are interested in the methods, the SAM paper helps.
                  Microarrays can measure the expression of thousands of genes to identify changes in expression between different biological states. Methods are needed to determine the significance of these changes while accounting for the enormous number of genes. We describe a method, Significance Analysis of Micr …


                  Many thanks for pointing it out.
                  Last edited by Xi Wang; 03-18-2010, 01:03 AM. Reason: No bug found
                  Xi Wang

                  Comment


                  • #54
                    hi m!x,

                    We investigated the results given by samWrapper on our test data and as well samWrapper code carefully, and don't find a bug. I said we found some similar results days ago, but the results were not that extreme as yours. We just set the q-value cutoff too stringent. In your case, as you only have 3 biological replicates, the q-value obtained by permutation could not have a high resolution.

                    However, I still felt weird about your results. I noticed that you used the raw read counts as the input for samWrapper. This maybe the reson. Read counts are not normalized against the sequencing depth. If the sequencing depth is quite different from the two samples, say the total read counts of sample B is as twice as that of sample A, the results could be lots of genes have a higher expression level in sample B than sample A. We included a normalization step in function DEGexp, so the results given by DEGexp are normal. I think the difference between the two results doesn't have much to do with the calculation of q-values.

                    I suggest you use RPKM to rerun samWrapper. It will give more reasonable and promising results.

                    Hope this helps.
                    Xi Wang

                    Comment


                    • #55
                      Hi Xi,

                      Thank you for the reply. Yes, you are right about the sequencing depth.
                      They can vary up to 2 times in my biological replicates.

                      Just to make sure I will do the right thing...here's what I should do with my samples:
                      1. Find the RPKM values generated in the getGeneExp function
                      2. Use the DEGexp with MARS method and set the p-value cutoff.

                      As you've said before about using the SAMWrapper, I assumed then the permutation method included in the SAMWrapper method only works well for the data that have more biological replicates?

                      Again, the same question as I asked before: for this DEGExp method, how is the fold-change calculated here?

                      Thanks a lot once again!

                      Comment


                      • #56
                        Originally posted by m!x View Post
                        Hi Xi,

                        Thank you for the reply. Yes, you are right about the sequencing depth.
                        They can vary up to 2 times in my biological replicates.

                        Just to make sure I will do the right thing...here's what I should do with my samples:
                        1. Find the RPKM values generated in the getGeneExp function
                        2. Use the DEGexp with MARS method and set the p-value cutoff.

                        As you've said before about using the SAMWrapper, I assumed then the permutation method included in the SAMWrapper method only works well for the data that have more biological replicates?

                        Again, the same question as I asked before: for this DEGExp method, how is the fold-change calculated here?

                        Thanks a lot once again!
                        Hi,

                        First, I need to clarify here that (1) for DEGseq or DEGexp function, it is recommended to take the raw read counts as gene expression values; but (2) for samWrapper, it is however only RPKM can be used, as raw read counts don't normalize against the sequencing depth.


                        Actually, for DEGseq and GEGexp functions, there are two types of fold-change values in the recent version of DEGseq. The output are both log2 values. The one labeled by "log2(Fold_change)" is calculated as follows:
                        Code:
                        log2 ( EXP_G_A / EXP_G_B )
                        The other one labeled by "log2(Fold_change) normalized" is defined as:
                        Code:
                        log2 [ (EXP_G_A/T.EXP_A) / (EXP_G_B/T.EXP_B) ]
                        where EXP_G_A stands the expression value for gene G in sample A, T.EXP_A stands the total expression values (sum up the expression values for all genes) in sample A; similar for sample B.
                        The two types of fold-change values are different from each other in whether or not normalizing against the total expression values. "log2(Fold_change)" is suitable for RPKM expression values, while "log2(Fold_change) normalized" for raw read counts.

                        It is true that the q-value estimate is not that precise if the number of biological replicates is small. But this method still works. If you have more repliactes, the estimated q-vaules are more reliable.
                        Last edited by Xi Wang; 03-21-2010, 01:20 AM. Reason: Change the description for fold-change for DEGseq and DEGexp functions.
                        Xi Wang

                        Comment


                        • #57
                          unused arguments error (DEGseq)

                          I tried DEGexp in R with example data, but got error "Error in DEGexp(geneExpMatrix1 = geneExpMatrix1, geneCol1 = 1, expCol1 = c(2, :
                          unused argument(s) (geneExpMatrix1 = geneExpMatrix1, geneExpMatrix2 = geneExpMatrix2)
                          "

                          The CMD

                          > geneExpFile <- "/home/Bio1/MMi/RNASeq/Example1000.txt"
                          > geneExpMatrix2 <- readGeneExp (file=geneExpFile, geneCol=1,valCol =c(7,9,10,12))
                          > geneExpMatrix1 <- readGeneExp (file=geneExpFile, geneCol=1,valCol =c(6,8,11,14))
                          > DEGexp(geneExpMatrix1=geneExpMatrix1, geneCol1=1, expCol1=c(2,3,4,5), groupLabel1="WT_uni", geneExpMatrix2=geneExpMatrix2, geneCol2=1, expCol2=c(2,3,4,5), groupLabel2="TR_uni",method= "MARS")
                          Error in DEGexp(geneExpMatrix1 = geneExpMatrix1, geneCol1 = 1, expCol1 = c(2, :
                          unused argument(s) (geneExpMatrix1 = geneExpMatrix1, geneExpMatrix2 = geneExpMatrix2)
                          ################

                          Did anyone face the same problem?

                          > sessionInfo()
                          R version 2.10.1 (2009-12-14)
                          x86_64-unknown-linux-gnu

                          locale:
                          [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
                          [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
                          [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8
                          [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
                          [9] LC_ADDRESS=C LC_TELEPHONE=C
                          [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

                          attached base packages:
                          [1] tcltk stats graphics grDevices utils datasets methods
                          [8] base

                          other attached packages:
                          [1] DEGseq_1.0.0 samr_1.28 impute_1.20.0 qvalue_1.20.0


                          cheers

                          Comment


                          • #58
                            Originally posted by middlemale View Post
                            I tried DEGexp in R with example data, but got error "Error in DEGexp(geneExpMatrix1 = geneExpMatrix1, geneCol1 = 1, expCol1 = c(2, :
                            unused argument(s) (geneExpMatrix1 = geneExpMatrix1, geneExpMatrix2 = geneExpMatrix2)
                            "

                            The CMD

                            > geneExpFile <- "/home/Bio1/MMi/RNASeq/Example1000.txt"
                            > geneExpMatrix2 <- readGeneExp (file=geneExpFile, geneCol=1,valCol =c(7,9,10,12))
                            > geneExpMatrix1 <- readGeneExp (file=geneExpFile, geneCol=1,valCol =c(6,8,11,14))
                            > DEGexp(geneExpMatrix1=geneExpMatrix1, geneCol1=1, expCol1=c(2,3,4,5), groupLabel1="WT_uni", geneExpMatrix2=geneExpMatrix2, geneCol2=1, expCol2=c(2,3,4,5), groupLabel2="TR_uni",method= "MARS")
                            Error in DEGexp(geneExpMatrix1 = geneExpMatrix1, geneCol1 = 1, expCol1 = c(2, :
                            unused argument(s) (geneExpMatrix1 = geneExpMatrix1, geneExpMatrix2 = geneExpMatrix2)
                            ################

                            Did anyone face the same problem?

                            > sessionInfo()
                            R version 2.10.1 (2009-12-14)
                            x86_64-unknown-linux-gnu

                            locale:
                            [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
                            [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
                            [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8
                            [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
                            [9] LC_ADDRESS=C LC_TELEPHONE=C
                            [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

                            attached base packages:
                            [1] tcltk stats graphics grDevices utils datasets methods
                            [8] base

                            other attached packages:
                            [1] DEGseq_1.0.0 samr_1.28 impute_1.20.0 qvalue_1.20.0


                            cheers
                            Hi,
                            Thanks for using DEGseq for RNA-seq differential expression analysis.
                            I would like to notice you that DEGseq has a newer release with version number 1.0.5. You can download it through


                            The arguments are changed a lot for the using convenience. Please refer to the newest manual for details.

                            Cheers.
                            Xi Wang

                            Comment


                            • #59
                              thanks xi, I updated R (2.10.1) and DEGseq now , but the errors are the same
                              "Error in DEGexp(geneExpMatrix1 = geneExpMatrix1, geneCol1 = 1, expCol1 = c(2, :
                              unused argument(s) (geneExpMatrix1 = geneExpMatrix1, geneExpMatrix2 = geneExpMatrix2)".

                              another point is example file contains column "ExternalID" would introduce error
                              "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
                              line 5 did not have 20 elements " . after deleting this column, that is fine.

                              any further suggestion?

                              Comment


                              • #60
                                Hi middlemale,

                                Did you installed the release version of DEGseq follow this link http://www.bioconductor.org/packages...ml/DEGseq.html ?

                                If so, please also use the examples in the manual of the release version, which can be downloaded at:


                                Hope this helps. Further questions, let me know.
                                Xi Wang

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin


                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  [Article Coming Soon!]...
                                  Today, 08:07 AM
                                • seqadmin
                                  Recent Developments in Metagenomics
                                  by seqadmin





                                  Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                                  09-23-2024, 06:35 AM
                                • seqadmin
                                  Understanding Genetic Influence on Infectious Disease
                                  by seqadmin




                                  During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                                  Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                                  09-09-2024, 10:59 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 10-02-2024, 04:51 AM
                                0 responses
                                13 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-01-2024, 07:10 AM
                                0 responses
                                23 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 09-30-2024, 08:33 AM
                                1 response
                                29 views
                                0 likes
                                Last Post EmiTom
                                by EmiTom
                                 
                                Started by seqadmin, 09-26-2024, 12:57 PM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X