Announcement

Collapse
No announcement yet.

DEGseq

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by newbietonextgen View Post
    No. I have tried both formats: giving the path to the file and then setting up the working dir and then naming the file. I am using a 64 bit R and i am nots sure if it a problem with it.

    This is how the console looks:
    >library(DEGseq)
    Loading required package: qvalue
    Loading Tcl/Tk interface
    > sample A <- "path to the file (bed.txt)"
    |

    So there was no screen message after i hit return...
    I found that you didn't use the most updated version of DEGseq.
    Please download the newest version from :
    http://bioconductor.org/packages/rel...ml/DEGseq.html

    And second, in R, variables can't have space in them; And you should tell it where is your file, but not the sentence.
    E.g.,
    Code:
    sample_A <- "/home/username/data.bed"
    Last edited by Xi Wang; 12-06-2010, 08:43 AM.
    Xi Wang

    Comment


    • Hi Xi,

      I finally figured out what the problem was with DEGseq execution. The R installation in mac does not come with the Tcl/Tk libraries. Once i down loaded it, it ran fine, as far loading all the needed libararies.

      > library(DEGseq)
      Loading required package: qvalue
      Loading Tcl/Tk interface ... done
      Loading required package: ShortRead
      Loading required package: IRanges

      Attaching package: 'IRanges'

      The following object(s) are masked from 'package:base':

      cbind, eval, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int,
      rbind, rep.int, table

      Loading required package: GenomicRanges
      Loading required package: Biostrings
      Loading required package: lattice
      Loading required package: Rsamtools
      Loading required package: samr
      Loading required package: impute

      Now i run into another problem. Please read the output below. First the mapresults don't show any path as per the example. But i am not sure if it happens in all operating systems. Further down it shows that it cannot read the input file. I am not sure about it. All i did was take a sorted BAM file and convert it to BED format using BEDtools. Does it need any other input? Any help is appreciated.


      Thnaks

      Please wait...

      mapResultBatch1:

      mapResultBatch2:

      file format: bed
      refFlat:
      Ignore the strand information when count the reads mapped to genes!
      Count the number of reads mapped to each gene ...
      This will take several minutes, please wait patiently!
      Please wait...

      does not exist!
      SampleFiles:
      Count the number of reads mapped to each gene.
      This will take several minutes.
      Please wait ...
      cannot open input file
      There is something wrong!
      Please check !
      There is something wrong!Please check...
      Error in file(file, "rt") : cannot open the connection
      In addition: Warning message:
      In file(file, "rt") :
      cannot open file '/var/folders/Bl/BlOaI4RVFYyvhEI-W+aTz++++TI/-Tmp-//RtmpuyIAOK/DEGseqExample/group1.exp': No such file or directory

      Comment


      • Originally posted by newbietonextgen View Post
        Hi Xi,

        I finally figured out what the problem was with DEGseq execution. The R installation in mac does not come with the Tcl/Tk libraries. Once i down loaded it, it ran fine, as far loading all the needed libararies.

        > library(DEGseq)
        Loading required package: qvalue
        Loading Tcl/Tk interface ... done
        Loading required package: ShortRead
        Loading required package: IRanges

        Attaching package: 'IRanges'

        The following object(s) are masked from 'package:base':

        cbind, eval, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int,
        rbind, rep.int, table

        Loading required package: GenomicRanges
        Loading required package: Biostrings
        Loading required package: lattice
        Loading required package: Rsamtools
        Loading required package: samr
        Loading required package: impute

        Now i run into another problem. Please read the output below. First the mapresults don't show any path as per the example. But i am not sure if it happens in all operating systems. Further down it shows that it cannot read the input file. I am not sure about it. All i did was take a sorted BAM file and convert it to BED format using BEDtools. Does it need any other input? Any help is appreciated.


        Thnaks

        Please wait...

        mapResultBatch1:

        mapResultBatch2:

        file format: bed
        refFlat:
        Ignore the strand information when count the reads mapped to genes!
        Count the number of reads mapped to each gene ...
        This will take several minutes, please wait patiently!
        Please wait...

        does not exist!
        SampleFiles:
        Count the number of reads mapped to each gene.
        This will take several minutes.
        Please wait ...
        cannot open input file
        There is something wrong!
        Please check !
        There is something wrong!Please check...
        Error in file(file, "rt") : cannot open the connection
        In addition: Warning message:
        In file(file, "rt") :
        cannot open file '/var/folders/Bl/BlOaI4RVFYyvhEI-W+aTz++++TI/-Tmp-//RtmpuyIAOK/DEGseqExample/group1.exp': No such file or directory
        Hi,

        Please show me your R script to run DEGseq. You can email me: [email protected] , if you don't want to put the details here.

        Thanks.
        Last edited by Xi Wang; 12-11-2010, 10:12 AM.
        Xi Wang

        Comment


        • DEGseq and expression of novel small RNAs

          Hi all!

          I´m new to the NGS business, and right now i have a lot of doubts about DE analysis.

          I have RNA-sequenced a bacterial transcriptome in 2 growth conditions, and I have 3 biological replicates for each condition:

          Condition A : Replicate 1A, Replicate 2A, Replicate 3A
          Condition B : Replicate 1B, Replicate 2B, Replicate 3B

          I have the bam an pileup files for each replicate.

          Now, my aim is compare the expression of non-annotated non-coding RNAs in my conditions A and B (so i will use a custom annotation file).

          I have read about DEGseq and i would like to use it for my DE analysis. But i have a number of questions about it:

          1. What method would suit my analysis best? I have thought of using MARS...

          2. How do I normalize my replicates? Should i use loess or median? What´s the difference between them?

          3. What is better: to pool the 3 replicates of each condition or to analyze DE without pooling them?

          4. Since my transcripts are not annotated i will have to use expression values based on raw read counts, right? Can i use the rawCount argument with the DEGseq function or is it only valid with the DEGexp function? If i use the MARS method is it automatically set to analyze raw counts?

          Thanks in advance for your help!

          Maria

          Comment


          • Originally posted by mgolo View Post
            Hi all!

            I´m new to the NGS business, and right now i have a lot of doubts about DE analysis.

            I have RNA-sequenced a bacterial transcriptome in 2 growth conditions, and I have 3 biological replicates for each condition:

            Condition A : Replicate 1A, Replicate 2A, Replicate 3A
            Condition B : Replicate 1B, Replicate 2B, Replicate 3B

            I have the bam an pileup files for each replicate.

            Now, my aim is compare the expression of non-annotated non-coding RNAs in my conditions A and B (so i will use a custom annotation file).

            I have read about DEGseq and i would like to use it for my DE analysis. But i have a number of questions about it:

            1. What method would suit my analysis best? I have thought of using MARS...

            2. How do I normalize my replicates? Should i use loess or median? What´s the difference between them?

            3. What is better: to pool the 3 replicates of each condition or to analyze DE without pooling them?

            4. Since my transcripts are not annotated i will have to use expression values based on raw read counts, right? Can i use the rawCount argument with the DEGseq function or is it only valid with the DEGexp function? If i use the MARS method is it automatically set to analyze raw counts?

            Thanks in advance for your help!

            Maria
            Hi Maria

            1&2. The methods for DEG detection and the normalization beforehand should depend on how your data distributed. You may try all of them and choose the best one.

            3. For biological replicates, it's better not to pool them together.

            4. Raw read counts have nothing to do with gene annotation. In our documents, the opposite of 'raw read counts' is RPKM vaules. For the unannotated non-RNAs, you'd better analyze the gene structure first and then the DEGs.


            Btw, we are working a new version of DEGseq, which will be more suitable for biological replicates.
            Xi Wang

            Comment


            • Originally posted by Xi Wang View Post
              Hi Maria

              1&2. The methods for DEG detection and the normalization beforehand should depend on how your data distributed. You may try all of them and choose the best one.

              3. For biological replicates, it's better not to pool them together.

              4. Raw read counts have nothing to do with gene annotation. In our documents, the opposite of 'raw read counts' is RPKM vaules. For the unannotated non-RNAs, you'd better analyze the gene structure first and then the DEGs.


              Btw, we are working a new version of DEGseq, which will be more suitable for biological replicates.
              Thanks for your reply Xi

              I'll try all the methods when i have my annotation file. But, what are the criteria to know which one is the best?

              Looking forward to your new version of DEGseq!

              Comment


              • Originally posted by mgolo View Post
                Thanks for your reply Xi

                I'll try all the methods when i have my annotation file. But, what are the criteria to know which one is the best?

                Looking forward to your new version of DEGseq!
                I think one of the most important criteria should be how the DEGs detected consist with previous knowledge, though the new findings may give novel discoveries. From the statistical point of view, the best method should guarantee that your data don't violate the assumption of the chosen method.
                Xi Wang

                Comment


                • Hi Xi,
                  My data is time course data with 6 time points but without replicate. I wonder if I can try your DEGseq.

                  If not, would you suggest some alternatively ways?

                  Thank you in advance!

                  Townway

                  Comment


                  • Originally posted by townway View Post
                    Hi Xi,
                    My data is time course data with 6 time points but without replicate. I wonder if I can try your DEGseq.

                    If not, would you suggest some alternatively ways?

                    Thank you in advance!

                    Townway
                    Sorry Townway, DEGseq is now not suitable for time series data. Please try Cufflinks (http://cufflinks.cbcb.umd.edu/) instead. Thanks.
                    Xi Wang

                    Comment


                    • DEGdseq problem

                      hi,xi
                      I have a problem using DEGseq,
                      DEGexp(geneExpMatrix1 = geneExpMatrix1, geneCol1 = 1,expCol1 = 2, groupLabel1 = "roottip",geneExpMatrix2 = geneExpMatrix2,geneCol2 = 1,expCol2 = 2,groupLabel2 = "hypocotyl",outputDir= "./roothypocoty",method = "MARS")

                      Please wait...
                      gene id column in geneExpMatrix1 for sample1: 1
                      expression value column(s) in geneExpMatrix1: 2
                      total number of reads uniquely mapped to genome obtained from sample1: 62747041
                      gene id column in geneExpMatrix2 for sample2: 1
                      expression value column(s) in geneExpMatrix2: 2
                      total number of reads uniquely mapped to genome obtained from sample2: 69469907

                      method to identify differentially expressed genes: MARS
                      pValue threshold: 0.001
                      output directory: ./roothypocoty

                      Please wait ...
                      Identifying differentially expressed genes ...
                      Please wait patiently ...
                      output ...

                      Done ...
                      The results can be observed in directory: ./roothypocoty



                      problem:


                      it can produce the file(outpuDir),but do not produce MA-plot,
                      additionaly, my two sample data do not have replicates.


                      hope you help !
                      thanks !
                      lei

                      Comment


                      • Originally posted by wangleibio View Post
                        hi,xi
                        I have a problem using DEGseq,
                        DEGexp(geneExpMatrix1 = geneExpMatrix1, geneCol1 = 1,expCol1 = 2, groupLabel1 = "roottip",geneExpMatrix2 = geneExpMatrix2,geneCol2 = 1,expCol2 = 2,groupLabel2 = "hypocotyl",outputDir= "./roothypocoty",method = "MARS")

                        Please wait...
                        gene id column in geneExpMatrix1 for sample1: 1
                        expression value column(s) in geneExpMatrix1: 2
                        total number of reads uniquely mapped to genome obtained from sample1: 62747041
                        gene id column in geneExpMatrix2 for sample2: 1
                        expression value column(s) in geneExpMatrix2: 2
                        total number of reads uniquely mapped to genome obtained from sample2: 69469907

                        method to identify differentially expressed genes: MARS
                        pValue threshold: 0.001
                        output directory: ./roothypocoty

                        Please wait ...
                        Identifying differentially expressed genes ...
                        Please wait patiently ...
                        output ...

                        Done ...
                        The results can be observed in directory: ./roothypocoty



                        problem:


                        it can produce the file(outpuDir),but do not produce MA-plot,
                        additionaly, my two sample data do not have replicates.


                        hope you help !
                        thanks !
                        lei
                        Thanks for using DEGseq.

                        To figure out your problem, please try
                        (1) Run the example provide in the help document. Simply type "?DEGexp" in the R console, and cope/paste the Examples at the end of the document. Then check if the example works properly
                        (2) Run "sessionInfo()" in R console, and paste the result here or better email to me "[email protected]"

                        Thanks.
                        Xi Wang

                        Comment


                        • DEGseq Question

                          Hello,

                          I have a question regarding DEGseq. I am not understanding the syntax of layout:
                          layout(matrix(c(1, 2, 3, 4, 5, 6), 3, 2, byrow = TRUE))

                          I am seeing my graphs but it is not interpreting anything. For my data only three rows were considered and their log fold changes were calculated. But for the remaining data, no histogram was built.

                          The first chunk of data is able to read the whole data, I think something is wrong in only fixing the layout and matrix.

                          Thanks for your help!
                          Aso

                          Comment


                          • Originally posted by AsoBioInfo View Post
                            Hello,

                            I have a question regarding DEGseq. I am not understanding the syntax of layout:
                            layout(matrix(c(1, 2, 3, 4, 5, 6), 3, 2, byrow = TRUE))

                            I am seeing my graphs but it is not interpreting anything. For my data only three rows were considered and their log fold changes were calculated. But for the remaining data, no histogram was built.

                            The first chunk of data is able to read the whole data, I think something is wrong in only fixing the layout and matrix.

                            Thanks for your help!
                            Aso

                            Dear Aso, thanks for your questions.

                            The "layout" is only related to drawing the DEGSeq output plot. Specifically, the command line means to generate a figure with 6 panels in 3 rows and 2 columns.

                            For your problem, could you copy and paste a head of your data and your command lines here? Thus I will be able to diagnose the issues. Thanks.
                            Xi Wang

                            Comment


                            • Originally posted by AsoBioInfo View Post
                              Hello,

                              I have a question regarding DEGseq. I am not understanding the syntax of layout:
                              layout(matrix(c(1, 2, 3, 4, 5, 6), 3, 2, byrow = TRUE))

                              I am seeing my graphs but it is not interpreting anything. For my data only three rows were considered and their log fold changes were calculated. But for the remaining data, no histogram was built.

                              The first chunk of data is able to read the whole data, I think something is wrong in only fixing the layout and matrix.

                              Thanks for your help!
                              Aso
                              Are you analyzing RNA-seq data? If so the overwhelming opinion of the community is that the poisson model of DEGseq is invalid and you should use edgeR or DESeq instead.
                              --------------
                              Ethan

                              Comment


                              • Originally posted by Xi Wang View Post
                                Dear Aso, thanks for your questions.

                                The "layout" is only related to drawing the DEGSeq output plot. Specifically, the command line means to generate a figure with 6 panels in 3 rows and 2 columns.

                                For your problem, could you copy and paste a head of your data and your command lines here? Thus I will be able to diagnose the issues. Thanks.

                                Thanks Xi for your reply!

                                The output score data looks like this:
                                "GeneNames" "value1" "value2" "log2(Fold_change)"
                                00000000000000 6 10 -0.736 -0.643
                                11111111111111 68 69 -0.02 0.072
                                22222222222222 1 1 0 0.095
                                33333333333333 NA NA NA NA NA NA NA NA FALSE
                                44444444444444 NA NA NA NA NA NA NA NA FALSE

                                Note: There are other scores also.

                                The fold change is calculated for only three rows. Although the matrix is having all values since it is giving output the whole matrix. The commands I used are:

                                -> library(DEGseq)
                                geneExpFile <- "D:/data/MyData.txt"
                                geneExpMatrix1 <- readGeneExp(file=geneExpFile, geneCol=1, valCol=c(7,9,11))
                                geneExpMatrix2 <- readGeneExp(file=geneExpFile, geneCol=1, valCol=c(8,10,12))
                                write.table(geneExpMatrix1[1:13,],row.names=FALSE)
                                write.table(geneExpMatrix2[1:13,],row.names=FALSE)

                                -> layout(matrix(c(1,2,3,4,5,6), 3, 2, byrow=TRUE))
                                par(mar=c(2, 2, 2, 2))
                                DEGexp(geneExpMatrix1=geneExpMatrix1, geneCol1=1, expCol1=c(2,3,4,5,6), groupLabel1="Label1",
                                geneExpMatrix2=geneExpMatrix2, geneCol2=1, expCol2=c(2,3,4,5,6), groupLabel2="Label2",
                                method="MARS")

                                Hope this helps!

                                Thanks!

                                Comment

                                Working...
                                X