Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

DEGseq

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    Hi there,

    I've a problem with DEGseq command.... I'm using solid data and I have prepared all files....
    I have this error

    > DEGseq(MAP_SANI, MAP_NT, fileFormat = "bed",refFlat = REFFLAT, outputDir = outputDir, method = "LRT")
    Please wait...

    mapResultBatch1:
    /local/database/HUMAN_GENOME_GRCh37/PROJECTS/TOMANIN_RNA/RESULTS/BED_FORMAT/sani_BED.txt
    mapResultBatch2:
    /local/database/HUMAN_GENOME_GRCh37/PROJECTS/TOMANIN_RNA/RESULTS/BED_FORMAT/NT_BED.txt
    file format: bed
    refFlat: /home/alex/PROGETTI/TOMANIN_RESULT/PROVA_DAG/Total_human_annotation.ucsc
    Ignore the strand information when count the reads mapped to genes!
    Count the number of reads mapped to each gene ...
    This will take several minutes, please wait patiently!
    Please wait...

    SampleFiles:
    /local/database/HUMAN_GENOME_GRCh37/PROJECTS/TOMANIN_RNA/RESULTS/BED_FORMAT/sani_BED.txt
    Count the number of reads mapped to each gene.
    This will take several minutes.
    Please wait ...

    *** caught segfault ***
    address 0xfffffffffffffff8, cause 'memory not mapped'

    Traceback:
    1: .C(".getGeneExp", as.character(refFlat), as.character(mapResultBatch), as.integer(length(mapResultBatch)), as.character(output), as.character(fileFormat), as.integer(readLength), as.integer(needStrand), as.numeric(min.overlapPercent))
    2: getGeneExp(mapResultBatch1, fileFormat, readLength, strandInfo, refFlat, GeneExp1)
    3: DEGseq(MAP_SANI, MAP_NT, fileFormat = "bed", refFlat = REFFLAT, outputDir = outputDir, method = "LRT")

    Possible actions:
    1: abort (with core dump, if enabled)
    2: normal R exit
    3: exit R without saving workspace
    4: exit R saving workspace
    Selection: 1
    aborting ...
    Segmentation fault

    Can someone help me ?!?

    Alessandro

    Comment


    • #92
      Dear DEGseq users and DEGseq potential users,

      Thank you for taking into account of using DEGseq. As the update of R platform, DEGseq also updates to its version 1.2.2, which changed quite a few from its version 1.1.x. Here, I’d like to suggest the users (1) download the most recent version with its “reference manual”, (2) following the examples in the manual, apply DEGseq’s functions to the test data, (3) replace the data set, and apply DEGseq to your own data. This procedure also gives a rapid way to diagnose commonly encountered mistakes: if step (2) works well and (3) not, please check the formats of the input files by comparing your files with the test files.

      For further bugs or questions regarding the programming, please send emails to my colleague Likun Wang via [email protected] in a more direct and rapid manner. For further scientific questions or more general questions, you may leave messages here seeking for helps from the community.

      Thanks again for your attention on DEGseq.
      Xi Wang

      Comment


      • #93
        Hi,

        Is the output for number of reads in each from getGeneExp function specific to the exons only? I used the refFlat format for the gene annotation.

        Thanks!

        Comment


        • #94
          Originally posted by m!x View Post
          Hi,

          Is the output for number of reads in each from getGeneExp function specific to the exons only? I used the refFlat format for the gene annotation.

          Thanks!
          Yes, only reads in exons are counted. If you want to count reads in introns, make your own refFlat files.
          Xi Wang

          Comment


          • #95
            Thanks for the quick reply. I tried to put the results of gene and number of reads in the UCSC custom track, and found out that there are a lot of reads that are mapping to 3' UTR when I viewed the track in genome browser. This becomes the problem when I started to do the subsequent statistical analysis.
            There are a lot of genes that are said to be significantly induced/repressed, but actually most of the reads are in the 3' UTR, and almost none of the reads are in the exons.

            Have you found this kind of phenomenon before? What do you suggest to correct this?

            Thanks!

            Comment


            • #96
              Originally posted by m!x View Post
              Thanks for the quick reply. I tried to put the results of gene and number of reads in the UCSC custom track, and found out that there are a lot of reads that are mapping to 3' UTR when I viewed the track in genome browser. This becomes the problem when I started to do the subsequent statistical analysis.
              There are a lot of genes that are said to be significantly induced/repressed, but actually most of the reads are in the 3' UTR, and almost none of the reads are in the exons.

              Have you found this kind of phenomenon before? What do you suggest to correct this?

              Thanks!
              Yes, we have observed the similar phenomenon in our data, that in the 3'UTR the read coverage was significantly higher than other regions. We think this non-uniform read distribution is largely due to sample degradation during the library preparation.

              As we don't have a biology backgroud, we are wondering whether this explanation is right or not. Only you know what caused this phenomenon, you can correct this distribution bias correctly.

              We are also seeking for biology guys to explain the reasons. Thanks!
              Xi Wang

              Comment


              • #97
                Dear WANG Xi,

                I am running into troubles using DEGseq. Admittedly, I am a novice to analyzing this kind of data and using R---in fact I only loaded R just yesterday in order to run your DEGseq package.

                I am trying to compare three separate sets of 454 data (from 3 stages of development). When we get the data back from our company in Shanghai, it is complete with the annotations and the reads are already counted. There are no replicates (just 3 samples). So, I think I can simply use DEGexp with the MARS method.

                I have followed your recent suggestions to "(1) download the most recent version with its “reference manual”, (2) following the examples in the manual, apply DEGseq’s functions to the test data, (3) replace the data set, and apply DEGseq to your own data." I can successfully run the test data without problem. I can also replace the data with my own data and I can successfully show the filepath to my data (so I know I have entered it correctly). However, when I try to set up the geneExpMatrix, I run into problems. I always get an error that says :
                "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
                line 10 did not have 4 elements"

                I've tried everything I can think of and I cannot seem to remedy this problem. Any suggestion?

                Thank you!

                Comment


                • #98
                  Originally posted by sma View Post
                  Dear WANG Xi,

                  I am running into troubles using DEGseq. Admittedly, I am a novice to analyzing this kind of data and using R---in fact I only loaded R just yesterday in order to run your DEGseq package.

                  I am trying to compare three separate sets of 454 data (from 3 stages of development). When we get the data back from our company in Shanghai, it is complete with the annotations and the reads are already counted. There are no replicates (just 3 samples). So, I think I can simply use DEGexp with the MARS method.

                  I have followed your recent suggestions to "(1) download the most recent version with its “reference manual”, (2) following the examples in the manual, apply DEGseq’s functions to the test data, (3) replace the data set, and apply DEGseq to your own data." I can successfully run the test data without problem. I can also replace the data with my own data and I can successfully show the filepath to my data (so I know I have entered it correctly). However, when I try to set up the geneExpMatrix, I run into problems. I always get an error that says :
                  "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
                  line 10 did not have 4 elements"

                  I've tried everything I can think of and I cannot seem to remedy this problem. Any suggestion?

                  Thank you!
                  Dear Sma,

                  Thanks for using DEGseq. I am wondering your problem may be caused by the format of the input file to geneExpMatrix. Could you paste the first 15 lines in the input file here, or email to me via [email protected]? Thanks.
                  Xi Wang

                  Comment


                  • #99
                    Starting using Degseq

                    Hey all, I'm just start using Degseq and I'm having some troubles. I have many library without replicats and I want to see the differential expression in the time course of a disease. I have solexa reads MP ~35 bp og RNA-seq, the genome and a gene prediction of this specie. I align the reads with the predicted genes using bowtie then I made and script to make a file like this:

                    "Gene" "LB1" "LB2"
                    "g1" 45 103
                    ... ... ...

                    where the numbers are how many reads align with that gene for each library. Then I ran DEGexp like this:


                    library(DEGseq)

                    geneExpMatrix1 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(2))
                    geneExpMatrix2 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(3))

                    layout(matrix(c(1,2), 3, 2, byrow=TRUE))

                    par(mar=c(2, 2, 2, 2))

                    DEGexp(geneExpMatrix1=geneExpMatrix1, geneCol1=1, expCol1=2, groupLabel1="GPB1", geneExpMatrix2=geneExpMatrix2, geneCol2=1, expCol2=2, groupLabel2="DPB1", method="MARS", outputDir="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/", pValue=1e-2, thresholdKind=1)

                    I got results without errors but the results have a lot of differentially expressed genes. I have ~18k genes and ~12k are differentially expressed. Looking at the file I can see that many of them aren't significant. I wanted to know if I'm doing something wrong?

                    Thanks for any help!

                    Comment


                    • Originally posted by osvaldoreis View Post
                      Hey all, I'm just start using Degseq and I'm having some troubles. I have many library without replicats and I want to see the differential expression in the time course of a disease. I have solexa reads MP ~35 bp og RNA-seq, the genome and a gene prediction of this specie. I align the reads with the predicted genes using bowtie then I made and script to make a file like this:

                      "Gene" "LB1" "LB2"
                      "g1" 45 103
                      ... ... ...

                      where the numbers are how many reads align with that gene for each library. Then I ran DEGexp like this:


                      library(DEGseq)

                      geneExpMatrix1 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(2))
                      geneExpMatrix2 <- readGeneExp(file="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/reads_por_gene.txt", geneCol=1, valCol=c(3))

                      layout(matrix(c(1,2), 3, 2, byrow=TRUE))

                      par(mar=c(2, 2, 2, 2))

                      DEGexp(geneExpMatrix1=geneExpMatrix1, geneCol1=1, expCol1=2, groupLabel1="GPB1", geneExpMatrix2=geneExpMatrix2, geneCol2=1, expCol2=2, groupLabel2="DPB1", method="MARS", outputDir="/root/rnaseq/Analise_diff_Expr/GPB1_X_DPB1/", pValue=1e-2, thresholdKind=1)

                      I got results without errors but the results have a lot of differentially expressed genes. I have ~18k genes and ~12k are differentially expressed. Looking at the file I can see that many of them aren't significant. I wanted to know if I'm doing something wrong?

                      Thanks for any help!
                      Hi,

                      Thanks for using DEGseq in differential expression analysis.

                      As your samples are without biological replicates (right?), the statistical model in DEGseq only depicts the measurement uncertainty in RNA-seq technology, so there could be some genes, which are picked up as differentially expressed genes, do appear to be differentially expressed in samples (for several reasons such as individual differences) but don't have biological significance. It is often said that statistical significance doesn't equal to biological significance.
                      Another thing is you may try more stringent p-value (or q-value) cutoff, say specifying pValue=1e-3. Or you'd better use thresholdKind=3 or thresholdKind=4. The q-values are adjusted from p-values for multiple testing correction.

                      Hope this helps.
                      Last edited by Xi Wang; 10-01-2010, 01:05 AM.
                      Xi Wang

                      Comment


                      • Hi Xi Wang,
                        Thanks for your software. I am wondering what's the difference between DEseq and DEGseq?

                        Comment


                        • Originally posted by luoruicd View Post
                          Hi Xi Wang,
                          Thanks for your software. I am wondering what's the difference between DEseq and DEGseq?
                          Please read the respective articles for the two tools.
                          Xi Wang

                          Comment


                          • Hi
                            I ran the program DEGexp of the DEGseq. The output file generated a table with values of log2 fold change, z-score, p-value, q-value and signature (p-value <0.001). How to interpret the gene upregulation and downregulation? what each column means? the input file was RPKM and genes. I have not replicate, but I'm comparing two conditions.
                            Thanks

                            Comment


                            • Originally posted by Sol View Post
                              Hi
                              I ran the program DEGexp of the DEGseq. The output file generated a table with values of log2 fold change, z-score, p-value, q-value and signature (p-value <0.001). How to interpret the gene upregulation and downregulation? what each column means? the input file was RPKM and genes. I have not replicate, but I'm comparing two conditions.
                              Thanks
                              Hi Sol, Thanks for using DEGseq.

                              In the output file, there are 2 columns for fold-change: "log2(Fold_change)" and "log2(Fold_change) normalized". log2(Fold_change) = log(value1/value2), and the normalized value is got from the normalized value1 and value2. From the value of fold-change, you can judge this gene is up-regulated or down-regulated. For example, for a gene if its log2(Fold_change) > 0, which means value1 > value2, and if its signature = TRUE, this gene is significantly down-regulated in condition 2. Also, you can look into z-scores.

                              Hope this helps.
                              Xi Wang

                              Comment


                              • but, o z-score, is based on what data? and what is difference between q-value and p-value. I don't understand.
                                Thanks

                                Comment

                                Working...
                                X