Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Xi Wang
    replied
    Originally posted by svl View Post
    I wasn't too clear indeed :P, I meant the amount of reads the analysis is based on. I just quickly wrote some things that came to mind.
    Thanks. It's quite clear this time. We are also feeling those statistics are quite important in practice.

    Originally posted by svl View Post
    Right. I have RPKM values (cufflinks output), so do you suggest I'd be better off using the method=MATR with rawCount=F instead of method=MARS...? It's not all technical replicates I put up against each other...
    Sorry that I didn't make myself quite clearly instead:-( The rawCount option is only for method=MATR. But for other methods, no need to check whether the gene expression levels are quantified by raw read counts or not.
    Further, as we recommend to use raw read count as the gene expression level, you can multiply the RPKM by the gene length to get back the raw read count. If you don't want to do like this, DEGexp deals with RPKM well.

    Leave a comment:


  • svl
    replied
    Originally posted by Xi Wang View Post
    Thanks a lot for your suggestions. We will add these info in the next version. I am not sure what "nr" is in the sentense "* nr of reads included for each". So could you please give me more details? Thanks.
    Xi
    I wasn't too clear indeed :P, I meant the amount of reads the analysis is based on. I just quickly wrote some things that came to mind.

    Originally posted by Xi Wang View Post
    If rawCount = FALSE, we assume that the gene expression levels have already been normalized (against the sequence depth), such as RPKM.
    Xi
    Right. I have RPKM values (cufflinks output), so do you suggest I'd be better off using the method=MATR with rawCount=F instead of method=MARS...? It's not all technical replicates I put up against each other...

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by adamreid View Post
    I guess I'm asking whether there is/ought to be a correction for gene length because more reads are expected to map to longer genes.
    It is ture that more reads come from the longer genes if the copy number of transcripts is the same. However, with the aim to identify the differently expressed genes, we can use raw read counts. The reason is that we only consider every gene, and the gene length in samples is not changed (if ignoring the alternative splicing). For the methods based on the random sampling model (such as LRT, FET, MARS), we suggest using the raw counts, which better fits the random sampling model.

    Originally posted by adamreid View Post
    Something else I was wondering about was the use of multiple examples for the two conditions. If I use multiple columns for expCol1 and expCol2 the number of reads appears to to summed. Is it therefore a bad idea to use say 3 columns for expCol1 and 2 for expCol2?
    It works.

    Thanks for your questions.
    Xi
    Last edited by Xi Wang; 11-18-2009, 09:19 AM.

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by svl View Post
    Thanks again for your package, seems to work fine. Will you be including more info in the output.html? For instance;
    * correlation measures
    * nr of reads included for each
    * amount of differentially expressed
    Of course we can extract that ourselfs from the output_score.txt files, but it still would be nice to have some more info directly
    --------------
    And why are the header values for the two compared samples called "value1" and "value2". If you give the sample-names via the flag "groupLabel1/2" in the function DEGexp() it would be nice if they show up in output_score.txt files too.
    Thanks a lot for your suggestions. We will add these info in the next version. I am not sure what "nr" is in the sentense "* nr of reads included for each". So could you please give me more details? Thanks.

    Originally posted by svl View Post
    And what does the option "rawCount" excactly do?
    The option rawCount is only used when the method=MATR is chosen. If rawCount = TRUE, we will adjust the mean of M to the same value for the case-and-control samples and the technical replicates. The difference of the mean of M is caused by the different sequence depth in the two samples compared. If rawCount = FALSE, we assume that the gene expression levels have already been normalized (against the sequence depth), such as RPKM. Therefore, no need to adjust the mean of M.

    May this information help you.

    Wish best wishes,
    Xi
    Last edited by Xi Wang; 11-18-2009, 09:06 AM.

    Leave a comment:


  • adamreid
    replied
    Hi Xi,

    I guess I'm asking whether there is/ought to be a correction for gene length because more reads are expected to map to longer genes.

    Something else I was wondering about was the use of multiple examples for the two conditions. If I use multiple columns for expCol1 and expCol2 the number of reads appears to to summed. Is it therefore a bad idea to use say 3 columns for expCol1 and 2 for expCol2?

    Adam

    Leave a comment:


  • svl
    replied
    Hi Xi Wang,

    --------------
    Thanks again for your package, seems to work fine. Will you be including more info in the output.html? For instance;
    * correlation measures
    * nr of reads included for each
    * amount of differentially expressed
    Of course we can extract that ourselfs from the output_score.txt files, but it still would be nice to have some more info directly

    --------------
    And why are the header values for the two compared samples called "value1" and "value2". If you give the sample-names via the flag "groupLabel1/2" in the function DEGexp() it would be nice if they show up in output_score.txt files too.

    --------------
    And what does the option "rawCount" excactly do?

    thanks,
    -SvL

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by adamreid View Post
    Hi there,

    When using DEGexp with read counts for each gene, how are the read counts normalised? I've put dummy values in for gene lengths. Does it use these, expecting
    no exons?

    Adam
    We don't recommend to normalize the read counts, although you can use RPKM as the gene expression level. DEGexp considers the number of total reads that map to the gene exon regions, when the sequence depth between samples are not the same. Why we do like this is to make sure the computation under the assumption of the random sampling model.

    I am not sure I understand quite well what you meant by asking the second question. When using DEGexp, the gene expression levels are provided by your data. You can define the "gene expression" yourself. However, we recommend that you use the raw read counts as the gene expression levels.

    Thanks for your question.
    Xi

    Leave a comment:


  • adamreid
    replied
    Hi there,

    When using DEGexp with read counts for each gene, how are the read counts normalised? I've put dummy values in for gene lengths. Does it use these, expecting
    no exons?

    Adam

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by shibujohn View Post
    Yes, We have the SOLiD data which is uniquely mapped into the refseq (two groups BN & SHR).. And I don't have the replicates.. We had performed the Whole transcriptome in SOLiD, so is it possible to analyze the SOLiD uniquely mapped reads (.ma)?
    Thanks,
    Shibu
    It looks possible. You'd better also try to valid the results given by DEGseq. Good luck!

    Xi

    Leave a comment:


  • shibujohn
    replied
    SOLiD WT-data

    Originally posted by Xi Wang View Post
    Thanks for your question.

    We only used the assumption that reads are uniformly distributed along transcripts. We can upload your uniquely mapped reads (or profiles) to a browser (such as UCSC genome browser) to check whether your data satisfy this assumption. And less stringently, you can check whether the variation between technical replicates (if any) can be explained by the random sampling model. A feature of our DEGseq package can help check it: see Section 3 in Supplementary Material on line for details. May this information helps you.

    Xi
    Yes, We have the SOLiD data which is uniquely mapped into the refseq (two groups BN & SHR).. And I don't have the replicates.. We had performed the Whole transcriptome in SOLiD, so is it possible to analyze the SOLiD uniquely mapped reads (.ma)?
    Thanks,
    Shibu

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by shibujohn View Post
    Hi,
    Is it possible to analyze SOLiD rna-seq data in DEGseq?

    Shibu
    Thanks for your question.

    We only used the assumption that reads are uniformly distributed along transcripts. We can upload your uniquely mapped reads (or profiles) to a browser (such as UCSC genome browser) to check whether your data satisfy this assumption. And less stringently, you can check whether the variation between technical replicates (if any) can be explained by the random sampling model. A feature of our DEGseq package can help check it: see Section 3 in Supplementary Material on line for details. May this information helps you.

    Xi

    Leave a comment:


  • shibujohn
    replied
    Is it possible to analyze SOLiD rna-seq in DEGseq

    Originally posted by Xi Wang View Post
    Hi RockChalkJayhawk,

    Thanks for using DEGseq.

    In the current version of DEGseq, we do not consider the splice junctions, and so do the reads with multiple matches to the reference genome. Actually, we take the reads mapped to the reference genome not transcriptiome as input, and count the reads in the annotateed gene regions as the gene expressing level. It works if the gene expression patterns (isoform expression percentage) are similar between the case and control samples, although otherwise it may cause a litter bias. We are now working on how to use the information provided by splice junction reads and multiple aligned reads to refine the work on differetially expressed gene identification.

    Xi
    Hi,
    Is it possible to analyze SOLiD rna-seq data in DEGseq?

    Shibu

    Leave a comment:


  • svl
    replied
    Originally posted by Xi Wang View Post
    Sorry for the inconvenience.
    Thanks for the update/info!

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by svl View Post
    Thanks. I figured it out. Had to update R and Bioconductor before trying to install the package....should keep my software better updated I guess
    Sorry for the inconvenience.

    If you want to install DEGseq through bioconductor by the followsing script, you need to update your R to 2.10.0 and bioc to 2.5

    source("http://bioconductor.org/biocLite.R")
    biocLite("DEGseq")

    Or alternatively, you can install DEGseq through our site by:

    source("http://bioinfo.au.tsinghua.edu.cn/software/degseq/DEGseqInstall.R")

    Yesterday, our building encountered a power cut, so the server was down. The server is running well now.

    Thanks and Best wishes,
    Xi

    Leave a comment:


  • Xi Wang
    replied
    Hi RockChalkJayhawk,

    Thanks for using DEGseq.

    In the current version of DEGseq, we do not consider the splice junctions, and so do the reads with multiple matches to the reference genome. Actually, we take the reads mapped to the reference genome not transcriptiome as input, and count the reads in the annotateed gene regions as the gene expressing level. It works if the gene expression patterns (isoform expression percentage) are similar between the case and control samples, although otherwise it may cause a litter bias. We are now working on how to use the information provided by splice junction reads and multiple aligned reads to refine the work on differetially expressed gene identification.

    Xi

    Originally posted by RockChalkJayhawk View Post
    Hey all,

    A new package has come out for RNA Seq Analysis:

    DEGseq: an R package for identifying differentially expressed genes from RNA-seq data.
    Wang L, Feng Z, Wang X, Wang X, Zhang X.
    Bioinformatics. 2009 Oct 24. [Epub ahead of print]
    PMID: 19855105 [PubMed - as supplied by publisher]

    However, from my understanding (which may not be correct) they only use tags that map to the genome, not splice juntions. Does anyone else see this and if so, what would be the work-around?

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:37 PM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 06:07 PM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
49 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
67 views
0 likes
Last Post seqadmin  
Working...
X