Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • aslihan
    replied
    I just opened the excel file and import counts_table.txt file and saved as csv file.


    > countsTable <- read.table("/countstable/count_table_last.csv", sep=',', header=FALSE, row.names=1)
    > head (countsTable) V2 V3 V4 V5
    1/2-SBSRNA4 13 12 12 48
    A1BG 12 2 7 12
    A1BG-AS1 8 3 12 12
    A1CF 2 3 0 1
    A2LD1 29 33 47 47
    A2M 21 36 19 6

    Leave a comment:


  • aslihan
    replied
    HELP Count Table for DESeq

    cat Join_counts_trimmed.txt | head

    1/2-SBSRNA4 13 12 12 48
    A1BG 12 2 7 12
    A1BG-AS1 8 3 12 12
    A1CF 2 3 0 1
    A2LD1 29 33 47 47
    A2M 21 36 19 6
    A2ML1 8 11 6 16
    A2MP1 0 0 1 0
    A4GALT 74 47 11 94
    A4GNT 0 0 0 0

    I would like to load into R for DESeq analysis. WHY IT doesnt show 5 columns?

    > countsTable <- read.table("/countstable/Join_counts_trimmed.txt", sep='\t', header=TRUE, row.names=1)
    > countsTable
    data frame with 0 columns and 23227 rows


    Then I tried differently

    > exampleFile = system.file( "/countstable/Join_counts.txt", package="DESeq" )
    > exampleFile
    [1] ""
    > countsTable <- read.delim( exampleFile, header=FALSE, stringsAsFactors=TRUE )

    Error in read.table(file = file, header = header, sep = sep, quote = quote, :
    no lines available in input
    In addition: Warning message:
    In file(file, "rt") :
    file("") only supports open = "w+" and open = "w+b": using the former


    I dont know how to load this joined.txt file?

    Thank you so much for any help

    Leave a comment:


  • Simon Anders
    replied
    Worries like yours were what prompted us to re-address this whole issue and make the changes to DESeq mentioned above. Maybe you should give the devel version a try. It should be stable now (and you can manually install it on top of a current release Bioc installation.)

    Leave a comment:


  • biofreak
    replied
    Hi Simon,
    I am using 1.4 version if DESeq. I tried filtering differentially expressed genes based on within group variance values from 'resVarA' and 'resVarB'. As you have mentioned, it is indeed difficult to determine a good cutoff level. I tried everything from very strict (smaller) cutoffs to the cutoff of even 20!
    But every time you think that you could have kept some genes (or otherwise).
    For example with cutoff of 20 (in either groupA or groupB), following are some of the
    discarded genes: (6 samples)

    Z-1 Z-2 Z-3 Z-4 Z-5 Z-6
    703 531 7 0 0 1
    158 328 2 4 0 1
    213 284 1 2 3 5
    186 240 4 6 6 2
    382 194 143 2634 951 4336
    229 119 105 2206 543 1837
    1646 120 67 129 167 159
    137 60 46 1147 321 990
    144 64 35 1195 273 955
    249 39 24 19 22 8
    210 83 40 1418 303 952
    46 63 28 417 178 704
    49 42 23 381 156 598
    498 831 42 247 115 88
    261 329 19 82 57 29
    72 29 26 471 137 506
    279 341 24 70 55 70
    101 51 34 364 182 850
    345 389 37 84 78 95

    My data is already sparse and I am worried that so many genes with good read counts are being discarded. (for good reasons ofcourse!)
    Based on your experience, do you have any number which people find reasonable to use as a cut off? or is it very specific to the data?
    thanks a lot in advance.
    Last edited by biofreak; 08-03-2011, 09:59 AM.

    Leave a comment:


  • biofreak
    replied
    thanks a lot Simon. This information is tremendously helpful.

    Leave a comment:


  • Simon Anders
    replied
    Originally posted by biofreak View Post
    Hi all,
    I am analyzing my RNAseq data using DEseq package. Some of the read counts in the data are as follows. (2 groups of 3 samples each)
    CTL CTL CTL HF HF HF
    gene1 144 0 0 0 0 0
    gene2 45 57 0 0 0 0

    Now, after running DEseq NB test, fold change for both gene1 and gene2 is 'inf'. which is okay because the read counts for the 3 samples in group2 is 0. However, I am not sure about gene1 data. Only 1 sample showing reads and other two being zero is unreliable.
    Should I remove such cases and keep only genes like gene2?
    The problem has little to do with the values being zero, you also get genes with values like '200, 1, 3; 2, 3, 2', i.e., one large value and all other values small.

    We recently changed how DESeq deals with this kind of data, so I have to give you two answers.

    for the released version of DESeq (v1.4.1): The data frame returned by 'nbinomTest' contains two columns called 'resVarA' and 'resVarB'. The values in their should be around 1 and for genes with unusually large estimated within-group variance ("variance outliers"), the value will be much higher. We recommend discounting such genes in the downstream analysis. See vignette for details

    for the development version (v1.5.23): We removed the 'resVar' columns as it turned out to be difficult to recommend a good cut-off value. Instead, the test is now informed about high estimated variance and becomes more conservative in this cases, i.e., genes as in your example should no longer reach a significant p value. Again, see the new (extensively revised) vignette for details.

    Also, should I remove the genes with all read counts zero before the analysis? (no data for all 6 samples in this case)
    thanks,
    No need to do that, as DESeq skips such genes anyway.

    Simon

    Leave a comment:


  • biofreak
    started a topic infinite fold change RNAseq

    infinite fold change RNAseq

    Hi all,
    I am analyzing my RNAseq data using DEseq package. Some of the read counts in the data are as follows. (2 groups of 3 samples each)
    CTL CTL CTL HF HF HF
    gene1 144 0 0 0 0 0
    gene2 45 57 0 0 0 0

    Now, after running DEseq NB test, fold change for both gene1 and gene2 is 'inf'. which is okay because the read counts for the 3 samples in group2 is 0. However, I am not sure about gene1 data. Only 1 sample showing reads and other two being zero is unreliable.
    Should I remove such cases and keep only genes like gene2?
    Also, should I remove the genes with all read counts zero before the analysis? (no data for all 6 samples in this case)
    thanks,

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 11:49 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X