Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • areyes
    replied
    Hi czelin,

    This is unlikely, since reads that overlap to many exons should be counted once per each exon. You could check whether the reads that are ignored by the script are properly paired or if they are mapping to multiple regions on the genome?

    Alejhandro

    Leave a comment:


  • czelin
    replied
    Dear all,

    I have recently started with exon-wise analysis and would appreciate your help.

    I have paired 100bp reads. I have prepared the annotation file with DEXSeq python scripts. What I have realized is that when I have a shorter exon (e.g. 150bp) the number of tags is 0. In IGV I can see lots (>500) of reads spanning this exon. Somehow few of the reads were counted for the control samples and none for comparison and this exon was reported as significantly DE.
    I suspect that is because the two read pairs are longer than this exon and they overlapped adjacent exon[s]. This caused the reads be considered as "_ambiguous_readpair_position". If my understanding is correct, is there any way to solve the short-exon issue? If my understanding is wrong, could you please correct me?

    Leave a comment:


  • sindrle
    replied
    I also have a question about warnings and dispersion:

    ecs <- estimateSizeFactors( ecs )
    the matrix is either rank-deficient or indefinite

    ecs <- fitDispersionFunction( ecs )
    Too much damping - convergence tolerance not achievable

    Click image for larger version

Name:	Screen Shot 2014-02-09 at 12.58.40.jpg
Views:	1
Size:	34.6 KB
ID:	304433

    And does this look ok?

    Thanks! First time DEXSeq..

    Leave a comment:


  • metheuse
    replied
    I just noticed the chromosome name of the gff file doesn't contain "chr":
    Code:
    1    Homo_sapiens.GRCh37.71.gtf      exonic_part     11869   11871   .       +       .       transcripts "ENST00000456328"; exonic_part_number "001"; gene_id "ENSG00000223972"
    This is probably the reason. I've added chr to each line and see if it works.

    Leave a comment:


  • metheuse
    replied
    By the way, these are the commands I used:
    Code:
    samtools index 21722_mapped_hg19/accepted_hits.bam
    samtools view 21722_mapped_hg19/accepted_hits.bam >21722_accepted_hits.sam
    sort -k1,1 -k2,2n 21722_accepted_hits.sam >21722_accepted_hits_sorted.sam
    python ~/scripts_64/dexseq_count.py -p yes -s no ~/scripts_64/Homo_sapiens.GRCh37.71.DEXSeq.chr.gff 21722_accepted_hits_sorted.sam KARPAS299_CEP.txt
    Last edited by metheuse; 04-17-2013, 08:54 PM.

    Leave a comment:


  • metheuse
    replied
    Originally posted by Simon Anders View Post
    Hav you checked your alignments with a genome browser? Load the SAM file and the GFF file produced by dexseq_prepare in, e.g., IGV, and look at one of the loci with zero counts. If there really are no reads, you experiment has failed (or you are using a wrong annotation file).
    I converted the bam file to bed and intersected it with "chr1 29385323 29385364 ENSG00000159023:023" (which has zero count in the dexseq_count.py output)
    This resulted in 71 intersecting reads. Here are the first 10 of them:
    Code:
    chr1    29379725        29391495        HWI-ST1235:101:C1WW9ACXX:6:1312:1770:71045/1    50      +
    chr1    29379725        29391495        HWI-ST1235:101:C1WW9ACXX:6:2116:17025:56846/2   50      -
    chr1    29379731        29391501        HWI-ST1235:101:C1WW9ACXX:6:2307:18153:8715/1    50      +
    chr1    29379734        29391504        HWI-ST1235:101:C1WW9ACXX:6:2314:2163:52123/2    50      +
    chr1    29379736        29391506        HWI-ST1235:101:C1WW9ACXX:6:2314:2163:52123/1    50      -
    chr1    29379741        29391511        HWI-ST1235:101:C1WW9ACXX:6:2313:2799:76858/1    50      +
    chr1    29379742        29391512        HWI-ST1235:101:C1WW9ACXX:6:2210:9177:71165/1    50      +
    chr1    29379742        29391512        HWI-ST1235:101:C1WW9ACXX:6:1101:15865:15952/2   50      -
    chr1    29379742        29391512        HWI-ST1235:101:C1WW9ACXX:6:1307:5858:86759/2    50      -
    chr1    29379742        29391512        HWI-ST1235:101:C1WW9ACXX:6:1308:20389:32047/1   50      -
    This should mean both my reads and the annotation file has no problem.
    Last edited by metheuse; 04-17-2013, 05:12 PM.

    Leave a comment:


  • Simon Anders
    replied
    Originally posted by metheuse View Post
    I got the same problem. It's all zeros in the second column of the output of dexseq_count.py
    I saw you said you realized later you didn't follow the manual exactly. Could you tell me what's the problem? Thanks.
    Hav you checked your alignments with a genome browser? Load the SAM file and the GFF file produced by dexseq_prepare in, e.g., IGV, and look at one of the loci with zero counts. If there really are no reads, you experiment has failed (or you are using a wrong annotation file).

    Leave a comment:


  • Simon Anders
    replied
    Originally posted by oliviera View Post
    Only 92 with adjp < 0.01.
    With DEseq I get ~2000 genes at this cut off. I think I made something wrong
    Why would you use such as stringent cut-off?

    Do you really need to make sure that your list of differentially used exons do not contain more than 1% false positives? In most use cases, 10% are considered acceptable.

    And: Of course, you get much more genes than exons. Detecting differential expression is needs to see much less information in the data than detecting differential exon usage.

    Leave a comment:


  • metheuse
    replied
    I got the same problem. It's all zeros in the second column of the output of dexseq_count.py
    I saw you said you realized later you didn't follow the manual exactly. Could you tell me what's the problem? Thanks.

    Leave a comment:


  • oliviera
    replied
    Only 92 with adjp < 0.01.
    With DEseq I get ~2000 genes at this cut off. I think I made something wrong

    Leave a comment:


  • areyes
    replied
    I think it should be OK, how many hits do you get?

    Leave a comment:


  • oliviera
    replied
    Hi Alejandro,

    Here is the graph to check dispersion estimate. What do you think?

    meanvalues<- rowMeans(counts(ecs))
    plot(meanvalues, fData(ecs)$dispBeforeSharing, log="xy", main="mean vs CR dispersion")
    x<- 0.01:max(meanvalues)
    y<- ecs@dispFitCoefs[1] + ecs@dispFitCoefs[2] / x
    lines(x, y, col="red")
    Attached Files

    Leave a comment:


  • areyes
    replied
    Hi oliviera,

    This warning sometimes happens when your data is sparsed. Have you done the "fit diagnostics" as indicated in the vignette? As it is just a warning, you can go on with your analysis, maybe you will loose some power if the fit is "above" most of the points...

    Alejandro

    Leave a comment:


  • oliviera
    replied
    Dear all,
    I try to use DExseq but got the following error when I call the function

    ecs<- fitDispersionFunction(ecs)

    Warning message:
    In glmgam.fit(mm, disps[good], coef.start = coefs) :
    Too much damping - convergence tolerance not achievable


    Here is the version in R I use
    R version 2.15.1 (2012-06-22)
    Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
    locale:
    [1] C/UTF-8/C/C/C/C
    attached base packages:
    [1] stats graphics grDevices utils datasets methods base
    other attached packages:
    [1] DEXSeq_1.4.0 Biobase_2.18.0 BiocGenerics_0.4.0
    loaded via a namespace (and not attached):
    [1] RCurl_1.95-1.1 XML_3.95-0.1 biomaRt_2.14.0 hwriter_1.3 plyr_1.7.1
    [6] statmod_1.4.16 stringr_0.6.1

    Any suggestion on what is going on?

    Cheers
    Olivier

    Leave a comment:


  • FuzzyCoder
    replied
    Thank you!

    I will try it with additional replicates. I only used one per condition because I wanted to work my way through the GSNAP -> DEXSeq workflow successfully before I ran all the replicates through GSNAP (~12 hours per replicate). I will let you know how it goes tomorrow.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 10:28 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 07:35 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-22-2024, 02:06 PM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-14-2024, 07:03 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Working...
X