Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Or remove "chr" them from your alignment file, whatever is easier. But the names should match!

    Comment


    • #17
      Originally posted by areyes View Post
      Check the function read.HTSeqCounts from DEXSeq. This will read this files, parse relevant information for the package and return an ExonCountSet object in your R session.
      I am new to DEXSeq. I have a set of separate TXT. files for each sample in my R folder. How do I compile these into the data frame samples in R in order to create an Exon count st.

      Comment


      • #18
        function read.table

        Check also this:

        Comment


        • #19
          Originally posted by areyes View Post
          function read.table

          Check also this:

          http://egret.psychol.cam.ac.uk/stati...eringdata.html
          I have used deserve, and read.csv function successfully. However now I have several tables of data. Do I merge them in R.

          Comment


          • #20
            ?cbind

            Alejandro

            Comment


            • #21
              Hi there,

              we want to do some alternative splicing analysis with DEXSeq. I'm at the data preparation step. We have got quite big files (80 GB per replicate after converting the 15 GB .bam file). How many GB of RAM are needed in order to sort them (--mem-per-cpu=). Is it 80 gigs, 160 (hopefully not...) or is it done subsequently with small RAM allocations?

              Code:
              sort -k1,1 -k2,2n ctrl_1.sam > ctrl_1_sorted.sam
              the hardware configuration is: http://csc.uni-frankfurt.de/index.php?id=60&L=2

              Thanks!
              Ben

              Answer (CSC support)

              The issue has been related to the SLURM scripts. It turned out that increasing disk space allocated to the node via
              Code:
              #SBATCH --tmp=200000
              followed by
              Code:
              srun sort --temporary-directory=/local/$SLURM_JOB_ID -k1,1 -k2,2n ctrl_1.sam > ctrl_1_sorted.sam
              did solve the problem.
              Last edited by kben; 10-19-2012, 03:21 AM. Reason: answer added

              Comment


              • #22
                I've got a question concering DEXSeq runs on a cluster/saving the calculation output by the node. Is it sufficient to have
                Code:
                save(ecs, file="ecs_ctrlvskd_.RData")
                at the end of the invoked R script, in order to work with the .RData file on a slower machine subsequently?

                Thanks!

                Ben

                The R script up to now (I'm new to R, so it's not fine tuned):

                Code:
                library(multicore)
                library(DEXSeq)
                setwd("/scratch/therapy/bkoch/dexseq/input")
                annotationfile = file.path("/scratch/therapy/bkoch/dexseq/GRCh37_Ensembl_DEXSeq.gff")
                samples = data.frame(condition = c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),replicate=c(1:3,1:3),row.names=c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),stringsAsFactors=TRUE,check.names = FALSE)
                fullFilenames<- list.files("/scratch/therapy/bkoch/dexseq/input/",full.names=TRUE,pattern="sorted.txt")
                ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = samples,flattenedfile = annotationfile)
                ecs<- estimateSizeFactors(ecs)
                ecs<- estimateDispersions(ecs)
                ecs<- fitDispersionFunction(ecs)
                ecs<- estimatelog2FoldChanges(ecs)
                test<- testForDEU(ecs)
                res1<- DEUresultTable(test) 
                save(ecs, file="ecs_ctrlvskd_.RData")

                Comment


                • #23
                  Likely
                  Code:
                  save(test, file="test_ctrlvskd_.RData")
                  save(res1, file="res1_ctrlvskd_.RData")
                  have to be added?

                  Comment


                  • #24
                    Hi kben,

                    Saving the "ecs" object like that should work, "res1" its easy to get from the ecs object in a slow machine.

                    Are you having problems with completion issues? If you have multiple cores in your machine, you could use the nCores argument in the functions estimateDispersions and testForDEU to parallelize DEXSeq into many cores. Also discarding genes with lots of exons (e.g. more than 150) will help, unfortunately a single gene with many exons can take some time.

                    Alejandro

                    Comment


                    • #25
                      Hi Alejandro,

                      awesome, thanks! I will use these lines, and load library(parallel) instead of multicore (checked your nice vignette)!

                      Code:
                      ecs <- estimateDispersions(ecs, nCores=24)
                      ecs <- testForDEU(ecs, nCores=24)
                      Ben
                      Last edited by kben; 10-19-2012, 04:38 AM. Reason: correction

                      Comment


                      • #26
                        I'm apperently running into trouble with DEXSeq & library(parallel) on the cluster. Could you please give me hint on how to fix it?

                        Thanks a lot!
                        Ben

                        Error log:
                        ...Fehler in function (classes, fdef, mtable) :
                        unable to find an inherited method for function "fData", for signature "character"
                        Calls: estimateDispersions ... .local -> divideWork -> rownames -> fData -> <Anonymous>
                        Zusätzlich: Warnmeldung:
                        In parallel::mclapply(allecs, FUN = funtoapply, mc.cores = mc.cores) :
                        all scheduled cores encountered errors in user code
                        Ausführung angehalten
                        srun: error: node1-012: task 0: Exited with exit code 1
                        The R script:

                        Code:
                        library(DEXSeq)
                        setwd("/scratch/therapy/bkoch/dexseq/input")
                        annotationfile = file.path("/scratch/therapy/bkoch/dexseq/GRCh37_Ensembl_DEXSeq.gff")
                        samples = data.frame(condition = c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),replicate=factor(c(1:3,1:3)),row.names=c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),stringsAsFactors=TRUE,check.names = FALSE)
                        fullFilenames<- list.files("/scratch/therapy/bkoch/dexseq/input/",full.names=TRUE,pattern="sorted.txt")
                        ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = samples,flattenedfile = annotationfile)
                        ecs<- estimateSizeFactors(ecs)
                        library(parallel)
                        ecs<- estimateDispersions(ecs, nCores=24)
                        ecs<- fitDispersionFunction(ecs)
                        ecs<- estimatelog2FoldChanges(ecs)
                        test<- testForDEU(ecs, nCores=24)
                        res1<- DEUresultTable(test) 
                        save(ecs, file="/scratch/therapy/bkoch/dexseq/output/ecs_ctrlvskd_.RData")
                        SessionInfo (if I start R after my login on the cluster):

                        R version 2.15.1 (2012-06-22)
                        Platform: x86_64-redhat-linux-gnu (64-bit)

                        locale:
                        [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
                        [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
                        [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
                        [7] LC_PAPER=C LC_NAME=C
                        [9] LC_ADDRESS=C LC_TELEPHONE=C
                        [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

                        attached base packages:
                        [1] parallel stats graphics grDevices utils datasets methods
                        [8] base

                        other attached packages:
                        [1] DEXSeq_1.4.0 Biobase_2.18.0 BiocGenerics_0.4.0

                        SLURM job script

                        Code:
                        #!/bin/bash
                        #SBATCH --ntasks=1
                        #SBATCH --cpus-per-task=24
                        #SBATCH --error=ecs_err.log
                        #SBATCH --output=ecs_out.log
                        #SBATCH --job-name=ecs
                        #SBATCH --mem-per-cpu=2000
                        #SBATCH --partition=parallel
                        #SBATCH --time=7-00:00:00
                        #
                        export OMP_NUM_THREADS=24
                        srun R --save --file=/scratch/therapy/bkoch/dexseq/dexseq_cluster.R --output=/scratch/therapy/dexseq/output
                        Last edited by kben; 10-19-2012, 10:45 AM. Reason: sessionInfo added

                        Comment


                        • #27
                          The cores error (if I didn't mix sth up, and it's a real error) could be related to this one:


                          update:

                          nCores=24 => failed run after 9 min
                          nCores=12 => failed after 10 min
                          nCores=4 => failed after 7 min

                          without library(parallel) and the nCores-variable: now running since > 100 min.
                          Last edited by kben; 10-19-2012, 12:29 PM. Reason: addendum

                          Comment


                          • #28
                            Probably my R script isn't correct. The run with only one core failed too after 120 min

                            Error log:
                            Code:
                            Fehler in FUN(c("ENSG00000000003", "ENSG00000000419", "ENSG00000000457",  :
                              Underdetermined model; cannot estimate dispersions. Maybe replicates have not been properly specified.
                            Calls: estimateDispersions -> estimateDispersions -> .local -> lapply -> FUN
                            Zusätzlich: Warnmeldungen:
                            1: In .local(object, ...) :
                              Exons with less than 10 counts will not be tested. For more details please see the manual page of 'estimateDispersions', parameter 'minCount'
                            2: In .local(object, ...) :
                              Genes with more than 70 testable exons will be omitted from the analysis. For more details please see the manual page of 'estimateDispersions', parameter 'maxExon'.
                            Ausführung angehalten
                            srun: error: node2-041: task 0: Exited with exit code 1
                            Regarding the error log's replicate (we have 3x biological replicates control, 3x biological replicates knockdown with each > 100 million clusters) statement: they are specified in this way (is it OK)?

                            Code:
                            samples = data.frame(condition = c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),replicate=factor(c(1:3,1:3)),row.names=c("ctrl1","ctrl2","ctrl3","kd1","kd2","kd3"),stringsAsFactors=TRUE,check.names = FALSE)
                            fullFilenames<- list.files("/scratch/therapy/bkoch/dexseq/input/",full.names=TRUE,pattern="sorted.txt")
                            ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = samples,flattenedfile = annotationfile)
                            Thanks!
                            Ben
                            Last edited by kben; 10-19-2012, 12:59 PM.

                            Comment


                            • #29
                              After changing the condition specifications the nCores=24 R script finished within 3 hrs. So the the problems resulted from condition namings.

                              Successfull data.frame creation:
                              Code:
                              > samples = data.frame(condition = c("ctrl","ctrl","ctrl","kd","kd","kd"),replicate=factor(c(1:3,1:3)),row.names=c("ctrl_1","ctrl_2","ctrl_3","kd_1","kd_2","kd_3"),stringsAsFactors=TRUE,check.names = FALSE)

                              Comment


                              • #30
                                Parallel

                                Originally posted by Thomas Ryan View Post
                                I am new to DEXSeq. I have a set of separate TXT. files for each sample in my R folder. How do I compile these into the data frame samples in R in order to create an Exon count st.
                                I have encountered a problem with estimateDispersions using library(parallel), with 8 cores, after 14000 genes I get a message "In mccollect(children(jobs), FALSE): restarting interrupted promise evaluation".
                                Any suggestions as to how I should get around this. estimate Dispersions with a single core would take quite a while.
                                Tom

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Recent Advances in Sequencing Analysis Tools
                                  by seqadmin


                                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                  05-06-2024, 07:48 AM
                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 05-10-2024, 06:35 AM
                                0 responses
                                15 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 05-09-2024, 02:46 PM
                                0 responses
                                21 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 05-07-2024, 06:57 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 05-06-2024, 07:17 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X