Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • JonB
    Member
    • Jan 2010
    • 85

    DESeq: question about using HTSeq counts

    In the count files from HTSeq there are a few lines at the end:
    __no_feature
    __ambiguous
    __too_low_aQual
    __not_aligned
    __alignment_not_unique

    Should these lines be removed before loading into DESeq? I seem to get slightly different normalized counts when I create a count data set using newCountDataSet on a count table or newCountDataSetFromHTSeqCount directly on the HTSeq counts. Could this be due to these last lines?
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    The functions in DESeq2 that load those files actually remove those lines for you

    Comment

    • JonB
      Member
      • Jan 2010
      • 85

      #3
      When I do

      tail(counts(cds, normalized=TRUE))

      I see that these lines are there, but maybe they are not taken into account when doing analyses in DESeq?

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        What function did you use to load the files and what's the output of sessionInfo()?

        Comment

        • JonB
          Member
          • Jan 2010
          • 85

          #5
          > library("DESeq")
          > sampleTable = read.csv(file="Gene_count_files/sampletable.txt", header=TRUE, sep="\t")
          > cds = newCountDataSetFromHTSeqCount(sampleTable, directory="Gene_count_files/")
          > cds = estimateSizeFactors(cds)

          > sessionInfo()
          R version 3.0.2 (2013-09-25)
          Platform: x86_64-apple-darwin10.8.0 (64-bit)

          locale:
          [1] C

          attached base packages:
          [1] parallel stats graphics grDevices utils datasets methods base

          other attached packages:
          [1] DESeq_1.14.0 lattice_0.20-29 locfit_1.5-9.1 Biobase_2.22.0
          [5] GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.7 BiocGenerics_0.8.0

          loaded via a namespace (and not attached):
          [1] AnnotationDbi_1.24.0 DBI_0.2-7 RColorBrewer_1.0-5 RSQLite_0.11.4
          [5] XML_3.95-0.2 annotate_1.40.1 genefilter_1.44.0 geneplotter_1.40.0
          [9] grid_3.0.2 splines_3.0.2 stats4_3.0.2 survival_2.37-7
          [13] tools_3.0.2 xtable_1.7-3

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            Try using the DESeqDataSetFromHTSeqCount() function. At least in the most recent version it strips those lines.

            Comment

            • JonB
              Member
              • Jan 2010
              • 85

              #7
              Ok, thanks!

              Is it also safe to remove these lines from the raw count files or will this mess up the normalization later?

              Comment

              • gringer
                David Eccles (gringer)
                • May 2011
                • 845

                #8
                Unless you have a specific reason not to, you should probably be using DESeq2 rather than DESeq -- it has better statistical models, is more flexible, and makes the process a bit easier.

                That said, I would expect that removing the lines will be fine, given that other ways of getting counts into a DESeq structure don't require unmapped read counts to be specified.

                Comment

                • dpryan
                  Devon Ryan
                  • Jul 2011
                  • 3478

                  #9
                  Go ahead and remove them, they should be removed prior to normalization anyway. And as David said, switch to DESeq2, which has a number of improvements.

                  Comment

                  • JonB
                    Member
                    • Jan 2010
                    • 85

                    #10
                    Thanks guys,
                    I actually didn't know there was a DESeq2. I will check it out asap

                    Comment

                    • super0925
                      Senior Member
                      • Feb 2014
                      • 206

                      #11
                      I mannually remove these lines. just some scripts should be OK for you.

                      Comment

                      Latest Articles

                      Collapse

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, 06-09-2026, 11:58 AM
                      0 responses
                      26 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-05-2026, 10:09 AM
                      0 responses
                      33 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-04-2026, 08:59 AM
                      0 responses
                      39 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      62 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...