Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dena.dinesh
    Member
    • Feb 2013
    • 58

    Countdata from samtools idxstats

    Hi

    I have my .bam files sorted and indexed for my samples. Later I used the samtools idxstats to obtain the number of mapped and unmapped reads.

    Can I take the reference_name column and No.of mapped reads column from all tab delim text files generated from samtools idxstats for all samples and create a count file and use it for subsequent analysis?

    or is there anyway to create count data file from samtools idxstatsa and use it in DESeq2?
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Did you map to the transcriptome? In any case, DESeq2 needs counts of uniquely assignable mappings, so you'll want to prefilter things to remove multimappers.

    BTW, if you really did map to the transcriptome then you'll probably want to use something like RSEM or eXpress to get estimated counts (these can then be used with limma after processing with voom()).

    Comment

    • dena.dinesh
      Member
      • Feb 2013
      • 58

      #3
      HI

      Yes I did mapping to Transcriptome. I used trinity perl packages for mapping and estimate the counts and also found Diff. expressed transcripts using both edgeR and DESeq2.

      Now I want do the analysis using normal approach, where I converted the SRA files into fastq files and later mapped it to transcriptome using bowtie2. Then I converted samfiles to bam files and later sorted it. Now I would like to generate a count table for all the samples

      Like you suggested Can I use RSEM or eXpress for generating counts? Also how to find the multimappers and how can I remove them. ?

      Why we have to use the voom() for estimated counts. I am new to RNA Seq, please guide me!

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        Yes, you can use RSEM or eXpress to generate the counts. These both deal with multimappers in a proper way so you don't need to worry about that issue.

        Estimated counts aren't integers and their variance doesn't follow that expected for a negative binomial distribution (it would be unsurprising if the rounded count variance also didn't behave like unique count data for many gene). voom() can handle such data since it has no negative-binomial assumptions.

        BTW, in the likely event that the trinity pipeline produced fractional counts and you read to simply round those, please redo the analysis with limma and voom(). One should never round counts for edgeR/DESeq2.

        Comment

        • dena.dinesh
          Member
          • Feb 2013
          • 58

          #5
          SO I can use RSEM-calculate expression on bam files and late input the .isoform results into RSEM-generate-data-matrix to get count.matrix (fragment raw counts) and TMM.matrix(normalised FPKM expression values). Then use voom() transformation from limma package to convert them into log-coounts and then later introduce them into DESEq2..did I got right?

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            Close You will (or at least should) never use DESeq2 (or edgeR or DESeq) with this data. You will (or "should", if you prefer) use limma instead.

            Comment

            • dena.dinesh
              Member
              • Feb 2013
              • 58

              #7
              I got your point. I will never use DESeq2 or edgeR when I use RSEM, rather I use limma classical approach.

              Just for information, the Trinity package use "align and estimate abundance.pl" which preps the reference and later aligns the fastq files with Transcriptome. The bam files generated can be directly fed into RSEM or eXpress to generate genes and isoforms.results. Then the "estimate_abundance.pl" is used to get generate the raw counts matrix and TMM normalized FPKM counts. Later the raw counts was then introduced into "run_DE_analysis..pl" choosing either "DESeq" or "edgeR" as options. But still I produces list diff. expressed transcripts.

              I tried once by converting the estimated count reads for samples into integers in R and later introduced in DESeq2...but I was not sure whether I ma doing it in right way ir not..

              Thanks mate..will try in limma method on RSEM count data.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 08:59 AM
              0 responses
              14 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              22 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              19 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              32 views
              0 reactions
              Last Post SEQadmin2  
              Working...