Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • yangliao
    Member
    • May 2013
    • 13

    #76
    Dear Sindrle,

    Sorry for the crash of R -- did it crash within a function of Rsubread or somewhere else? The ".featureCount" files are indeed very big -- they contain the assignment result of each single read (or fragment) in the input SAM or BAM files, and is uncompressed. That is why they are usually larger than the input BAM files.

    The format of the GTF file seemed perfectly correct. I tried to put the first couple of lines you provided into a GTF file, and ran featureCounts on a paired-end BAM file. The result seemed correct -- featureCounts loaded 3 features and 2 meta-features from the GTF file, and assigned 286 reads to the 2 meta-features. The R object returned from featureCounts has the correct numbers on each meta-feature.

    The only problem is that your command line has an additional parameter "GTF.featureType=featureGENE". What was the value of variable featureGENE? The value should be "exon" for the GTF file you're using.

    Cheers,

    Yang


    Code:
            ==========     _____ _    _ ____  _____  ______          _____
            =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
              =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
                ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
                  ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
            ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
           Rsubread 1.12.6
    
    //========================== featureCounts setting ===========================\\
    ||                                                                            ||
    ||             Input files : 1 BAM file                                       ||
    ||                           o A.bam                                          ||
    ||                                                                            ||
    ||             Output file : ./.Rsubread_featureCounts_pid24234               ||
    ||             Annotations : in.gtf (GTF)                                     ||
    ||                                                                            ||
    ||                 Threads : 1                                                ||
    ||                   Level : meta-feature level                               ||
    ||              Paired-end : no                                               ||
    ||         Strand specific : no                                               ||
    ||      Multimapping reads : not counted                                      ||
    || Multi-overlapping reads : not counted                                      ||
    ||                                                                            ||
    \\===================== [url]http://subread.sourceforge.net/[/url] ======================//
    
    //================================= Running ==================================\\
    ||                                                                            ||
    || Load annotation file in.gtf ...                                            ||
    ||    Number of features is 3                                                 ||
    ||    Number of meta-features is 2                                            ||
    ||    Number of chromosomes is 1                                              ||
    ||                                                                            ||
    || Process BAM file A.bam...                                                  ||
    ||    Assign reads to features...                                             ||
    ||    Total number of reads is : 1124980                                      ||
    ||    Number of successfully assigned reads is : 286 (0.0%)                   ||
    ||    Running time : 0.05 minutes                                             ||
    ||                                                                            ||
    ||                         Read assignment finished.                          ||
    ||                                                                            ||
    \\===================== [url]http://subread.sourceforge.net/[/url] ======================//




    Originally posted by sindrle View Post
    Im a bit disappointed, after finishing read summary after 5.5 hours, R just crashed.. R version 3.0.2 Rsubread version 1.12.6.

    Maybe I can read the "reported output", the .featureCount files thats 2x the size of the BAMs (!)

    Heres the GTF:

    chr1 unknown CDS 69091 70005 . + 0 gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1 unknown exon 69091 70008 . + . gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1 unknown start_codon 69091 69093 . + . gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1 unknown stop_codon 70006 70008 . + . gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1 unknown exon 134773 139696 . - . gene_id "LOC729737"; gene_name "LOC729737"; transcript_id "NR_039983"; tss_id "TSS18541";
    chr1 unknown exon 139790 139847 . - . gene_id "LOC729737"; gene_name "LOC729737"; transcript_id "NR_039983"; tss_id "TSS18541";
    Last edited by yangliao; 02-01-2014, 05:21 AM.

    Comment

    • sindrle
      Senior Member
      • Aug 2013
      • 266

      #77
      Originally posted by yangliao View Post
      The only problem is that your command line has an additional parameter "GTF.featureType=featureGENE". What was the value of variable featureGENE? The value should be "exon" for the GTF file you're using.
      I see!

      R crashed making the "$counts" table.

      featureGENE = "gene_id"

      My goal was to count reads "at gene level", thus adding all exon reads for one gene together.

      Is it, be the way, possible to count more than one feature in each run?

      Comment

      • yangliao
        Member
        • May 2013
        • 13

        #78
        It is good that we have found the problem.

        Every meta-feature in featureCounts can have one or multiple features. Each row in the GTF file is a feature if the third column equals the feature type you wanted (e.g., "exon", or any value defined in variable "GTF.featureType"). FeatureCounts then groups the features that have the same meta-feature-id into meta-features.

        For example, if a row in the GTF file contains gene_id "LOC729737", then the meta-feature-id for this feature is LOC729737.

        If the attribute name is not "gene_id" in the GTF file, you need to specify the correct attribute name. For example, if a row in the GTF file contains feature-name "LOC729737", you need to give a parameter telling featureCounts the new attribute name: GTF.attrType="feature-name", so that featureCounts can find the correct meta-feature-id by searching the attribute name (feature-name) in every row in the GTF file.

        Thereby, if you want to assign reads on meta-feature level, please set GTF.attrType="gene_id", GTF.featureType="exon" and useMetaFeatures=TRUE.

        If you want to assign reads on feature level, please set all variables the same values as above, but only useMetaFeatures=FALSE. FeatureCounts will do what it should do.

        FeatureCounts counts reads for every gene in one single run. The returned R object contains a count table where the rows are meta-features (or features, depending on what you need) , and the columns are the input SAM/BAM file names. Every number in the table is the number of reads (or fragments if you specified isPairedEnd=TRUE) in that input file assigned to that gene.

        Cheers,

        Yang

        Originally posted by sindrle View Post
        I see!

        R crashed making the "$counts" table.

        featureGENE = "gene_id"

        My goal was to count reads "at gene level", thus adding all exon reads for one gene together.

        Is it, be the way, possible to count more than one feature in each run?

        Comment

        • sindrle
          Senior Member
          • Aug 2013
          • 266

          #79
          Thank you for the clarification! Then I know what I did wrong, and maybe thats why it crashed..

          Comment

          • sindrle
            Senior Member
            • Aug 2013
            • 266

            #80
            Quick question, Im trying to run featureCounts with DEXSeq.

            > head(adiExons$counts)
            D104.bam D121.bam D153.bam D155.bam D161.bam D162.bam D167.bam D173.bam
            WASH7P 0 0 0 0 0 0 0 0
            WASH7P 9 4 2 6 4 5 5 7
            WASH7P 8 1 2 5 0 1 0 2

            How do I know which exon these are? Or more importantly, how does DEXSeq get the information it needs?

            Comment

            • yangliao
              Member
              • May 2013
              • 13

              #81
              Hi Sindrle,

              If you look at the annotation component in the returned R object, you will find another table containing the details of the annotations.

              Each row in adiExons$counts corresponds to the same row in adiExons$annotation, so you can tell which count number is for which exon. The information in the annotation table (e.g., chromosome name, start, stop, gene name, strand symbol and exon length) can be used in down stream analysis.

              You may also notice that the order of the rows in adiExons$annotation is exactly the same as the order of the exons in the input GTF file.

              Cheers,

              Yang

              Originally posted by sindrle View Post
              Quick question, Im trying to run featureCounts with DEXSeq.

              > head(adiExons$counts)
              D104.bam D121.bam D153.bam D155.bam D161.bam D162.bam D167.bam D173.bam
              WASH7P 0 0 0 0 0 0 0 0
              WASH7P 9 4 2 6 4 5 5 7
              WASH7P 8 1 2 5 0 1 0 2

              How do I know which exon these are? Or more importantly, how does DEXSeq get the information it needs?

              Comment

              • sindrle
                Senior Member
                • Aug 2013
                • 266

                #82
                Ah! Now I get it, very nice.

                One more question, where is the information about "gene length" stored?

                Comment

                • yangliao
                  Member
                  • May 2013
                  • 13

                  #83
                  The values in the "Gene Length" column are the length of each feature (on feature mode) or the total length of all the features belonging to each gene (on meta-feature mode).

                  It is straightforward on the feature mode: gene length = stop - start + 1. However, on the meta-feature mode, the total length is the uniquely covered length of the gene; namely, if two exons in this gene have an overlapping part, the overlapping part is measured only once in the Gene Length column.

                  For example, on the meta-feature mode, if a gene has three exons: [100, 200], [100, 300] and [200, 250], the Gene Length value for this gene is only 201 because the three exons overlap with each other, and they can be merged into one interval on the chromosome: [100, 300], which has 201 bases.

                  Originally posted by sindrle View Post
                  Ah! Now I get it, very nice.

                  One more question, where is the information about "gene length" stored?

                  Comment

                  • sindrle
                    Senior Member
                    • Aug 2013
                    • 266

                    #84
                    Thank you, you have been very helpful.

                    Im currently struggling with how to implement featureCounts to DEXseq, its much easier to use HTSeq, since its described in the manual.

                    I really want to be able to use featureCounts, since with HTSeq I had to "hack" the python script to work with my annotation file, whereas I don't need that with featureCounts.

                    Also, I want to be able, in the end, to present expression values of transcripts as counts/library size/gene length, which also is easier with featureCounts, since it outputs gene length!

                    Comment

                    • amitm
                      Member
                      • Feb 2011
                      • 52

                      #85
                      Using pre-sorted SAM in featureCounts?

                      hello featureCounts users.
                      I have a bunch of "name-sorted" SAM files which I used for HTSeq. I'm trying featureCounts now but I cant find an option to turn off sorting.

                      Though much faster than HTSeq, the program spends major portion of its run time in sorting. If I could turn off sorting I could get results like superfast..

                      Is there a way?

                      thanks

                      Comment

                      • yangliao
                        Member
                        • May 2013
                        • 13

                        #86
                        Dear Amitm,

                        Thank you for using featureCounts. If you wish to count the reads on the paired-end mode, and if the the input SAM or BAM files are not sorted by the read names, featureCounts has to sort the reads according to their names, so that it can count both reads in a pair at the same time. If, however, the reads are already sorted by names (like SAM or BAM files from subread, bowtie, bwa and etc), subread can count the reads without re-sorting them.

                        You can bypass the sorting step by turning off paired-end counting, or using subtools (provided in the /bin/utilities directory in subread-1.4.3 and latter versions) to sort the SAM file by read names before running featureCounts.

                        Cheers,

                        Yang

                        Originally posted by amitm View Post
                        hello featureCounts users.
                        I have a bunch of "name-sorted" SAM files which I used for HTSeq. I'm trying featureCounts now but I cant find an option to turn off sorting.

                        Though much faster than HTSeq, the program spends major portion of its run time in sorting. If I could turn off sorting I could get results like superfast..

                        Is there a way?

                        thanks
                        Last edited by yangliao; 02-07-2014, 12:01 PM. Reason: typo

                        Comment

                        • amitm
                          Member
                          • Feb 2011
                          • 52

                          #87
                          hi yangliao,
                          Thanks for the reply. Ok, so I understand that in non-PE mode, sorting won't be performed. But in that case neither -B and -C flags would have any meaning.
                          Both the flags are really good for stringency/ QC. Moreover in non-PE mode, reads where R1 is in one meta-feature and R2 in another, would both get counted. That is to say the multi-overlap filter would become less effective.

                          I guess I would bear with the extra time. Thank you for a really good tool!

                          Comment

                          • yangliao
                            Member
                            • May 2013
                            • 13

                            #88
                            Both reads in a pair are counted on the Single-end mode even if they are assigned to different meta-features.

                            Thank you for using our tools

                            Originally posted by amitm View Post
                            hi yangliao,
                            Thanks for the reply. Ok, so I understand that in non-PE mode, sorting won't be performed. But in that case neither -B and -C flags would have any meaning.
                            Both the flags are really good for stringency/ QC. Moreover in non-PE mode, reads where R1 is in one meta-feature and R2 in another, would both get counted. That is to say the multi-overlap filter would become less effective.

                            I guess I would bear with the extra time. Thank you for a really good tool!

                            Comment

                            • shi
                              Wei Shi
                              • Feb 2010
                              • 236

                              #89
                              Originally posted by amitm View Post
                              hello featureCounts users.
                              I have a bunch of "name-sorted" SAM files which I used for HTSeq. I'm trying featureCounts now but I cant find an option to turn off sorting.

                              Though much faster than HTSeq, the program spends major portion of its run time in sorting. If I could turn off sorting I could get results like superfast..

                              Is there a way?

                              thanks
                              featureCounts only performs read reordering when it is necessary, ie. when reads are not properly paired and/or there are missing mates. Note that sorting a SAM file by name using a tool like samtools does not guarantee that reads are properly paired after sorting. For example, reads that were reported multiple times in the SAM file may still not be properly paired after being sorted by read names.

                              If all the reads in a SAM/BAM file are properly paired, featureCounts will not perform read reordering at all although it will spend a very small amount of time to check if reads are properly paired or not. It is very important to check this, otherwise the read summarization results might be incorrect and the users may not be aware of it.

                              Comment

                              • shi
                                Wei Shi
                                • Feb 2010
                                • 236

                                #90
                                Originally posted by amitm View Post
                                hi yangliao,
                                Thanks for the reply. Ok, so I understand that in non-PE mode, sorting won't be performed. But in that case neither -B and -C flags would have any meaning.
                                Both the flags are really good for stringency/ QC. Moreover in non-PE mode, reads where R1 is in one meta-feature and R2 in another, would both get counted. That is to say the multi-overlap filter would become less effective.

                                I guess I would bear with the extra time. Thank you for a really good tool!
                                I agree it is better to count read pairs (PE mode) rather than individual reads for paired end read data.

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Today, 08:59 AM
                                0 responses
                                7 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...