Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • shi
    Wei Shi
    • Feb 2010
    • 236

    #46
    Hi Bruce,

    I would suggest you trying to count reads instead of fragments to see if you will still get a large number of reads overlapping with multiple genes. This will help determine if the large number of multi-overlapping fragments you observed were due to summarization.

    If you still get a large number, then that will mean it is either a mapping problem, or a problem with your data generation, or a lot of genes in your annotation overlapping with each other.

    You can simply remove those paired-end parameters, such as -p -P -D and -C, from your command to summarize reads instead of fragments.

    Best wishes,

    Wei

    Comment

    • bruce01
      Senior Member
      • Mar 2011
      • 160

      #47
      Hi Wei,

      I had looked at these before, but am keen to keep the counts to fragments as I believe there is more accuracy in this method. I have run for the initial trimmo sam, and the one with all non-pairs removed:

      Code:
      ::::::::::::::
      featco.pair.read.diags
      ::::::::::::::
      35364121 ACCEPTED_GENE
      4944952 MULTI_MAPPING
      6848219 NOTFOUND_GENE
       338020 OVERLAPPED_GENES
      ::::::::::::::
      featco.read.diags
      ::::::::::::::
      35711285 ACCEPTED_GENE
      4944952 MULTI_MAPPING
      6902210 NOTFOUND_GENE
       340271 OVERLAPPED_GENES

      Comment

      • shi
        Wei Shi
        • Feb 2010
        • 236

        #48
        Hi Bruce,

        I agree it is better to count fragments for paired-end data. Looking at read counts is just to help diagnose what the problem was and it turned out this is quite helpful.

        The percentage of multi-overlapping reads is much smaller that of multi-overlapping fragments, suggesting that something went wrong when read pairs were being summarized. We have never seen this before for the summarization of mapping results from Subread and a few other aligners.

        One possibility is that in your mapping results the order of the two reads from the same pair was altered, ie the second read appeared before the first read. If this is the case, the read pair might be wrongly assigned. You may check a few multi-overlapping fragments to see if this is the case.

        Alternatively, you may try other aligners as well. Subread is guaranteed to work with featureCounts.

        Hope this helps.

        Best wishes,

        Wei

        Comment

        • choseqid
          Junior Member
          • Apr 2013
          • 5

          #49
          Dear Wei,

          I have been trying to run featureCounts, both from R and command line, but I keep getting a segmentation fault. I checked the different things suggested on this discussion thread and also tried to allocate more memory to the job resp. R session, but it didn't help. I use subread 1.3.6 and here is my command line:

          featureCounts -p -P -d 50 -D 600 -a mm10/annotation/mm10.allmrna.gtf -t exon -g gene_id -b -f -i tophat_out/accepted_hits.sort.bam -o subread_counts.txt

          Here is the error message:

          /var/spool/gridengine/node-hp0211/job_scripts/1095023: line 10: 57569 Segmentation fault (core dumped)


          Have I overseen anything?

          Thanks in advance,
          Cho
          Last edited by choseqid; 10-02-2013, 06:36 AM.

          Comment

          • shi
            Wei Shi
            • Feb 2010
            • 236

            #50
            Dear Cho,

            Could you provide the complete output of your featureCounts run? It is hard to figure out what went wrong from the information you currently provided.

            Also could you provide the first 100 lines of your annotation file and also the first 100 reads in your BAM file?

            Cheers,
            Wei

            Comment

            • choseqid
              Junior Member
              • Apr 2013
              • 5

              #51
              Dear Wei,

              Thanks for the quick reply. Attached are the files you ask for. I am including the R output, as I do not have any from command line other than the error message I already quoted.

              Cheers,
              CHo
              Attached Files
              Last edited by choseqid; 10-03-2013, 03:01 AM.

              Comment

              • ddb
                Member
                • Feb 2012
                • 13

                #52
                "featureCounts requires that for paired-end read data both ends must be included in the SAM/BAM file and the two reads from the same pair must be next to each other."

                If this is not stated in the User Guide (I did not see it there) then it should be added as it is essential for correct functioning of the program.

                Comment

                • choseqid
                  Junior Member
                  • Apr 2013
                  • 5

                  #53
                  Thanks, ddb, for the reminder. It looks like the problem lies in the way I aligned the reads with tophat: I allowed multiple hits (which apparently hampers the sorting by name) and didn't disable the separate alignment reporting for unpairable reads (ie. didn't use --no-mixed). Would fixing these two parameters help?

                  Comment

                  • shi
                    Wei Shi
                    • Feb 2010
                    • 236

                    #54
                    I'm still not sure if it is the issue with paired-end reads that caused the problem. You can try to change those parameters to see it will work. But you may also try to count your reads as single-end reads by NOT using the '-p' option. This will tell us if the problem arose from dealing with the paired-end reads. Your command should be like this:

                    featureCounts -a mm10/annotation/mm10.allmrna.gtf -t exon -g gene_id -b -f -i tophat_out/accepted_hits.sort.bam -o subread_counts.txt

                    Wei

                    Comment

                    • choseqid
                      Junior Member
                      • Apr 2013
                      • 5

                      #55
                      Dear Wei,

                      I tried that command line, but it still drops a Segmentation fault. I also tried aligning my reads using Subreads (which succeeded), but when I ran featureCounts on the resulting SAM file I also got a Segmentation fault. The output is the same as I attached to a previous post.

                      Any more ideas?

                      Comment

                      • adaigle
                        Junior Member
                        • Sep 2013
                        • 6

                        #56
                        Hi, quick question. I was wondering if there was a way to get featureCounts to work on a Windows 7 OS. Going through R and Bioconductor would be perfect, but it looks like Rsubreads does not have a Windows version? Is there any other way?

                        Comment

                        • shi
                          Wei Shi
                          • Feb 2010
                          • 236

                          #57
                          Originally posted by choseqid View Post
                          Dear Wei,

                          I tried that command line, but it still drops a Segmentation fault. I also tried aligning my reads using Subreads (which succeeded), but when I ran featureCounts on the resulting SAM file I also got a Segmentation fault. The output is the same as I attached to a previous post.

                          Any more ideas?
                          Hi,

                          Thank you for trying these options. We found featureCounts always works nicely with Subread. So the segment fault is likely to be due to some unexpected data in the annotation. We have also received some other bug reports similar to this recently. The 1.3.x version of featureCounts allows up to 60 features overlapping with each other in the annotation. If the number of such features exceeded this limit, we found the program crashed. Although this is rare but it may happen and we suspect this might be the reason causing the seg fault seen in your data.

                          We have removed this limit in the latest version 1.4.0 and hopefully this will solve the problem.

                          Also, if reads in your BAM file were sorted by chromosomal locations, you should include '-S' option in your command. Not doing so will not crash the program, but will result in incorrect read counts.

                          Let me know if the problem persists.

                          Wei

                          Comment

                          • shi
                            Wei Shi
                            • Feb 2010
                            • 236

                            #58
                            Originally posted by adaigle View Post
                            Hi, quick question. I was wondering if there was a way to get featureCounts to work on a Windows 7 OS. Going through R and Bioconductor would be perfect, but it looks like Rsubreads does not have a Windows version? Is there any other way?
                            You are correct that Rsubread does not have a Windows version. It is pretty hard to develop a Windows version for this package due to most of the code was written in C. I think we might eventually come up with a Windows version, but it will take a fair bit of time. If you have access to a unix machine, you can fairly easily use featureCounts via the Bioconductor package Rsubread.

                            Wei

                            Comment

                            • bw.
                              Member
                              • Mar 2012
                              • 21

                              #59
                              Hi,
                              I would like to use featureCounts, but miss the stats provided by htseq-count (copied below) as these let me make sure I got the 'strand' setting right and other things.
                              Any chance you could add similar output to featureCounts (either as a separate 'stats.txt' file or as part of the main table)?

                              no_feature 20123817
                              ambiguous 9026940
                              too_low_aQual 0
                              not_aligned 0
                              alignment_not_unique 3034042

                              Thanks
                              -Ben

                              Comment

                              • bruce01
                                Senior Member
                                • Mar 2011
                                • 160

                                #60
                                Ben, I had the same issue, so made a command to get this info. It requires you to make the 'reads' output using -R flag.

                                cut -f 2 <featco.counts.reads> | sort | uniq -c > <featco.counts.diags>

                                Output looks like:

                                154266 ACCEPTED_2VOTE_GENE
                                23169444 ACCEPTED_GENE
                                40066 MULTI_MAPPING
                                4470627 NOTFOUND_GENE
                                100013 OVERLAPPED_GENES
                                2850 PAIR_DISTANCE

                                Hope that helps, Bruce.

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Today, 08:59 AM
                                0 responses
                                9 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                17 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                30 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...