Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • shi
    Wei Shi
    • Feb 2010
    • 236

    Hi @GenoMax,

    featureCounts does support SAM specification version 1.4. Have you looked at the .summary file generated by featureCounts to see why reads were not assigned?

    Best,
    Wei

    Comment

    • GenoMax
      Senior Member
      • Feb 2008
      • 7142

      Originally posted by shi View Post
      Hi @GenoMax,

      featureCounts does support SAM specification version 1.4. Have you looked at the .summary file generated by featureCounts to see why reads were not assigned?

      Best,
      Wei
      Hi Wei,

      Here is a comparison of the two flags for a sample. I have used the -M flag for featureCounts.

      Code:
      With SAM v.1.3 flags
      
      Assigned	     104867565
      Unassigned_Ambiguity	577037
      Unassigned_MultiMapping	0
      Unassigned_NoFeatures	52436519
      Unassigned_Unmapped	1772756
      
      With SAM v.1.4 flags
      
      Assigned	      66620
      Unassigned_Ambiguity	280
      Unassigned_MultiMapping	0
      Unassigned_NoFeatures	157814221
      Unassigned_Unmapped	1772756
      Last edited by GenoMax; 10-22-2015, 04:56 PM.

      Comment

      • shi
        Wei Shi
        • Feb 2010
        • 236

        Thanks @GenoMax,

        Could you also provide mapping results for a few reads for each version?

        Comment

        • Brian Bushnell
          Super Moderator
          • Jan 2014
          • 2709

          For what it's worth, the only difference between SAM 1.3 and 1.4 format output BBMap is that in 1.3 mode, it use the "M" symbol in the cigar string; in 1.4 mode, it uses "=" and "X" for match and mismatch in the cigar string. 1.4 will still occasionally contain "M" symbols when there is a "N"-called bases, as that does not strictly speaking match or mismatch. Given that a small fraction of the reads do get assigned from v1.4 sam files, I wonder if featureCounts might be using only the reads that contain at least one M operation in the cigar string?

          Comment

          • shi
            Wei Shi
            • Feb 2010
            • 236

            featureCounts currently does not support letters '=' and 'X' in the CIGAR string and we think that is the reason why it failed to assign most of the reads in your v1.4 data. We will add support to these letters in featureCounts.

            We are going to make a major release of our software package next week but unfortunately we are not able to include this support in the release. We will work on fixing this issue after the release. Will let you know once this is done.

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              Hi Shi,

              Brian is on the right track.

              Here is a sample of reads

              Code:
              With v.1.3 tags
              
              XXXXXXX:XXX:XXXXXX:2:1213:18683:18437 1:N:0:        0       chr10   376     3       50M     *       0       0       GTCTGTGTTCTGTGTCAGAGGAGCAGTAGAGCTGAGGCTATCAAGCTATC    BB@BBHEHHCG@<FC@CE@FGH@EGHGFEEHIHFEFE?GEHHHGEHHFHE       XT:A:R  NM:i:0  AM:i:3
              XXXXXXX:XXX:XXXXXX:2:1203:10888:83435 1:N:0:        0       chr10   413     3       50M     *       0       0       CTATCAAGCTATCAGGACATCCCAGATGCCCAAGACTACACTAGTCACCA    DDDBD?HFEE?GHGHEEEFIFIIHGHHHI?HIIIEHIIIIEHHHEHFHHI       XT:A:R  NM:i:0  AM:i:3
              XXXXXXX:XXX:XXXXXX:2:1114:19668:64532 1:N:0:        0       chr10   938     2       5M3I42M *       0       0       CCACTGACACCATTCTTACTGTCATCCTCTGAAGCCAGGTCCTGCTCAGT    DDDDDIIIIIGIIIIIIIIIHIIIIIIIIHIIIHIIIIIIIIIIIHIIIF       XT:A:R  NM:i:4  AM:i:2
              XXXXXXX:XXX:XXXXXX:2:1103:21004:71100 1:N:0:        0       chr10   946     3       50M     *       0       0       ATTCTTACTGTCATCCTCTGAAGCCAGGTCCTGCTCAGTGTTGGCATAGG    0<D0@@FHHIIIIIE?F@GHE?H@GHHIIIEGH?HFE@GHHHH?ECHHII       XT:A:R  NM:i:0  AM:i:3
              XXXXXXX:XXX:XXXXXX:2:2106:11030:47701 1:N:0:        0       chr10   946     3       50M     *       0       0       ATTCTTACTGTCATCCTCTGAAGCCAGGTCCTGCTCAGTGTTGGCATAGG    DDDDDHIFIIIHIIIIIIIIHHEHEHIIGHHIIIIHIIHHIIIHHHHHHE       XT:A:R  NM:i:0  AM:i:3
              
              With v.1.4 tags
              
              XXXXXXX:XXX:XXXXXX:2:1213:18683:18437 1:N:0:        0       chr10   376     3       50=     *       0       0       GTCTGTGTTCTGTGTCAGAGGAGCAGTAGAGCTGAGGCTATCAAGCTATC    BB@BBHEHHCG@<FC@CE@FGH@EGHGFEEHIHFEFE?GEHHHGEHHFHE       XT:A:R  NM:i:0  AM:i:3
              XXXXXXX:XXX:XXXXXX:2:1203:10888:83435 1:N:0:        0       chr10   413     3       50=     *       0       0       CTATCAAGCTATCAGGACATCCCAGATGCCCAAGACTACACTAGTCACCA    DDDBD?HFEE?GHGHEEEFIFIIHGHHHI?HIIIEHIIIIEHHHEHFHHI       XT:A:R  NM:i:0  AM:i:3
              XXXXXXX:XXX:XXXXXX:2:1114:19668:64532 1:N:0:        0       chr10   938     2       2=1X2=3I42=     *       0       0       CCACTGACACCATTCTTACTGTCATCCTCTGAAGCCAGGTCCTGCTC
              AGT     DDDDDIIIIIGIIIIIIIIIHIIIIIIIIHIIIHIIIIIIIIIIIHIIIF      XT:A:R  NM:i:4  AM:i:2
              XXXXXXX:XXX:XXXXXX:2:1103:21004:71100 1:N:0:        0       chr10   946     3       50=     *       0       0       ATTCTTACTGTCATCCTCTGAAGCCAGGTCCTGCTCAGTGTTGGCATAGG    0<D0@@FHHIIIIIE?F@GHE?H@GHHIIIEGH?HFE@GHHHH?ECHHII       XT:A:R  NM:i:0  AM:i:3
              XXXXXXX:XXX:XXXXXX:2:2106:11030:47701 1:N:0:        0       chr10   946     3       50=     *       0       0       ATTCTTACTGTCATCCTCTGAAGCCAGGTCCTGCTCAGTGTTGGCATAGG    DDDDDHIFIIIHIIIIIIIIHHEHEHIIGHHIIIIHIIHHIIIHHHHHHE       XT:A:R  NM:i:0  AM:i:3
              Edit: Just saw the last post from Wei. Will look forward to a future release of subread.

              Edit2: There was an error in the copy/paste of the data in original post. I have replaced the original with correct copy.
              Last edited by GenoMax; 10-23-2015, 11:03 AM.

              Comment

              • shi
                Wei Shi
                • Feb 2010
                • 236

                Hi GenoMax,

                Thanks for providing the sample reads. However, we noticed that there are missing columns in your mapping results (either PNEXT or TLEN or both). This might cause problems for read counting by featureCounts and lead to incorrect counts. For example, featureCounts might not be able to find NH tags which are used to identify multi-mapping reads.

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  That's really strange. It is, indeed, not a valid sam file. I just did some mapping and generated this:

                  Code:
                  @HD	VN:1.4	SO:unsorted
                  @SQ	SN:ecoli_K12	LN:4639675
                  @PG	ID:BBMap	PN:BBMap	VN:35.51	CL:java -ea -Xmx1g align2.BBMap in=80x_perfect.fa.gz reads=1000 int=f out=xx2.sam ow
                  400	16	ecoli_K12	778201	45	150=	*	0	0	TCTTCCGGCAACTGATGGACAGGTCAAATTCCCTGCCTGGTCGCCGTATCTGTGATAATAATTAATTGAATAGTAAAGGAATCATTGAAATGCAACTGAACAAAGTGCTGAAAGGGCTGATGATTGCTCTGCCTGTTATGGCAATTGCGG	*	NM:i:0	AM:i:45
                  401	0	ecoli_K12	940032	45	150=	*	0	0	GTAATACTACTTTCGAGTGAAAATCTACCTATCTCTTTGATTTTCAAATTATTCGATGTATACAAGCCTATATAGCGAACTGCTATAGAAATAATTACACAATACGGTTTGTTACTGGAATCAATCGTGAGCAAGCTTGAGTGAGCCATT	*	NM:i:0	AM:i:45
                  402	0	ecoli_K12	543578	45	150=	*	0	0	CATTTCATCGCAAATCGCCCGCATGTCGTTTTCTAACTGTTGGGTGAAATCGCGCAGCACGGCAGCGTCGGTATGACGACAATCAATGGTGAACGTGGTTTTACCCGGCACCACATTTACCGTATTCGGGCGCGGCTCTACTTTGCCAAA	*	NM:i:0	AM:i:45
                  All of the columns are present. I have no idea how you were able to generate output missing some columns... could it be some kind of copy-paste error? Or did you reformat the data in some way?

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    Originally posted by Brian Bushnell View Post
                    That's really strange. It is, indeed, not a valid sam file.
                    All of the columns are present. I have no idea how you were able to generate output missing some columns... could it be some kind of copy-paste error? Or did you reformat the data in some way?
                    When I copied and pasted the data last night something must have gone wrong. I have recopied the data in post above. Sorry about that.

                    Comment

                    • touchsk
                      Junior Member
                      • Aug 2015
                      • 7

                      Question on Output

                      Hi,
                      Thanks for this wonderful tool. We use it routinely.
                      I had a question regarding the output (.featureCounts file). What is the difference between a read having more than one gene assigned with commas in one line (Read1) and a read assigned to more than one gene listed over multiple lines (Read2)?

                      Read1 Assigned ENSNNNG00000079745,ENSNNNG00000059028 Total=2

                      Read2 Unassigned_NoFeatures * *
                      Read2 Unassigned_NoFeatures * *
                      Read2 Assigned ENSNNNG00000098385 *
                      Read2 Assigned ENSNNNG00000101430 *
                      Read2 Assigned ENSNNNG00000100881 *

                      The output is a result of dropping the --primary option and including the -M and -O options.

                      Thanks in advance for your help.
                      Karthik

                      Comment

                      • shi
                        Wei Shi
                        • Feb 2010
                        • 236

                        Read 1 was reported only once in your mapping result and this read was found to overlap with two genes (ENSNNNG00000079745 and ENSNNNG00000059028) by featureCounts.

                        Read 2 was reported 5 times in your mapping result (5 rows in your bam/sam file). The first two alignments reported for this read do not overlap with any genes and the other three alignments were found to overlap with genes.

                        Read 1 was counted two times because of multi-overlapping and Read 2 was counted three times due to multi-mapping.

                        Comment

                        • touchsk
                          Junior Member
                          • Aug 2015
                          • 7

                          Thanks!

                          Thanks for the quick response and clarification provided.

                          Comment

                          • htetre
                            Member
                            • Jul 2013
                            • 28

                            Hello,

                            I am trying to use featureCount with a bam file that aligned using Hisat2 with an indexed genome and spliced sites file created from a gtf file. Now I am using the same gtf file in featureCount but am getting 0% Successfully assigned reads.

                            Here is the code I am using
                            featureCounts -a gene.gtf -F GTF -b -o countsSb10.txt -t gene -R -g gene_id Sb10.aln-sort.bam

                            I am not sure what I am doing wrong. These are reads that aligned to the reference genome, my unmapped reads are removed.

                            Thank you
                            htetre

                            Comment

                            • shi
                              Wei Shi
                              • Feb 2010
                              • 236

                              Are you using the latest version? '-b' is an invalid argument. Could you also provide the assignment summary that can be found in the 'countsSb10.txt.summary' file?

                              Comment

                              • TML
                                Junior Member
                                • May 2014
                                • 1

                                output tagged bamfile

                                Hi Shi,

                                Would it be possible to include an option for outputting a tagged bam file (similar to the htseq --samout option) in the next release. Basically a bamfile with a XF:Z: tag that has the name of the assignment there. We're using this currently to allow umi-tools pcr deduplication of RNAseq reads on a per gene basis, and use the FC -R files to add the tag to the bamfiles ourselves but it would a lot faster if featurecounts could just output the tagged bamfiles.

                                Thanks, Marinus

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  Yesterday, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 12:03 PM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, Yesterday, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...