Unconfigured Ad

**shi** · 10-22-2015, 03:47 PM

Hi @GenoMax,

featureCounts does support SAM specification version 1.4. Have you looked at the .summary file generated by featureCounts to see why reads were not assigned?

Best,
Wei

**GenoMax** · 10-22-2015, 04:52 PM

Originally posted by shi View Post

Hi @GenoMax,

featureCounts does support SAM specification version 1.4. Have you looked at the .summary file generated by featureCounts to see why reads were not assigned?

Best,
Wei

Hi Wei,

Here is a comparison of the two flags for a sample. I have used the -M flag for featureCounts.

Code:

With SAM v.1.3 flags

Assigned	     104867565
Unassigned_Ambiguity	577037
Unassigned_MultiMapping	0
Unassigned_NoFeatures	52436519
Unassigned_Unmapped	1772756

With SAM v.1.4 flags

Assigned	      66620
Unassigned_Ambiguity	280
Unassigned_MultiMapping	0
Unassigned_NoFeatures	157814221
Unassigned_Unmapped	1772756

**shi** · 10-22-2015, 05:04 PM

Thanks @GenoMax,

Could you also provide mapping results for a few reads for each version?

**Brian Bushnell** · 10-22-2015, 05:26 PM

For what it's worth, the only difference between SAM 1.3 and 1.4 format output BBMap is that in 1.3 mode, it use the "M" symbol in the cigar string; in 1.4 mode, it uses "=" and "X" for match and mismatch in the cigar string. 1.4 will still occasionally contain "M" symbols when there is a "N"-called bases, as that does not strictly speaking match or mismatch. Given that a small fraction of the reads do get assigned from v1.4 sam files, I wonder if featureCounts might be using only the reads that contain at least one M operation in the cigar string?

**shi** · 10-22-2015, 05:42 PM

featureCounts currently does not support letters '=' and 'X' in the CIGAR string and we think that is the reason why it failed to assign most of the reads in your v1.4 data. We will add support to these letters in featureCounts.

We are going to make a major release of our software package next week but unfortunately we are not able to include this support in the release. We will work on fixing this issue after the release. Will let you know once this is done.

**GenoMax** · 10-22-2015, 05:42 PM

Hi Shi,

Brian is on the right track.

Here is a sample of reads

Code:

With v.1.3 tags

XXXXXXX:XXX:XXXXXX:2:1213:18683:18437 1:N:0:        0       chr10   376     3       50M     *       0       0       GTCTGTGTTCTGTGTCAGAGGAGCAGTAGAGCTGAGGCTATCAAGCTATC    BB@BBHEHHCG@<FC@CE@FGH@EGHGFEEHIHFEFE?GEHHHGEHHFHE       XT:A:R  NM:i:0  AM:i:3
XXXXXXX:XXX:XXXXXX:2:1203:10888:83435 1:N:0:        0       chr10   413     3       50M     *       0       0       CTATCAAGCTATCAGGACATCCCAGATGCCCAAGACTACACTAGTCACCA    DDDBD?HFEE?GHGHEEEFIFIIHGHHHI?HIIIEHIIIIEHHHEHFHHI       XT:A:R  NM:i:0  AM:i:3
XXXXXXX:XXX:XXXXXX:2:1114:19668:64532 1:N:0:        0       chr10   938     2       5M3I42M *       0       0       CCACTGACACCATTCTTACTGTCATCCTCTGAAGCCAGGTCCTGCTCAGT    DDDDDIIIIIGIIIIIIIIIHIIIIIIIIHIIIHIIIIIIIIIIIHIIIF       XT:A:R  NM:i:4  AM:i:2
XXXXXXX:XXX:XXXXXX:2:1103:21004:71100 1:N:0:        0       chr10   946     3       50M     *       0       0       ATTCTTACTGTCATCCTCTGAAGCCAGGTCCTGCTCAGTGTTGGCATAGG    0<D0@@FHHIIIIIE?F@GHE?H@GHHIIIEGH?HFE@GHHHH?ECHHII       XT:A:R  NM:i:0  AM:i:3
XXXXXXX:XXX:XXXXXX:2:2106:11030:47701 1:N:0:        0       chr10   946     3       50M     *       0       0       ATTCTTACTGTCATCCTCTGAAGCCAGGTCCTGCTCAGTGTTGGCATAGG    DDDDDHIFIIIHIIIIIIIIHHEHEHIIGHHIIIIHIIHHIIIHHHHHHE       XT:A:R  NM:i:0  AM:i:3

With v.1.4 tags

XXXXXXX:XXX:XXXXXX:2:1213:18683:18437 1:N:0:        0       chr10   376     3       50=     *       0       0       GTCTGTGTTCTGTGTCAGAGGAGCAGTAGAGCTGAGGCTATCAAGCTATC    BB@BBHEHHCG@<FC@CE@FGH@EGHGFEEHIHFEFE?GEHHHGEHHFHE       XT:A:R  NM:i:0  AM:i:3
XXXXXXX:XXX:XXXXXX:2:1203:10888:83435 1:N:0:        0       chr10   413     3       50=     *       0       0       CTATCAAGCTATCAGGACATCCCAGATGCCCAAGACTACACTAGTCACCA    DDDBD?HFEE?GHGHEEEFIFIIHGHHHI?HIIIEHIIIIEHHHEHFHHI       XT:A:R  NM:i:0  AM:i:3
XXXXXXX:XXX:XXXXXX:2:1114:19668:64532 1:N:0:        0       chr10   938     2       2=1X2=3I42=     *       0       0       CCACTGACACCATTCTTACTGTCATCCTCTGAAGCCAGGTCCTGCTC
AGT     DDDDDIIIIIGIIIIIIIIIHIIIIIIIIHIIIHIIIIIIIIIIIHIIIF      XT:A:R  NM:i:4  AM:i:2
XXXXXXX:XXX:XXXXXX:2:1103:21004:71100 1:N:0:        0       chr10   946     3       50=     *       0       0       ATTCTTACTGTCATCCTCTGAAGCCAGGTCCTGCTCAGTGTTGGCATAGG    0<D0@@FHHIIIIIE?F@GHE?H@GHHIIIEGH?HFE@GHHHH?ECHHII       XT:A:R  NM:i:0  AM:i:3
XXXXXXX:XXX:XXXXXX:2:2106:11030:47701 1:N:0:        0       chr10   946     3       50=     *       0       0       ATTCTTACTGTCATCCTCTGAAGCCAGGTCCTGCTCAGTGTTGGCATAGG    DDDDDHIFIIIHIIIIIIIIHHEHEHIIGHHIIIIHIIHHIIIHHHHHHE       XT:A:R  NM:i:0  AM:i:3

Edit: Just saw the last post from Wei. Will look forward to a future release of subread.

Edit2: There was an error in the copy/paste of the data in original post. I have replaced the original with correct copy.

**shi** · 10-22-2015, 08:19 PM

Hi GenoMax,

Thanks for providing the sample reads. However, we noticed that there are missing columns in your mapping results (either PNEXT or TLEN or both). This might cause problems for read counting by featureCounts and lead to incorrect counts. For example, featureCounts might not be able to find NH tags which are used to identify multi-mapping reads.

**Brian Bushnell** · 10-23-2015, 10:42 AM

That's really strange. It is, indeed, not a valid sam file. I just did some mapping and generated this:

Code:

@HD	VN:1.4	SO:unsorted
@SQ	SN:ecoli_K12	LN:4639675
@PG	ID:BBMap	PN:BBMap	VN:35.51	CL:java -ea -Xmx1g align2.BBMap in=80x_perfect.fa.gz reads=1000 int=f out=xx2.sam ow
400	16	ecoli_K12	778201	45	150=	*	0	0	TCTTCCGGCAACTGATGGACAGGTCAAATTCCCTGCCTGGTCGCCGTATCTGTGATAATAATTAATTGAATAGTAAAGGAATCATTGAAATGCAACTGAACAAAGTGCTGAAAGGGCTGATGATTGCTCTGCCTGTTATGGCAATTGCGG	*	NM:i:0	AM:i:45
401	0	ecoli_K12	940032	45	150=	*	0	0	GTAATACTACTTTCGAGTGAAAATCTACCTATCTCTTTGATTTTCAAATTATTCGATGTATACAAGCCTATATAGCGAACTGCTATAGAAATAATTACACAATACGGTTTGTTACTGGAATCAATCGTGAGCAAGCTTGAGTGAGCCATT	*	NM:i:0	AM:i:45
402	0	ecoli_K12	543578	45	150=	*	0	0	CATTTCATCGCAAATCGCCCGCATGTCGTTTTCTAACTGTTGGGTGAAATCGCGCAGCACGGCAGCGTCGGTATGACGACAATCAATGGTGAACGTGGTTTTACCCGGCACCACATTTACCGTATTCGGGCGCGGCTCTACTTTGCCAAA	*	NM:i:0	AM:i:45

All of the columns are present. I have no idea how you were able to generate output missing some columns... could it be some kind of copy-paste error? Or did you reformat the data in some way?

**GenoMax** · 10-23-2015, 10:57 AM

Originally posted by Brian Bushnell View Post

That's really strange. It is, indeed, not a valid sam file.
All of the columns are present. I have no idea how you were able to generate output missing some columns... could it be some kind of copy-paste error? Or did you reformat the data in some way?

When I copied and pasted the data last night something must have gone wrong. I have recopied the data in post above. Sorry about that.

**touchsk** · 06-02-2016, 04:02 PM

Question on Output

Hi,
Thanks for this wonderful tool. We use it routinely.
I had a question regarding the output (.featureCounts file). What is the difference between a read having more than one gene assigned with commas in one line (Read1) and a read assigned to more than one gene listed over multiple lines (Read2)?

Read1 Assigned ENSNNNG00000079745,ENSNNNG00000059028 Total=2

Read2 Unassigned_NoFeatures * *
Read2 Unassigned_NoFeatures * *
Read2 Assigned ENSNNNG00000098385 *
Read2 Assigned ENSNNNG00000101430 *
Read2 Assigned ENSNNNG00000100881 *

The output is a result of dropping the --primary option and including the -M and -O options.

Thanks in advance for your help.
Karthik

**shi** · 06-05-2016, 09:29 PM

Read 1 was reported only once in your mapping result and this read was found to overlap with two genes (ENSNNNG00000079745 and ENSNNNG00000059028) by featureCounts.

Read 2 was reported 5 times in your mapping result (5 rows in your bam/sam file). The first two alignments reported for this read do not overlap with any genes and the other three alignments were found to overlap with genes.

Read 1 was counted two times because of multi-overlapping and Read 2 was counted three times due to multi-mapping.

**touchsk** · 06-06-2016, 11:20 AM

Thanks!

Thanks for the quick response and clarification provided.

**htetre** · 10-30-2016, 09:24 AM

Hello,

I am trying to use featureCount with a bam file that aligned using Hisat2 with an indexed genome and spliced sites file created from a gtf file. Now I am using the same gtf file in featureCount but am getting 0% Successfully assigned reads.

Here is the code I am using
featureCounts -a gene.gtf -F GTF -b -o countsSb10.txt -t gene -R -g gene_id Sb10.aln-sort.bam

I am not sure what I am doing wrong. These are reads that aligned to the reference genome, my unmapped reads are removed.

Thank you
htetre

**shi** · 10-31-2016, 02:37 PM

Are you using the latest version? '-b' is an invalid argument. Could you also provide the assignment summary that can be found in the 'countsSb10.txt.summary' file?

**TML** · 05-02-2017, 09:05 AM

output tagged bamfile

Hi Shi,

Would it be possible to include an option for outputting a tagged bam file (similar to the htseq --samout option) in the next release. Basically a bamfile with a XF:Z: tag that has the name of the assignment there. We're using this currently to allow umi-tools pcr deduplication of RNAseq reads on a per gene basis, and use the FC -R files to add the tag to the bamfiles ourselves but it would a lot faster if featurecounts could just output the tagged bamfiles.

Thanks, Marinus

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 45 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News