Seqanswers Leaderboard Ad

**[email protected]** · 09-18-2013, 07:26 AM

Appologies for the mistake in the title. *"the" = "they"

**dpryan** · 09-18-2013, 08:03 AM

At a cursory glance, it seems that your annotations don't match. You seem to have aligned against an Ensembl genome (or something like that, since chromosome names are "2", "3", "4", etc.), but then the annotation GTF file is using a more UCSC-like format (where chromosomes are named "chr2", "chr3", etc.). That's going to cause reads to not be counted appropriately.

**[email protected]** · 09-18-2013, 09:15 AM

Ah yeah I fixed that before I ran it. I copied the original though. The file I obtained from ensembl does not have "chr" before the number.

**dpryan** · 09-18-2013, 09:45 AM

Ah, good, glad you already fixed that. I downloaded the updated GTF and looked at it (there were a number of problems with it, but presumably these are particular to the version you uploaded, since it can't be parsed). You likely aren't setting "-s no" when that's what you need. Those reads overlap a feature, just on the wrong strand (unless you have a stranded library, in which case perhaps "-s reverse" is correct, depending on how it was made).

**[email protected]** · 09-19-2013, 07:05 AM

Thank you very much, that solves most of my problem. Now there is just a very small number of lines that map differently than I would expect.

Here is one example:
HWI-EAS225:1:10:1402:1333#0/1 16 VII 1026639 255 36M * 0 0 ATCATGCGTGTCTTTAGACATAACAGTATGGTTTAT ###############:7:<999:8.).;;<9<9:<B XA:i:0 MD:Z:36 NM:i:0

Does not map to:
VII protein_coding exon 1026642 1026968 . + . gene_id "YGR269W"; transcript_id "YGR269W"; exon_number "1"; gene_name "YGR269W"; transcript_name "YGR269W";
VII protein_coding CDS 1026642 1026965 . + 0 gene_id "YGR269W"; transcript_id "YGR269W"; exon_number "1"; gene_name "YGR269W"; transcript_name "YGR269W"; protein_id "YGR269W";
VII protein_coding start_codon 1026642 1026644 . + 0 gene_id "YGR269W"; transcript_id "YGR269W"; exon_number "1"; gene_name "YGR269W"; transcript_name "YGR269W";
VII protein_coding stop_codon 1026966 1026968 . + 0 gene_id "YGR269W"; transcript_id "YGR269W"; exon_number "1"; gene_name "YGR269W"; transcript_name "YGR269W";

If I put these lines in a seperate gtf file, the gene gets a count of 1. If I use the entire gtf file obtained from ensembl it shows 0 counts with for the specific sam line reporting "no feature" (using the -o filename.txt argument).

**[email protected]** · 09-19-2013, 07:15 AM

The same happens with:

HWI-EAS225:1:10:1427:554#0/1 0 II 606243 255 36M * 0 0 AAGACAAAAATAACTAGCAACAATGGGTAAATCACA B>@=@A@>?@@?>:999><<99999999999<:;;< XA:i:0 MD:Z:33G0T1 NM:i:2
HWI-EAS225:1:10:1615:484#0/1 0 II 606243 255 36M * 0 0 AAGACAAAAATAACTAGCAACAATGGGTAAATAACA 4>CCB>+>6?8==B6<*B>9<1AA:1<062:##### XA:i:0 MD:Z:32C0G0T1 NM:i:3

II protein_coding exon 605961 606272 . + . gene_id "YBR190W"; transcript_id "YBR190W"; exon_number "1"; gene_name "YBR190W"; transcript_name "YBR190W";
II protein_coding CDS 605961 606269 . + 0 gene_id "YBR190W"; transcript_id "YBR190W"; exon_number "1"; gene_name "YBR190W"; transcript_name "YBR190W"; protein_id "YBR190W";
II protein_coding start_codon 605961 605963 . + 0 gene_id "YBR190W"; transcript_id "YBR190W"; exon_number "1"; gene_name "YBR190W"; transcript_name "YBR190W";
II protein_coding stop_codon 606270 606272 . + 0 gene_id "YBR190W"; transcript_id "YBR190W"; exon_number "1"; gene_name "YBR190W"; transcript_name "YBR190W";

**dpryan** · 09-19-2013, 08:29 AM

I don't have any good explanation for why that's happening when there's an assigned feature in the miniature version. You might try successively larger versions of the annotation to see where things go whonky.

**Simon Anders** · 09-19-2013, 11:20 PM

Try again with "--mode union". My guess it that the "-o" output will then show you another feature with which the read but not the gene overlaps.

**[email protected]** · 09-20-2013, 01:18 AM

I followed your suggestion, and you are right. This read:
HWI-EAS225:1:10:1427:554#0/1 0 II 606243 255 36M * 0 0 AAGACAAAAATAACTAGCAACAATGGGTAAATCACA B>@=@A@>?@@?>:999><<99999999999<:;;< XA:i:0 MD:Z:33G0T1 NM:i:2 XF:Z:ambiguous[YBR191W+YBR190W]

now is ambiguously mapped to
II protein_coding exon 605961 606272 . + . gene_id "YBR190W"; transcript_id "YBR190W"; exon_number "1"; gene_name "YBR190W"; transcript_name "YBR190W";
II protein_coding exon 606265 606275 . + . gene_id "YBR191W"; transcript_id "YBR191W"; exon_number "1"; gene_name "RPL21A"; transcript_name "RPL21A";

But the overlap of the read with gene "YBR191W" is only 10 and with gene "YBR190W" it is 29 from what I can see. I do not understand why with the option -mode intersection-nonempty it is considered as a "no feature" read instead of being counted toward YBR190W. Are reads considered to have "no feature" if they overlap more then one gene with the mode intersection-nonempty even if the size of the overlap is different?

**Simon Anders** · 09-20-2013, 03:23 AM

No, the size of the overlap is not taken into account. Please see the description on the web page.

In the end, I now see only very few use cases where one may want to use something else than the default mode "union" and hence recommend to stick to it unless you have some really special problem.

**[email protected]** · 09-20-2013, 05:12 AM

Ah I understand it now. Thanks, that explains it.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 33 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

HTseq not counting reads toward genes the overlap?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News