Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sofia17
    started a topic htseq-count for sam and gff3

    htseq-count for sam and gff3

    Hi everyone,

    I need some help with running htseq-count.

    I am running htseq-count for a sam file obtained as a result of Illumina reads (not paired) aligned to the genome. I am getting the following Warning for all my reads:

    "Warning: Skipping read 'S5_057841196', because chromosome 'SL2.40ch00', to which it has been aligned, did not appear in the GFF file."

    and of course, all of my reads go in the "alignment_not_unique" pile, so there is no gene count at the end.

    I read a similar problem posted by hibachings2013 in 2010, but unlike his problem, my sam and gff3 files BOTH have the correct chromosome names.

    What am I doing wrong?

    Below are examples of entries in my sam and gff3 files:

    @HD VN:1.0 SO:coordinate
    @SQ SN:SL2.40ch00 LN:21805821
    @SQ SN:SL2.40ch01 LN:90304244
    @SQ SN:SL2.40ch02 LN:49918294
    @SQ SN:SL2.40ch03 LN:64840714
    @SQ SN:SL2.40ch04 LN:64064312
    @SQ SN:SL2.40ch05 LN:65021438
    @SQ SN:SL2.40ch06 LN:46041636
    @SQ SN:SL2.40ch07 LN:65268621
    @SQ SN:SL2.40ch08 LN:63032657
    @SQ SN:SL2.40ch09 LN:67662091
    @SQ SN:SL2.40ch10 LN:64834305
    @SQ SN:SL2.40ch11 LN:53386025
    @SQ SN:SL2.40ch12 LN:65486253
    @PG ID:TopHat VN:1.3.1 CL:/usr/local/bin/tophat -I 5000 --segment-mismatches 1 -o ./Output/tophatSA1_S6/ -G ./Annotation/ITAG2.3_gene_models.gff3 ./Reference/S_lycopersicm_genome ./Input/SA1_S6
    S6_021828409 272 SL2.40ch00 96314 0 49M * 0 0 GACTCTTGAATACAATCTTACAATTTTTCCTCACAAATTGCTACACCCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:2 NH:i:7 CC:Z:SL2.40ch02 CP:i:13220492 HI:i:0
    S6_028644958 256 SL2.40ch00 225372 0 49M * 0 0 CCGACACACTAAATAAAAGAACAATATCACATTGCATATCAAACTAATA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:1 NH:i:11 CC:Z:= CP:i:3434611 HI:i:0
    S6_008579902 272 SL2.40ch00 251575 0 49M * 0 0 CTTATGATTTTAATATGAACACATTTATCACTTTCATCATTCTTCGATC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:2 NH:i:10 CC:Z:SL2.40ch01 CP:i:5753251 HI:i:0
    S6_033284153 272 SL2.40ch00 251575 0 49M * 0 0 CTTATGATTTTAATATGAACACATTTATCACTTTCATCATTCTTCGATC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:2 NH:i:10 CC:Z:SL2.40ch01 CP:i:5753251 HI:i:0
    S6_084652203 272 SL2.40ch00 251582 0 49M * 0 0 TTTTAATATGAACACATTTATCACTTTCATCATTCTTCGATCCATTTGC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:2 NH:i:8 CC:Z:SL2.40ch01 CP:i:5753258 HI:i:0
    S6_015020076 272 SL2.40ch00 251583 0 49M * 0 0 TTTAATATGAACACATTTATCACTTTCATCATTCTTCGATCCATTTGCC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:2 NH:i:9 CC:Z:SL2.40ch01 CP:i:5753259 HI:i:0

    ##gff-version 3
    ##feature-ontology http://song.cvs.sourceforge.net/*che...?revision=1.93
    ##sequence-region SL2.40ch00 1 21805821
    SL2.40ch00 ITAG_eugene gene 16437 18189 . + . Alias=Solyc00g005000;ID=gene:Solyc00g005000.2;Name=Solyc00g005000.2;from_BOGAS=1;length=1753
    SL2.40ch00 ITAG_eugene mRNA 16437 18189 . + . ID=mRNA:Solyc00g005000.2.1;Name=Solyc00g005000.2.1;Note=Aspartic proteinase nepenthesin I (AHRD V1 **-- A9ZMF9_NEPAL)%3B contains Interpro domain(s) IPR001461 Peptidase A1 ;Ontology_term=GO:0006508;Parent=gene:Solyc00g005000.2;from_BOGAS=1;interpro2go_term=GO:0006508;length=1753;nb_exon=2
    SL2.40ch00 ITAG_eugene exon 16437 17275 . + . ID=exon:Solyc00g005000.2.1.1;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
    SL2.40ch00 ITAG_eugene five_prime_UTR 16437 16479 . + . ID=five_prime_UTR:Solyc00g005000.2.1.0;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
    SL2.40ch00 ITAG_eugene CDS 16480 17275 . + 0 ID=CDS:Solyc00g005000.2.1.1;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
    SL2.40ch00 ITAG_eugene intron 17276 17335 . + . ID=intron:Solyc00g005000.2.1.1;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
    SL2.40ch00 ITAG_eugene exon 17336 18189 . + 0 ID=exon:Solyc00g005000.2.1.2;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
    SL2.40ch00 ITAG_eugene CDS 17336 17940 . + 2 ID=CDS:Solyc00g005000.2.1.2;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
    SL2.40ch00 ITAG_eugene three_prime_UTR 17941 18189 . + . ID=three_prime_UTR:Solyc00g005000.2.1.0;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
    ###
    SL2.40ch00 ITAG_eugene gene 68062 68764 . + . Alias=Solyc00g005020;ID=gene:Solyc00g005020.1;Name=Solyc00g005020.1;from_BOGAS=1;length=703
    SL2.40ch00 ITAG_eugene mRNA 68062 68764 . + . ID=mRNA:Solyc00g005020.1.1;Name=Solyc00g005020.1.1;Note=Unknown Protein (AHRD V1);Parent=gene:Solyc00g005020.1;from_BOGAS=1;length=703;nb_exon=3;eugene_evidence_code=10F0H0E0IEG
    SL2.40ch00 ITAG_eugene exon 68062 68211 . + 0 ID=exon:Solyc00g005020.1.1.1;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
    SL2.40ch00 ITAG_eugene CDS 68062 68211 . + 0 ID=CDS:Solyc00g005020.1.1.1;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
    SL2.40ch00 ITAG_eugene intron 68212 68343 . + . ID=intron:Solyc00g005020.1.1.1;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
    SL2.40ch00 ITAG_eugene exon 68344 68568 . + 0 ID=exon:Solyc00g005020.1.1.2;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
    SL2.40ch00 ITAG_eugene CDS 68344 68568 . + 0 ID=CDS:Solyc00g005020.1.1.2;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
    SL2.40ch00 ITAG_eugene intron 68569 68653 . + . ID=intron:Solyc00g005020.1.1.2;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
    SL2.40ch00 ITAG_eugene exon 68654 68764 . + 0 ID=exon:Solyc00g005020.1.1.3;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
    SL2.40ch00 ITAG_eugene CDS 68654 68764 . + 0 ID=CDS:Solyc00g005020.1.1.3;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
    Last edited by sofia17; 09-02-2011, 12:12 PM.

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 11:49 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
62 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X