Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Dedeusan
    Junior Member
    • Nov 2011
    • 7

    #16
    Almost...

    Hi Simon.
    First of all, thank you very much!!!
    Second:
    Apparently, it worked. It was an ; extra in a pair of lines so it continues reading everything just until it arrives to the #fasta part:

    htseq-count -s no -i ID s_1_sequence_clipped_tophat.bam TbruceiTreu927_TriTrypDB-3.3.3.gff
    Error occured in line 46748 of file TbruceiTreu927_TriTrypDB-3.3.3.gff.
    Error: need more than 1 value to unpack
    [Exception type: ValueError, raised in __init__.py:214]

    And when I look at this line this is what I can see:


    tryp_XI-1036e06.p1k.snoRNA.0004;Dbxref=ApiDB:tryp_XI-1036e06.p1k.snoRNA.0004,taxon:185431
    apidb|tryp_XI-1036e06.p1k ApiDB exon 22123 22200 . + . ID=apidb|exon_tryp_XI-1036e06.p1k.snoRNA.0004-1;Name=exon;description=exon;size=78;Parent=apidb|rna_tryp_XI-1036e06.p1k.snoRNA.0004-1
    (This is line 46748)##FASTA
    >apidb|TB927.5.300b
    ATGGCTCACGGCTCGATTCCAGTTATTGATGTCGGCCCTCTGTTCTGTGATGGAGAAAAG
    GGGATGATGGATGTTGCGAAACAGATTGATCATGCCTGTAGGACGTGGGGTGTTTTTCTT
    GTTGTGGGTCATCCCATTCCCCGTGAGCGAACGGAAAAGTTGATGGAAATGGCCAAGGCT
    TTTTTTTCGCTTCCATTGGAAGAGAAACTTAAGGTTGATATTCGAAAGAGCAAACATCAT
    CGCGGTTACGGATGCCTCGATGCGGAGAATGTTGACCCAACGAAACCATTTGATTGTAAA
    GAAACATTTAATATGGGCTGTCATCTCCCTGAGGATCACCCCGATGTTGCAGCTGGAAAG
    CCATTGCGTGGACCGAACAATCACCCCACGCAAGTGAAAGGTTGGGTAGAGTTGATGAAC
    AGACATTATCGCGAAATGCAGGAATTTGCCCTCGTTATTCTTCGTGCCCTCGCACTCGCT
    ATTGGTTTAAAGAAAGACTTTTTCGATACCAAATTTGATGAACCTTTGAGTGTGTTCCGT
    ATGCTACATTATCCTCCACAAAAGCAAGGGACCCGTTATCCCATCGTGTGTGGTGAGCAT
    ACGGATTATGGTATTATTACATTACTCTACCAAGATTCGGTGGGAGGACTGCAGGTGCGC
    AATCTGTCAGATGAGTGGGTGGATGTGGAACCCCTTGAAGGAAGTTTTGTTGTGAATATT
    GGGGACATGATGAATATGTGGAGTAATGGCCGTTACCGCTCAACACCGCATCGCGTTCGC
    TTAACCACAACTGATCGCTACTCCATGCCATTTTTCTGTCAGCCTAATCCTTATACTGTT
    ATTAAATGCCTTGATCATTGCCATTCGCCAAGCAATCCCCCCAAATATCCACCAGTCCGT
    GCTGTGGATTGGTTGCTGAAGCGTTTCGCGGAAACATATGCCCATCGCAAAACAAAGATG
    TGA
    >apidb|cds_TB927.5.300b-1
    MAHGSIPVIDVGPLFCDGEKGMMDVAKQIDHACRTWGVFLVVGHPIPRERTEKLMEMAKA
    FFSLPLEEKLKVDIRKSKHHRGYGCLDAENVDPTKPFDCKETFNMGCHLPEDHPDVAAGK
    PLRGPNNHPTQVKGWVELMNRHYREMQEFALVILRALALAIGLKKDFFDTKFDEPLSVFR
    MLHYPPQKQGTRYPIVCGEHTDYGIITLLYQDSVGGLQVRNLSDEWVDVEPLEGSFVVNI
    GDMMNMWSNGRYRSTPHRVRLTTTDRYSMPFFCQPNPYTVIKCLDHCHSPSNPPKYPPVR
    AVDWLLKRFAETYAHRKTKM

    Why does htseq-count gives now an error?
    Again, thanks for all!
    Sandra

    Comment

    • Simon Anders
      Senior Member
      • Feb 2010
      • 995

      #17
      In my understanding of the GFF standard, a GFF file is supposed to contain GFF lines and not FASTA lines. I've seen this before that a full FASTA file is concatenated to the end of the GFF file but this is really confusing for any software trying to parse the file. Please remove everything that is not GFF from your GFF file.

      Comment

      • Dedeusan
        Junior Member
        • Nov 2011
        • 7

        #18
        Hi SImon,

        I am really desiring now to be working with Drosophila or something similar...
        I applied what you told me:

        samtools view s_1_sequence_clipped_tophat.bam | htseq-count -s no -i ID - TbruceiTreu927_TriTrypDB-3.3.4.gff > count.txt


        Warning: Skipping read 'HWUSI-EAS582_211:1:106:1372:1342', because chromosome 'GeneDB|Tb927_01_v4', to which it has been aligned, did not appear in the GFF file

        And after I stopped the execution of the program:

        ^CError: 'itertools.chain' object has no attribute 'get_line_number_string'
        [Exception type: AttributeError, raised in count.py:198]

        Maybe it was not a good idea to eliminate the fasta part...

        Comment

        • Simon Anders
          Senior Member
          • Feb 2010
          • 995

          #19
          Well, it is a only warning that told you that a single read got skipped. I suppose you can ignore that. If you have contigs in your reference to which the GFF file does not assign any exons it is completely expected to get a few of these warnings.

          Comment

          • Dedeusan
            Junior Member
            • Nov 2011
            • 7

            #20
            Well, SImon, this is what I get:


            apidb|exon_Tb09.160.0220-1 0

            And over 12.000 similar, and:

            no_feature 0
            ambiguous 0
            too_low_aQual 0
            not_aligned 0
            alignment_not_unique 11678480

            But at the txt document I couldn't see anything else about the 11678480 aligments. Did it worked?? Can I see the aligments somewhere?

            And Soooo sorry for disturbing you so much. THanks for your attention!
            Sandra

            Comment

            • Simon Anders
              Senior Member
              • Feb 2010
              • 995

              #21
              I would say, no, it didn't work, if you see only zeroes.

              I've just had another look at the excerpt from your GFF file that you posted above, and it does not seem right. htseq-count counts on the level of genes, not exons. Hence, for each "exon" line, the attribute "ID" (or whatever you have specified with "-i") has to be theID of the gene. All exons from the same gene must have the same ID. In your file, it seems that some exon or transcript number is appended to the ID. You may need some perl (or python or sed) script to remove these.

              Comment

              • Dedeusan
                Junior Member
                • Nov 2011
                • 7

                #22
                OK, Simon...
                I will ty it...
                Thanks for your help!
                SAndra

                Comment

                • m232
                  Junior Member
                  • Aug 2013
                  • 2

                  #23
                  HTSeq_ Error occured when reading first line of sam file.

                  Hello,

                  I am very new in bioinformatics and I am having some problems that can´t solve. I have seen similar threads to this one, but not exactly the same, and I am not sure how to solve my issue. When running HTSeq, I received this error:

                  100000 GFF lines processed.
                  200000 GFF lines processed.
                  217821 GFF lines processed.
                  Error occured when reading first line of sam file.

                  [Exception type: StopIteration, raised in count.py:79]
                  100000 GFF lines processed.
                  200000 GFF lines processed.
                  217821 GFF lines processed.
                  Error occured when reading first line of sam file.

                  [Exception type: StopIteration, raised in count.py:79]
                  100000 GFF lines processed.
                  200000 GFF lines processed.
                  217821 GFF lines processed.
                  Error occured when reading first line of sam file.

                  It does not tell me any specific error for why it can´t read the first line of the sam file, like in other threads that I have seen.

                  I have used Bowtie for the alignment to a reference metagenome, and samtools to create the nameSorted.sam files.

                  Any thoughts?

                  Thank you!

                  Comment

                  • dpryan
                    Devon Ryan
                    • Jul 2011
                    • 3478

                    #24
                    Please post the command you used and the first few lines of the SAM/BAM file in question.

                    Comment

                    • alvarani
                      Junior Member
                      • Mar 2013
                      • 5

                      #25
                      Htseq-count throws error

                      Dear All,


                      I am using htseq count to get exonic read count fro targeted seq data..
                      I have output BAM file with alignments and I need to get the read counts for the respective exonics..the exon gtf file looks like this,
                      chr9 bed2gff exon 133589697 133589852 0 + . ABL1
                      chr9 bed2gff exon 133709410 133709441 0 + . ABL1
                      chr9 bed2gff exon 133710251 133710279 0 + . ABL1

                      And the error I receive from HT SEQ-count is,
                      Error occurred when processing GFF file (line 1 of file Regions_new.GTF):
                      Failure parsing GFF attribute line
                      [Exception type: Value Error, raised in init.py:164]
                      Pleas elet me know why I am having this error, and the command line I used is,

                      htseq-count -a 10 -m intersection-strict -s yes mydata.sam Regions_new.GTF
                      Kind regards,
                      Alva

                      Comment

                      • dpryan
                        Devon Ryan
                        • Jul 2011
                        • 3478

                        #26
                        That's neither GFF nor GTF format. If bed2gff produced that then the program is broken.

                        Comment

                        • alvarani
                          Junior Member
                          • Mar 2013
                          • 5

                          #27
                          Dear Devon,
                          Thanks for the prompt reply. I would like at ask..So I have the targeted region in bed format and converted to this above mentioned format..

                          So in order to extract the read count for exons what would be most suggested tool and the format of the bed file. ??
                          Kind regards,
                          Alva

                          Comment

                          • dpryan
                            Devon Ryan
                            • Jul 2011
                            • 3478

                            #28
                            You're probably better off with featureCounts, since it allows counts for alignments to multiple features (this will treat spliced alignments properly).

                            Anyway, the following bit of awk code should work with your BED file:
                            Code:
                            awk '{printf("%s\tawk\texon\t%i\t%i\t.\t%s\t.\texon_id \"%s%i\"\n", $1, $2+1, $3, $6, $4, NR)}' foo.bed > foo.gff
                            Then the exons will be labeled with the associated gene and then a number (the line number). I haven't tested this, but something like that should work.

                            Comment

                            • alvarani
                              Junior Member
                              • Mar 2013
                              • 5

                              #29
                              Ok, Thanks
                              Kind regards,
                              Alva

                              Comment

                              • padmoo
                                Member
                                • Jun 2015
                                • 16

                                #30
                                Hi everyone,

                                I have a similar problem with my gff file. I get the error:

                                Error occured when processing GFF file (line 1 of file /gpfs/scratch/cbh12wsu/Thaps3_chromosomes_geneModels_FilteredModels2.gff):
                                Feature fgenesh1_pg.C_chr_1000001 does not contain a 'gene_id' attribute
                                [Exception type: ValueError, raised in count.py:53]

                                My gff files looks like this:
                                chr_1 JGI exon 300 1153 . - . name "fgenesh1_pg.C_chr_1000001"; transcriptId 867
                                chr_1 JGI CDS 300 1153 . - 0 name "fgenesh1_pg.C_chr_1000001"; proteinId 867; exonNumber 10
                                chr_1 JGI exon 1199 2425 . - . name "fgenesh1_pg.C_chr_1000001"; transcriptId 867
                                chr_1 JGI CDS 1199 2425 . - 2 name "fgenesh1_pg.C_chr_1000001"; proteinId 867; exonNumber 9
                                chr_1 JGI exon 2512 2935 . - . name "fgenesh1_pg.C_chr_1000001"; transcriptId 867
                                chr_1 JGI CDS 2512 2935 . - 2 name "fgenesh1_pg.C_chr_1000001"; proteinId 867; exonNumber

                                I'm guessing there is something missing? Can anyone help?

                                Thanks!
                                Padmoo

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  Yesterday, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 12:03 PM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, Yesterday, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...