Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • himanshu04
    Member
    • Mar 2012
    • 35

    #31
    Dear Simon,
    I am using HT-Seq counts for summarizing the data of arabidopsis gene. When i use the command: 'htseq-count -m intersection-strict -s no aligned_074262 TAIR10_GTF.gtf > counts_074262'.
    I get an error saying unable to read first line of the sam file. unable to determine 11 delimited tabs in the file. What can be the possible solution for this?
    Thanks.

    Comment

    • Simon Anders
      Senior Member
      • Feb 2010
      • 995

      #32
      I get an error saying unable to read first line of the sam file.
      Well, how does the first line of your sam file look like?

      Comment

      • himanshu04
        Member
        • Mar 2012
        • 35

        #33
        htseq-count -m intersection-strict -s no aligned_074262 TAIR10_GTF.gtf > counts_074262
        100000 GFF lines processed.
        200000 GFF lines processed.
        300000 GFF lines processed.
        400000 GFF lines processed.
        478324 GFF lines processed.
        Error occured when reading first line of sam file.
        Error: ('SAM line does not contain at least 11 tab-delimited fields.', 'line 1 of file aligned_074262')
        [Exception type: ValueError, raised in _HTSeq.pyx:1220]

        The file looks like
        SRR074262.3 HWI-EAS121:2:1:14:1646 + Chr3 328884 AGGGAGGGAGGGAGTGAGATTGNTTCGATCGCCAAT BCCCBCCBACCCBCACBCBBC:!<B@A<?>@:>8AA 0 22:A>N
        SRR074262.4 HWI-EAS121:2:1:15:1470 + Chr1 14296071 CTGGGTTTTTGTGTTATTGAGANTCTGAGTTTGAGA BBA>?96>@@>==?A?@A<<=4!.5;7268?@607= 0 22:G>N

        Comment

        • Camg
          Member
          • Jan 2011
          • 21

          #34
          I think your "SAM" file has a bit of a strange format. What alignment program did you use to create it?
          Take a look at the example of my SAM file a few lines back and you can see some differences right away.

          Comment

          • himanshu04
            Member
            • Mar 2012
            • 35

            #35
            I created it using Bowtie . I am actually following the same method as described in the seqwiki page. What do you think I should be doing?

            Comment

            • Camg
              Member
              • Jan 2011
              • 21

              #36
              Did you use the -S option when running Bowtie? That's what will give you the standard SAM output file. Otherwise I think it gives you the SAM-like file that you currently have. Try running your Bowtie command again with -S and then use that file with HTseq.

              Comment

              • himanshu04
                Member
                • Mar 2012
                • 35

                #37
                Thanks a lot for your replies. I had used the -S option funnily though when I used --sam option it did work out fine. Thanks a lot for your feedback I really appreciate it.

                Comment

                • rboettcher
                  Member
                  • Oct 2010
                  • 71

                  #38
                  Dear all,

                  I encounter the same error message ('SAM line does not contain at least 11 tab-delimited fields.') as mentioned before when running htseq-count with this command:
                  Code:
                  htseq-count merged_align.bam UCSC_annotation_hg19.gtf > htseq_test.txt
                  However, when I inspect the provided bam file via samtools view it looks like this:

                  HWI-ST301L:234:C0GECACXX:3:1102:2230:56147 355 chr1 11164 1 101M = 11274 211 GACCGCCCCTTGCTTGCAGCCGGGCACTACAGGACCCGCTTGCTCACGGTGCTGTGCCAGGGCGCCCCCTGCTGGCGACTAGGGCAACTGCAGGGCTCTCT * AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:101 YT:Z:UU NH:i:3 CC:Z:chr15 CP:i:102519906 HI:i:0

                  The alignments have been done via Tophat2 + bowtie2 (most recent releases).
                  Does anyone have an idea why this error occurs?

                  Best regards

                  Comment

                  • Alex.Manuel
                    Junior Member
                    • Aug 2012
                    • 8

                    #39
                    Dear Seqanswer community, dear Simon,

                    I'm dealing with stickleback RNA-Seq data. Mapping was done with Tophat2. The ref.fasta and the anno.gtf (both from ensembl) were included in the tophat options.
                    ./tophat -p 8 -G stickle_gen_anno.gtf -o 193_ctrl24_CT_Shared_thout stickle_genome 193_ctrl24_CT_1_Shared.fastq 193_ctrl24_CT_2_Shared.fastq

                    I used samtools for sorting and converting the accepted_hits.bam (from tophat) to sam format and did the HTseq-count analysis with the anno.gtf I already used in tophat.
                    Coming to my recent problem. Is it normal that so many genes have no feature? So more than genes that are counted?
                    Here as an example (with one line added to see haw many genes are counted in total:

                    count sum 1266408
                    no_feature 2005250
                    ambiguous 9
                    too_low_aQual 0
                    not_aligned 0
                    alignment_not_unique 862188

                    So first I checked if the chromosomes displayed in the warning messages of htseq_count
                    e.g. Warning: Skipping read 'H108:2920J82ACXX:1:1101:3626:48576', because chromosome 'scaffold_893', to which it has been aligned, did not appear in the GFF file.
                    are not in the anno.gtf file. And indeed they are not. So they have to be just in the ref.fasta file.
                    But isn't there a possibility to count these genes? If not the loss of information is really big!?

                    Many thanks for an answer!

                    Cheers Alex

                    Comment

                    • Alex.Manuel
                      Junior Member
                      • Aug 2012
                      • 8

                      #40
                      Damn! I just saw that default option in htseq-count is strand-specific...and as I used the RNA TruSeq V2 PE Library prep. I can be as good as certain that it is unstranded... so did HTseq count with the --stranded=no option and get less no_feature counts. Here is the result:

                      count sum 2464902
                      no_feature 728598
                      ambiguous 78167
                      too_low_aQual 0
                      not_aligned 0
                      alignment_not_unique 862188

                      Before (without --stranded=no)

                      see above

                      So the result looks better. And I thought about my question, if it is not possible to get less no feature genes, but I think when the gene is not yet annotated then there is no Gene ID and it wont make sense to include genes without an ID. So I have to live with that...
                      Can you confirm my thoughts?

                      Cheers Alex
                      Last edited by Alex.Manuel; 12-11-2012, 09:19 AM.

                      Comment

                      • Camg
                        Member
                        • Jan 2011
                        • 21

                        #41
                        Yes, I think you already found your problem. You have reads mapping to contigs in your reference file, but these contigs must not have any genes annotated on them and therefore are not present in your .gff file.

                        You could check this by removing these contigs from your reference file and running your whole pipeline again, or just using an awk script to remove all the SAM lines with alignments to one of these contigs, then run HT-Seq again. That should get rid of a big proportion of the "no_feature" counts.

                        If you want to get useful information from these reads you're going to have to annotate genes on these contigs, which you could do with something like cufflinks.

                        Good luck

                        Comment

                        Latest Articles

                        Collapse

                        • SEQadmin2
                          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                          by SEQadmin2


                          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                          Here are nine questions we think about, in roughly the order they matter, before...
                          06-18-2026, 07:11 AM
                        • SEQadmin2
                          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                          by SEQadmin2


                          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                          ...
                          06-02-2026, 10:05 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, 06-17-2026, 06:09 AM
                        0 responses
                        25 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-09-2026, 11:58 AM
                        0 responses
                        42 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-05-2026, 10:09 AM
                        0 responses
                        48 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-04-2026, 08:59 AM
                        0 responses
                        49 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...