Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bioBob
    Member
    • Mar 2011
    • 72

    HTSeq error using STAR alignments

    Hi,
    I am trying to use STAR alignments as input to the HTSeq count tools suitable for either DESeq or DEXSeq.

    The error I am getting is:

    2181086 GFF lines processed.
    Error occured when reading first line of sam file.
    Error: ("SAM optional field with illegal type letter ':'", 'line 28 of file GRL1259_76_GAGATTCC_L005_R1_001_AT_QT.paire
    R_aligned.sam.bam.sorted.bam.sam')
    [Exception type: ValueError, raised in _HTSeq.pyx:1180]

    This is the first line of the samfile, the 28th line looks similar with a different read.
    HWI-1KL163:53:C2191ACXX:1:1204:19189:58727 99 1 155979169 255 51M = 155979289 171 CCTTCACTATGGCTGAGAGCAGGGCAGGATCCAGGAGAAAGTGGCCAAGGG CCCFFFFFHHHHHJJJJJJIJJJJJJJJIJJJJIJFHHJJJFHGIJJJJJJ NH:i:1 HI:i:1 AS:i:100 nM:i:0

    I am using the following commands:

    STAR --genomeDir $genome_DIR --runThreadN 12 --genomeLoad NoSharedMemory --outSAMmode Full --outFilterMismatchNmax 3 --outFilterMultimapNmax 5 --readFilesIn $align_FILES

    samtools view -h file.bam -o file.bam.sam

    python -m HTSeq.scripts.count -m intersection-strict -t exon -s no -i gene_id $BAM_NAME.sorted.bam.sam $GTF >$BAM_NAME.sorted.bam.sam.out

    I am using samtools 0.1.19, HTSeq 0.5.4p3, STAR 2.3.0

    Thanks,
    Bob
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Can you post the actual 28th line? I think that this will include the header, so it could be the line that you posted. One odd thing is the "nM" field, which should be "NM". You might also try just making a smaller version of the same file, excluding the line that causes issues. You can then see if this crops up elsewhere.

    Comment

    • bioBob
      Member
      • Mar 2011
      • 72

      #3
      Hi,
      it looks like it doesn't like the second sequence in all cases, 3000 jobs all choked.
      Here are the first two from one of the files.
      HWI-1KL163:5422YEACXX:5:1101:1627:2223 99 3 196083660 255 51M = 196089177 5568 TGAGAAGTTCAAAACGTTCAT
      TTGGGTATCCTTTAGACTGCACGTGCTTCA @CCFDDFDHHHHHJJJIJJJJJJJJJGHIJJJJJIHIJJJJJJJJJJJJJG NH:i:1 HI:i:1 AS:i:99 nM:i:0 jM:B:c,-1 jI:B:i,-1
      HWI-1KL163:5422YEACXX:5:1101:1627:2223 147 3 196089177 255 51M = 196083660 -5568 TCCCCTCCACTACTCCATCTG
      CTTTTTCAGGTGGCATCTCCAGTATCTCTG GJJHF?HGFHHDIGJJJIJJJJJJJJJJJJJIJJJJIJHHHHHFFFFFCCC NH:i:1 HI:i:1 AS:i:99 nM:i:0 jM:B:c,-1 jI:B:i,-1

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        I wonder if it's choking on the jM and/or jI tags. Try making a miniature SAM file with just the header and a couple read-pairs and then try removing those two fields (optionally changing "nM" to "NM"). I recall in the STAR manual, there's mention made that some of the additional fields can break other programs.

        Comment

        • bioBob
          Member
          • Mar 2011
          • 72

          #5
          Good ideas. I will give that a try.

          Comment

          • bioBob
            Member
            • Mar 2011
            • 72

            #6
            Hi,
            thanks, with that as a hint, I removed the --outSAMmode Full option and went with the defaults, all seems to work now.
            Bob

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              Originally posted by bioBob View Post
              Hi,
              thanks, with that as a hint, I removed the --outSAMmode Full option and went with the defaults, all seems to work now.
              Bob
              That must be the option I'm remembering, glad that everything's working.

              Devon

              Comment

              • alexdobin
                Senior Member
                • Feb 2009
                • 161

                #8
                Originally posted by dpryan View Post
                That must be the option I'm remembering, glad that everything's working.

                Devon
                Hi @bioBob and @dpryan

                you are right - HTseq cannot deal with jM:B:c,-1 and jI:B:i,-1 SAM attributes which are output if you use (non-default) --outSAMmode Full. These new type of "array" attributes are relatively new to SAM, they appeared in 0.1.17, I believe. HTseq (as well as Cufflinks and probably others) seem to have a problem with these attributes. These attributes are convenient but not really required, so you can either run STAR without them, or cut the corresponding columns before feeding them to HTseq or Cufflinks. I probably need to report this problem to HTseq authors.

                nM attribute is intentionally named differently than the standard NM attributes because nM contains the number of mismatches in both mates of a paired-end read, and does not include indels. It should not cause any problems with the downstream processing.

                Cheers
                Alex

                Comment

                • ellybelly
                  Junior Member
                  • Oct 2016
                  • 2

                  #9
                  HTSeq works fine with BAM

                  I had the same issue when trying to count sam files in HTSeq. However, if you got pysam installed you can run it with the -f bam flag and the new fields are no longer an issue.

                  Comment

                  Latest Articles

                  Collapse

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  12 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  24 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  28 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  22 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...