Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Simon Anders
    Senior Member
    • Feb 2010
    • 995

    #76
    Hi

    Originally posted by agali View Post
    2_512_865_F3 16 Esi0595_0002 conserved unknown protein [1335] f:2354-3688 613 255 3H47M * 0 0 * AS:i:347
    Your SAM file is incorrect. According to the specs, a SAM file has the following fields:

    Code:
    <QNAME> <FLAG> <RNAME> <POS> <MAPQ> <CIGAR> <MRNM> <MPOS> <ISIZE> <SEQ> <QUAL> [<TAG>:<VTYPE>:<VALUE> [...]]
    I try to align your fields to the field names:

    QNAME: 2_512_865_F3
    FLAG: 16
    RNAME: Esi0595_0002 conserved unknown protein [1335] f:2354-3688 (assuming these are all spaces and no tags in here)
    POS: 613
    MAPQ: 255
    CIGAR: 3H47M
    MRNM: *
    MPOS: 0
    ISIZE: 0
    SEQ: *
    QUAL: AS:i:347
    TAG:VTYPE:VALUE:

    Obviously, "AS:i:347" is a tag and should hence be in the 12th column. It is, however, in the 11th column, and hence read as quality string.

    Where did you get this SAM file from?

    Simon

    Comment

    • agali
      Junior Member
      • Jul 2010
      • 4

      #77
      Hi Simon,

      The SAM file is from SHRiMP. I looked up the file format specification and I think there should be a '*' in the QUAL field when there is a '*' in the SEQ field..
      I will try to put an extra column in my SAM file and then run it on HTSeq.

      Thanks!
      Aga

      Comment

      • mmpillai
        Junior Member
        • Apr 2010
        • 6

        #78
        Hi,
        I am trying to install 04.5p5 on windows. I get this error when I run the setup.py on a shell
        Traceback (most recent call last):
        File "C:\Python26\Lib\site-packages\HTSeq-0.4.5p5\setup.py", line 62, in <module>
        'scripts/htseq-count',
        File "C:\Python26\lib\distutils\core.py", line 140, in setup
        raise SystemExit, gen_usage(dist.script_name) + "\nerror: %s" % msg
        SystemExit: usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
        or: setup.py --help [cmd1 cmd2 ...]
        or: setup.py --help-commands
        or: setup.py cmd --help

        error: no commands supplied
        Please help, all help is very much appreciated.
        Thanks
        Manoj

        Comment

        • alvin
          Junior Member
          • Oct 2010
          • 9

          #79
          No Feature

          Hi!
          I wonder if it possible to retrieve the id of the reads that has "no feature" in htseq-count.
          I'm interested in those reads that do not overlap with any annotated gene.
          I would really appreciate any suggestion.
          Thanks
          Best regards.


          Alvaro Pena

          Comment

          • Simon Anders
            Senior Member
            • Feb 2010
            • 995

            #80
            Hi

            Originally posted by mmpillai View Post
            Hi,
            I am trying to install 04.5p5 on windows. I get this error when I run the setup.py on a shell
            Traceback (most recent call last):
            File "C:\Python26\Lib\site-packages\HTSeq-0.4.5p5\setup.py", line 62, in <module>
            'scripts/htseq-count',
            File "C:\Python26\lib\distutils\core.py", line 140, in setup
            raise SystemExit, gen_usage(dist.script_name) + "\nerror: %s" % msg
            SystemExit: usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
            or: setup.py --help [cmd1 cmd2 ...]
            or: setup.py --help-commands
            or: setup.py cmd --help

            error: no commands supplied
            Please help, all help is very much appreciated.
            Thanks
            Manoj
            Please read the installation instructions:



            I haven't made a Windows binary package for a while, though.

            (I still have trouble understanding why anybody would want to do HTS bioinformatics on Windows. Nearly all bioinformatics developers work on GNU systems (Linux or Mac OS). Ensuring that a tool developed on Linux works on a Mac, or vice versa, is trivial, but supporting Windows is always extra work, and hence has low priority for us developers, which makes Windows a bad choice for users, too.)

            Simon

            Comment

            • Simon Anders
              Senior Member
              • Feb 2010
              • 995

              #81
              Hi Alvaro

              Originally posted by alvin View Post
              I wonder if it possible to retrieve the id of the reads that has "no feature" in htseq-count.
              I'm interested in those reads that do not overlap with any annotated gene.
              I would really appreciate any suggestion.
              As you are by now the fourth person requesting this feature, I thought I'd no longer only make promises to add it in the future but rather get it done. :-)

              The new version, 0.4.7, now offers a new option, "-o", for htseq-count. If you add '-o', followed by a filename, a SAM file of this name will be written that contains the same lines as the input SAM file, but with each line appended by an optional field, with tag 'XF', that indicates how the read was counted, i.e., it is either a gene name, or a special counter name like "no_feature". With grep and cut, you can then get what you want.

              Simon

              Comment

              • alvin
                Junior Member
                • Oct 2010
                • 9

                #82
                Originally posted by Simon Anders View Post
                Hi Alvaro



                As you are by now the fourth person requesting this feature, I thought I'd no longer only make promises to add it in the future but rather get it done. :-)

                The new version, 0.4.7, now offers a new option, "-o", for htseq-count. If you add '-o', followed by a filename, a SAM file of this name will be written that contains the same lines as the input SAM file, but with each line appended by an optional field, with tag 'XF', that indicates how the read was counted, i.e., it is either a gene name, or a special counter name like "no_feature". With grep and cut, you can then get what you want.

                Simon
                Great! I found the -o option very useful.
                Thank you very much for your help.
                Best Regards


                Álvaro Pena

                Comment

                • marcora
                  Member
                  • Jan 2010
                  • 52

                  #83
                  Originally posted by Simon Anders View Post
                  Hi Keith
                  At the moment, HTSeq can natively only work with SAM files. Adding BAM support is on my to-do list, and of course, I would do it by simply wrapping the samtools.

                  Cheers
                  Simon
                  Hi Simon,

                  is BAM support in HTSeq coming soon!?!

                  Keep up the good work!

                  Comment

                  • naluru
                    Member
                    • Jul 2010
                    • 16

                    #84
                    htseq-count for miRNA

                    I am using "htseq-count" to count the miRNA using their genomic coordinates. It worked very well. But, I am also interested in a more detailed output. I want an output with each and every aligned read and their counts. The reason for this is, there are lot of miRNA length variants, mature star and precursor sequences. It would be nice to see the proportion of different reads. Right now, I can only see the counts of all precursor miRNAs.

                    I would like to know if there is any way to get that information and can provide some hints that will be highly appreciated.

                    Thank you in advance.

                    Comment

                    • mmpillai
                      Junior Member
                      • Apr 2010
                      • 6

                      #85
                      Hi Simon,
                      I heed your advice re: the OS - I have succesfully installed HTSeq in my linux system. I wanted to install it from binary on my Mac, but the binary package is not available for download on PyPI. (I dont want to download XCode - seems like it is >3.5 Gb in size ).
                      Thanks again, bioinformatics clearly being the bottleneck for high throughput applications, packages such as yours is clearly very helpful.
                      Manoj

                      Comment

                      • Simon Anders
                        Senior Member
                        • Feb 2010
                        • 995

                        #86
                        Hi Manoj,

                        I don't provide binary packages for MacOS -- it's too complicated as I don't have a Mac myself. Please install XCode and use it. (Actually, you only need GCC, but installing all of XCode is easiest.)

                        XCode comes with MacOS and can be found on the second of the two MacOS installation CDs. So, there is no need to download it.

                        (If a Mac user reading this wants to help out Manoj: Run the command 'python setup.py bdist' in the unpacked tarball, and a binary package will be built automatically and packed into a single file.)

                        Simon

                        Comment

                        • marcora
                          Member
                          • Jan 2010
                          • 52

                          #87
                          Originally posted by Simon Anders View Post
                          Hi Manoj,

                          I don't provide binary packages for MacOS -- it's too complicated as I don't have a Mac myself. Please install XCode and use it. (Actually, you only need GCC, but installing all of XCode is easiest.)

                          XCode comes with MacOS and can be found on the second of the two MacOS installation CDs. So, there is no need to download it.

                          (If a Mac user reading this wants to help out Manoj: Run the command 'python setup.py bdist' in the unpacked tarball, and a binary package will be built automatically and packed into a single file.)

                          Simon
                          Here we go!
                          Attached Files

                          Comment

                          • mmpillai
                            Junior Member
                            • Apr 2010
                            • 6

                            #88
                            Simon and Marcora: thanks much !

                            Comment

                            • fennan
                              Member
                              • Apr 2010
                              • 19

                              #89
                              Hi Simon,


                              In one of my datasets, I'm getting a lot of these warnings:

                              Read ILLUMINA-GA_0000:8:36:18294:7129#0 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)

                              If I grep for these reads in the SAM file I do find the two mates:

                              Code:
                              ILLUMINA-GA_0000:8:36:18294:7129#0    163     chrY    59342791        255     38M     =       59342801        0 CAGAGGGCAGCAGGAGCAGCAGCAGCAGCAGCAGCAGC hdhhehhhhhhgghhghghgahhff[fhacfdaahhgh  NM:i:0  NH:i:1  XS:A:+
                              ILLUMINA-GA_0000:8:36:18294:7129#0    83      chrY    59342801        255     38M     =       59342791        0 CAGGAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAACA abaQWdffRbWWffWfd]aa_ggfggcgfgfgggfggg  NM:i:1  NH:i:1  XS:A:+
                              Questions:

                              1) Why is this warning coming up?
                              2) When this warning appears, is the read discarded? I'm getting results that are not making a lot of sense to me:
                              Code:
                              The command:
                              htseq-count -s yes -i gene_id -m intersection-nonempty accepted_hits.sam /scratch/fdgarcia/data/gtfs/Homo_sapiens.GRCh37.60.gtf > counts.txt
                              
                              
                              Results for ~210000000 reads:
                              
                              no_feature      130841007
                              ambiguous       51826
                              too_low_aQual   0
                              not_aligned     0
                              alignment_not_unique    66886614
                              Thanks!

                              Comment

                              • Simon Anders
                                Senior Member
                                • Feb 2010
                                • 995

                                #90
                                Hi Fennan

                                Originally posted by fennan View Post
                                In one of my datasets, I'm getting a lot of these warnings:

                                Read ILLUMINA-GA_0000:8:36:18294:7129#0 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)

                                If I grep for these reads in the SAM file I do find the two mates:
                                ...
                                Well, is the SAM file properly sorted?

                                If you use htseq-count on paired-end data, you need to make sure that all SAM lines referring to the same read pair are in adjacent lines. To this end, you need to sort the SAM file by read name. (Just run it through the standard Unix 'sort' command.)

                                Simon

                                Comment

                                Latest Articles

                                Collapse

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-09-2026, 11:58 AM
                                0 responses
                                15 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-05-2026, 10:09 AM
                                0 responses
                                26 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-04-2026, 08:59 AM
                                0 responses
                                37 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                61 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...