Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Trouble with HTSeq install
Trying to install HTSeq onto a machine running Windows7 and I got the following error:
C:\Users\ArrayStar>cd C:\HTSeq-0.5.3p9
C:\HTSeq-0.5.3p9>PATH=%PATH%;C:\Python27;C:\MinGW\bin
C:\HTSeq-0.5.3p9>python setup.py build --compiler=mingw32
Could not import 'setuptools', falling back to 'distutils'.
running build
running build_py
running build_ext
building 'HTSeq._HTSeq' extension
C:\MinGW\bin\gcc.exe -mno-cygwin -mdll -O -Wall -IC:\Python27\lib\site-packages\
numpy\core\include -IC:\Python27\include -IC:\Python27\PC -c src/_HTSeq.c -o bui
ld\temp.win-amd64-2.7\Release\src\_htseq.o -w
cc1.exe: error: unrecognized command line option '-mno-cygwin'
error: command 'gcc' failed with exit status 1
C:\HTSeq-0.5.3p9>
Any help with what I'm doing wrong and what I can do to fix it? Thanks!
Leave a comment:
-
Hi,
I usely use a command line like this, if this can help:
I use samtools to read the bam file and with the pipe, I use htseq-count. The "-" call the output of the samtools view and pass it to htseq-count
samtools view file.bam | python2.6 htseq-count - file.gtf -q > htseq_output &
Delphine
Leave a comment:
-
For those who weren't already aware of it, HTSeq seems to be able to read BAM files now, using pysam:
Leave a comment:
-
Originally posted by gringer View PostFor those who weren't already aware of it, HTSeq seems to be able to read BAM files now, using pysam
Code:def BAM_Reader( bamfile, samtools_exec="samtools" ): # derived from Simon Anders' code from # http://seqanswers.com/forums/showpost.php?p=22114&postcount=46 """Get Reader object for part of a BAM file bamfile -- a string, either a file name or an URL to a BAM file region -- a GenomicInterval samtools_exec -- file name (with path, if necessary) of the 'samtools' executable Returns a SAM_Reader object. """ cmd_line = ( samtools_exec , "view", bamfile ) samtools_view = subprocess.Popen( cmd_line, stdout = subprocess.PIPE ) for line in samtools_view.stdout: fields = line.split('\t') #columns from http://samtools.sourceforge.net/SAM1.pdf read = HTSeq.SequenceWithQualities(name = fields[1-1], seq = fields[10-1], qualstr = fields[11-1]) mapStrand = '+' if(int(fields[2-1]) & 0x10 == 0x10): mapStrand = '-' if(int(fields[2-1]) & 0x4 == 0x4): iv = None else: iv = HTSeq.GenomicInterval( chrom = fields[3-1], start = int(fields[4-1]) - 1, end = int(fields[4-1]) - 1 + len(fields[10-1]), strand = mapStrand) yield HTSeq.Alignment(read, iv)
Attached Files
Leave a comment:
-
For those who weren't already aware of it, HTSeq seems to be able to read BAM files now, using pysam:
edit: on that note, is there any way I can get the Query name from a BAM file line using the BAM_Reader interface? I need this because it contains an experiment ID, and I would like to split my output based on that ID. My current way to get around it is to use pysam directly. Looking at the HTSeq source code showed me how to convert the output into something that could be used for coverage analysis:
Code:bamFile = pysam.Samfile(bamFileName, "rb") coverage = dict() totalCount = 0 hitCount = 0 for read in bamFile: totalCount += 1 if(not(read.is_unmapped)): experiment = read.qname[:read.qname.find(".")] hitCount += 1 ## Taken from HTSeq source code chrom = bamFile.getrname(read.tid) strand = "-" if read.is_reverse else "+" iv = HTSeq.GenomicInterval( chrom, read.pos, read.aend, strand ) if(not(experiment in coverage)): coverage[experiment] = ( HTSeq.GenomicArray("auto", stranded = True, typecode = "i")) (coverage[experiment])[iv] += 1
Leave a comment:
-
Prequisites
To use HTSeq, you need at least version 2.5 of Python (Python 3 does not work yet),
Python 3 does not work yet....ooh no!!
Leave a comment:
-
No, there is no gene_id.
I even tried -i ID option and nothing is working.
Is there any way i can convert gff file to gtf or can directly get GTF file for my Genome?
Thanks
Shalabh
Leave a comment:
-
Well, there is no attribute called "gene_id" in your GFF file, or is there? The attributes are the entries in the last column, and their name is the part before the "=" sign. Try '-i ID'.
Leave a comment:
-
Thanks Simon,
So if i want to use gff file i need to mention the feature type to be used (3rd column) with -t option.
So this is what i did: (i want to use 'gene')
htseq-count -t gene CL1_forBWA_left_DSS3.sam NC_003911.gff
but now i am getting following error:
Error occured in line 6 of file NC_003911.gff.
Error: Feature NC_003911.11:gidA does not contain a 'gene_id' attribute
[Exception type: SystemExit, raised in count.py:55]
These are few lines from my gff:
##gff-version 3
#!gff-spec-version 1.14
#!source-version NCBI C++ formatter 0.2
##Type DNA NC_003911.11
NC_003911.11 RefSeq source 1 4109442 . + . organism=Ruegeria pomeroyi DSS-3;mol_type=genomic DNA;strain=DSS-3;db_xref=taxon:246200
NC_003911.11 RefSeq gene 1 1869 . + . ID=NC_003911.11:gidA;locus_tag=SPO0001;db_xref=GeneID:3194636
NC_003911.11 RefSeq CDS 1 1866 . + 0 ID=NC_003911.11:gidA:unknown_transcript_1;Parent=NC_003911.11:gidA;locus_tag=SPO0001;note=GidA%3B glucose-inhibited cell division protein A%3B involved in the 5-carboxymethylaminomethyl modification %28mnm%285%29s%282%29U%29 of the wobble uridine base in some tRNAs;transl_table=11;product=tRNA uridine 5-carboxymethylaminomethyl modification enzyme GidA;protein_id=YP_165274.1;db_xref=GI:56694929;db_xref=GeneID:3194636;exon_number=1
Leave a comment:
-
ssharma:
This line does not have a 'gene_id' attribute, but neither do the others. Please note that htseq-counts expects by default a GTF file. Yours is not one. (See this post for details.) See also the htseq-count documentation, especially then options '-t' and '-i'.
Leave a comment:
-
@joro
Here are the 3 lines from gff file.
Middle one is 2169
NC_003911.11 RefSeq gene 176231 176306 . + . locus_tag=SPO_tRNA-Gln-1;db_xref=GeneID:3196480
NC_003911.11 RefSeq exon 176231 176306 . + . gbkey=tRNA;locus_tag=SPO_tRNA-Gln-1;product=tRNA-Gln;db_xref=GeneID:3196480;exon_number=1
NC_003911.11 RefSeq gene 176312 176389 . + . locus_tag=SPO_tRNA-Pro-2;db_xref=GeneID:3196280
Leave a comment:
-
Sorry, my mistake. The comma at the end of the print line escaped the c&p.
Leave a comment:
-
Originally posted by Simon Anders View PostFor an explanation why htseq-count skips multiply matched reads, see post #4 of this thread: http://seqanswers.com/forums/showthread.php?t=9129
Now, if you have two mappings for a read and one has high quality and the other does not, it would indeed make sense to use the good one and discard the bad one rather than skipping both. Hence, you could sort you SAM file by read ID, so that multiple mappings are in adjacent lines and the write a script to filter the best one. Here is a quick attempt to do this with HTSeq:
Code:import sys, re import HTSeq insam = HTSeq.SAM_Reader( sys.stdin ) # Go through all reads, with their alignments bundled up: for bundle in HTSeq.bundle_multiple_alignments( insam ): bestAlmt = None # Go through all alignments of a given read, looking # for the one with the best alignment score for almt in bundle: if bestAlmt is None: bestAlmt = almt elif almt.aQual > bestAlmt.aQual: bestAlmt = almt elif almt.aQual == bestAlmt: # If there are more than one best alignment, # better skip the read bestAlmt = None if bestAlmt is not None: # Change the NH field to 1 and print the line print re.sub( "NH:i:\d+", "NH:i:1", bestAlmt.original_sam_line ),
Code:sort samfile.sam | python chooseBest.py > filtered.sam
Be aware hat I haven't really tested this thoroughly.
Code:[tobiasko@bs-bsse01 tophat_output_n_mode]$ ls -l total 269918 -rw-r--r-- 1 tobiasko bsse-paro 275526474 Sep 26 10:26 accepted_hits.bam -rw-r--r-- 1 tobiasko bsse-paro 52 Sep 26 10:24 deletions.bed -rw-r--r-- 1 tobiasko bsse-paro 54 Sep 26 10:24 insertions.bed -rw-r--r-- 1 tobiasko bsse-paro 52 Sep 26 10:24 junctions.bed -rw-r--r-- 1 tobiasko bsse-paro 68 Sep 26 10:09 left_kept_reads.info drwxr-xr-x 2 tobiasko bsse-paro 13 Sep 26 10:21 logs [tobiasko@bs-bsse01 tophat_output_n_mode]$ samtools sort -n accepted_hits.bam accepted_hits.sortedByName [bam_sort_core] merging from 2 files... [tobiasko@bs-bsse01 tophat_output_n_mode]$ samtools view accepted_hits.sortedByName.bam | python /usr/local/paro/scripts/selectBest.py > accepted_hits.sortedByName.best.sam
These are the sam file heads:
Code:[tobiasko@bs-bsse01 tophat_output_n_mode]$ samtools view accepted_hits.sortedByName.bam | head -1350550920:7:1:1:309 0 chr2R 12473665 255 35M * 0 0 AAAGTTGTGTTTTTCGTACATTTTTCCAATTCACG .<;:3I8=@2IFII2*G;2B?III%+,:7GI,2') NM:i:1 NH:i:1 -1350550920:7:1:1:862 16 chr3L 9776491 255 35M * 0 0 AGAAGAAGAAACCTAGTCCGCTGGAGAAAACTGGA I,>A'DC,IB0&/.G7.(&8(6>BI7IC@B:G39I NM:i:0 NH:i:1 -1350550920:7:1:1:1324 0 chr2L 9788256 255 35M * 0 0 TTAGATAAGAGAAGAGTATTCACCTCATCGCCCGG III6;III+<4ID%=)$@ID9I>1$88>5%667%) NM:i:2 NH:i:1 -1350550920:7:1:1:1598 0 chr2L 19448548 255 35M * 0 0 TGAGGCCGACTTCTGGCCCTTCAAAGCCTTCGCGG ;18,;E>/?7II:I<+;:>CI<11-.43;C.(2'' NM:i:0 NH:i:1 -1350550920:7:1:1:1881 16 chrX 6547455 255 35M * 0 0 TTCAAACGCTTCTGGATAGCGGTGAGCACATCCCA +*+CE.37,;+D5C;6;E4@53IGII85II36<2> NM:i:0 NH:i:1 -1350550920:7:1:2:92 0 chrX 13614346 255 35M * 0 0 ACGGTGGCAGCTGAGTCCGTGACATCGTGTTCGCT ?2CE16?/+>9I-4:=/7A+-4'(8(88++1$($6 NM:i:0 NH:i:1 -1350550920:7:1:2:195 0 chr3R 16964869 255 35M * 0 0 GACACCACCAACGCGGGCTCCTACGTGGCCTGTTC /7.1'36.+2I/<16:7/B.,-.+/)4*%++),*) NM:i:0 NH:i:1 -1350550920:7:1:2:198 0 chrX 9515569 255 35M * 0 0 AAGCCAAATGCACTGCCCCGTGCCCACACATGCTT A98''G:?05+;;>/30'/?05%$--,.,0+.-,- NM:i:0 NH:i:1 -1350550920:7:1:2:222 16 chr3R 10049214 255 35M * 0 0 GAGCTCATAGAAGCGATTAAAGAAGCTATTGGGCG )8+%6&?9C.9?5*7CI?III7EF<94:3C//6.5 NM:i:0 NH:i:1 -1350550920:7:1:2:237 16 chr3R 26711769 255 35M * 0 0 GTTGGCCACCAAGACGTCGCGAAGGTGCTTTACTA -242-(<+,:=8:6596DCFB>D:?176GI@6A:A NM:i:0 NH:i:1 [tobiasko@bs-bsse01 tophat_output_n_mode]$ head accepted_hits.sortedByName.best.sam -1350550920:7:1:1:309 0 chr2R 12473665 255 35M * 0 0 AAAGTTGTGTTTTTCGTACATTTTTCCAATTCACG .<;:3I8=@2IFII2*G;2B?III%+,:7GI,2') NM:i:1 NH:i:1 -1350550920:7:1:1:862 16 chr3L 9776491 255 35M * 0 0 AGAAGAAGAAACCTAGTCCGCTGGAGAAAACTGGA I,>A'DC,IB0&/.G7.(&8(6>BI7IC@B:G39I NM:i:0 NH:i:1 -1350550920:7:1:1:1324 0 chr2L 9788256 255 35M * 0 0 TTAGATAAGAGAAGAGTATTCACCTCATCGCCCGG III6;III+<4ID%=)$@ID9I>1$88>5%667%) NM:i:2 NH:i:1 -1350550920:7:1:1:1598 0 chr2L 19448548 255 35M * 0 0 TGAGGCCGACTTCTGGCCCTTCAAAGCCTTCGCGG ;18,;E>/?7II:I<+;:>CI<11-.43;C.(2'' NM:i:0 NH:i:1 -1350550920:7:1:1:1881 16 chrX 6547455 255 35M * 0 0 TTCAAACGCTTCTGGATAGCGGTGAGCACATCCCA +*+CE.37,;+D5C;6;E4@53IGII85II36<2> NM:i:0 NH:i:1
Leave a comment:
-
Hi Shalabh, perhaps it would help if you showed us line 2169 of the *gff file.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.
3D Genomics
While spatial biology often involves studying proteins and RNAs in their...-
Channel: Articles
01-01-2025, 07:30 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 07:35 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
Yesterday, 07:35 AM
|
||
Started by seqadmin, 01-23-2025, 09:43 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
01-23-2025, 09:43 AM
|
||
Started by seqadmin, 01-23-2025, 08:36 AM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
01-23-2025, 08:36 AM
|
||
Started by seqadmin, 01-17-2025, 09:38 AM
|
0 responses
37 views
0 likes
|
Last Post
by seqadmin
01-17-2025, 09:38 AM
|
Leave a comment: