Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simon Anders
    replied
    Originally posted by townway View Post
    There is not binary version available, would you give a link to download ?
    For which operating system?

    Leave a comment:


  • townway
    replied
    I check the website of HTSeq
    A framework to process and analyze data from high-throughput sequencing (HTS) assays


    There is not binary version available, would you give a link to download ?

    Thank you

    Wei

    Leave a comment:


  • Simon Anders
    replied
    Hi

    Originally posted by rkusko View Post
    I've used HTSeq-counts to extract raw counts from Cufflinks output, and it worked correctly for 40/42 of my samples. For some reason, two samples raised the error
    Could you maybe send me (be e-mail to anders(at)embl(dot)de) some excerpts from the data that produce the error? Then I could investigate.

    Cheers
    Simon

    Leave a comment:


  • rkusko
    replied
    Hi Simon,
    I've used HTSeq-counts to extract raw counts from Cufflinks output, and it worked correctly for 40/42 of my samples. For some reason, two samples raised the error:

    Error: 'generator' object has no attribute 'get_line_number_string'
    [Exception type: AttributeError, raised in count.py:126]


    I haven't been able to find anything wrong with the data in those two samples. Any ideas? Thank you!

    Leave a comment:


  • yh253
    replied
    Hi Simon,

    Thanks very much for your reply! Quoted from BWA:"Internally BWA concatenates all reference sequences into one long sequence. A read may be mapped to the junction of two adjacent reference sequences. In this case, BWA will flag the read as unmapped, but you will see position, CIGAR and all the tags". This has happened to me that some reads mapped to the end of chrY and the beginning of chrM by using BWA. However, I didn't see these mappings by using Bowtie. This might arise from mapping strategy adopted by BWA, which converts "N" in the reference sequence into random bases. During mapping, BWA must have converted "N"s at the end of ChrY into random bases which happened to match the beginning bases of some reads.

    Cheers,
    Yuan
    Last edited by yh253; 06-05-2010, 06:17 AM.

    Leave a comment:


  • Simon Anders
    replied
    Hi Abhijit

    Originally posted by gen2prot View Post
    I made a Drosophila Gene Index file and ran all the solexa reads against this. I have a 5.4 GB sam file. Can I use the Htseq-qa program on this sam file? or will this crash since the subject to which the alignment was done are genes and not chromosomes.
    Just try it out. But it should work. htseq-count does not care what you have aligned against.

    Secondly how can I get the read count in my case. The GTF file that I have has the start and end coordinates wrt to the genome. Therefore this won't work with the sam output. Any suggestions?
    So, basically, wherever you usually have a chromosome name (i.e., in the third column of your SAM file) you now have a gene name, and you want to count how many reads fall onto each gene? Why don't you just cut out this third column and count how often each gene appears there?

    Of course, your approach has other dangers. For example, how does your aligner handle multiple transcripts? If the same exon appears in several transcripts, the aligner might think it is a repeat.

    Cheers
    Simon

    Leave a comment:


  • Simon Anders
    replied
    Hi Yuan

    Originally posted by yh253 View Post
    I used BWA to do the alignment. If a read mapped to the chromosomal junction (end of one chromosome and beginning of another chromosome), BWA will produce a Flag = 4, but you can still see the other tags: "chr", "CIGAR","MAPQ", etc. in the sam file. This causes a problem for htseq-qa in that it can't process an alignment with flag=4, but mapping position does "NOT" equal "*". One option for me is to pre-process my sam file by excluding all such alignments before giving to htseq-qa, but I am wondering is it possible to turn off this requirement in htseq-qa so that I don't have to change my sam file in advance?Thank you very much for your advice in advance!
    I've changed the SAM parser such that it now only writes a warning rather than stopping with an error if this case is encountered. Please try again with version 0.4.4p2.

    However, I don't quite understand how such SAM lines come about. What is a chromosomal junction? How could a read get mapped partly to one and partly to another chromosome?

    Cheers
    Simon

    Leave a comment:


  • gen2prot
    replied
    Hi Simon,

    I made a Drosophila Gene Index file and ran all the solexa reads against this. I have a 5.4 GB sam file. Can I use the Htseq-qa program on this sam file? or will this crash since the subject to which the alignment was done are genes and not chromosomes. Secondly how can I get the read count in my case. The GTF file that I have has the start and end coordinates wrt to the genome. Therefore this won't work with the sam output. Any suggestions?

    thanks
    Abhijit

    Leave a comment:


  • yh253
    replied
    Hi Simon,

    I used BWA to do the alignment. If a read mapped to the chromosomal junction (end of one chromosome and beginning of another chromosome), BWA will produce a Flag = 4, but you can still see the other tags: "chr", "CIGAR","MAPQ", etc. in the sam file. This causes a problem for htseq-qa in that it can't process an alignment with flag=4, but mapping position does "NOT" equal "*". One option for me is to pre-process my sam file by excluding all such alignments before giving to htseq-qa, but I am wondering is it possible to turn off this requirement in htseq-qa so that I don't have to change my sam file in advance?Thank you very much for your advice in advance!

    Yuan

    Leave a comment:


  • gen2prot
    replied
    Got it to work... Coffee helps.

    Leave a comment:


  • Simon Anders
    replied
    Hi Abhijit

    Originally posted by gen2prot View Post
    I have a mac 10.6, intel and python 2.6. Matplotlib is unavailable for 10.6, py 2.6.... as far as I know.
    Yours seems to be a pretty standard configuration, so it would be apity if matplotlib was not available. If you have Xcode installed, building from source might work. Just try:

    Code:
    wget http://sourceforge.net/projects/matplotlib/files/matplotlib/matplotlib-0.99.1/matplotlib-0.99.1.2.tar.gz/download
    tar -xzvf matplotlib-0.99.1.2.tar.gz 
    cd matplotlib-0.99.1.1/
    python setup.py build
    sudo python setup.py install
    Simon

    Leave a comment:


  • gen2prot
    replied
    Hi Simon,

    I have a mac 10.6, intel and python 2.6. Matplotlib is unavailable for 10.6, py 2.6.... as far as I know. I give up. The website (matplotlib) asks me to tinker around with System files which I don't want to tamper with. Thanks anyway for your help.

    Abhijit

    Leave a comment:


  • gen2prot
    replied
    Hi Simon,

    I am having difficulty in running the htseq-qa script. I think I have installed HTSeq correctly since I get no error message for "import HTSeq" command. Then on giving the "htseq-qa -t sam accepted.sam" command, I get a Syntax error. I have given the following export command in Unix

    export PYTHONPATH=$PYTHONPATH:/Library/Python/2.6/

    Is this wrong? On giving the command "whereis python", I get /usr/local/python. I am confused.

    Thank you
    Abhijit

    Leave a comment:


  • yh253
    replied
    Hi Simon,

    I am afraid not. Actually I was not aware of this issue until now.I will try to figure out how to convert the quality strings first then, and give a feedback on this later.Thanks again for your prompt reply!

    Yuan

    Leave a comment:


  • Simon Anders
    replied
    Hi Yuan

    Originally posted by yh253 View Post
    ValueError: Too large quality value encountered.
    Usually, SAM files don't contain quality values exceeding 40. When you aligned your _sequence.txt file, did you convert the quality strings from Solexa to Sanger scale? If not, BWA probably might not have found the optimal alignments, and htseq-qa gets confused, too.

    If you did, and the large quality values are legitimate, I'd be interested to see your SAM file.

    Simon

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin


    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
    Yesterday, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
39 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
41 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
35 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
55 views
0 likes
Last Post seqadmin  
Working...
X