Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sindrle
    replied
    I have made a SAM file with -samout option and checked around a bit..

    From Tophat.log I get 21.8 mill. total kept reads
    In my SAM I get a total of: 20.90 mill., wonder where the rest are?

    Of the 20.9 mill. I have:

    17.7 mill. NH:i:1 of which 158.000 ambiguous
    I also get 3.4 mill. alignment_not_unique &
    1.6 mill. no_feature

    Looking at the HTSeq output file I get:

    no_feature: 3.8 mill.
    ambiguous: 158.000
    alignment_not_unique: 3.4 mill.

    So the SAM has 1 million reads less than the BAM. Also "no_feature" is different in the SAM and HTSeq output..

    I tried to watch specific reads in IGV, but selecting reads by read name (right click the BAM track and choose "select by name", does not change the view....Annoying).

    But anyone have something to add on this?

    Leave a comment:


  • sindrle
    replied
    Oh yeah, that make sense.

    How come you know so much about everything? Where have you learned?

    But, its pretty sure something is wrong here right, so I should keep looking? I have checked my GTF, the chromosome names are the same.

    Leave a comment:


  • dpryan
    replied
    HTSeq-count also looks at the NH auxiliary tag. With a MAPQ of 3, it's likely that three of those are multimappers (this will be the case if you used tophat2) and would be (properly) ignored.

    Leave a comment:


  • sindrle
    replied
    Hi again!

    I have now tested HTSeq with all modes, also upgraded to Python 2.7.6 and inspected using IGV.

    Click image for larger version

Name:	Screen Shot 2014-01-20 at 22.43.27.png
Views:	1
Size:	17.1 KB
ID:	304414

    Here is in total 4 reads, one with mapping quality 50 and three with 3.
    I used HTSeq option -a 0, so they should been picked up..

    All three modes only counts 1 read.. How can this be?

    Leave a comment:


  • dpryan
    replied
    Perhaps you have a lot of immature mRNAs or a lot of expressed repeat regions. The general idea is to look at some of the alignments in IGV and see if they really don't match anything. Also ensure that the chromosome names in the BAM file and GTF file match (that probably causes this sort of thing half the time).

    Leave a comment:


  • sindrle
    started a topic HTseq: Very few counts recognised

    HTseq: Very few counts recognised

    Hi!
    Ive seen a lot of threads on this, but I can't figure it out. I got 16-60 millions single end reads in each library. Ive used Tophat 2 with UCSC GTF file for hg19.

    This is my code:

    samtools view accepted_hits.bam | \
    htseq-count -m intersection-nonempty -s no -a 10 \
    - UCSC/hg19/genes.gtf \
    > Out.txt

    Here is a typical result, its propotional to the library size:

    no_feature 7013689
    ambiguous 269370
    too_low_aQual 0
    not_aligned 0
    alignment_not_unique 6645341

    How come i get on average 25 - 50% reads that is "no_feature",
    "ambiguous" or "alignment_not_unique".

    This is RNAseq, and if I must visually inspect, how to precede?

Latest Articles

Collapse

  • seqadmin
    The Impact of AI in Genomic Medicine
    by seqadmin



    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
    02-26-2024, 02:07 PM
  • seqadmin
    Multiomics Techniques Advancing Disease Research
    by seqadmin


    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

    A major leap in the field has
    ...
    02-08-2024, 06:33 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 02-28-2024, 06:12 AM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-23-2024, 04:11 PM
0 responses
70 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-21-2024, 08:52 AM
0 responses
79 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-20-2024, 08:57 AM
0 responses
69 views
0 likes
Last Post seqadmin  
Working...
X