Seqanswers Leaderboard Ad

**Simon Anders** · 12-10-2013, 01:59 AM

htseq-count counts each read pair for the gene it maps to as long as it maps unambiguously to one gene. If one of the two mates could not be aligned, htseq-count uses the information from the aligned mate only.

What to you mean by "including single reads"? Do you have a sample with a mixture of paired-end and single-end reads?

**Dinesh Heisnam** · 12-10-2013, 02:55 AM

Hi Simon. thanks for the reply.

what we generate is the paired end read. then we run the Q20 dynamic trim of SolexaQA package to filtered out low quality reads. Dynamic trim give two output-one single reads and other paired reads. we used both single read and paired read.
so what is your suggestion. should we removed the single reads for mapping.

**Simon Anders** · 12-10-2013, 04:21 AM

Why should you? I'm still unsure whether I understand where you problem is.

You will need to align the paired-end and the single-end FASTQ files in separate runs of the aligner, anyway, because the aligner won't like it if you pass a mixture of both. And then you get two SAM files and count htseq-count for each file. And if you like, you can add up the counts from the two files.

And as an aside: All modern alignment tool take the base-call qualities into account and will leave a bad-quality read unaligned. I believe that there is not need to filter out low-quality reads beforehands. Better leave this to your aligner, it know better when a read is too bad to be mappable.

**Dinesh Heisnam** · 12-10-2013, 05:42 AM

thanks. Simon
i go through the top hat manual again and it says the same as you suggest.
well it seems to the right to align with fastq reads.
one more problem
we got consistent warnings says 'not mate found. is SAM properly sorted?' we check the sorted SAM and it and seems fine. After ignoring such warnings we got "0" counts for many GTF transcripts (Cufflink) with HTseq. we are not sure whether it is because of the not finding mate.
please suggest

**Simon Anders** · 12-10-2013, 06:28 AM

Is you file sorted by read name (not: by position)?

**dpryan** · 12-10-2013, 07:23 AM

If you're trying to count by transcript rather than gene and there are multiple overlapping transcripts (quite common), then it'd be unsurprising to have 0 counts for most things.

BTW, the warning message is probably due to the odd tophat behaviour of leaving out unmapped mates if it can't map both as a pair and only one maps as a singleton.

**Simon Anders** · 12-10-2013, 07:52 AM

Spotted with keen eye by Devon. Counting reads per transcript is rarely useful. (See my several earlier posts on this.) What did you intend to do with these counts?

**Dinesh Heisnam** · 12-10-2013, 10:17 PM

we sort the SAM with read name with the command 'sort -s -k 1,1 A.SAM > Sorted.SAM

**Dinesh Heisnam** · 12-10-2013, 10:46 PM

we have RNA Seq data for six different conditions. we mapped filtered reads of these conditions to reference genome independently and get the SAM and with the maping information we run Cufflink. After cufflink, the redundancy was removed withh cuffmerge.
We are counting the GFF transcripts from the cuffmerge. i am not sure whether we should call it gene or transcripts. we intend to take these counts as input for DESeq.
however we are stuck in this counting step.

**dpryan** · 12-11-2013, 01:17 AM

If a gene has multiple transcripts then that won't work. If you ended up squishing all of the transcripts of a gene together into a single "union gene model" then that likely will work. When in doubt, (1) look at the data visually with IGV/a text editor and (2) run htseq-count with the -o option so you can see what happens to some reads that you think (from looking at things in IGV or a similar tool) should increase the count of a gene/transcript.

**Dinesh Heisnam** · 12-11-2013, 02:29 AM

Hi Ryan,
when we check the sorted SAM with "-s -c 1,1 sorted A.sam" we get error.
So this suggest the sorting is the problem.
And sorting the sam file takes long time. is there any alternative to sort the sam. let me remind you that the sam file size is ~120GB.

**dpryan** · 12-11-2013, 02:40 AM

Yeah, if you get an error then likely something went wrong. Just stick to using BAM files (they're smaller anyway) and then just "samtools sort -n A.bam A.namesorted".

**Dinesh Heisnam** · 12-11-2013, 03:00 AM

we are trying samtools and will let you know.
thanks

**Dinesh Heisnam** · 12-12-2013, 01:25 AM

Hi Ryan
we tried to sort the BAM file with the samtools with the command 'samtools sort -n -m maxMem A.bam -0 sortA.bam. however sorting is taking longer time more than 24 hrs and it not complete yet. And a number of bam files are also generated in the file.

**dpryan** · 12-12-2013, 01:43 AM

Yeah, with 40 GB files that'll still take a while. There are some implementations of samtools that can do multithreaded sorting. Have a look through this thread for those. There's also biobambam (there's also a publication that I can't seem to find at the moment), which is supposed to be generally faster but I've never actually used.

Topics	Statistics	Last Post
The Role of Spliceosomes in RNA Splicing and Genome Evolution by seqadmin Started by seqadmin, 05-14-2024, 07:03 AM	0 responses 15 views 0 likes	Last Post by seqadmin 05-14-2024, 07:03 AM
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 37 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 46 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 39 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News