what command line options do you mean? stranded=yes is default behaviour, is it not? I'm using "-t exon" options and that's about it, really. Default strandedness and default "overlap" mode all work for me...
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by apredeus View Postwhat command line options do you mean? stranded=yes is default behaviour, is it not? I'm using "-t exon" options and that's about it, really. Default strandedness and default "overlap" mode all work for me...Originally posted by gringer View PostThe default options are fine for doing this.
Comment
-
Originally posted by dlepe View PostSince my protocol wasn't stranded I should be losing half the counts when --stranded=yes but as you can see this was not the case.. I tried the same for some Illumina data I have access to and got this, which I think its alright.
Comment
-
Error: 'pair_alignments' needs a sequence of paired-end alignments
Hi Simon,
I'm new to HTSeq and encountered an error when analyzing our stranded paired-end RNA-seq data.
Let me describe the procedure of my analysis.
1. Use cutadapt to trim adapters and trim by quality
2. Use tophat2 to map to the mouse genome
3. Sort the accepted_hits.bam by name with samtools
4. Error reported when I was trying to count the read using htseq-count
'pair_alignments' needs a sequence of paired-end alignments.
I search the forum, this error is reported when combining paired-end and single end read accepted_hits.bam into one input file for htseq-count. I know htseq-count handles orphan reads from tophat output .bam file well. Is it possible cutadapt causes some problem in our case? Thanks.
Jason
Comment
-
Hi everyone,
sorry for sneaking in this thread, but I guess you can definitively help and maybe it is also an easy one (- I am not a bioinformatician...yet I am trying to do my best!)
I am doing RNA-seq on human samples. I have run several samples on 2x75 Illumina HiSeq200, then processed raw reads with Tophat, run samtools to convert bam into sam files, sorted sam file and proceeded with Htseq. *flagstat results from samtools are all ok (excellent mapping and paired mapping rate).
Here my Htseq commands:
htseq-count -m HTSeq.scripts.count -m intersection-nonempty -s yes -i gene_id file.sam annotation.gtf > HTSeq_counts.txt 2> HTSeq_sterr.txt
Although HTSeq count provides the .txt file with all the ensemble ids and relative counts, the sterror file is a huge (3-4Gb) text file full of warnings:
Warning: Read HWI-ST1144:606:H9U7WADXX:1:2208:15447:12819 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
So I figured out that maybe I had to sort the sam file. I did it with this command, which I found in other threads:
sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted
Now I run HTSeq again (same commands, sorted sam file as input) and the sterror file comes out gigantic as before.
I really don't know how to proceed...
1) is this error file a relatively acceptable thing?
2) am I doing something terribly wrong which I am not aware of?
I appreciate any help...
thanks!!!
Manu
Comment
-
Originally posted by kmcarr View PostAre you positive that the method/kit used to prepare the library is not strand-specific? The Life Tech SOLiDâ„¢ Total RNA-Seq Kit does generate strand-specific libraries. How was your library prepared?
Comment
-
Originally posted by apredeus View Postflags 0/16 should be enough to tell the strand.
did you try to visualize your bam (or small portion of it) in IGV or something like that? Make sure you have a picture you would expect to see.
Comment
-
Originally posted by fanli View PostThis sorts your alignment by genomic position. You want to sort by read name:
thanks for your reply! Meanwhile I had done samtools sort -n on my .bam file, samtools view -h to create a .sam file and run HTSeq again.
I guess it worked, as the sterror file created was way way smaller, and contains this:
100000 sam line pairs processed.
200000 sam line pairs processed.
300000 sam line pairs processed.
500000 sam line pairs processed.
600000 sam line pairs processed.
700000 sam line pairs processed.
800000 sam line pairs processed.
900000 sam line pairs processed.
1000000 sam line pairs processed.
1100000 sam line pairs processed.
1200000 sam line pairs processed.
1300000 sam line pairs processed.
1400000 sam line pairs processed.
1500000 sam line pairs processed.
1600000 sam line pairs processed.
1700000 sam line pairs processed.
1800000 sam line pairs processed.
1900000 sam line pairs processed.
2000000 sam line pairs processed.
2100000 sam line pairs processed.
2200000 sam line pairs processed.
2300000 sam line pairs processed.
2500000 sam line pairs processed.
[...]
31936941 sam line pairs processed.
So I guess this is correct, right?
So I guess my question is: does it make any difference practically if you sort the .bam and then make it .sam or if you sort a .sam file with the command you suggested? I am too ignorant to appreciate the difference!
Thanks again!!!
Comment
Latest Articles
Collapse
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, Yesterday, 05:31 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 05:31 AM
|
||
Started by seqadmin, 10-24-2024, 06:58 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
10-24-2024, 06:58 AM
|
||
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types
by seqadmin
Started by seqadmin, 10-23-2024, 08:43 AM
|
0 responses
48 views
0 likes
|
Last Post
by seqadmin
10-23-2024, 08:43 AM
|
||
Started by seqadmin, 10-17-2024, 07:29 AM
|
0 responses
58 views
0 likes
|
Last Post
by seqadmin
10-17-2024, 07:29 AM
|
Comment