Seqanswers Leaderboard Ad

**dpryan** · 03-19-2014, 10:54 AM

My guess would be that you have some orphan alignments in the BAM files (i.e., you have some reads marked as being paired-end but lacking a mate in the file). HTSeq-count has a new option ("-r pos") that allows you to input a file that's coordinate sorted. Also, there's featureCounts, which I don't think has ever required name-sorted input (it's also faster than htseq-count).

**mparida85** · 03-19-2014, 12:37 PM

Hi dpryan
Thanks for you reply. You may be right about the orphaned alignment. It still doesn't answer where does that read comes from. I couldn't find a read such as the above from my raw data file. Also as suggested I tried the following:
a) python ~/Software/HTSeq-0.6.1/scripts/htseq-count -f bam -t gene -s no -a 20 $CONTROL1_1 $ANNOTATION
Error occured when processing SAM input (record #109 in file /Users/mparida/Sample_1_code15801B2R/1_code15801B2R_ATCACG_L008_SORTED_QUERY_NAME.bam):
'pair_alignments' needs a sequence of paired-end alignments

b) python ~/Software/HTSeq-0.6.1/scripts/htseq-count -f bam -t gene -r pos -s no -a 20 $CONTROL1_1 $ANNOTATION
Error:Error occured when processing SAM input (record #22 in file /Users/mparida/Sample_1_code15801B2R/1_code15801B2R_ATCACG_L008_SORTED.bam)
Sequence of paired-end alignments expected, but got single-end alignment.

Do you suggest removing those orphaned alignments from my analysis that have no read IDs? Or is this another issue?

Once again thanks a ton for your time and replies.

**dpryan** · 03-19-2014, 02:10 PM

What's the output if you run:

Code:

samtools view -cF 1 /Users/mparida/Sample_1_code15801B2R/1_code15801B2R_ATCACG_L008_SORTED.bam

htseq-count generally deals well with the orphaned reads (it just issues a warning, since how these should be handled is a bit ambiguous in the specification), but I wonder if mixing paired and single-end data in one file leads to this sort of error (in fact, looking through the source code suggests that that's the only circumstance that can lead to this).

**mparida85** · 03-22-2014, 12:41 PM

samtools view -cF 1 /Users/mparida/Sample_1_code15801B2R/1_code15801B2R_ATCACG_L008_SORTED.bam = 524336
Yes I merged both my PE and SE files together to create /Users/mparida/Sample_1_code15801B2R/1_code15801B2R_ATCACG_L008_SORTED.bam

**dpryan** · 03-23-2014, 01:43 AM

htseq-count doesn't seem to handle that case. Perhaps featureCounts will, I've never tried. Otherwise, just count the paired-end and single-end separately.

**vkartha** · 06-08-2014, 09:33 AM

I was trying to process some paired-end RNASeq data and attempted to use the latest version of htseq (0.6.1). For some reason, it doesn't make it past a certain record complaining about expecting paired-end alignments (as your error suggests)

.....
.....
.....
52700000 SAM alignment record pairs processed.
52800000 SAM alignment record pairs processed.
52900000 SAM alignment record pairs processed.
53000000 SAM alignment record pairs processed.
Error occured when processing SAM input (line 53063638):
'pair_alignments' needs a sequence of paired-end alignments
[Exception type: ValueError, raised in __init__.py:603]

What's annoying is that there are a little more than 53 million records in this alignment file which means it almost finished running. I don't see why htseq can't support reads with unmapped mates (singletons), as you proposed (I did get one warning while it was running for a particular instance where a mate was not found: "Warning: Read DGL9ZZQ1:503:C28YCACXX:1:1101:1213:73341 claims to have an aligned mate which could not be found in an adjacent line.").

Looking at the SAM file line which threw the error - I don't see anything especially wrong :

"samtools view accepted_hits.bam | sed '53063638q;d'

DGL9ZZQ1:503:C28YCACXX:5:2104:4502:43742 161 Scaffold9997 1534367 50 101M Scaffold9970 1335931 0 CTCTGTTTCAGTAAATTCACATTCCTTTGACAAGATCACCGTCCATTAAATGCAACACTACAGCCCTTCATGTCTGGATTAGCACTTGCATAGACTTCGTA CCCFFDFFHHHHHJGHIJJJJIJIIJJJJJJIJJJJIHJJJIJJIIJJJJIJJJIJJJJJJJJJJJJJJIJJJJJJHHFHHHFFFFFEEEECEDDDDDDDC AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:98A2 YT:Z:UU NH:i:1"

I have no idea how to get around this and have never come across this issue, having run htseq-count several times in the past

Any help would be greatly appreciated

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, Today, 06:35 AM	0 responses 9 views 0 likes	Last Post by seqadmin Today, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, Yesterday, 02:46 PM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 14 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 18 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Counting alignments

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News