Seqanswers Leaderboard Ad

**Simon Anders** · 10-08-2012, 11:43 PM

There are good reasons to align against the genome, not the transcriptome (see various previous threads here). Please don't deviate from the accepted best practise unless you know what you are doing. Especially, using TopHat and htseq-count when aligning against a transcriptome does not make any sense. (Read up on what these tools do precisely to se why.)

**cpleis** · 10-10-2012, 09:50 AM

Simon,
Thank you for your response. I re-did the alignment using the complete nucleotide sequence (Gmax_109.fa) instead of the transcriptome using TopHat. I then sorted the accepted_hits.bam file by read name (-n) using samtools and then converted this sorted file into a SAM file. When I loaded this file into htseq-count I am getting this error repeatedly:
Warning: Skipping read 'DBRHHJN1:263:C0REVACXX:5:1101:1886:109580', because chromosome 'scaffold_785', to which it has been aligned, did not appear in the GFF file.

I am still using the same htseq-count code as before the same GFF3 file. I am not really sure how to tackle this error.

Thanks!

**Simon Anders** · 10-10-2012, 11:11 AM

So, does the scaffold appear in your GFF file?

Have you checked whether the chromosome names in the genome fasta file and in the gff file match?

**cpleis** · 10-10-2012, 12:45 PM

Simon,
It looks like my gff file does not contain information about scaffold_785 and that is why reads mapping to scaffold_785 are skipped for counting. I'm assuming then that there is most likely there is no annotation information available for scaffold_785 and so is not present in gff file. There are only a handful of scaffolds that are being skipped. My chromosome names are the same in both files.

I re-ran the analysis and got an output file with:
no_feature 2421689
ambiguous 345554
too_low_aQual 0
not_aligned 0
alignment_not_unique 19096489

I started with 44,613,416 reads so I'm assuming this output is "normal" (?).

Thanks for the help!

**Simon Anders** · 10-14-2012, 04:26 AM

Should be okay. Only the fraction of reads with ambiguous alignment is quite high. Transcribed sequence is usually not that repetitive that you would have so many reads mapping to more than one location. Maybe somethig went wrong in your assembly, and some contigs appear in several scaffolds.

**cpleis** · 10-15-2012, 09:06 AM

Would a large number of isoforms not account for the high number of ambiguous alignments? I was also going to try running my htseq-count alignment with a different mode. Currently I am using union, but I was also going to try running it with intersection_nonempty to see what kind of output I get.

If it is a problem with my alignment how would I troubleshoot that? Thanks!

**Simon Anders** · 10-15-2012, 09:25 AM

Did you align against the genome or the transcriptome? If the latter, using htseq-count is pointless, of course.

**cpleis** · 10-15-2012, 09:31 AM

I re-did the alignment against the genome.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Issue with htseq-count

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News