Seqanswers Leaderboard Ad

**GenoMax** · 07-11-2015, 01:12 PM

Having 80% of your data multi-map (when 93+% of the data is mapping) seems to indicate that there is some issue with either the data you started with or the assembly that was generated.

Was the quality of the sequences you started with good (reasonably clean FastQC results)? Is this data for an organism with no reference genome (or is a closely related genome available)? How did the assembly look (in terms of number of contigs, length of contigs, genes you expected to find)?

**Brian Bushnell** · 07-11-2015, 02:06 PM

Originally posted by GenoMax View Post

Having 80% of your data multi-map (when 93+% of the data is mapping) seems to indicate that there is some issue with either the data you started with or the assembly that was generated.

I think it's OK in this case, as he's mapping RNA-seq data to a transcriptome. I'm more concerned with the fact that almost 100% of the alignments are discordant, which indicates the reads don't go together, or pair ordering was broken.

First, you need to re-pair the reads, or perhaps even validate that they are from the same library. You could alternatively reprocess them from the raw reads. Be sure to process read 1 and read 2 at the same time to keep them together. In fact, it might help if you displayed all the command lines you used up for processing up to mapping.

Secondly, for mapping RNA to a transcriptome, you should not use Tophat2, which is designed for spliced alignments to a genome; it's better to use Bowtie2 or something else.

EDIT: Looking at the statistics again, I think the mapping command was wrong, and the same file was specified twice. Note that the exact same number of read 1 was mapped as read 2, with the exact same number having multiple alignments, and the same number with mapq>20; that can't be a coincidence.

**TLongi1** · 07-12-2015, 01:06 AM

Brian: Oh yes, very good observation! I completely missed that. I will check it right away. Thanks

Max: The quality of the sequences was good. Yes, I sequenced the transcriptome of an ant species and there are no genomes of closely related species out there. That's why I had to do a denovo assembly. I agree that there seems to be a problem with the data and hope (also it would be a bit shameful..) that it is what Brian suggested

**TLongi1** · 07-13-2015, 07:20 AM

Indeed, I accidently used the Reverse reads twice rather than 1x reverse and 1x forward. This completely solved the problem. Thanks guys!

**GenoMax** · 07-13-2015, 07:24 AM

Ouch!

What did the new stats look like (compared to one's posted above)?

Topics	Statistics	Last Post
The Role of Spliceosomes in RNA Splicing and Genome Evolution by seqadmin Started by seqadmin, 05-14-2024, 07:03 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-14-2024, 07:03 AM
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 42 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 53 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 42 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM

Seqanswers Leaderboard Ad

Announcement

Low hits in express

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News