Seqanswers Leaderboard Ad

**seqret** · 02-23-2012, 08:30 AM

I assume you used newbler, and that you had about 370 thousand reads? Did you check the 454NewblerMetrics.txt file and/or the 454ReadStatus.txt file to determine how many reads the assembler thought it used? I would guess that the assembly was very fragmented so that many of the reads ended up in contigs that were too small to report. When doing transcriptome assemblies, Newbler has some rules about what gets reported as isotigs, contigs, or not reported at all --- don't remember them all off the top of my head.

Also, you did tell the assembler that this is a cdna assembly project, correct?

**westerman** · 02-23-2012, 09:50 AM

@seqret ... note his first line. He used Trinity, not newbler. Then he used Velvet and MIRA.

Originally posted by cerebralrust View Post

I assembled plant transcriptome 454 data (non normalised) using trinity

I have been thinking about this problem. Hard to tell without looking at the data. However it is possible that Trinity, Velvet and MIRA are not up to the task. If you are recommending using Newbler then I heartily agree with that idea.

**kmcarr** · 02-23-2012, 02:37 PM

I'm wondering if the problem is not with the assembly but with the mapping. Is bwa the best tool to use here, or were the options used appropriate? (I'm asking because I'm not that familiar with bwa.) Frankly, if I had a set of contigs (putative transcripts) and wanted to map raw 454 reads back to them just to count I would use blat.

**lh3** · 02-23-2012, 09:42 PM

For 454, I recommend bwasw, bowtie2, smalt or tmap. Blat is a bit slow and does not output SAM.

**Jeremy** · 02-23-2012, 10:19 PM

I would recommend Newbler since it has been specifically designed for 454 data.
I am assuming that by mapping the reads back you are trying to get read counts per contig/isotig/isogroup yes?

If you use newbler you can get read counts per contig from the 454ReadStatus.txt file that is produced when you perform a transcriptome assembly. Just do a grep for 'Assembled' and count the number of times each contig appears, if you have different samples in different lanes you can do the appropriate grep to subset them also. This file lists the 3` and 5` match of each read so you effectively count each read twice. I don't think that is a problem since the reads are generally pretty long to begin with. This method means that some contigs may have a zero or low read count, but it does count every read so that should not be a problem after you sum the read counts of contigs to form read counts per isotig.

Alternatively you can grep 'Assembled', and make a subset of the assembled reads and then map them back to your contigs using GSMapper. I recommend only using reads with the assembled status to minimise false mapping. I use mapping for SNP deiscovery also, so I set -ais 1 which means that the mapped read needs to be a very good match.

**cerebralrust** · 02-25-2012, 01:27 AM

Thank you for all your suggestions, members!

@ seqret : As Rick pointed out, i've never used Newbler.

@ Rick : Using Newbler is not an option, i guess, since it is not open source and we got the sequenced data from a collaborator in the US. Perhaps my only option is to standardise mira parameters to improve the assembly?

@kmcarr : I was wondering about the mapping also. I will try mapping with bwasw and bowtie2 on the suggestion of lh3 since i require results in sam format also.

@lh3 : I will try all, compare and pick the best one.

@Jeremy : As i mentioned before, Newbler is not an option since it is not open source and i'm a poor undergraduate student. But i will keep your suggestions in mind for the future.

I suppose i'm left with the option of using mira with various combinations of parameters to get the best assembly.

If it may be of help to anyone, I should not have used Trinity for this data considering :

According to one of key developers of Trinity - Brian J. Haas' option:

"Ultimately, Trinity might not be the best tool for assembling 454 data, since coverage won't be anywhere near what is expected from Illumina in most cases, and Trinity exploits the high coverage data as part of reconstructing transcripts. The current version of Newbler is supposed to work especially well for 454 transcriptome data, so I encourage you to give that a try if you haven't already."

**kmcarr** · 02-25-2012, 07:56 AM

Originally posted by cerebralrust View Post

@Jeremy : As i mentioned before, Newbler is not an option since it is not open source and i'm a poor undergraduate student. But i will keep your suggestions in mind for the future.

Newbler may be proprietary but proprietary != $. You can obtain Newbler free of charge by completing the software request at this webpage. Note: I'm not sure if there are any restrictions for non-USA distribution.

**Jeremy** · 02-26-2012, 11:25 PM

Once you do get Newbler, you should use the .sff file(s) for assembly and mapping. This file has the quality scores as well as the fasta sequence so it will produce much better results than just a .txt of the sequence.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Too few reads mapping back to contigs

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News