Seqanswers Leaderboard Ad

**pmiguel** · 01-13-2011, 10:52 AM

Originally posted by FelipeAd View Post

More specifically, what I did is I used blast2go (blastx) initially and then a Fisher's exact test of my two libraries of interest. I do not know if this was the wrong part but i will try using a blastn of one against another.

Another issue that i did not mention is that I used very strict criteria for the assembly parameters. Namely, I set the Minimum overlap length to be 90 and the Minimum overlap identity to be 95.

Maybe this gives highly specific contigs (???) for each library?
I mean since a small part of the genome is highly expressed in combination with a requirement for strict alignment results could lead to this problem?

Okay, non-mammal sequence generally does not get very comprehensive annotation from blast2go. It get some but, it is limited. Further, your assembly parameters may have resulted in a lower percentage of your reads being assembled into contigs. So, in effect, you may be looking at a tiny percentage of your total sequence, and it doesn't happen to match.

Of course that does not rule out your sample getting mixed up with something else in the sequencing core. But blastn will tell you that.

--
Phillip

**pmiguel** · 01-14-2011, 04:06 AM

Originally posted by FelipeAd View Post

This is my next step. I am in the part of running again all the libraries as one. This is the reason I posted another post asking how I find which reads comrise which contigs. I could not find this in the output files, I did not check the .ace files, though.
I suppose it is better also to have each contig in a different .ace file, is that right?

You probably want to use the parameter that forces gsAssembler to put a read into no more than one contig. Its default behavior is to rip reads into parts and assemble those parts into different contigs. You could compensate for the ripping behavior -- with a RPKM approach, or such. But otherwise I would advise that you turn it off.

.ace is pretty easy to parse. (You can find the specs at phrap.org -- Roche follows them fairly well.)

Even without doing individual .ace files you can find which reads belong to which contigs with

egrep '^CO|RD '

Not that you want to. But the above shows the record structure of .ace is easy to parse.

--
Phillip

**kmcarr** · 01-14-2011, 06:18 AM

Originally posted by pmiguel View Post

You probably want to use the parameter that forces gsAssembler to put a read into no more than one contig. Its default behavior is to rip reads into parts and assemble those parts into different contigs. You could compensate for the ripping behavior -- with a RPKM approach, or such. But otherwise I would advise that you turn it off.

If you are using the cDNA assembly mode of gsAssembler I'm pretty sure that this option is ignored. That is to say it will always allow reads to be split across multiple contigs. This is fundamental to the model that reads may cross exon junctions.

**pmiguel** · 01-14-2011, 06:39 AM

Originally posted by kmcarr View Post

If you are using the cDNA assembly mode of gsAssembler I'm pretty sure that this option is ignored. That is to say it will always allow reads to be split across multiple contigs. This is fundamental to the model that reads may cross exon junctions.

I am talking gsAssembler, not gsMapper.

The switch works, even in combination with -cdna.

--
Phillip

**kmcarr** · 01-14-2011, 07:09 AM

Originally posted by pmiguel View Post

I am talking gsAssembler, not gsMapper.

The switch works, even in combination with -cdna.

--
Phillip

I am talking about gsAssembler too. I have tested using the -rip option in a -cdna project and can confirm by checking the 454ReadStatus.txt file that there are reads split between contigs.

**dschika** · 01-14-2011, 07:36 AM

The -rip option is in Newbler version 2.5 no longer available with the -cdna option.
(Newbler says: Warning: The -rip option has no effect for cDNA assembly projects.)

**kmcarr** · 01-14-2011, 07:43 AM

Originally posted by dschika View Post

The -rip option is in Newbler version 2.5 no longer available with the -cdna option.
(Newbler says: Warning: The -rip option has no effect for cDNA assembly projects.)

Yes, I was just testing my newly installed 2.5.3 and when I tried to run a -cdna project with the -rip option I got the warning

I believe this was also true in v2.3 but gsAssembler just silently ignored the -rip option.

**pmiguel** · 01-14-2011, 08:27 AM

Complaint about Newbler intra-read ripping.

Originally posted by kmcarr View Post

Yes, I was just testing my newly installed 2.5.3 and when I tried to run a -cdna project with the -rip option I got the warning

I believe this was also true in v2.3 but gsAssembler just silently ignored the -rip option.

Okay, I guess I'm being forced to accept the Newbler assembly model unless I want to switch to another assembler.

But why is it desirable to ignore the information implicit in a read as to what sequence is directly juxtaposed next to another with this ripping behavior?

--
Phillip

**kmcarr** · 01-14-2011, 08:59 AM

Originally posted by pmiguel View Post

Okay, I guess I'm being forced to accept the Newbler assembly model unless I want to switch to another assembler.

But why is it desirable to ignore the information implicit in a read as to what sequence is directly juxtaposed next to another with this ripping behavior?

--
Phillip

I would counter that the assembler does not ignore it. It uses this information as it is constructing isotigs by finding valid path traversals across contigs. The valid traversals are defined (in part) by reads which link contig ends.

**pmiguel** · 01-18-2011, 06:06 AM

Originally posted by kmcarr View Post

I would counter that the assembler does not ignore it. It uses this information as it is constructing isotigs by finding valid path traversals across contigs. The valid traversals are defined (in part) by reads which link contig ends.

But that would not be necessary if the assembler had not torn the read apart in the first place. What is the value added by tearing?

--
Phillip

**flxlex** · 01-19-2011, 02:28 AM

Originally posted by pmiguel View Post

But that would not be necessary if the assembler had not torn the read apart in the first place. What is the value added by tearing?

The actual tearing does not happen until the contigs are built. Newbler creates a graph of all read alignments, with the contigs as nodes, and reads that go from one to the next as edges. Repeats are one reason for the fact that a genome assembles into a graph in the first place.

In a way, creating contigs, i.e. tearing apart the graph and listing the nodes only, is a necessary evil, but the actual assembly is the whole graph. In contrast to other assembler, newbler chooses to tear apart within the reads, instead of assigning reads to a single contig. The 'tearing' information can be used to find which contigs are neighbors of each other.

Transcriptome assembly is somewhat special, as each gene is expected to result in a small contig graph of its own. This graph is than traversed in order to create isotigs (transcript variants).

Pardon the self-promotion, but I try to explain all of this here and here

**FelipeAd** · 01-19-2011, 06:37 AM

Originally posted by westerman View Post

As for the assembly reducing 20,000,000 bases down to around 100,000 bases, I see nothing inherently wrong with that. It could mean that you simply had 200x coverage. Of course it could also indicate a problem. A better metric is "how percentage of my reads are found in the contigs."

A more detailed description of my 'metrics' for my four libraries is the following:
Bases: 20,000,000
contigs: 150
Bases in contigs: 100,000

Bases: 26,000,000
contigs: 220
Bases in contigs: 140,000

Bases: 25,000,000
contigs: 80
Bases in contigs: 40,000

Bases: 30,000,000
contigs: 120
Bases in contigs: 60,000

I assume again there is something wrong with my assembly unless there is something that i did not take into consideration

**dschika** · 01-19-2011, 06:52 AM

Originally posted by kmcarr View Post

I believe this was also true in v2.3 but gsAssembler just silently ignored the -rip option.

v2.3 didn't ignore it. I started a thread about that a while ago...

FelipeAd:
What is the numAlignedReads or numAlignedBases in your 454NewblerMetrics.txt file (section consenusResults) ?

**FelipeAd** · 01-19-2011, 07:02 AM

Originally posted by dschika View Post

FelipeAd:
What is the numAlignedReads or numAlignedBases in your 454NewblerMetrics.txt file (section consenusResults) ?

The number of aligned reads range from 75% to 90% for all my libraries. But maybe i would not count on this very much since the output includes contigs of length even 2nt. That is why i refer only to the number of bases that are included in 'large' contigs only

**Jeremy** · 01-19-2011, 09:15 PM

Since your samples all come from the same plant then an assembly using all of the samples together will give you the most information. This has downstream benefits also as you will have only one set of Isotigs to annotate using Blast2GO or similar.

I have done something very similar and mapped the raw reads from each sample back against the contigs from the combined assembly in order to get read counts. Summing read counts for each contig used in an isotig or isogroup gives you isotig and isogroup read counts. Since gsMapper only maps each read once it should get around the problem of multiple contig assignment that you run into by just using the output file (although potentially introduces another problem where some reads will not be counted at all).

Also note that the 454Contigs.fna file has an error in the sequence where it appends the previous contigs' sequence for status=isotig contigs, as mentioned about half way through this thread:

Detection of alternative splicing events from 454 output - SEQanswers

http://seqanswers.com/forums/showthread.php?t=4732

Pyrosequencing in picotiter plates, custom arrays for enrichment/decomplexing. (Roche)

This thread is somewhat related to a question I have recently posted:

splitting 454 reads into kmers for diff expression - SEQanswers

http://seqanswers.com/forums/showthread.php?t=8956

Application of sequencing to RNA analysis (RNA-Seq, whole transcriptome, SAGE, expression analysis, novel organism mining, splice variants)

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News