Seqanswers Leaderboard Ad

**sklages** · 09-02-2009, 11:39 PM

Originally posted by greigite View Post

I'm working in Consed with a hybrid Sanger/454 assembly that I generated using gsAssembler. I'm pretty familiar with consed and would like to use it to join contigs and analyze SNPs as I have done with Sanger-only assemblies. However, I'm running into some problems:

Welcome aboard

(1) When I try to view a read trace, consed calls "sff2scf" on Sanger reads as if they were 454 instead of reading the pre-existing scf file. This results in an error as there is no sff file for Sanger reads. This is causing problems when, for example, I want to extend or change the consensus sequence, since Consed requires this be done from the trace window.

In this case you should use your own script to catch the type of read requested by consed and react accordingly.
Or you just write a simple script which puts your sanger chromat to /tmp .
Consed resources which you should consider:

Code:

consed.alwaysRunProgramToGetChromats
consed.programToRunToGetChromats
consed.uncompressedChromatDirectory
consed.programToRunToGetChromatsOf454Reads

The chromat_file is renamed following the Newbler convention of adding suffixes to reads based on the location of their mate pairs. For example, for a Sanger read named "ABCD.g1" the relevant lines in the ace file look like this:

DS CHROMAT_FILE: ABCD.g1.548-1.fm12429.pr12429 PHD_FILE: AB CD.g1.548-1.fm12429.pr12429.phd.1 TIME: Thu Jul 27 12:33:48 2000 CHEM: unknown DYE: unknown TEMPLATE: ABCD DIRECTION: rev.

Perhaps changing the chromat_file path in the ace file would help, but not if consed always calls "sff2scf".

That is a typical newbler "problem". I never understood how newbler could break reads and put parts of these reads to different locations.
That's a severe problem, especially for sanger reads, as you loose all your read pair info.

Maybe you can change or copy your chromat file to ABCD.g1.548-1.fm12429.pr12429 to enable consed to open the file.

(2) Lengthy "unaligned" regions are present at the start and end of contigs. To my eye at least some of these regions look quite well aligned and frequently contain sequence overlapping with other contigs, which is necessary to manually join them using the "Compare Contigs" command. Why are these considered "unaligned" by newbler? And how can I use them to join contigs, since consed won't allow unaligned regions to be used in the "compare contigs" window?

Just another newbler thing ... you could manually extend the consensus ... but that's pretty annoying.

All these problems may vanish if the software suite gets updated sometime this year (probably by the end of the year ;-) )

... or not.

Anyone have some insight into these issues, or tools for hybrid Sanger/454 assemblies in general?

We are using newbler for a quick overview for denovo assembly/mapping or as a reference assembly.

For routine work we go for MIRA3 (ESTs, metagenome, wgs) or celera assembler (large wgs projects). Both assemblers work very well with 454/sanger hybrid data.

For finishing we use either Consed (large projects) or Staden's Gap4, depending on what to do with the data ..

hth,
Sven

**maubp** · 09-03-2009, 01:39 AM

Originally posted by greigite View Post

(2) Lengthy "unaligned" regions are present at the start and end of contigs. To my eye at least some of these regions look quite well aligned and frequently contain sequence overlapping with other contigs, which is necessary to manually join them using the "Compare Contigs" command. Why are these considered "unaligned" by newbler? And how can I use them to join contigs, since consed won't allow unaligned regions to be used in the "compare contigs" window?

Originally posted by sklages View Post

Just another newbler thing ... you could manually extend the consensus ... but that's pretty annoying.

All these problems may vanish if the software suite gets updated sometime this year (probably by the end of the year ;-) )

... or not.

I've also seen this on Newbler 2.00.01 de novo assemblies of just 454 data (high coverage), but don't have an automated way of dealing with it yet.

**greigite** · 09-03-2009, 03:15 PM

Thanks very much for this information, sklages. I've been playing around with getting this to work but there is still an issue with displaying sanger chromats.

I wrote a small perl script to identify 454 vs Sanger chromats and redirect the chromat files to /tmp where Consed can find them (I can post this if there is interest). You have to set some consed parameters to find the script (chromat_redirect.pl):

Code:

consed.programToRunToGetChromats=chromat_redirect.pl
consed.alwaysRunProgramToGetChromats=last
consed.uncompressedChromatDirectory=/tmp

The script works fine to display 454 chromats and to copy Sanger chromats to /tmp, but there is a new problem relating to discrepancies between the phd files created by newbler and the original Sanger chromat files:

Code:

ace file: 454Contigs.ace.1
Version 19.0 (090206)
Sorry--the chromatogram file /tmp/ABCD12783.b1.482-291.fm24208.to24208 has 10349 trace array points while the phd file ABCD12783.b1.482-291.fm24208.to24208.phd.1
was made from a chromatogram with 15592.  This means that someone 
overwrote the original chromatogram file.   Check the file dates on the 
chromatogram file and the phd file.  To correct this, I would suggest deleting the phd file and running the phredPhrap script again.  To prevent this from happening again, find out who/why the chromatogram was switched.  Sorry.

The source of the problem is this line in the phd file:

Code:

TRACE_ARRAY_MAX_INDEX

which is written in by newbler, and differs from the info in the original chromat file.
Possibly this is happening because I used phred to trim the Sanger reads before putting them into newbler, so now the original chromats are a different length (though if this is the case, it doesn't make sense that there are fewer trace points in the original chromat than in the phd created by newbler). I may try manually editing the trace point info in the newbler phd files to see if that helps. If not I think I may give up on fixing this issue.

That is a typical newbler "problem". I never understood how newbler could break reads and put parts of these reads to different locations.
That's a severe problem, especially for sanger reads, as you loose all your read pair info.

I actually don't think it loses the pair info when it breaks the read- at least it still indicates the location of the mate pair in the read name using the suffix ".pr". The large "unaligned" regions are annoying though. Hopefully mira will work better once I figure out how to use it!

**sklages** · 09-03-2009, 11:24 PM

Originally posted by greigite View Post

Code:

TRACE_ARRAY_MAX_INDEX

which is written in by newbler, and differs from the info in the original chromat file.
Possibly this is happening because I used phred to trim the Sanger reads before putting them into newbler, so now the original chromats are a different length (though if this is the case, it doesn't make sense that there are fewer trace points in the original chromat than in the phd created by newbler). I may try manually editing the trace point info in the newbler phd files to see if that helps. If not I think I may give up on fixing this issue.

.... I am not really sure if it is worth fixing it ... you will probably run into the next problem ...

I actually don't think it loses the pair info when it breaks the read- at least it still indicates the location of the mate pair in the read name using the suffix ".pr". The large "unaligned" regions are annoying though. Hopefully mira will work better once I figure out how to use it!

One part of a read where it should be, the other part unaligned at this position but aligned to another contig .. so my mate pair only consists of parts of the reads? Not very convincing

Depending of the kind of project your trying to assemble, MIRA is doing a very good job.

There are two alignment formats created to further go for the most popular "finishing packages",

a CAF file which easily is converted to Staden's Gap4 or Gap5 format and
a ACE file which probably needs some fixing (as MIRA doesn't create any phd.ball files you probably need to fix the TIME stamps in the ACE file).

cheers,
Sven

**greigite** · 09-04-2009, 11:58 AM

reason for unaligned regions

Originally posted by maubp View Post

I've also seen this on Newbler 2.00.01 de novo assemblies of just 454 data (high coverage), but don't have an automated way of dealing with it yet.

I think this issue with extensive "unaligned" regions appearing in consed happens due to Newbler placing the same read in multiple locations. The read is renamed to indicate the portion found in a particular contig, and only that part is used to construct a consensus sequence, but the entire read is shown in the alignment. In consed this appears as greyed out sequence to the left of the contig start or the right of the contig end.
For example, the read name FY3Z7SM02JNDKE.456-510.fm1369 in contig 1370 indicates that positions 456-510 of the read are considered part of contig 1370 and the rest of the read belongs in contig 1369. However, looking at contig 1370, this read extends to position -454 before the start of the contig.
Presumably these overlaps are corrected in the scaffolds output by Newbler, but it makes manual contig joins by consed cumbersome for 454 reads (you have to change the consensus via the chromatogram) and impossible for Sanger reads (chromatogram can't be displayed).

**Broadie** · 08-23-2010, 11:47 AM

I have recently resolved this problem with the help of Jim Knight (the creator of Newbler).

The solution to getting your Sanger read traces to pop up in consed is quite simple. The first step is to use a version of Newbler later than 2.3 (such as the 4/19/2010 release).

Then, you must add the path to the chromat for each read into the headers of the fasta file.

for example, if your read name is ABCDEFG and your chromat is located in user/bin/chromats/ then your fasta header must look like this:

>ABCDEFG scf=/user/bin/chromats/ABCDEFG.scf

**sklages** · 08-23-2010, 11:54 AM

Originally posted by Broadie View Post

[...]
>ABCDEFG scf=/user/bin/chromats/ABCDEFG.scf

We do store our chromatograms in tarballs ... the solution is obviously just a quick hack ;-)

cheers,
Sven

**Broadie** · 08-23-2010, 11:57 AM

duct tape has many uses

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

consed issues with newbler generated hybrid assembly

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News