Unconfigured Ad

GenoMax · 04-17-2015, 02:03 PM

You should align only 2 contigs that are most similar to each other. If you look closely at the PDF you posted you can probably make out which contigs are most similar to each other as pairs.

Again this is not going to help you a lot since you have 1.4M contigs.

If you don't know perl/python find a friend who can help parse the blat result file.

milo0615 · 04-17-2015, 01:20 PM

Originally posted by GenoMax View Post

For those larger contigs that are noted be similar by blat try using Mauve . You can get additional information from Mauve alignments: http://darlinglab.org/mauve/user-guide/files.html

Hi GenoMax,

I did use Mauve. Attached is a pdf with the Mauve results of largest 10 contigs for both assemblies (gerbera_alignment.pdf). However, Mauve does not work when I align the largest 10 contigs from assembly 1 to the whole assembly 2. I guess it is mostly designed for bacterial genome.

Attached Files

gerbera_alignment.pdf (258.1 KB, 4 views)

GenoMax · 04-17-2015, 11:31 AM

For those larger contigs that are noted be similar by blat try using Mauve . You can get additional information from Mauve alignments: http://darlinglab.org/mauve/user-guide/files.html

milo0615 · 04-17-2015, 11:06 AM

Originally posted by GenoMax View Post

@Milo: With 1.4M+ input sequences those viewer programs are not going to be useful (and your results are not in blast format either).

You can get an idea of coverage estimate by using Brian's suggestion in this thread: http://seqanswers.com/forums/showthread.php?t=44035 You will have to use the raw data for this. This suggestion is not directly related to question you asked, but may be worth while to do, since you are working with a unknown genome.

Hi GenoMax,

I took the longest 10 contigs from each assembly (>8500 bp with a max length of 120 K for assembly 1 and 104 K for assembly 2) and BLAT aligned them together. They seem to be pretty similar but I would like more in depth information about their differences. I am going to give it a try at what you suggested.

Thank you for your help and please let me know if you have any other ideas.

Thank you,

-Emilio

GenoMax · 04-16-2015, 05:15 PM

@Milo: With 1.4M+ input sequences those viewer programs are not going to be useful (and your results are not in blast format either).

You can get an idea of coverage estimate by using Brian's suggestion in this thread: http://seqanswers.com/forums/showthread.php?t=44035 You will have to use the raw data for this. This suggestion is not directly related to question you asked, but may be worth while to do, since you are working with a unknown genome.

GenoMax · 04-16-2015, 05:10 PM

Perhaps you should concentrate on largest (what is the size range of the largest contigs? Are there any that are 10kb and above?) ones first and see if they are related. That would make your search space smaller.

milo0615 · 04-16-2015, 05:06 PM

Originally posted by GenoMax View Post

Looks like you are a ways away from having a real assembly. You likely will need some custom scripting to get a meaningful answer since I assume your blat result file (even in PSL format) is probably pretty large.

If the blat results were in blast format and you just wanted to visualize them then http://bioinformatics.oxfordjournals...31/8/1305.full or http://www.biomedcentral.com/1471-2105/15/128 would have been useful.

Do you have a related reference genome available? What is the expected genome size for your samples?

Hi GenoMax,

I do no have a related reference genome. Therefore, I aligned both assemblies against each other. My specie is a diploid plant with a 2C DNA value estimated at 5.1 pg (about 5.0 Gb).

I am going to try the ones that you suggested. Is there a better way of comparing both assemblies? How can I get an alignment percentage coverage? Please let me know.

Thank you,

-Milo

GenoMax · 04-16-2015, 04:49 PM

Looks like you are a ways away from having a real assembly. You likely will need some custom scripting to get a meaningful answer since I assume your blat result file (even in PSL format) is probably pretty large.

If the blat results were in blast format and you just wanted to visualize them then http://bioinformatics.oxfordjournals...31/8/1305.full or http://www.biomedcentral.com/1471-2105/15/128 would have been useful.

Do you have a related reference genome available? What is the expected genome size for your samples?

milo0615 · 04-16-2015, 04:11 PM

Originally posted by GenoMax View Post

How big are these genomes? How many contigs are there in each?

Hi GenoMax,

Thank you for replying.

Assembly a is 444.5MB and it contains 1.4M contigs
Assembly b is 526.1MB and it contains 1.6M contigs

Please advice.

Thank you,

-Milo

GenoMax · 04-16-2015, 03:09 PM

How big are these genomes? How many contigs are there in each?

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 10 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Comparative Genomics - BLAT

Latest Articles

ad_right_rmr

News