Seqanswers Leaderboard Ad

**mastal** · 09-01-2010, 03:17 PM

Hi,

Are you using the free version of Novoalign?

I found that the free version of Novoalign was comparable to Maq for speed. The first alignment I tried ran for about 6 days, and hadn't quite finished after all that time, so your results seem about right.

The licensed version supports multithreading, but I haven't tried it.

Maria

**zee** · 09-01-2010, 06:55 PM

Hi Kristen,

I would definitely recommend that you use the unlocked version for better performance as more features are available , including multithreading, quality calibration.

Novoalign is designed to be sensitive and works best with good quality data. Now I'm not sure what the quality of your data is but it would help to know what your average per-base quality along the read is. If your data is bad quality it would be good to remove erroneous read pairs with some preprocessing tools like FASTQc

If you would like to get more assistance and a license key please visit our website at www.novocraft.com and fill in a request form. We will be happy to assist you to obtain better performance.

Originally posted by Kristen View Post

Hi,

I am a new member & am new to working with Novoalign. I am trying to align paired end data & was wondering if anyone knows approximately how long this typically takes?

My data is 25 million reads in each file, so total 50 million. It's a mosquito reference genome.

I've been running one file for about one week now, but it is still running, so I was wondering if there might be a problem.

The output Sam file is about 6 GB as of now. & the alignment efficiency is not high for our data. If anyone knows why this might be happening as well, I would appreciate your answers.

The computer I'm running novoalign on is a MAC server with 16 GB RAM & the processor is: 2 X 2.26 GHz Quad-Core Intel Xeon.

Again, I'd appreciate any help or links to websites that might help me figure out more.

Thanks, Kristen

**sparks** · 09-02-2010, 12:03 AM

Hi Kristen,

It should be a lot faster than that as Mosquito genome isn't that large. Several things can affect performance.
First is to get a trial license so that you have multi-threading, you can request that via the web site www.novocraft.com but even without this 5 days is way too long.
Could you provide the commands that you used to build the index and to run novoalign, things like setting the threshold too high may slow down the alignments. (if you're running SAM report format just the output to stderr log will do or if Native format a head of the report)
The alignment process may also slow down if the base calls have low qualities. You can filter out low quality reads using the -l option (set to about 60% of read length) or in latest versions by using the polyclonal filter (-p option).
You might also use a Unix command like top to see what CPU utilisation Nvoalign is getting. If it's running single threaded it should be 100% if it's lower than this then maybe some other processes are competing for resources. If you're using the multi-threaded version CPU utilisation should be up around 800%.

Colin

**Kristen** · 09-02-2010, 08:26 AM

Yes, we're currently using the free version, version 2.06. We used FastQC to check the quality of our data. Our data is far above the 20 level. Doesn't this mean our quality should be high enough to work with novoalign?

Here are the commands used to build the index: novoindex -k 14 -s 1 m_index 2L.fa 2R.fa 3L.fa 3R.fa UNKN.fa X.fa Y_unplaced.fa

These are the command line arguments to run novoalign: novoalign -o SAM -f 1.fastq 2.fastq -d m_index > control.sam

Also, is there a way to set novoalign to allow higher mismatches for the reads? We're currently using the default 2; is there a way to set it to 5 or 10?

Thanks for all of your help & let me know if there's any other information you need. We are going to try it again after downloading the trial.

**sparks** · 09-02-2010, 04:19 PM

Hi Kristen,

OK, for novoindex you should try without setting k&s options, the default values will give much better performance. With a genome the size of mosquito most of the 14-mers in the index will not exist in the genome and this reduces efficiency of the algorithm. Default k&s are choosen so that each index entry has 5-20 references to the genome and this gives good efficiency.
The 14/1 index will also be quite large and you may have a problem with other processes competing for memory on your server especially as you are running single threaded.

There was a problem in Novoindex on Mac OSX version in choosing default k&s, this was fixed in V2.06.00 so you should be OK and default should be around -k12 -s1.
With a default index I expect it should take 10hrs on single thread and around 90mins with multi-threading.

Colin

**Kristen** · 09-03-2010, 10:18 AM

Thanks for all of your help. We are trying your recommendations.

Do you also happen to know if there is a way to allow higher mismatches for the reads? Are there any parameters that we can use to set this or is there a default already built in to the code? I am asking because the alignment efficiency is low; only about 10% of reads can be pair end aligned. For others, both of the ends cannot be aligned, or one will align but the other can't.

Thanks again for all of your help.

**zee** · 09-03-2010, 11:04 AM

Kristen,

That is interesting to know. I'm not sure of the source of your reads but I am guessing that you are trying to align slightly divergent genomes. Novoalign was designed primarily for resequencing and if there is very little similarity between your reads and the genome then it may not be the best tool to use. Could you provide more information about how these reads are related to the reference?

Another scenario why the reads do not map so well is perhaps the presence of an adaptor sequence that the aligner does not know about. Novoalign can do three-prime and five-prime adaptor trimming of the read before it matches it to the genome. Have your checked for perhaps some sort of contamination of the library?

The threshold parameter "-t" controls specificity and by default it is dynamic with an upper bound of 250. Setting it to 250 will alllow more mismatches on the read but could also lead to more repeat or low-scoring alignments. Try the -r All options to get a sense of all the possible locations a short read will map to.

Originally posted by Kristen View Post

Thanks for all of your help. We are trying your recommendations.

Do you also happen to know if there is a way to allow higher mismatches for the reads? Are there any parameters that we can use to set this or is there a default already built in to the code? I am asking because the alignment efficiency is low; only about 10% of reads can be pair end aligned. For others, both of the ends cannot be aligned, or one will align but the other can't.

Thanks again for all of your help.

**sparks** · 09-05-2010, 09:55 PM

Hi Kristen,

As Zee mentioned, Novoalign doesn't default to 2 mismatches, it defaults an alignment score that ranges from about 90 for a 32 bp read up to 250 for 75 bp reads.
An alignment score will allow up to 8 mismatches or even more if there at low quality bases so usually that isn't the cause of the 10% alignment rate however it is probably the reason the alignments are slow, Novoalign uses an iterative approach where it tries to align with no mismatches and then gradually increases the mismatch allowance until an alignment is found or we reach an upper limit that's around 8 mismatches depending on read length and base qualities. If 90% aren't aligning it means that at least 90% are getting to the final iteration which is the slowest and hence the long run time.
A couple of things you could try are:
1. turning on quality calibration, just add option -k, and see if it increases the yield of alignments. It often helps if something went wrong with the sequencing run.
2. Turn on adapter trimming -a, if by chance your fragments were shorter than the read length you may have adapter on the reads. Trimming it off will improve alignment yield.

I also wonder if something is wrong with your reads such as contamination or just a really bad run of the sequencer. Could you send me 10k reads taken from around the middle of the read file. Email to colin at novocraft .... com

Thanks, Colin

**ac422** · 07-14-2011, 05:15 AM

Bowtie

hi

can anyone tell me how to input the sequence in bowtie? i dont know what to print in the command line to input the sequence?

Akash

**mastal** · 07-14-2011, 08:08 AM

Novoalign Alignment Time

Hi Akash,

Have a look at the bowtie website, the Manual and the Getting Started sections are very helpful.

Bowtie: An ultrafast, memory-efficient short read aligner

http://bowtie-bio.sourceforge.net

Best wishes,
Maria

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Novoalign Alignment Time

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News