Seqanswers Leaderboard Ad

**nilshomer** · 04-08-2010, 09:18 PM

Originally posted by jmartin View Post

I'm trying to get BFAST working as an aligner for me to use to attempt to detect human contamination in a bacterial metagenomic sample (everything will be 100mer Illumina reads). I am using the ensembl build 36 human genome + some additional novel regions from 2 other human genomes sequenced at the BGI. The total db size is ~3.0Gb, but it consists of 24 chromosomes that are VERY large, and then several thousand small sequences in addition to that. So its kind of a 'lopsided' database.

I successfully ran 'bfast fasta2brg' on the file, but now for the 'bfast index' step I was using the '-d 1' parameter to reduce the memory footprint. From other threads I'd gotten the idea that using '-d 1' would probably keep my memory footprint down to ~8Gb. But all my blade jobs keep dying when I request only 8Gb of memory. What kind of memory can I expect my job to require?

On another matter, I'm using the masks listed in the bfast manual for 'illumina reads > 40bp'. Should those be good enough for me to align Illumina 100mers, or would I be better off defining new masks? My goal is to identify human reads out from amongst bacterial sequences. So I believe I can be fairly relaxed in my search criteria without fear of falsely identifying bacterial reads as human.

I am would not expect more than 8GB is required when creating split indexes ("-d 1"). Nevertheless, in your case it looks like this is the case. Make sure you use the multi-threaded parameter nonetheless. Can you test with more memory?

As for the 100bp data, the masks are great for 100bp data.

**jmartin** · 04-12-2010, 08:42 AM

I was able to successfully index using 24Gb memory per blade job. At some point I may throttle down the memory and see what the minimum I can get by with is for my db which may grow somewhat.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

BFAST indexing memory requirements

Comment

Comment

Latest Articles

ad_right_rmr

News