How to convert genoma.fa to chr*.fa

dpryan replied

07-26-2013, 12:22 AM
It's difficult to diagnose your problem when you use multiple pipes. Just run the enumerate command and save that to a file, then try the subsequent commands one at a time and then post which didn't work (probably sort-bed or bedops) and some of the input you're giving it.
Leave a comment:
krafiq replied

07-25-2013, 09:03 PM
I just tried building a bowtie index using the genome.fa file (without splitting it) and it gave me 4 .ebwt files. And then I used the script:

./enumerateUniquelyMappableSpace.pl 50 Genome | sort-bed - | bedops -m - > Genome.50.mappable_only.bed

but it gave the following error:

Failed to read 50
Warning: Could not find any reads in "-"
# reads processed: 0
# reads with at least one reported alignment: 0 (0.00%)
# reads that failed to align: 0 (0.00%)
No alignments
Leave a comment:
dpryan replied

07-25-2013, 12:17 AM
Originally posted by krafiq View Post

EricHaugen: What's the "bowtie_index_prefix" in the 2nd option you gave me above?

It's the prefix for the output index files, so it can be anything you want. Normal examples would be "hg19", "mm9" and "mm10", for human and two mouse genome versions. When bowtie is later invoked to do alignments, this same prefix is given to it so it knows what to align things against.
Leave a comment:
dpryan replied

07-25-2013, 12:15 AM
Originally posted by krafiq View Post

dpryan: I'm sorry-could you please clarify a bit as to what exactly I should do?
Also, is there a way to get the source code for bowtie?

The source code for bowtie is available on its website here or here.

Regarding the remainder, enumerateUniquelyMappableSpace is just a perl script that executes a few other commands, some of which won't work for you because of how the script is structured. I already deleted the Hotspot code (I don't use it) so I can't immediately give you exact changes, but the gist is that you can just edit the code to have bowtie-build index genome.fa rather than a too-long list of scaffold.fa files. There may be a few other lines that will throw errors for similar reasons and you can likely use the same strategy. This all assumes that you know enough to edit a bit of code, of course.
Leave a comment:
krafiq replied

07-24-2013, 11:09 PM
EricHaugen: What's the "bowtie_index_prefix" in the 2nd option you gave me above?
Leave a comment:
krafiq replied

07-24-2013, 10:52 PM
dpryan: I'm sorry-could you please clarify a bit as to what exactly I should do?
Also, is there a way to get the source code for bowtie?
Leave a comment:
dpryan replied

07-24-2013, 01:13 AM
Well, if you read through the perl scripts, it'll become pretty apparent that they only designed hotspot around human/mouse/etc. genomes (rather than your situation with scaffolds), so you're probably going to have to just edit the script. It's just trying to run bowtie-build, which will effectively concatentate everything together anyway (it looks like they normally use individual chromosome files so things can more easily be split to later run on a cluster). The script is pretty simple, so go ahead and change it to suite your usage needs.
Leave a comment:
krafiq replied

07-24-2013, 12:56 AM
dpryan: I had to split the genome.fa file to get the individual files in the first place to use the hotspot software. should i still cat them? won't that bring it back to genome.fa?
Leave a comment:
dpryan replied

07-24-2013, 12:52 AM
Try concatenating the files together first (you'll have to do it in a couple batches, since the command will be too long for "cat" too) and just use the multifasta file.
Leave a comment:
krafiq replied

07-24-2013, 12:48 AM
My enumerateUniquelyMappableSpace script is calling this line:
bowtie-build $chromosomeFiles $genome

The chromosome files variable in this case is a list of 30,000 file names. So when I run the script, it gives me the following error:

Argument list too long

is there a way around this?
Leave a comment:
sivasubramani replied

07-23-2013, 08:02 PM
In HHblits package, there is a script which does the way you want.

HHblits_src/scripts/splitfasta.pl
Leave a comment:
EricHaugen replied

07-23-2013, 07:23 PM
It looks like "bowtie-build" isn't in your PATH, so the shell couldn't find it.

Try adding a line near the top of "enumerateUniquelyMappableSpace" like:

export PATH=$PATH:/location/of/this/script/folder

Then it should be able to find bowtie-build, and the Perl script it calls later will be able to find your bowtie executable there also.
Leave a comment:
krafiq replied

07-23-2013, 06:17 PM
Thanks all!!

EricHaugen: I'm trying option 1 for now. I'm trying to run the script again with bowtie and bowtie-build in the same folder as the script. But it's giving me the following error:
./enumerateUniquelyMappableSpace: line 30: bowtie-build: command not found

And then it goes on to give the following error multiple times:
Failed to find bowtie index file Genome.1.ebwt

Does anyone know why and what I should change?

Thanks!

Last edited by krafiq; 07-25-2013, 09:02 PM.
Leave a comment:
fengqi replied

07-23-2013, 04:30 PM
Did you try
'samtools faidx genome.fasta chrX > chrX.fasta'
Leave a comment:
EricHaugen replied

07-22-2013, 04:30 PM
Two options:

1. Change "chr" to "scaffold" in the enumerateUniquelyMappableSpace bash wrapper script, to list the individual fasta files.

2. Just run the whole genome fasta file, after building a bowtie index, with:

enumerateUniquelyMappableSpace.pl read_length bowtie_index_prefix genome.fa | sort-bed - | bedops -m - > genome.read_length.mappable_only.bed

If "sort-bed" runs out of memory here, the BEDOPS suite includes a "bbms" script that can be used in place of sort-bed.
Leave a comment:

Previous 1 2 template Next

Exploring the Dynamics of the Tumor Microenvironment

by seqadmin

The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
- Channel: Articles
07-08-2024, 03:19 PM

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 24 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News