Seqanswers Leaderboard Ad

**GenoMax** · 04-22-2017, 12:41 PM

If you are a proficient programmer source code for bwa is available on SF. If you need technical help then you could post to bwa mailing list.

**dpryan** · 04-22-2017, 02:05 PM

I thought BWA (as ends up being the case with a lot of aligners) ends up concatenating the chromosomes together when indexing (with a bunch of N in between). Anyway, BWA at least used to occasionally produce alignments that extended beyond the chromosome bounds, so if you have an example that makes it crash then post that on github as an issue.

**Greeeeb** · 04-22-2017, 05:57 PM

Thanks!

I will go ahead and check the code.

The problem happens when our code is running, not the BWA. Also, I tried to read the indexing step output files. The large ones are not simple text formate. If that was the case, I would try to input small, tailored reference genomes to see how what is the output from the indexing.

As I noted, our program can handle 1 chromosome without problems. So, I merged the chromosomes of a human genome into one; but BWA exits without indexing the whole genome. The largest single chromosome reference BWA indexed successfully was about 160M bytes.

**GenoMax** · 04-23-2017, 04:29 AM

Not a direct answer but if you are open to substituting the aligner then BBMap from BBMap suite is an option. It is written in pure Java and author of BBMap participates (Brian Bushnell) here regularly.

**Greeeeb** · 04-23-2017, 09:13 AM

Actually it may not be possible to use something else. Since I am not well familiar with BWA, I am afraid I would not be able to preserve the compatibility with the code when I replace BWA commands with BBmap. Figuring out how to read the reference the way BWA does is the safest way.

Thanks.

**Greeeeb** · 04-25-2017, 07:57 AM

I found a way to reconstruct the reference subsequence for an aligned read from the alignment information in SAM file. This is the "MD" field in the output. Here is example:

"MD: A string which summarizes the mismatch positions between the aligned read and the reference genome.

Example:
MD:Z:8G61 indicates a single base pair mismatch. Specifically, the aligned read matches the first 8 bases of the reference, after which it fails to match a G in the reference sequence, followed by 61 exact matches to the reference.
"
Source: http://biobits.org/samtools_primer.html

Thanks for all.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 24 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

BWA indexing mechanism

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News