No reads mapped using BWA SOLiD

lanner

Member

Join Date: Apr 2014

Posts: 29
- Share
- Tweet
#1

No reads mapped using BWA SOLiD

05-05-2014, 11:57 AM

I am trying to troubleshoot why no reads were mapped in my call to BWA SoLiD in Galaxy. I also posted about this on Biostar here (https://biostar.usegalaxy.org/p/7444/#7459).

Basically, below is the background of my data
1) Downloaded .sra file.
2) Used fastq-dump to convert .sra file to .fastq file
3) Only kept 1/10 sequences from .fastq file (it was too large).
4) On Galaxy: Ran "Fastq Groomer" tool to remove the leading quality score for the adaptor base (input 'Color Space Sanger', default for the rest)
5) On Galaxy: Ran "Manipulation Reads" to convert to base-space "0123. -> ATCGN" and remove the adaptor itself. This generated fastqcssanger file.
6) On Galaxy: Ran ""Map with BWA for SOLiD" tool on the fastqcssanger file and mrna.fa file

I downloaded the mrna.fa file from: (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/)

Jen, a member from Galaxy, noted: "It appears that your mapping run did not produce any hits. Look further down in the SAM file (past the @SQ lines). You can also generate general counts in Galaxy: run a tool from the SAMTools or Picard group such as 'flagstat' or 'SAM/BAM Alignment Summary Metrics'. From there, you may need to adjust the BWA setting to capture hits. If I remember correctly, the sequence fragments in your mrna.fa custom reference genome were quite short."

So, I ran both of her suggestions, and these were the outputs:

1) I ran "flagstat" on one of my BAM files, and it gave me an 11-line output:

6263972 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
406 + 0 mapped (0.01%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

2) Below is the output "log" of the SAM/BAM Alignment Summary Metrics on a BAM file (I bolded a line that does not look too good):

INFO:root:## executing java -Xmx4g -Djava.io.tmpdir='/galaxy/main/scratch' -jar /galaxy/main/deps/picard/1.56.0/devteam/picard/e0232cbac965/jars/CreateSequenceDictionary.jar REFERENCE=/galaxy/main/scratch/CollectAlignmentSummaryMetricsNnX2Ds.fasta OUTPUT=/galaxy/main/scratch/CollectAlignmentSummaryMetricsNnX2Ds.dict URI=dataset_8089824.dat TRUNCATE_NAMES_AT_WHITESPACE=None returned status 1 and stderr:
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp
[Mon May 05 10:49:41 CDT 2014] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/galaxy/main/scratch/CollectAlignmentSummaryMetricsNnX2Ds.fasta OUTPUT=/galaxy/main/scratch/CollectAlignmentSummaryMetricsNnX2Ds.dict URI=dataset_8089824.dat TRUNCATE_NAMES_AT_WHITESPACE=false NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Mon May 05 10:49:41 CDT 2014] Executing as [email protected] on Linux 2.6.32-358.23.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_40-b43
[Mon May 05 10:49:41 CDT 2014] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2025324544
Exception in thread "main" net.sf.samtools.SAMException: Sequence name contains invalid character: AF001540 1
at net.sf.samtools.SAMSequenceRecord.<init>(SAMSequenceRecord.java:80)
at net.sf.picard.sam.CreateSequenceDictionary.makeSequenceRecord(CreateSequenceDictionary.java:147)
at net.sf.picard.sam.CreateSequenceDictionary.makeSequenceDictionary(CreateSequenceDictionary.java:138)
at net.sf.picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:113)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:175)
at net.sf.picard.sam.CreateSequenceDictionary.main(CreateSequenceDictionary.java:93)

INFO:root:## executing samtools sort /galaxy-repl/main/files/008/119/dataset_8119150.dat dataset_8119150.dat.sorted returned status 0 and nothing on stderr

INFO:root:## executing java -Xmx4g -Djava.io.tmpdir='/galaxy/main/scratch' -jar /galaxy/main/deps/picard/1.56.0/devteam/picard/e0232cbac965/jars/CollectAlignmentSummaryMetrics.jar VALIDATION_STRINGENCY=LENIENT ASSUME_SORTED=true ADAPTER_SEQUENCE= IS_BISULFITE_SEQUENCED=false MAX_INSERT_SIZE=100000 OUTPUT=/galaxy/main/jobdir/006/922/6922621/dataset_8126499_files/CollectAlignmentSummaryMetrics.metrics.txt R=/galaxy/main/jobdir/006/922/6922621/dataset_8126499_files/CollectAlignmentSummaryMetricsNnX2Ds.fasta_fake.fasta TMP_DIR=/galaxy/main/scratch INPUT=/galaxy/main/jobdir/006/922/6922621/dataset_8126499_files/dataset_8119150.dat.sorted.bam returned status 0 and nothing on stderr

And the output "txt" of the same program:

## net.sf.picard.metrics.StringHeader
# net.sf.picard.analysis.CollectAlignmentSummaryMetrics MAX_INSERT_SIZE=100000 ADAPTER_SEQUENCE=[AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG, IS_BISULFITE_SEQUENCED=false] INPUT=/galaxy/main/jobdir/006/922/6922621/dataset_8126499_files/dataset_8119150.dat.sorted.bam OUTPUT=/galaxy/main/jobdir/006/922/6922621/dataset_8126499_files/CollectAlignmentSummaryMetrics.metrics.txt REFERENCE_SEQUENCE=/galaxy/main/jobdir/006/922/6922621/dataset_8126499_files/CollectAlignmentSummaryMetricsNnX2Ds.fasta_fake.fasta ASSUME_SORTED=true TMP_DIR=[/galaxy/main/scratch] VALIDATION_STRINGENCY=LENIENT METRIC_ACCUMULATION_LEVEL=[ALL_READS] IS_BISULFITE_SEQUENCED=false STOP_AFTER=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
## net.sf.picard.metrics.StringHeader
# Started on: Mon May 05 10:50:52 CDT 2014

## METRICS CLASS net.sf.picard.analysis.AlignmentSummaryMetrics
CATEGORY TOTAL_READS PF_READS PCT_PF_READS PF_NOISE_READS PF_READS_ALIGNED PCT_PF_READS_ALIGNED PF_ALIGNED_BASES PF_HQ_ALIGNED_READS PF_HQ_ALIGNED_BASES PF_HQ_ALIGNED_Q20_BASES PF_HQ_MEDIAN_MISMATCHES PF_MISMATCH_RATE PF_HQ_ERROR_RATE PF_INDEL_RATE MEAN_READ_LENGTH READS_ALIGNED_IN_PAIRS PCT_READS_ALIGNED_IN_PAIRS BAD_CYCLES STRAND_BALANCE PCT_CHIMERAS PCT_ADAPTER SAMPLE LIBRARY READ_GROUP
UNPAIRED 6263972 6263972 1 0 406 0.000065 19883 63 3078 2442 0 0.00176 0.004873 0.001257 49.999925 0 0 0 0.578818 0 0

The seq fragments in my mrna.fa reference genome varied in length. In my .fastq files, they were 50 nt.

I have no idea how to adjust my BWA to capture this, though. Could anyone help me with how to interpret this? I guess there is no solution but to remap (but if there is a faster solution, I am going for faster time right now), but what should I change in my BWA SOLiD to remap , and hope to get some hits this time?

Many thanks...
Tags: bwa align color solid, count, galaxy
lanner

Member

Join Date: Apr 2014

Posts: 29
- Share
- Tweet
#2

05-05-2014, 11:58 AM

In case it is useful, below is the head of some of my files:

Below is the head output of some of my files:

hg19.gtf:

chr1 hg19_refGene start_codon 67000042 67000044 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67000042 67000051 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene exon 66999825 67000051 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67091530 67091593 0.000000 + 2 gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene exon 67091530 67091593 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67098753 67098777 0.000000 + 1 gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene exon 67098753 67098777 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67101627 67101698 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene exon 67101627 67101698 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67105460 67105516 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";

mrna.fasta:

>AF001540 1
ggcacgaggcaggtctgtctgttctgttggcaagtaaatgcagtactgtt
ctgatcccgctgctattagaatgcattgtgaaacgactggagtatgatta
aaagttgtgttccccaatgcttggagtagtgattgttgaaggaaaaaatc
cagctgagtgataaggctgagtgttgaggaaatttctgcagttttaagca
gtcgtatttgtgattgaagctgagtacatttgctggtgtatttttaggta
aaatgcttttttgttcatttctgggtggtgggaggggactgaagccttta
gtcttttccagatgcaaccttaaaatcagtgacaagaaacattccaaaca
agcaacagtcttcaagaaattaaactggcaagtggaaatgtttaaacagt
tcagtgatctttagtgcattgtttatgtgtgggtttctctctcccctccc

FourA.sam:

@SQ SN:AF001540 LN:1781
@SQ SN:AF001541 LN:1138
@SQ SN:AF001542 LN:2992
@SQ SN:AF001543 LN:903
@SQ SN:AF001544 LN:434
@SQ SN:AF001545 LN:370
@SQ SN:AF001546 LN:1142
@SQ SN:AF001547 LN:1092
@SQ SN:AF034176 LN:7232
@SQ SN:AF038950 LN:2384

FourA.count (head and tail)

head:
NM_000014 0
NM_000015 0
NM_000016 0
NM_000017 0
NM_000018 0
NM_000019 0
NM_000020 0
NM_000021 0
NM_000022 0
NM_000023 0

tail:
NR_046431 0
NR_046432 0
NR_046433 0
NR_046435 0
NR_046436 0
no_feature 257
ambiguous 0
too_low_aQual 0
not_aligned 6366159
alignment_not_unique 0
Comment

Previous template Next

Advanced Sequencing Platforms Tackle Neuroscience’s Toughest Genomics Problems

by SEQadmin2

Genomics studies in neuroscience face a special challenge due to the brain’s complexity and scarcity of samples. Mapping changes in cell type and state using conventional next-generation sequencing methods remains challenging. Advances in technologies like single-cell sequencing, spatial transcriptomics, and long-read sequencing have opened the door to deeper studies of the brain and diseases like Alzheimer’s, amyotrophic lateral sclerosis (ALS), and schizophrenia.
...
- Channel: Articles
07-09-2026, 11:10 AM
Cancer Drug Resistance: The Lingering Barrier to Rising Survival

by SEQadmin2

Cancer survival rates have significantly increased in the last few decades in the United States, reaching a combined 70% 5-year survival rate by 2021. Behind this number, there are years of research to find new therapies, drug targets, and early detection methods. But there is one core challenge that keeps slowing down these advances, and it’s about drug resistance.

There is no single reason why many patients don’t respond to treatment as expected. Cancer is...
- Channel: Articles
07-08-2026, 05:17 AM
Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing

by GATTACAT

Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
- Channel: Articles
07-01-2026, 11:43 AM

Topics	Statistics	Last Post
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 30 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 34 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM

Unconfigured Ad

No reads mapped using BWA SOLiD

Comment

Latest Articles

ad_right_rmr

News