Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lanner
    Member
    • Apr 2014
    • 29

    No reads mapped using BWA SOLiD

    I am trying to troubleshoot why no reads were mapped in my call to BWA SoLiD in Galaxy. I also posted about this on Biostar here (https://biostar.usegalaxy.org/p/7444/#7459).

    Basically, below is the background of my data
    1) Downloaded .sra file.
    2) Used fastq-dump to convert .sra file to .fastq file
    3) Only kept 1/10 sequences from .fastq file (it was too large).
    4) On Galaxy: Ran "Fastq Groomer" tool to remove the leading quality score for the adaptor base (input 'Color Space Sanger', default for the rest)
    5) On Galaxy: Ran "Manipulation Reads" to convert to base-space "0123. -> ATCGN" and remove the adaptor itself. This generated fastqcssanger file.
    6) On Galaxy: Ran ""Map with BWA for SOLiD" tool on the fastqcssanger file and mrna.fa file

    I downloaded the mrna.fa file from: (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/)

    Jen, a member from Galaxy, noted: "It appears that your mapping run did not produce any hits. Look further down in the SAM file (past the @SQ lines). You can also generate general counts in Galaxy: run a tool from the SAMTools or Picard group such as 'flagstat' or 'SAM/BAM Alignment Summary Metrics'. From there, you may need to adjust the BWA setting to capture hits. If I remember correctly, the sequence fragments in your mrna.fa custom reference genome were quite short."

    So, I ran both of her suggestions, and these were the outputs:

    1) I ran "flagstat" on one of my BAM files, and it gave me an 11-line output:

    6263972 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    406 + 0 mapped (0.01%:-nan%)
    0 + 0 paired in sequencing
    0 + 0 read1
    0 + 0 read2
    0 + 0 properly paired (-nan%:-nan%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (-nan%:-nan%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)

    2) Below is the output "log" of the SAM/BAM Alignment Summary Metrics on a BAM file (I bolded a line that does not look too good):

    INFO:root:## executing java -Xmx4g -Djava.io.tmpdir='/galaxy/main/scratch' -jar /galaxy/main/deps/picard/1.56.0/devteam/picard/e0232cbac965/jars/CreateSequenceDictionary.jar REFERENCE=/galaxy/main/scratch/CollectAlignmentSummaryMetricsNnX2Ds.fasta OUTPUT=/galaxy/main/scratch/CollectAlignmentSummaryMetricsNnX2Ds.dict URI=dataset_8089824.dat TRUNCATE_NAMES_AT_WHITESPACE=None returned status 1 and stderr:
    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp
    [Mon May 05 10:49:41 CDT 2014] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/galaxy/main/scratch/CollectAlignmentSummaryMetricsNnX2Ds.fasta OUTPUT=/galaxy/main/scratch/CollectAlignmentSummaryMetricsNnX2Ds.dict URI=dataset_8089824.dat TRUNCATE_NAMES_AT_WHITESPACE=false NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
    [Mon May 05 10:49:41 CDT 2014] Executing as [email protected] on Linux 2.6.32-358.23.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_40-b43
    [Mon May 05 10:49:41 CDT 2014] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
    Runtime.totalMemory()=2025324544
    Exception in thread "main" net.sf.samtools.SAMException: Sequence name contains invalid character: AF001540 1
    at net.sf.samtools.SAMSequenceRecord.<init>(SAMSequenceRecord.java:80)
    at net.sf.picard.sam.CreateSequenceDictionary.makeSequenceRecord(CreateSequenceDictionary.java:147)
    at net.sf.picard.sam.CreateSequenceDictionary.makeSequenceDictionary(CreateSequenceDictionary.java:138)
    at net.sf.picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:113)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:175)
    at net.sf.picard.sam.CreateSequenceDictionary.main(CreateSequenceDictionary.java:93)


    INFO:root:## executing samtools sort /galaxy-repl/main/files/008/119/dataset_8119150.dat dataset_8119150.dat.sorted returned status 0 and nothing on stderr

    INFO:root:## executing java -Xmx4g -Djava.io.tmpdir='/galaxy/main/scratch' -jar /galaxy/main/deps/picard/1.56.0/devteam/picard/e0232cbac965/jars/CollectAlignmentSummaryMetrics.jar VALIDATION_STRINGENCY=LENIENT ASSUME_SORTED=true ADAPTER_SEQUENCE= IS_BISULFITE_SEQUENCED=false MAX_INSERT_SIZE=100000 OUTPUT=/galaxy/main/jobdir/006/922/6922621/dataset_8126499_files/CollectAlignmentSummaryMetrics.metrics.txt R=/galaxy/main/jobdir/006/922/6922621/dataset_8126499_files/CollectAlignmentSummaryMetricsNnX2Ds.fasta_fake.fasta TMP_DIR=/galaxy/main/scratch INPUT=/galaxy/main/jobdir/006/922/6922621/dataset_8126499_files/dataset_8119150.dat.sorted.bam returned status 0 and nothing on stderr



    And the output "txt" of the same program:

    ## net.sf.picard.metrics.StringHeader
    # net.sf.picard.analysis.CollectAlignmentSummaryMetrics MAX_INSERT_SIZE=100000 ADAPTER_SEQUENCE=[AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG, IS_BISULFITE_SEQUENCED=false] INPUT=/galaxy/main/jobdir/006/922/6922621/dataset_8126499_files/dataset_8119150.dat.sorted.bam OUTPUT=/galaxy/main/jobdir/006/922/6922621/dataset_8126499_files/CollectAlignmentSummaryMetrics.metrics.txt REFERENCE_SEQUENCE=/galaxy/main/jobdir/006/922/6922621/dataset_8126499_files/CollectAlignmentSummaryMetricsNnX2Ds.fasta_fake.fasta ASSUME_SORTED=true TMP_DIR=[/galaxy/main/scratch] VALIDATION_STRINGENCY=LENIENT METRIC_ACCUMULATION_LEVEL=[ALL_READS] IS_BISULFITE_SEQUENCED=false STOP_AFTER=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
    ## net.sf.picard.metrics.StringHeader
    # Started on: Mon May 05 10:50:52 CDT 2014

    ## METRICS CLASS net.sf.picard.analysis.AlignmentSummaryMetrics
    CATEGORY TOTAL_READS PF_READS PCT_PF_READS PF_NOISE_READS PF_READS_ALIGNED PCT_PF_READS_ALIGNED PF_ALIGNED_BASES PF_HQ_ALIGNED_READS PF_HQ_ALIGNED_BASES PF_HQ_ALIGNED_Q20_BASES PF_HQ_MEDIAN_MISMATCHES PF_MISMATCH_RATE PF_HQ_ERROR_RATE PF_INDEL_RATE MEAN_READ_LENGTH READS_ALIGNED_IN_PAIRS PCT_READS_ALIGNED_IN_PAIRS BAD_CYCLES STRAND_BALANCE PCT_CHIMERAS PCT_ADAPTER SAMPLE LIBRARY READ_GROUP
    UNPAIRED 6263972 6263972 1 0 406 0.000065 19883 63 3078 2442 0 0.00176 0.004873 0.001257 49.999925 0 0 0 0.578818 0 0



    The seq fragments in my mrna.fa reference genome varied in length. In my .fastq files, they were 50 nt.


    I have no idea how to adjust my BWA to capture this, though. Could anyone help me with how to interpret this? I guess there is no solution but to remap (but if there is a faster solution, I am going for faster time right now), but what should I change in my BWA SOLiD to remap , and hope to get some hits this time?

    Many thanks...
  • lanner
    Member
    • Apr 2014
    • 29

    #2
    In case it is useful, below is the head of some of my files:

    Below is the head output of some of my files:

    hg19.gtf:

    chr1 hg19_refGene start_codon 67000042 67000044 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene CDS 67000042 67000051 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene exon 66999825 67000051 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene CDS 67091530 67091593 0.000000 + 2 gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene exon 67091530 67091593 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene CDS 67098753 67098777 0.000000 + 1 gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene exon 67098753 67098777 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene CDS 67101627 67101698 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene exon 67101627 67101698 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
    chr1 hg19_refGene CDS 67105460 67105516 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";


    mrna.fasta:

    >AF001540 1
    ggcacgaggcaggtctgtctgttctgttggcaagtaaatgcagtactgtt
    ctgatcccgctgctattagaatgcattgtgaaacgactggagtatgatta
    aaagttgtgttccccaatgcttggagtagtgattgttgaaggaaaaaatc
    cagctgagtgataaggctgagtgttgaggaaatttctgcagttttaagca
    gtcgtatttgtgattgaagctgagtacatttgctggtgtatttttaggta
    aaatgcttttttgttcatttctgggtggtgggaggggactgaagccttta
    gtcttttccagatgcaaccttaaaatcagtgacaagaaacattccaaaca
    agcaacagtcttcaagaaattaaactggcaagtggaaatgtttaaacagt
    tcagtgatctttagtgcattgtttatgtgtgggtttctctctcccctccc


    FourA.sam:

    @SQ SN:AF001540 LN:1781
    @SQ SN:AF001541 LN:1138
    @SQ SN:AF001542 LN:2992
    @SQ SN:AF001543 LN:903
    @SQ SN:AF001544 LN:434
    @SQ SN:AF001545 LN:370
    @SQ SN:AF001546 LN:1142
    @SQ SN:AF001547 LN:1092
    @SQ SN:AF034176 LN:7232
    @SQ SN:AF038950 LN:2384

    FourA.count (head and tail)

    head:
    NM_000014 0
    NM_000015 0
    NM_000016 0
    NM_000017 0
    NM_000018 0
    NM_000019 0
    NM_000020 0
    NM_000021 0
    NM_000022 0
    NM_000023 0

    tail:
    NR_046431 0
    NR_046432 0
    NR_046433 0
    NR_046435 0
    NR_046436 0
    no_feature 257
    ambiguous 0
    too_low_aQual 0
    not_aligned 6366159
    alignment_not_unique 0

    Comment

    Latest Articles

    Collapse

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by SEQadmin2, 06-05-2026, 10:09 AM
    0 responses
    10 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-04-2026, 08:59 AM
    0 responses
    22 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-02-2026, 12:03 PM
    0 responses
    28 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-02-2026, 11:40 AM
    0 responses
    22 views
    0 reactions
    Last Post SEQadmin2  
    Working...