Header Leaderboard Ad

Collapse

BWA sampe fails - invalid BAM binary header (

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA sampe fails - invalid BAM binary header (

    Hi,
    I am trying to use BWA for analysing human whole exome PE data (Illumina HiSeq) for the first time.
    I downloaded the hg19 genome as a reference as indexed it with bwa index.
    Then aligned with bwa aln with the following format of command (for eacg fastq file):

    nohup bwa aln ~/work_area/hg19_genome/hg19_chromFa.tar.gz SG1177_2_sequence.txt > SG1177_2_aln_sa.sai &

    from the ~10Gb fastq files I got ~1.3 Gb sai files.

    But when I am trying to use sampe to get the sam files, it doesn't work, and the output file contains the following comments:
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bwa_read_seq] the maximum barcode length is 63.
    then the list of chromosomes
    and end.

    Here is the sampe command I used:
    nohup bwa sampe ~/work_area/hg19_genome/hg19_chromFa.tar.gz SG1177_1_aln_sa.sai SG1177_2_aln_sa.sai SG1177_1_sequence.txt SG1177_2_sequence.txt &

    What did I do wrong?

    Many thanks in advance!

  • #2
    Hello,

    According to the bwa documentation, the reference should be a FASTA file.


    http://bio-bwa.sourceforge.net/bwa.shtml


    Your reference is hg19_chromFa.tar.gz -- this is a compressed archive likely containing many files.

    Sébastien Boisvert

    Originally posted by Lilach View Post
    Hi,
    I am trying to use BWA for analysing human whole exome PE data (Illumina HiSeq) for the first time.
    I downloaded the hg19 genome as a reference as indexed it with bwa index.
    Then aligned with bwa aln with the following format of command (for eacg fastq file):

    nohup bwa aln ~/work_area/hg19_genome/hg19_chromFa.tar.gz SG1177_2_sequence.txt > SG1177_2_aln_sa.sai &

    from the ~10Gb fastq files I got ~1.3 Gb sai files.

    But when I am trying to use sampe to get the sam files, it doesn't work, and the output file contains the following comments:
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bwa_read_seq] the maximum barcode length is 63.
    then the list of chromosomes
    and end.

    Here is the sampe command I used:
    nohup bwa sampe ~/work_area/hg19_genome/hg19_chromFa.tar.gz SG1177_1_aln_sa.sai SG1177_2_aln_sa.sai SG1177_1_sequence.txt SG1177_2_sequence.txt &

    What did I do wrong?

    Many thanks in advance!

    Comment


    • #3
      solution

      After solving the prolem, I am writing here the solution, in case somebody else will search for a solution for this proble:

      The problem was not in giving hg19_chromFa.tar.gz as a reference file. BWA knows to work with it.

      The reason for the problem was that I tried to run the bwa commands in the background (in nohup...&) or via a Putty SSH terminal (that crashed in the middle).
      This command should NOT be executed in the background, so the only way to run it in a far computer in by VPN. Then, run it without nohup ... &.

      Comment


      • #4
        Yeah, incompatibility with nohup has been reported on other threads on Seqanswers.

        Comment


        • #5
          The program screen is useful for that kind of workflow.

          You can start your things in a screen, close your window. After that, you can re-ssh to
          your box and reconnect to your screen. In one screen, you can have several tabs too.

          Sébastien

          Comment

          Latest Articles

          Collapse

          • seqadmin
            How RNA-Seq is Transforming Cancer Studies
            by seqadmin



            Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
            09-07-2023, 11:15 PM
          • seqadmin
            Methods for Investigating the Transcriptome
            by seqadmin




            Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

            Whole Transcriptome RNA-seq
            Whole transcriptome sequencing...
            08-31-2023, 11:07 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 09-22-2023, 09:05 AM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-21-2023, 06:18 AM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-20-2023, 09:17 AM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-19-2023, 09:23 AM
          0 responses
          29 views
          0 likes
          Last Post seqadmin  
          Working...
          X