Announcement

Collapse
No announcement yet.

Sam to Bam using bowtie and using the shell script

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sam to Bam using bowtie and using the shell script

    Hi ,
    I am writing a script to convert sam file to bam files
    I wrote script like this


    #!/bin/bash
    for files in *.sra
    do
    OUT=(${files%.sra}.bam)
    ref=($"/raid/references-and-indexes/hg19/bwa/hg19.fa")
    fastq-dump -Z $files|bwa aln -t 4 $ref -|bwa samse $ref -| samtools view -bS -o > $OUT
    done

    I got the following doubts
    bwa aln database.fasta short_read.fastq > aln_sa.sai

    bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam

    How would my script can understand that I am taking fastq dump output and giving to bwa samse . Can some body help me to improve my code and tell me where I am going wrong .
    And also I want output my file to specific directory instead of current directory like this
    I want all my bam files in /raid/development/anusha/python_test/fetal_brain_bam_bwa

    Thanks in advance ,
    Anusha.Ch

  • #2
    Hi All ,
    Sorry I meant bwa alignment

    Thanks in advance,
    Anusha.Ch

    Comment


    • #3
      Originally posted by AnushaC View Post
      fastq-dump -Z $files | bwa aln -t 4 $ref - | bwa samse $ref - | samtools view -bS -o > $OUT
      ...
      I want all my bam files in /raid/development/anusha/python_test/fetal_brain_bam_bwa
      I Don't really understand the specifics of your question, so will assume that the bit before the 'samtools' pipe produces a properly-formatted SAM file.

      I like sorting the BAM files at the same time (in preparation for indexing), because it can all be done in the one pipe, so you don't need to keep the unsorted BAM file around:

      Code:
      fastq-dump -Z $files | bwa aln -t 4 $ref -|bwa samse $ref  - | \
         samtools view -bS - | \
         samtools sort - /raid/development/anusha/python_test/fetal_brain_bam_bwa/$OUT
      Last edited by gringer; 10-31-2013, 03:32 PM.

      Comment


      • #4
        Hi ,
        I wrote script similar this one like fastq-dump -Z SRR203400.copy.sra|bwa aln -t 7 /raid/references-and-indexes/hg19/bwa/hg19.fa -|bwa samse /raid/references-and-indfexes/hg19/bwa/hg19.fa - -|samtools view -bT /raid/references-and-indfexes/hg19/bwa/hg19.fa - > scrapSRR203400.copy.bam
        The sam file generating from bwa samse is not generating an indexed samfile so i think samtools view -bS option is not working . I tried use -T with reference file and -t option it did not work.

        Thanks,
        Anusha.Ch

        Comment


        • #5
          .sam files are never indexed, only .bam files are. samtools view -bS will work fine with a sam file. I use it all the time.

          If you are going to pipe from .sam | .bam | sorted.bam, I would use -Shu instead of -bS. -b compresses the .bam file, -u leaves it uncompressed, which makes the sort faster. In some versions of samtools, samtools sort will complain that the EOF is missing, but it's nothing to worry about.

          Comment


          • #6
            Hi ,
            I am getting multiple errors while running pipeline these are following errors
            Usage: bwa samse [-n max_occ] [-f out.sam] [-r RG_line] <prefix> <in.sai> <in.fq>
            [samopen] no @SQ lines in the header.
            [sam_read1] missing header? Abort!
            [bwa_aln] 17bp reads: max_diff = 2
            [bwa_aln] 38bp reads: max_diff = 3
            [bwa_aln] 64bp reads: max_diff = 4
            [bwa_aln] 93bp reads: max_diff = 5
            [bwa_aln] 124bp reads: max_diff = 6
            [bwa_aln] 157bp reads: max_diff = 7
            [bwa_aln] 190bp reads: max_diff = 8
            [bwa_aln] 225bp reads: max_diff = 9
            [bwa_aln_core] calculate SA coordinate... 71.48 sec
            [bwa_aln_core] write to the directory

            [samopen] no @SQ lines in the header.
            [sam_read1] missing header? Abort!
            [bwa_aln_core] calculate SA coordinate... 53.11 sec
            [bwa_aln_core] write to the disk... [bwa_aln] 17bp reads: max_diff = 2
            [bwa_aln] 38bp reads: max_diff = 3
            [bwa_aln] 64bp reads: max_diff = 4
            [bwa_aln] 93bp reads: max_diff = 5
            [bwa_aln] 124bp reads: max_diff = 6
            [bwa_aln] 157bp reads: max_diff = 7
            [bwa_aln] 190bp reads: max_diff = 8
            [bwa_aln] 225bp reads: max_diff = 9
            Usage: bwa samse [-n max_occ] [-f out.sam] [-r RG_line] <prefix> <in.sai> <in.fq>
            [samopen] no @SQ lines in the header.
            [sam_read1] missing header? Abort!
            [bwa_aln_core] calculate SA coordinate... 53.71 sec
            [bwa_aln_core] write to the disk... [bwa_aln] 17bp reads: max_diff = 2
            [bwa_aln] 38bp reads: max_diff = 3
            [bwa_aln] 64bp reads: max_diff = 4
            [bwa_aln] 93bp reads: max_diff = 5
            [bwa_aln] 124bp reads: max_diff = 6
            [bwa_aln] 157bp reads: max_diff = 7
            Usage: bwa samse [-n max_occ] [-f out.sam] [-r RG_line] <prefix> <in.sai> <in.fq>
            [bwa_aln] 190bp reads: max_diff = 8
            [bwa_aln] 225bp reads: max_diff = 9
            [samopen] no @SQ lines in the header.
            [sam_read1] missing header? Abort!

            Then i changed my code little bit like this

            fastq-dump -Z $files | bwa aln -t 8 $ref -|bwa samse $ref - | \samtools view -Shu - | \
            # samtools sort - >/raid/development/anusha/python_test/fetalbrain_bam_bwa/$OUT


            fastq-dump -Z $files | bwa aln -t 4 $ref -|bwa samse $ref - - | \
            samtools view -bS - >/raid/development/anusha/python_test/fetal_brainbam_bwa/$OUT
            done
            samtbam_bwa.copy.sh: line 3: 33602 Broken pipe fastq-dump -Z $files
            33603 | bwa aln -t 4 $ref -
            33604 Exit 1 | bwa samse $ref - -
            33605 Aborted | samtools view -bS - > /raid/development/anusha/python_test/fetal_brainbam_bwa/$OUT
            [bwa_aln] 17bp reads: max_diff = 2
            [bwa_aln] 38bp reads: max_diff = 3
            [bwa_aln] 64bp reads: max_diff = 4
            [bwa_aln] 93bp reads: max_diff = 5
            [bwa_aln] 124bp reads: max_diff = 6
            [bwa_aln] 157bp reads: max_diff = 7
            [bwa_aln] 190bp reads: max_diff = 8
            [bwa_aln] 225bp reads: max_diff = 9
            [bwa_aln_core] calculate SA coordinate... 49.96 sec
            [bwa_aln_core] write to the disk... [fclose] Bad file descriptor
            [samopen] SAM header is present: 93 sequences.
            [sam_read1] reference 'SN:chr18_gl000207_random LN:4262


            ' is recognized as '*'.
            [main_samview] truncated file.
            [bwa_aln] 17bp reads: max_diff = 2
            [bwa_aln] 38bp reads: max_diff = 3
            [bwa_aln] 64bp reads: max_diff = 4
            [bwa_aln] 93bp reads: max_diff = 5
            [bwa_aln] 124bp reads: max_diff = 6
            [bwa_aln] 157bp reads: max_diff = 7
            [bwa_aln] 190bp reads: max_diff = 8
            [bwa_aln] 225bp reads: max_diff = 9
            [bwa_aln_core] calculate SA coord

            In a weird way

            Comment


            • #7
              bwa_aln_core] calculate SA coordinate... 47.61 sec
              [bwa_aln_core] write to the disk... [bwa_aln_core] convert to sequence coordinate... 4.23 sec
              [bwa_aln_core] refine gapped alignments... 0.55 sec
              [bwa_aln_core] print alignments... [samopen] SAM header is present: 93 sequences.
              [sam_read1] reference '0' is recognized as '*'.
              Parse warning at line 95: mapped sequence without CIGAR
              Parse error at line 95: sequence and quality are inconsistent
              Aborted

              Comment


              • #8
                Hi ,
                Can anybody can explain me in detail samse requires three inputs one is fastq file ,fast file and fai file . While I am specifying the reference but I how it can take fastq dump out and take it as input since in the intermediate there is a sai file. So what do for piping part

                Comment

                Working...
                X