Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tonio100680
    Member
    • Apr 2010
    • 25

    samtools & flagstat & awk

    Hi,
    Here is my command :
    $ samtools merge L002_LBCO1.bam L002_LBCO1_chr*.bam

    $ samtools view L002_LBCO1.bam | awk '$3== "chr1.fa" && $4>= 45787123 && $4<= 45787316 || $3== "chr1.fa" && $4>= 45790335 && $4<= 45790528' > essai_chr1.bam

    $ samtools flagstat L002_LBCO1.bam
    3933498 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    3868863 + 0 mapped (98.36%:nan%)
    3933498 + 0 paired in sequencing
    1966749 + 0 read1
    1966749 + 0 read2
    3787076 + 0 properly paired (96.28%:nan%)
    3804228 + 0 with itself and mate mapped
    64635 + 0 singletons (1.64%:nan%)
    7262 + 0 with mate mapped to a different chr
    7121 + 0 with mate mapped to a different chr (mapQ>=5)

    $ samtools flagstat essai_chr1.bam
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_flagstat_core] Truncated file? Continue anyway.
    0 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (nan%:nan%)
    0 + 0 paired in sequencing
    0 + 0 read1
    0 + 0 read2
    0 + 0 properly paired (nan%:nan%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (nan%:nan%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)

    I would compare files L002_LBCO1.bam (merge result) with essai_chr1. bam (awk result). Apparently he does not recognize the file essai_chr1.bam as bam file.

    Do you have a solution?

    Best regards
  • chenyao
    Member
    • Jul 2011
    • 74

    #2
    Originally posted by tonio100680 View Post
    Hi,
    Here is my command :
    $ samtools merge L002_LBCO1.bam L002_LBCO1_chr*.bam

    $ samtools view L002_LBCO1.bam | awk '$3== "chr1.fa" && $4>= 45787123 && $4<= 45787316 || $3== "chr1.fa" && $4>= 45790335 && $4<= 45790528' > essai_chr1.bam

    $ samtools flagstat L002_LBCO1.bam
    3933498 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    3868863 + 0 mapped (98.36%:nan%)
    3933498 + 0 paired in sequencing
    1966749 + 0 read1
    1966749 + 0 read2
    3787076 + 0 properly paired (96.28%:nan%)
    3804228 + 0 with itself and mate mapped
    64635 + 0 singletons (1.64%:nan%)
    7262 + 0 with mate mapped to a different chr
    7121 + 0 with mate mapped to a different chr (mapQ>=5)

    $ samtools flagstat essai_chr1.bam
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_flagstat_core] Truncated file? Continue anyway.
    0 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (nan%:nan%)
    0 + 0 paired in sequencing
    0 + 0 read1
    0 + 0 read2
    0 + 0 properly paired (nan%:nan%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (nan%:nan%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)

    I would compare files L002_LBCO1.bam (merge result) with essai_chr1. bam (awk result). Apparently he does not recognize the file essai_chr1.bam as bam file.

    Do you have a solution?

    Best regards
    truly, the awk command can not generate 'bam' file, you can try making 'san' file then convert to bam file by samtools, but I am not sure it can work.

    Comment

    • kmcarr
      Senior Member
      • May 2008
      • 1181

      #3
      Tonio,

      As chenyao already correctly stated, the output of your samtools view | awk pipe is plain text. You could save that to essai_chr1.sam and then convert that sam file to bam.

      There is an easier way though. You don't need to use awk to filter the alignments you want; samtools can do this for you, and output a bam file directly. You can use region specifiers in your samtools view command. These define regions using chromosome names and coordinates, you can give multiple regions on one command line. They are written in the format
      Code:
      <chromosome_name>:<start_position>-<end_position>
      and should be placed after the input sam/bam file name. Your command pipe above could be rewritten:

      Code:
      samtools view -bh -o essai_chr1.bam  L002_LBCO1.bam chr1:45787123-45787316 chr1:45790335-45790528
      -b tells sammtools to output in bam format
      -h will include the header lines in the output file
      -o gives the name of the output file

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      14 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      28 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      33 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 11:40 AM
      0 responses
      23 views
      0 reactions
      Last Post SEQadmin2  
      Working...