Announcement

Collapse
No announcement yet.

.SAM to .BAM with SAM file header @PG

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • .SAM to .BAM with SAM file header @PG

    Hi
    I used export2sam.pl to convert export.txt to .sam. I checked the newly generated SAM file with header @PG. When I tried to use command line, like
    " samtools view -b in.sam -o out.bam "
    to generate BAM file, it occurs errors:

    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [main_samview] fail to read the header from "in.sam".

    Does anybody know what's wrong with it? What command line I should use for converting SAM to BAM

    Thanks

  • #2
    use -S parameter

    Usage: samtools view [options] <in.bam>|<in.sam> [region1 [...]]

    Options: -b output BAM
    -h print header for the SAM output
    -H print header only (no alignments)
    -S input is SAM
    -u uncompressed BAM output (force -b)
    -1 fast compression (force -b)
    -x output FLAG in HEX (samtools-C specific)
    -X output FLAG in string (samtools-C specific)
    -c print only the count of matching records
    -L FILE output alignments overlapping the input BED FILE [null]
    -t FILE list of reference names and lengths (force -S) [null]
    -T FILE reference sequence file (force -S) [null]
    -o FILE output file name [stdout]
    -R FILE list of read groups to be outputted [null]
    -f INT required flag, 0 for unset [0]
    -F INT filtering flag, 0 for unset [0]
    -q INT minimum mapping quality [0]
    -l STR only output reads in library STR [null]
    -r STR only output reads in read group STR [null]
    -? longer help

    Comment


    • #3
      Hi Richard,

      I do want to convert SAM to BAM, it output error when I used "samtools view -b in.sam -o out.bam". I checked the header of SAM file, it comes with @PG. I don't know how to deal with it?

      Thanks

      Comment


      • #4
        Originally posted by emilyjia2000 View Post
        Hi Richard,

        I do want to convert SAM to BAM, it output error when I used "samtools view -b in.sam -o out.bam". I checked the header of SAM file, it comes with @PG. I don't know how to deal with it?

        Thanks
        Emily,

        As Richard said you need to us the -S option (in addition to your other options) to tell samtools view that the INPUT is in SAM format. By default samtools view expects a BAM file as input but you are giving it a SAM file, that's what is causing an error.

        Comment


        • #5
          I am dealing with the same kind of SAM files - header @PG.
          I tried -S option, it didn't work.
          First I saw the segmentation fault. When I fixed that and ran

          samtools view -bt my.fa.fai my.sam > my.bam - It showed the following

          [sam_read1] reference 'chr3.fa' is recognized as '*'.
          [sam_read1] reference 'chr1.fa' is recognized as '*'.
          [sam_read1] reference 'chr19.fa' is recognized as '*'.
          [sam_read1] reference 'chr3.fa' is recognized as '*'.

          Then I did a sed s/.fa// on the input file before doing export2sam.pl and ran export2sam.pl, it throws the following errors:

          ERROR: Unexpected number of fields in export record on line 285 of read1 export file. Found 21 fields but expected 22.
          ...erroneous export record:
          ABC-GA2 1 4 1 3 1347 0 1 TTTCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB QC

          Any insight will be helpful.
          Any other SAM to BAM tools known for sam files with @PG ?????

          Comment


          • #6
            What is the full command you are using for export2sam.pl ?
            Beware that the input is supposed to be a "GERALD" type of file (also know as "illumina export file").

            Comment


            • #7
              perl export2sam.pl --read1=my_export.txt > my_export.sam

              Comment


              • #8
                What version of samtools?

                Comment


                • #9
                  samtools-0.1.16

                  Comment


                  • #10
                    The perl code is ...

                    if(scalar(@t) < EXPORT_SIZE) {
                    my $msg="\nERROR: Unexpected number of fields in export record on line $line_no of read$read_no export file. Found " . scalar(@t) . " fields but expected " . EXPORT_SIZE . ".\n";
                    $msg.="\t...erroneous export record:\n" . $line . "\n\n";
                    die($msg);

                    EXPORT_SIZE is 22 ( EXPORT_SIZE => 22 )

                    It's complaining that line 285 has only 21 fields.

                    What are on lines 284 and 285 ?

                    Comment


                    • #11
                      Line 284:

                      ABC-DE2 1 4 1 3 119 0 1 GAGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB QC N


                      Line 285:
                      ABC-DE2 1 4 1 3 1347 0 1 TTTCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN fa_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB QC N


                      I see that there is the extra 'fa' in line 285.....
                      can I try deleting it?
                      will deleting it work?

                      Comment


                      • #12
                        Sorry, the above was from the file where I did not remove the .fa

                        Below is from the file which I am working on:

                        ABC-DE2 1 4 1 3 1347 0 1 TTTCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB QC

                        Comment


                        • #13
                          On another line I see :

                          ABC-DE2 1 4 1 3 1978 0 1 CAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN _]_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB QC

                          This way I will have to go through the whole file?

                          Comment


                          • #14
                            Hmmmm.

                            You might be messing up with the sed command:

                            sed s/.fa//

                            that's saying change "anychar+f+a" to nothing.

                            "f" and "a" appear to be legitimate GERALD (or whatever, "export") quality value, so they'll get unintentionally changed to null , as well as the intended strings likes "chr1.fa" --> "chr1"

                            Glance at the input file for legitimate quality values (the field after the sequence field)

                            In sed language , putting a backslash before dot (i.e. \. ) means "period" to distinguish from the sole dot (i.e. .) which means "any character".
                            Last edited by Richard Finney; 06-14-2011, 12:24 PM.

                            Comment

                            Working...
                            X