Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • awayihaha
    Junior Member
    • Mar 2014
    • 7

    sam files convert to bam files error

    hi all,

    when I use samtools to get bam file from sam file? I met the following problems:
    samtools view -h -F 4 -q 1 -bS C.filsa.sam >C.filsa.bam
    [samopen] SAM header is present: 7 sequences.
    [sam_read1] reference 'SR' is recognized as '*'.
    [main_samview] truncated file.

    I also met "missing colon in auxiliary data " and "CIGAR and sequence length are inconsistent" in individual rows. My sam files came from the results of gsnap. I am not sure these problem caused by gsnap or samtools. how can i deal with them?

    Any suggestions and answers are appreciated. thank you.
  • awayihaha
    Junior Member
    • Mar 2014
    • 7

    #2
    The following is my sam sample. I don't understand where is the reference 'SR'?
    SRR019035.130 16 Chr5 9804788 40 36M * 0 0 CAGCCTCAAACGGCGCCGTCTTATACGGTGAGTTAC IIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:1 HI:i:1 NM:i:0
    SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
    SRR019035.131 16 Chr1 753661 40 30M * 0 0 TGAAGATATTGAACCTCTCCGTTAGGGAAC IIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:30 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40
    X2:i:0 XO:Z:UU PG:Z:A
    SRR019035.132 16 Chr3 7844307 40 36M * 0 0 ATGCTGGTAATTCACGAGCTTGATGAAACATTTCAC I3IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:1 HI:i:1 NM:i:0
    SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
    SRR019035.133 0 Chr1 28835502 40 36M * 0 0 GTTTTAGTTTCGTCTGCAACTGAGTCATCACCTACT IIIIIIIIIIIIIIIIIIIIIIDIIIIIIDIII-II MD:Z:36 NH:i:1 HI:i:1
    NM:i:0 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
    SRR019035.134 0 Chr1 28836313 40 36M * 0 0 GAAAATTTCAGGTCTGGTTCAGAATTGGTTCCGAAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII7II MD:Z:36 NH:i:1 HI:i:1
    NM:i:0 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
    SRR019035.135 0 Chr5 22542176 40 25M * 0 0 CGTGGTTCTAGGACATCATCTGATA IIIIIIIIIIIIIIIIIIIIIIIII MD:Z:25 NH:i:1 HI:i:1 NM:i:0 SM:i:40
    XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
    SRR019035.136 0 ChrC 100327 3 36M * 0 0 GAATAAAGGATTAATCCGTATCATCTTGACTTGGTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:2 HI:i:1 NM:i:0
    SM:i:3 XQ:i:40 X2:i:40 XO:Z:UM PG:Z:A
    SRR019035.136 272 ChrC 138287 3 36M * 0 0 AACCAAGTCAAGATGATACGGATTAATCCTTTATTC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:2 HI:i:2 NM:i:0
    SM:i:3 XQ:i:40 X2:i:40 XO:Z:UM PG:Z:A
    SRR019035.137 16 Chr1 28835623 40 36M * 0 0 TATTTTCGTCGTCTCTAGAGTTTGAAGCATCAGTCC IIBI61IIIIIHIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:1 HI:i:1
    NM:i:0 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
    SRR019035.138 16 Chr5 19304066 40 36M * 0 0 ATCAATGATATGTTTAAGCAAGACGACTCTTTCAGC IIIII?IIIIIIIIIIIIIIIIIIIIIIIIIIIIII MD:Z:36 NH:i:1 HI:i:1
    NM:i:0 SM:i:40 XQ:i:40 X2:i:0 XO:Z:UU PG:Z:A
    SRR019035.139 0 Chr4 162871 40 26M * 0 0 TGATTTCGTTGTGCTATGTAAACTTT IIIIIIIIIIIIIIIIIIII1IIIII MD:Z:26 NH:i:1 HI:i:1 NM:i:0 SM:i:40 XQ:i:40
    X2:i:0 XO:Z:UU PG:Z:A

    Comment

    • dpryan
      Devon Ryan
      • Jul 2011
      • 3478

      #3
      The SR... stuff is just the name of the read, which I see you downloaded from SRA (or ENA). Out of curiousity, what happens if you just:

      Code:
      samtools view -F 0x4 -q 1 -Sbo C.filsa.bam C.filsa.sam
      I wonder if giving the -h option is just screwing things up (it shouldn't do anything when you write a BAM file).

      Comment

      • awayihaha
        Junior Member
        • Mar 2014
        • 7

        #4
        Thanks dpryan.
        I try your code, but "reference 'SR' is recognized as '*'.” still occurred. my SRA data download from http://www.ncbi.nlm.nih.gov/sra/?term=SRR019035。

        Comment

        • dpryan
          Devon Ryan
          • Jul 2011
          • 3478

          #5
          If the first 1000 lines or so are sufficient to reproduce this, could you attach that (you have to edit in "advanced" mode and click on the paperclip)? That'd provide a reproducible example. To get the first 1000 (or whatever) lines, just:

          Code:
          head -n 1000 file.sam > excerpt.txt

          Comment

          • awayihaha
            Junior Member
            • Mar 2014
            • 7

            #6
            I try the first 1000 raws, It's no problem. So I attach the first 500 raws and the tail 500 raws for you. but I am not sure the problems will appear.

            Every time, when I deal with large sam files, only very few lines has some problems such as 'missing colon in auxiliary data' or 'CIGAR and sequence length are inconsistent', but these two problem always illustrate the specific lines and I could found the problems. Only 'reference *** is recognized as '*‘’,I couldn't found which lines have problems?

            because my sam files are got from gsnap alignment. So I am confused the problems are caused from the gsnap or samtools? if they are caused by gsnap, 99% data is OK. how can I avoid these problem and filter these low quality data in advance.
            Attached Files

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              That doesn't seem to reproduce the problem either. It's very likely that the problem is with gsnap, which apparently is producing corrupt output on occasion. You might consider upgrading if that's an option or report the issue to the developer.

              Comment

              • awayihaha
                Junior Member
                • Mar 2014
                • 7

                #8
                Thank you for your good advise, It indeed help me.

                Comment

                Latest Articles

                Collapse

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 10:09 AM
                0 responses
                9 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                17 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                26 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 11:40 AM
                0 responses
                21 views
                0 reactions
                Last Post SEQadmin2  
                Working...