Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • wdt
    Member
    • Oct 2009
    • 19

    SAM to .bai conversion

    Hi,

    I performed the following steps for SAM to bai file conversion:

    samtools import ${REF_LIST} ${SAM_FILE} ${BAM_FILE}

    # Sort BAM
    samtools sort ${BAM_FILE} ${BAM_FILE}.sorted

    # Generated 20 GB sorted.bam file

    # Index sorted bam, output in default file ${BAM_FILE}.sorted.bai
    samtools index ${BAM_FILE}.sorted.bam

    Last step generated a .bai file with just 200 bytes in it. Is this expected size?
    I did not see any error messages

    Thanks!!!
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    #2
    The BAMs I work with are 80-200GB in size and have "BAI" (BAM indexes) that are about 5-10MB in size. 200 bytes seems rather small. Can you try "samtools view ${BAM_FILE}.sorted.bam chr10", which should start printing alignments at chromosome 10?

    Comment

    • wdt
      Member
      • Oct 2009
      • 19

      #3
      Thanks, Nils. Here are the outputs:

      samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr10

      [bam_view] fail to get the reference name. Abort!

      $ samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr_10_validated

      generated nothing!

      I named FASTA headers in reference differently, could this be the problem?
      Also, just FYI, this is single frag data, not mate pair.


      Here is the header for the BAM:

      samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam -H
      @HD VN:1.0 SO:coordinate
      @PG ID:SOLID-GffToSam VN:1.0
      @RG ID:EGAN SM:COMBINED_556
      @SQ SN:chr_1_validated LN:247249719
      @SQ SN:chr_2_validated LN:242951149
      @SQ SN:chr_3_validated LN:199501827
      @SQ SN:chr_4_validated LN:191273063
      @SQ SN:chr_5_validated LN:180857866
      @SQ SN:chr_6_validated LN:170899992
      @SQ SN:chr_7_validated LN:158821424
      @SQ SN:chr_8_validated LN:146274826
      @SQ SN:chr_9_validated LN:140273252
      @SQ SN:chr_10_validated LN:135374737
      @SQ SN:chr_11_validated LN:134452384
      @SQ SN:chr_12_validated LN:132349534
      @SQ SN:chr_13_validated LN:114142980
      @SQ SN:chr_14_validated LN:106368585
      @SQ SN:chr_15_validated LN:100338915
      @SQ SN:chr_16_validated LN:88827254
      @SQ SN:chr_17_validated LN:78774742
      @SQ SN:chr_18_validated LN:76117153
      @SQ SN:chr_19_validated LN:63811651
      @SQ SN:chr_20_validated LN:62435964
      @SQ SN:chr_21_validated LN:46944323
      @SQ SN:chr_22_validated LN:49691432
      @SQ SN:chr_X_validated LN:154913754
      @SQ SN:chr_Y_validated LN:57772954
      Last edited by wdt; 10-02-2009, 11:40 AM.

      Comment

      • nilshomer
        Nils Homer
        • Nov 2008
        • 1283

        #4
        Originally posted by wdt View Post
        Thanks, Nils. Here are the outputs:

        samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr10

        [bam_view] fail to get the reference name. Abort!

        $ samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr_10_validated

        generated nothing!

        I named FASTA headers in reference differently, could this be the problem?
        Also, just FYI, this is single frag data, not mate pair.


        Here is the header for the BAM:

        samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam -H
        @HD VN:1.0 SO:coordinate
        @PG ID:SOLID-GffToSam VN:1.0
        @RG ID:EGAN SM:COMBINED_556
        @SQ SN:chr_1_validated LN:247249719
        @SQ SN:chr_2_validated LN:242951149
        @SQ SN:chr_3_validated LN:199501827
        @SQ SN:chr_4_validated LN:191273063
        @SQ SN:chr_5_validated LN:180857866
        @SQ SN:chr_6_validated LN:170899992
        @SQ SN:chr_7_validated LN:158821424
        @SQ SN:chr_8_validated LN:146274826
        @SQ SN:chr_9_validated LN:140273252
        @SQ SN:chr_10_validated LN:135374737
        @SQ SN:chr_11_validated LN:134452384
        @SQ SN:chr_12_validated LN:132349534
        @SQ SN:chr_13_validated LN:114142980
        @SQ SN:chr_14_validated LN:106368585
        @SQ SN:chr_15_validated LN:100338915
        @SQ SN:chr_16_validated LN:88827254
        @SQ SN:chr_17_validated LN:78774742
        @SQ SN:chr_18_validated LN:76117153
        @SQ SN:chr_19_validated LN:63811651
        @SQ SN:chr_20_validated LN:62435964
        @SQ SN:chr_21_validated LN:46944323
        @SQ SN:chr_22_validated LN:49691432
        @SQ SN:chr_X_validated LN:154913754
        @SQ SN:chr_Y_validated LN:57772954
        Does
        Code:
        samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam
        work?

        This seems perfect for the samtools help mailing list ([email protected]).

        Comment

        • wdt
          Member
          • Oct 2009
          • 19

          #5
          Thanks, Nils, for the suggestion.

          Here are a first few lines, is there something obvious I am missing?

          Thanks.


          samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam | head
          5692_30_692 16 * 1 255 50M * 0 0 ACTCTTCTCACCCTGTTCCTGCATAGATAATTGCATGACAATTGCCTTGT !$$$"@:7@@@<@@@@;@@@?@@@@@@@@@<@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T31102031030112131310303322331312020112001122201221 CQ:Z:=:@;;?:><5@7;9:=>9?&7=892:5:<(8/72*<8-7&=8)/,7.$$% MD:Z:1TC47
          690_1188_353 16 * 1 255 50M * 0 0 CCCCAACCCTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAAC !=6-&'2?##058<'&67/.&&&&-8@@:664@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T11032001032001032001032000100000103000103100101000 CQ:Z:3;:8<49?>;,;8;:5:,+,/<2''&&&&)'1&',1(.##2.%#$*-1 MD:Z:3A46
          2580_1784_289 16 * 2 255 50M * 0 0 NACCCTAACCCTAACCCTAACCCTAACCCTAACCGACCCTCACCCTCACC !@@1@@@=>@="""@@@))@@=6@@@@@@@@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T10112200112200123010320010320010020011120010320012 CQ:Z:6>9:8>7;:86@(>:A;89<==6,7><1&8:8;1;,()59&8,8,&<' MD:Z:T49
          4592_669_181 16 * 2 255 50M * 0 0 AGCAACTCACCCTGTTCCTGCATAGATAATTGCATGACAATTGCCTTGTC !7'))*61@,'00:##.0@?:@@@<?@@@22@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T12110203103011213131130332233131202111201112210132 CQ:Z:;=<;=<<;;;=<8:<<=66<2;0:<$9577$<*'(#4'*',:/#4**)'1 MD:Z:3AA45
          276_439_1731 0 * 4 255 50M * 0 0 CTAACCCTAACCCTAACCCTAACCCTAACCCTACCCGAACCCTAACCCTT @@@@@@@@@@@@@@@@@@@@@<@@@>7=@@7114?<,""?@/,&&"""+! RG:Z:EGAN CS:Z:T22301002301002301002301002301002310032110023311320 CQ:Z:=9:8:;8::367<895669/6,17,6)//6,,&,)7&')'9)'&&&&&&& MD:Z:36T13
          1109_603_622 0 * 4 255 50M * 0 0 CTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCTT @@@@@@@@@@@@@@@@@@@@@@@@@<<((@?7)(8<16&&@8&,**@'&! RG:Z:EGAN CS:Z:T22301002301002301002301002300002300023000020000000 CQ:Z:@<=AA?>;><;>>9;68:85151631,1(61/)1(1,&1&;3&,3*95'& MD:Z:50
          5612_1858_1717 16 * 4 255 50M * 0 0 TAAAACTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAACCCT !""%%%**@<186/11;>"""69<@@@@::@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T32001032001032001032001032000102200103200100210033 CQ:Z:=:<<>7=>7<8;:;379<18/,/5/51,.)/&,3)))'0))45**%&&)/ MD:Z:3GG45
          5818_1717_156 0 * 6 255 50M * 0 0 AACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCAAAA @@@@@@@@@@@@@@@@@@@;@@=@@8=@"""@@><>%%@9;>$$@('''! RG:Z:EGAN CS:Z:T30100230100230100230100230100001002302002300001000 CQ:Z:8:4<;99569A::11;;843):,233&83$+,80/.1%31)3,$:8(0'5 MD:Z:46TT2
          5482_1175_1664 0 * 7 255 50M * 0 0 CCCTAACCCTAACCCTTACCCTTAACCCTAACCCTAACCCTAACCCTAAC @@@@@@@@@@@4@@@@18>:===7?@147@@@@@''67@9&&&&?:&&7! RG:Z:EGAN CS:Z:T20023010023010020310020301002301002001002000002201 CQ:Z:1;686;53=25,):3;,&3,////)7,&/)9+6+9'1&21)&1&1/,&2& MD:Z:21C28
          3491_1092_1469 16 * 8 255 50M * 0 0 CCTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAACCCTAACC !"""<=@"""619.1?+<@<)3@=<@@@@@@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T10103200103200103200103200103000013000010000010000 CQ:Z:;5?A<>A;A@<<:=:<@687698397&853)47&&:1.,&189/6(5,)& MD:Z:50

          Comment

          • wdt
            Member
            • Oct 2009
            • 19

            #6
            I fixed the problem by following Heng Li's suggestion from samtools mailing list.

            The SAM file (generated from SOLiD GFF) had different FASTA ids than the ones in the REF_FASTA. Changing the REF_LIST names to match SAM entries did the trick.

            Comment

            • UNCKidney
              Junior Member
              • Apr 2010
              • 5

              #7
              Similar problem

              I am also generating a 200 byte file when I index, however the other 6 lanes I use the same technique so I am not sure why this lane is generating a 200 byte index file. What did you do to correct this?

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              15 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              26 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              37 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              61 views
              0 reactions
              Last Post SEQadmin2  
              Working...