Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • wdt
    Member
    • Oct 2009
    • 19

    SAM to .bai conversion

    Hi,

    I performed the following steps for SAM to bai file conversion:

    samtools import ${REF_LIST} ${SAM_FILE} ${BAM_FILE}

    # Sort BAM
    samtools sort ${BAM_FILE} ${BAM_FILE}.sorted

    # Generated 20 GB sorted.bam file

    # Index sorted bam, output in default file ${BAM_FILE}.sorted.bai
    samtools index ${BAM_FILE}.sorted.bam

    Last step generated a .bai file with just 200 bytes in it. Is this expected size?
    I did not see any error messages

    Thanks!!!
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    #2
    The BAMs I work with are 80-200GB in size and have "BAI" (BAM indexes) that are about 5-10MB in size. 200 bytes seems rather small. Can you try "samtools view ${BAM_FILE}.sorted.bam chr10", which should start printing alignments at chromosome 10?

    Comment

    • wdt
      Member
      • Oct 2009
      • 19

      #3
      Thanks, Nils. Here are the outputs:

      samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr10

      [bam_view] fail to get the reference name. Abort!

      $ samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr_10_validated

      generated nothing!

      I named FASTA headers in reference differently, could this be the problem?
      Also, just FYI, this is single frag data, not mate pair.


      Here is the header for the BAM:

      samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam -H
      @HD VN:1.0 SO:coordinate
      @PG ID:SOLID-GffToSam VN:1.0
      @RG ID:EGAN SM:COMBINED_556
      @SQ SN:chr_1_validated LN:247249719
      @SQ SN:chr_2_validated LN:242951149
      @SQ SN:chr_3_validated LN:199501827
      @SQ SN:chr_4_validated LN:191273063
      @SQ SN:chr_5_validated LN:180857866
      @SQ SN:chr_6_validated LN:170899992
      @SQ SN:chr_7_validated LN:158821424
      @SQ SN:chr_8_validated LN:146274826
      @SQ SN:chr_9_validated LN:140273252
      @SQ SN:chr_10_validated LN:135374737
      @SQ SN:chr_11_validated LN:134452384
      @SQ SN:chr_12_validated LN:132349534
      @SQ SN:chr_13_validated LN:114142980
      @SQ SN:chr_14_validated LN:106368585
      @SQ SN:chr_15_validated LN:100338915
      @SQ SN:chr_16_validated LN:88827254
      @SQ SN:chr_17_validated LN:78774742
      @SQ SN:chr_18_validated LN:76117153
      @SQ SN:chr_19_validated LN:63811651
      @SQ SN:chr_20_validated LN:62435964
      @SQ SN:chr_21_validated LN:46944323
      @SQ SN:chr_22_validated LN:49691432
      @SQ SN:chr_X_validated LN:154913754
      @SQ SN:chr_Y_validated LN:57772954
      Last edited by wdt; 10-02-2009, 11:40 AM.

      Comment

      • nilshomer
        Nils Homer
        • Nov 2008
        • 1283

        #4
        Originally posted by wdt View Post
        Thanks, Nils. Here are the outputs:

        samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr10

        [bam_view] fail to get the reference name. Abort!

        $ samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr_10_validated

        generated nothing!

        I named FASTA headers in reference differently, could this be the problem?
        Also, just FYI, this is single frag data, not mate pair.


        Here is the header for the BAM:

        samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam -H
        @HD VN:1.0 SO:coordinate
        @PG ID:SOLID-GffToSam VN:1.0
        @RG ID:EGAN SM:COMBINED_556
        @SQ SN:chr_1_validated LN:247249719
        @SQ SN:chr_2_validated LN:242951149
        @SQ SN:chr_3_validated LN:199501827
        @SQ SN:chr_4_validated LN:191273063
        @SQ SN:chr_5_validated LN:180857866
        @SQ SN:chr_6_validated LN:170899992
        @SQ SN:chr_7_validated LN:158821424
        @SQ SN:chr_8_validated LN:146274826
        @SQ SN:chr_9_validated LN:140273252
        @SQ SN:chr_10_validated LN:135374737
        @SQ SN:chr_11_validated LN:134452384
        @SQ SN:chr_12_validated LN:132349534
        @SQ SN:chr_13_validated LN:114142980
        @SQ SN:chr_14_validated LN:106368585
        @SQ SN:chr_15_validated LN:100338915
        @SQ SN:chr_16_validated LN:88827254
        @SQ SN:chr_17_validated LN:78774742
        @SQ SN:chr_18_validated LN:76117153
        @SQ SN:chr_19_validated LN:63811651
        @SQ SN:chr_20_validated LN:62435964
        @SQ SN:chr_21_validated LN:46944323
        @SQ SN:chr_22_validated LN:49691432
        @SQ SN:chr_X_validated LN:154913754
        @SQ SN:chr_Y_validated LN:57772954
        Does
        Code:
        samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam
        work?

        This seems perfect for the samtools help mailing list ([email protected]).

        Comment

        • wdt
          Member
          • Oct 2009
          • 19

          #5
          Thanks, Nils, for the suggestion.

          Here are a first few lines, is there something obvious I am missing?

          Thanks.


          samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam | head
          5692_30_692 16 * 1 255 50M * 0 0 ACTCTTCTCACCCTGTTCCTGCATAGATAATTGCATGACAATTGCCTTGT !$$$"@:7@@@<@@@@;@@@?@@@@@@@@@<@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T31102031030112131310303322331312020112001122201221 CQ:Z:=:@;;?:><5@7;9:=>9?&7=892:5:<(8/72*<8-7&=8)/,7.$$% MD:Z:1TC47
          690_1188_353 16 * 1 255 50M * 0 0 CCCCAACCCTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAAC !=6-&'2?##058<'&67/.&&&&-8@@:664@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T11032001032001032001032000100000103000103100101000 CQ:Z:3;:8<49?>;,;8;:5:,+,/<2''&&&&)'1&',1(.##2.%#$*-1 MD:Z:3A46
          2580_1784_289 16 * 2 255 50M * 0 0 NACCCTAACCCTAACCCTAACCCTAACCCTAACCGACCCTCACCCTCACC !@@1@@@=>@="""@@@))@@=6@@@@@@@@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T10112200112200123010320010320010020011120010320012 CQ:Z:6>9:8>7;:86@(>:A;89<==6,7><1&8:8;1;,()59&8,8,&<' MD:Z:T49
          4592_669_181 16 * 2 255 50M * 0 0 AGCAACTCACCCTGTTCCTGCATAGATAATTGCATGACAATTGCCTTGTC !7'))*61@,'00:##.0@?:@@@<?@@@22@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T12110203103011213131130332233131202111201112210132 CQ:Z:;=<;=<<;;;=<8:<<=66<2;0:<$9577$<*'(#4'*',:/#4**)'1 MD:Z:3AA45
          276_439_1731 0 * 4 255 50M * 0 0 CTAACCCTAACCCTAACCCTAACCCTAACCCTACCCGAACCCTAACCCTT @@@@@@@@@@@@@@@@@@@@@<@@@>7=@@7114?<,""?@/,&&"""+! RG:Z:EGAN CS:Z:T22301002301002301002301002301002310032110023311320 CQ:Z:=9:8:;8::367<895669/6,17,6)//6,,&,)7&')'9)'&&&&&&& MD:Z:36T13
          1109_603_622 0 * 4 255 50M * 0 0 CTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCTT @@@@@@@@@@@@@@@@@@@@@@@@@<<((@?7)(8<16&&@8&,**@'&! RG:Z:EGAN CS:Z:T22301002301002301002301002300002300023000020000000 CQ:Z:@<=AA?>;><;>>9;68:85151631,1(61/)1(1,&1&;3&,3*95'& MD:Z:50
          5612_1858_1717 16 * 4 255 50M * 0 0 TAAAACTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAACCCT !""%%%**@<186/11;>"""69<@@@@::@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T32001032001032001032001032000102200103200100210033 CQ:Z:=:<<>7=>7<8;:;379<18/,/5/51,.)/&,3)))'0))45**%&&)/ MD:Z:3GG45
          5818_1717_156 0 * 6 255 50M * 0 0 AACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCAAAA @@@@@@@@@@@@@@@@@@@;@@=@@8=@"""@@><>%%@9;>$$@('''! RG:Z:EGAN CS:Z:T30100230100230100230100230100001002302002300001000 CQ:Z:8:4<;99569A::11;;843):,233&83$+,80/.1%31)3,$:8(0'5 MD:Z:46TT2
          5482_1175_1664 0 * 7 255 50M * 0 0 CCCTAACCCTAACCCTTACCCTTAACCCTAACCCTAACCCTAACCCTAAC @@@@@@@@@@@4@@@@18>:===7?@147@@@@@''67@9&&&&?:&&7! RG:Z:EGAN CS:Z:T20023010023010020310020301002301002001002000002201 CQ:Z:1;686;53=25,):3;,&3,////)7,&/)9+6+9'1&21)&1&1/,&2& MD:Z:21C28
          3491_1092_1469 16 * 8 255 50M * 0 0 CCTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAACCCTAACC !"""<=@"""619.1?+<@<)3@=<@@@@@@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T10103200103200103200103200103000013000010000010000 CQ:Z:;5?A<>A;A@<<:=:<@687698397&853)47&&:1.,&189/6(5,)& MD:Z:50

          Comment

          • wdt
            Member
            • Oct 2009
            • 19

            #6
            I fixed the problem by following Heng Li's suggestion from samtools mailing list.

            The SAM file (generated from SOLiD GFF) had different FASTA ids than the ones in the REF_FASTA. Changing the REF_LIST names to match SAM entries did the trick.

            Comment

            • UNCKidney
              Junior Member
              • Apr 2010
              • 5

              #7
              Similar problem

              I am also generating a 200 byte file when I index, however the other 6 lanes I use the same technique so I am not sure why this lane is generating a 200 byte index file. What did you do to correct this?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Pathogen Surveillance with Advanced Genomic Tools
                by seqadmin




                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                03-24-2025, 11:48 AM
              • seqadmin
                New Genomics Tools and Methods Shared at AGBT 2025
                by seqadmin


                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                The Headliner
                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                03-03-2025, 01:39 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-20-2025, 05:03 AM
              0 responses
              41 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-19-2025, 07:27 AM
              0 responses
              51 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-18-2025, 12:50 PM
              0 responses
              38 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-03-2025, 01:15 PM
              0 responses
              193 views
              0 reactions
              Last Post seqadmin  
              Working...