Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAM to .bai conversion

    Hi,

    I performed the following steps for SAM to bai file conversion:

    samtools import ${REF_LIST} ${SAM_FILE} ${BAM_FILE}

    # Sort BAM
    samtools sort ${BAM_FILE} ${BAM_FILE}.sorted

    # Generated 20 GB sorted.bam file

    # Index sorted bam, output in default file ${BAM_FILE}.sorted.bai
    samtools index ${BAM_FILE}.sorted.bam

    Last step generated a .bai file with just 200 bytes in it. Is this expected size?
    I did not see any error messages

    Thanks!!!

  • #2
    The BAMs I work with are 80-200GB in size and have "BAI" (BAM indexes) that are about 5-10MB in size. 200 bytes seems rather small. Can you try "samtools view ${BAM_FILE}.sorted.bam chr10", which should start printing alignments at chromosome 10?

    Comment


    • #3
      Thanks, Nils. Here are the outputs:

      samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr10

      [bam_view] fail to get the reference name. Abort!

      $ samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr_10_validated

      generated nothing!

      I named FASTA headers in reference differently, could this be the problem?
      Also, just FYI, this is single frag data, not mate pair.


      Here is the header for the BAM:

      samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam -H
      @HD VN:1.0 SO:coordinate
      @PG ID:SOLID-GffToSam VN:1.0
      @RG ID:EGAN SM:COMBINED_556
      @SQ SN:chr_1_validated LN:247249719
      @SQ SN:chr_2_validated LN:242951149
      @SQ SN:chr_3_validated LN:199501827
      @SQ SN:chr_4_validated LN:191273063
      @SQ SN:chr_5_validated LN:180857866
      @SQ SN:chr_6_validated LN:170899992
      @SQ SN:chr_7_validated LN:158821424
      @SQ SN:chr_8_validated LN:146274826
      @SQ SN:chr_9_validated LN:140273252
      @SQ SN:chr_10_validated LN:135374737
      @SQ SN:chr_11_validated LN:134452384
      @SQ SN:chr_12_validated LN:132349534
      @SQ SN:chr_13_validated LN:114142980
      @SQ SN:chr_14_validated LN:106368585
      @SQ SN:chr_15_validated LN:100338915
      @SQ SN:chr_16_validated LN:88827254
      @SQ SN:chr_17_validated LN:78774742
      @SQ SN:chr_18_validated LN:76117153
      @SQ SN:chr_19_validated LN:63811651
      @SQ SN:chr_20_validated LN:62435964
      @SQ SN:chr_21_validated LN:46944323
      @SQ SN:chr_22_validated LN:49691432
      @SQ SN:chr_X_validated LN:154913754
      @SQ SN:chr_Y_validated LN:57772954
      Last edited by wdt; 10-02-2009, 11:40 AM.

      Comment


      • #4
        Originally posted by wdt View Post
        Thanks, Nils. Here are the outputs:

        samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr10

        [bam_view] fail to get the reference name. Abort!

        $ samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam chr_10_validated

        generated nothing!

        I named FASTA headers in reference differently, could this be the problem?
        Also, just FYI, this is single frag data, not mate pair.


        Here is the header for the BAM:

        samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam -H
        @HD VN:1.0 SO:coordinate
        @PG ID:SOLID-GffToSam VN:1.0
        @RG ID:EGAN SM:COMBINED_556
        @SQ SN:chr_1_validated LN:247249719
        @SQ SN:chr_2_validated LN:242951149
        @SQ SN:chr_3_validated LN:199501827
        @SQ SN:chr_4_validated LN:191273063
        @SQ SN:chr_5_validated LN:180857866
        @SQ SN:chr_6_validated LN:170899992
        @SQ SN:chr_7_validated LN:158821424
        @SQ SN:chr_8_validated LN:146274826
        @SQ SN:chr_9_validated LN:140273252
        @SQ SN:chr_10_validated LN:135374737
        @SQ SN:chr_11_validated LN:134452384
        @SQ SN:chr_12_validated LN:132349534
        @SQ SN:chr_13_validated LN:114142980
        @SQ SN:chr_14_validated LN:106368585
        @SQ SN:chr_15_validated LN:100338915
        @SQ SN:chr_16_validated LN:88827254
        @SQ SN:chr_17_validated LN:78774742
        @SQ SN:chr_18_validated LN:76117153
        @SQ SN:chr_19_validated LN:63811651
        @SQ SN:chr_20_validated LN:62435964
        @SQ SN:chr_21_validated LN:46944323
        @SQ SN:chr_22_validated LN:49691432
        @SQ SN:chr_X_validated LN:154913754
        @SQ SN:chr_Y_validated LN:57772954
        Does
        Code:
        samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam
        work?

        This seems perfect for the samtools help mailing list ([email protected]).

        Comment


        • #5
          Thanks, Nils, for the suggestion.

          Here are a first few lines, is there something obvious I am missing?

          Thanks.


          samtools view combined_556.csfasta.ma.50.6.v2.gff.bam.sorted.bam | head
          5692_30_692 16 * 1 255 50M * 0 0 ACTCTTCTCACCCTGTTCCTGCATAGATAATTGCATGACAATTGCCTTGT !$$$"@:7@@@<@@@@;@@@?@@@@@@@@@<@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T31102031030112131310303322331312020112001122201221 CQ:Z:=:@;;?:><5@7;9:=>9?&7=892:5:<(8/72*<8-7&=8)/,7.$$% MD:Z:1TC47
          690_1188_353 16 * 1 255 50M * 0 0 CCCCAACCCTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAAC !=6-&'2?##058<'&67/.&&&&-8@@:664@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T11032001032001032001032000100000103000103100101000 CQ:Z:3;:8<49?>;,;8;:5:,+,/<2''&&&&)'1&',1(.##2.%#$*-1 MD:Z:3A46
          2580_1784_289 16 * 2 255 50M * 0 0 NACCCTAACCCTAACCCTAACCCTAACCCTAACCGACCCTCACCCTCACC !@@1@@@=>@="""@@@))@@=6@@@@@@@@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T10112200112200123010320010320010020011120010320012 CQ:Z:6>9:8>7;:86@(>:A;89<==6,7><1&8:8;1;,()59&8,8,&<' MD:Z:T49
          4592_669_181 16 * 2 255 50M * 0 0 AGCAACTCACCCTGTTCCTGCATAGATAATTGCATGACAATTGCCTTGTC !7'))*61@,'00:##.0@?:@@@<?@@@22@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T12110203103011213131130332233131202111201112210132 CQ:Z:;=<;=<<;;;=<8:<<=66<2;0:<$9577$<*'(#4'*',:/#4**)'1 MD:Z:3AA45
          276_439_1731 0 * 4 255 50M * 0 0 CTAACCCTAACCCTAACCCTAACCCTAACCCTACCCGAACCCTAACCCTT @@@@@@@@@@@@@@@@@@@@@<@@@>7=@@7114?<,""?@/,&&"""+! RG:Z:EGAN CS:Z:T22301002301002301002301002301002310032110023311320 CQ:Z:=9:8:;8::367<895669/6,17,6)//6,,&,)7&')'9)'&&&&&&& MD:Z:36T13
          1109_603_622 0 * 4 255 50M * 0 0 CTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCTT @@@@@@@@@@@@@@@@@@@@@@@@@<<((@?7)(8<16&&@8&,**@'&! RG:Z:EGAN CS:Z:T22301002301002301002301002300002300023000020000000 CQ:Z:@<=AA?>;><;>>9;68:85151631,1(61/)1(1,&1&;3&,3*95'& MD:Z:50
          5612_1858_1717 16 * 4 255 50M * 0 0 TAAAACTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAACCCT !""%%%**@<186/11;>"""69<@@@@::@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T32001032001032001032001032000102200103200100210033 CQ:Z:=:<<>7=>7<8;:;379<18/,/5/51,.)/&,3)))'0))45**%&&)/ MD:Z:3GG45
          5818_1717_156 0 * 6 255 50M * 0 0 AACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCAAAA @@@@@@@@@@@@@@@@@@@;@@=@@8=@"""@@><>%%@9;>$$@('''! RG:Z:EGAN CS:Z:T30100230100230100230100230100001002302002300001000 CQ:Z:8:4<;99569A::11;;843):,233&83$+,80/.1%31)3,$:8(0'5 MD:Z:46TT2
          5482_1175_1664 0 * 7 255 50M * 0 0 CCCTAACCCTAACCCTTACCCTTAACCCTAACCCTAACCCTAACCCTAAC @@@@@@@@@@@4@@@@18>:===7?@147@@@@@''67@9&&&&?:&&7! RG:Z:EGAN CS:Z:T20023010023010020310020301002301002001002000002201 CQ:Z:1;686;53=25,):3;,&3,////)7,&/)9+6+9'1&21)&1&1/,&2& MD:Z:21C28
          3491_1092_1469 16 * 8 255 50M * 0 0 CCTAACCCTAACCCTTACCCCTAACCCTAACCCTAACCCTAACCCTAACC !"""<=@"""619.1?+<@<)3@=<@@@@@@@@@@@@@@@@@@@@@@@@@ RG:Z:EGAN CS:Z:T10103200103200103200103200103000013000010000010000 CQ:Z:;5?A<>A;A@<<:=:<@687698397&853)47&&:1.,&189/6(5,)& MD:Z:50

          Comment


          • #6
            I fixed the problem by following Heng Li's suggestion from samtools mailing list.

            The SAM file (generated from SOLiD GFF) had different FASTA ids than the ones in the REF_FASTA. Changing the REF_LIST names to match SAM entries did the trick.

            Comment


            • #7
              Similar problem

              I am also generating a 200 byte file when I index, however the other 6 lanes I use the same technique so I am not sure why this lane is generating a 200 byte index file. What did you do to correct this?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin







                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has...
                12-02-2024, 01:49 PM
              • seqadmin
                Genetic Variation in Immunogenetics and Antibody Diversity
                by seqadmin



                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                11-06-2024, 07:24 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-02-2024, 09:29 AM
              0 responses
              145 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-02-2024, 09:06 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-02-2024, 08:03 AM
              0 responses
              42 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-22-2024, 07:36 AM
              0 responses
              72 views
              0 likes
              Last Post seqadmin  
              Working...
              X