Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • veena
    Junior Member
    • Feb 2009
    • 9

    Error indexing BAM file using samtools

    Hi,

    I downloaded a bam file from NCBI and am unable to index it. Here is what I've done:
    samtools index file.bam

    Error message:
    [bam_header_read] EOF marker is absent.

    I haven't really found anything in thread archives that give this error with a bam file so am pretty sure I'm doing something fundamentally wrong. I'm a samtools and bam file newbie so any help would be much appreciated!

    Thanks!
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    #2
    Originally posted by veena View Post
    Hi,

    I downloaded a bam file from NCBI and am unable to index it. Here is what I've done:
    samtools index file.bam

    Error message:
    [bam_header_read] EOF marker is absent.

    I haven't really found anything in thread archives that give this error with a bam file so am pretty sure I'm doing something fundamentally wrong. I'm a samtools and bam file newbie so any help would be much appreciated!

    Thanks!
    Your fine as it is just a warning. The new implementations of samtools and picard add an EOF marker. Earlier BAMs did not have these.

    Nils

    Comment

    • veena
      Junior Member
      • Feb 2009
      • 9

      #3
      Thanks Nils!
      Another newbie question, I'm trying to get a subset of reads from publicly available unmapped data that align to my sequence of interest.

      I'm told that a read's sequence should be available in a BAM file. But isnt a BAM file by definition an alignment file (as in aligned-to-something file) to begin with? Can I run another alignment program (say Blast) on a pre-existing BAM file with a completely different query? Very confused and would appreciate any help!

      Comment

      • veena
        Junior Member
        • Feb 2009
        • 9

        #4
        Also, any help on how to run BLAST on a BAM file would be much appreciated!

        Comment

        • nilshomer
          Nils Homer
          • Nov 2008
          • 1283

          #5
          Originally posted by veena View Post
          Thanks Nils!
          Another newbie question, I'm trying to get a subset of reads from publicly available unmapped data that align to my sequence of interest.

          I'm told that a read's sequence should be available in a BAM file. But isnt a BAM file by definition an alignment file (as in aligned-to-something file) to begin with? Can I run another alignment program (say Blast) on a pre-existing BAM file with a completely different query? Very confused and would appreciate any help!
          The SAM format has support for reads that are not aligned. For example, if one end of a paired end read does not map, it can be flagged as unmapped and given the co-ordinate of the other end. I would study the SAM spec carefully. By filtering on the FLAG field, you can pull out reads that are unmapped (assuming that the aligner was kind enough to include unmapped reads).

          To run BLAST on a BAM file, you would have to convert the BAM file into whatever format (FASTA?) BLAST requires. This can be done with a quick script or bugging your local bioinformatician.

          Comment

          • veena
            Junior Member
            • Feb 2009
            • 9

            #6
            Thanks so much again Nils! The scary thought is I'm the "local bioinformatician" and I've googled my fingers silly trying to figure out how to get a fasta (thats all I really need!) from the publicly available .bam file. Nobody else around me cares to work with .bam files (yet). Is it best to convert from bam to sam and then format read name and sequence into fasta? Or is there a better way?

            Comment

            • nilshomer
              Nils Homer
              • Nov 2008
              • 1283

              #7
              Originally posted by veena View Post
              Thanks so much again Nils! The scary thought is I'm the "local bioinformatician" and I've googled my fingers silly trying to figure out how to get a fasta (thats all I really need!) from the publicly available .bam file. Nobody else around me cares to work with .bam files (yet). Is it best to convert from bam to sam and then format read name and sequence into fasta? Or is there a better way?
              Look at Picard's SamToFastq.jar. That will get you to FASTQ and then smooth sailing to FASTA. Alternatively, you can use the many APIs (PERL, Python, C, Java, etc.) to natively read in SAM/BAM. I have personally used all of them successfully.
              Last edited by nilshomer; 03-03-2010, 08:55 PM. Reason: speak and spell failed

              Comment

              • veena
                Junior Member
                • Feb 2009
                • 9

                #8
                Thanks Nils, I'll give it a try!

                Comment

                • krobison
                  Senior Member
                  • Nov 2007
                  • 734

                  #9
                  Using Picard's tool is probably better, but it's worth studying the line below as an example as an example of a very quick-and-dirty SAM-to-FASTA generator

                  Code:
                  samtools view myalign.bam | perl -n -e 'if (/^\@/) { @f=split(/\t/); print ">$f[0]|$f[1] $f[2]:$f[3]\n$f[9]\n"; }'

                  I used the flag field to disambiguate the two ends of a read

                  (any bugs were clearly deliberate attempts to educate the student! :-)

                  Comment

                  • veena
                    Junior Member
                    • Feb 2009
                    • 9

                    #10
                    Thats what I get for not readig the manual well enough thanks krobison! And disclaimer duly noted!
                    Last edited by veena; 06-03-2010, 12:38 PM. Reason: being an idiot!

                    Comment

                    Latest Articles

                    Collapse

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    12 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    23 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    28 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 11:40 AM
                    0 responses
                    22 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...