Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • knostrov
    Junior Member
    • Oct 2012
    • 7

    Converting ELAND export -> SAM -> BAM

    Hi everyone,

    I apologize because a variation of this question has been asked several times before, but I still can't seem to figure out what's wrong.

    I have ELAND export files that I converted to SAM (I believe), using Samtools 0.1.18 and the command

    perl samtools-0.1.18/misc/export2sam.pl --read1=exportfile > exportfile.sam

    When I try to make a bam file using

    samtools view -Sbh exportfile.sam > exportfile.bam

    I get the following error message:
    [samopen] no @SQ lines in the header.
    [sam_read1] missing header? Abort!

    This is day 1 for me, so I'd appreciate any pointers.

    Thanks
  • swbarnes2
    Senior Member
    • May 2008
    • 910

    #2
    My guess is that the sam didn't get made with headers. Find out what they are supposed to look like, and add them yourself.

    Comment

    • maubp
      Peter (Biopython etc)
      • Jul 2009
      • 1544

      #3
      Are these unaligned reads? If so it is legitimate to have a SAM file with no @SQ lines.

      Comment

      • knostrov
        Junior Member
        • Oct 2012
        • 7

        #4
        They are aligned reads (using ELAND). I am now trying to make my own header, as suggested, but additionally I suppose I would like to understand WHY there is no @SQ header -- is there something I need to supply to the export2sam.pl script that I am not supplying?

        Comment

        • BAMseek
          Senior Member
          • Apr 2011
          • 124

          #5
          I don't believe eland2sam.pl will give the header. For one, it doesn't know the length of the sequences - so at best it could give you a length as long as the largest alignment position. Also, it would need to go through the Eland file twice: once to calculate the header and the second to write the reads.

          This post suggests using the FASTA reference file to get the headers. One issue would be if your data has splice junctions - since the "chromosome" in the Eland alignment would be represented as a concatenation of exon coordinates. I think you would be able to create a legitimate BAM file, but the splice junctions will have coordinates relative to the exon junction rather than the chromosome.

          Justin

          Comment

          • knostrov
            Junior Member
            • Oct 2012
            • 7

            #6
            Thank you, that is very helpful. This is DNA sequence, so I think there should be no issues.

            So what I need is a ref.fa file. Do I make this myself by concatenating the FASTA files for all chromosomes?

            Comment

            • swbarnes2
              Senior Member
              • May 2008
              • 910

              #7
              Originally posted by knostrov View Post
              Thank you, that is very helpful. This is DNA sequence, so I think there should be no issues.

              So what I need is a ref.fa file. Do I make this myself by concatenating the FASTA files for all chromosomes?
              Yes, you can do that. You shouldn't need a single reference fasta to make the header.

              Is it possible that lying around somewhere is an old sample that someone analyzed the right way? You can grab the headers off of that. It's something you'll only need once.

              Comment

              • knostrov
                Junior Member
                • Oct 2012
                • 7

                #8
                So this is basically a shot in the dark, but what I did is to make a ref.fa file by concatenating fasta files for all chromosomes.
                Then I ran

                samtools faidx chromFa/ref.fa

                and got back a file that looked like this:
                chr1 249250621 6 50 51
                chr10 135534747 254235647 50 51
                chr11 135006516 392481096 50 51
                chr12 133851895 530187750 50 51
                chr13 115169878 666716690 50 51
                chr14 107349540 784189973 50 51
                chr15 102531392 893686511 50 51
                chr16 90354753 998268538 50 51
                chr17 81195210 1090430394 50 51
                chr18 78077248 1173249516 50 51

                etc.

                Then I ran

                samtools view -t chromFa/ref.fa.fai -bSh exportfile.sam -o exportfile.bam

                and got back this very impressive error message (verbatim!):
                [sam_header_read2] 24 sequences loaded.
                ?1`?-!??2?p??yF??b?ZQ?_TbT???W???ge?g?g???c?Z?ZT???X?Z?_?????VR?X??T\???S?k?g?gh???Y???b???nQjb????_???a?s?{??{????y|?!T????u?@??Qd?`?|??
                ߳~??x?
                aQw?`<?48ϔ??Q?3cpܒ?
                [main_samview] random alignment retrieval only works for indexed BAM files.
                ?3??p?Ƴ`p?]?Y2L?ic?:̈??m?aF
                ?x?
                ????`<#??0m?
                rS?rC9&
                )'R`S??\P??5?1?cΐ?\? ?X0????r,??˂q"?:”E2???

                Really!

                If there are any comments, thank you. Unfortunately I don't have any files to steal headers from...

                Comment

                • swbarnes2
                  Senior Member
                  • May 2008
                  • 910

                  #9
                  Then I ran

                  samtools view -t chromFa/ref.fa.fai -bSh exportfile.sam -o exportfile.bam
                  Why? What you need to do is make header lines, and append them to the top of your .sam file.

                  For each chromosome in your reference, you need a line like

                  @SQ SN:chr10 LN:135534747
                  You can make the header file, and then cat it to your .sam file.

                  Comment

                  • BAMseek
                    Senior Member
                    • Apr 2011
                    • 124

                    #10
                    I think you are on the home stretch. According to the samtools reference, you can use the .fai file as input for the -t option. I think the message you see being printed out is actually the BAM output.

                    I would try this command instead

                    PHP Code:
                    samtools view -t chromFa/ref.fa.fai -bSh exportfile.sam exportfile.bam 
                    To check the bam file, you can do

                    PHP Code:
                    samtools view -h exportfile.bam less -
                    You should see the header lines (starting with the @ symbol) followed by the alignments.

                    Justin

                    Comment

                    • knostrov
                      Junior Member
                      • Oct 2012
                      • 7

                      #11
                      Yes! It worked! Thank you very much!

                      Ah, the satisfaction...

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        Yesterday, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM
                      • SEQadmin2
                        Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                        by SEQadmin2

                        Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                        Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                        05-06-2026, 09:04 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Yesterday, 12:03 PM
                      0 responses
                      19 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, Yesterday, 11:40 AM
                      0 responses
                      14 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-28-2026, 11:40 AM
                      0 responses
                      29 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-26-2026, 10:12 AM
                      0 responses
                      31 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...