Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Desiree Wilson
    Junior Member
    • Mar 2012
    • 9

    Samtools error: the alignment was not sorted

    Hello everyone! How are you all doing? I hope all is well with you!

    I'm having a difficult time indexing my sorted bam files. I would like to use IGV to visualize my bam file. I used samtools to sort and index the bam file. Here's my code:

    samtools sort -n <path of .bam file> <sorted .bam file>
    samtools index <sorted .bam file>


    But then I get this message:
    [bam_index_core] the alignment is not sorted .....
    [bam_index_build2] fail to index the BAM file.


    So I tried to re-sort the bam file based on coordinates instead of gene name:
    samtools sort <path of .bam file> <sorted .bam file>
    samtools index <sorted .bam file>


    This time I didn't have a problem with indexing but when I loaded my bam file (both sorted and unsorted), I couldn't visualize my reads. I can really use someone's help here.
  • Alex Renwick
    Member
    • Jul 2011
    • 44

    #2
    As you've discovered, bam files almost always need to be sorted by coordinate.

    IGV won't display reads until you've zoomed in a good deal. Have you tried that? Are you sure you have reads in the place you're looking?

    Comment

    • Desiree Wilson
      Junior Member
      • Mar 2012
      • 9

      #3
      Thank you so much for the prompt reply.

      I've zoomed in to the nucleotide level and I still cannot see my reads. According to the IGV user guide, I should be able to see reads at 30kb. I believe that I'm looking in the right place. I downloaded the chromosome 6 data from the 1000 genomes website to learn how to process ngs data. So when I loaded the bam file, I clicked on "chr6" from the drop down box and still couldn't see my reads.

      Thank you so much. Now I learned that I cannot align my bam files by gene name.

      Comment

      • Alex Renwick
        Member
        • Jul 2011
        • 44

        #4
        You can use samtools mpileup to check read coverage:

        Code:
        samtools mpileup -r chr6:28543200-28543220 file.bam
        ...looks at reads covering 28,543,200 to 28,543,220 of chromosome 6. If you have the reference handy, you can specify it with the "-f" argument to see whether the reads match the reference.

        Comment

        • Desiree Wilson
          Junior Member
          • Mar 2012
          • 9

          #5
          Thanks! Wow, you reply so quickly! I will try this. Thank you so much.

          Comment

          • swbarnes2
            Senior Member
            • May 2008
            • 910

            #6
            I use IGV to look at samtools sorted files all the time.

            This may be a silly question, but you know that in order to get sort.bam, you run samtools like this:

            samtools sort unsorted.bam sort
            and not

            samtools sort unsorted.bam sort.bam
            And you spot-checked your bam to see that you really have reads aligning, and the that genome they were aligned to is exactly the same as the reference genome you uploaded into IGV?

            Comment

            • Desiree Wilson
              Junior Member
              • Mar 2012
              • 9

              #7
              Thanks swbarnes2! This is really good advice! Man I am soooo green! I'm so new to bioinformatics but I'm more than willing to learn!

              I do not know how to spot-check to see if I really have reads. May you please direct me to some literature that can teach me how to do this? I do think that there are reads in my bam file because I downloaded the bam file from 1000 genomes website. I think that I chose the correct reference genome in IGV but I will double check to make sure.
              Last edited by Desiree Wilson; 03-28-2012, 12:58 PM.

              Comment

              • Desiree Wilson
                Junior Member
                • Mar 2012
                • 9

                #8
                Wow. I feel so foolish right now. I selected the wrong reference build in IGV. Thank you so much swbarnes2! And thank you Alex Renwick for helping me! Thank you both!

                Comment

                • swbarnes2
                  Senior Member
                  • May 2008
                  • 910

                  #9
                  Originally posted by Desiree Wilson View Post
                  Thanks swbarnes2! This is really good advice! Man I am soooo green! I'm so new to bioinformatics but I'm more than willing to learn!

                  I do not know how to spot-check to see if I really have reads. May you please direct me to some literature that can teach me how to do this? I do think that there are reads in my bam file because I downloaded the bam file from 1000 genomes website. I think that I chose the correct reference genome in IGV but I will double check to make sure.
                  Well, you can't actaully eyeball a .bam. It's gibberish. But you convert it to .sam (samtools view out.bam > out.sam) and you'll get a very large file that is human readable. (hitting control-C will halt the process early, which is good if you just want to spot-check a bit of the file)

                  Look up the .sam format, and learn it, at least the first 8 columns or so. Learn to interpret the flags. The flags you want to see are 83, 99, 147,163. Figure out why those are the good ones.

                  The thing is, when you are doing everything right, but stuff isn't working, it's probably because you are making some assumption that is wrong, so that's when you stop and double-check all of your assumptions, and one of those assumptions is that your data is good.

                  If you had let samtools view convert the .bam to .sam for a little while, you could have looked at that .sam, and just confirmed that yes, the file is not corrupted, yes, most of the reads really did align, yes, the chromosome names in the .sam match the chromosome names of my reference file, and yes, there are supposed to be reads visible in this region of chr 1, etc.

                  You might have spotted the discrepancy between your genome version and the version used to make the .bam at that point.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    Yesterday, 10:05 AM
                  • SEQadmin2
                    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                    by SEQadmin2


                    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                    Introduction

                    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                    05-22-2026, 06:42 AM
                  • SEQadmin2
                    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                    by SEQadmin2

                    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                    05-06-2026, 09:04 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Yesterday, 12:03 PM
                  0 responses
                  19 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, Yesterday, 11:40 AM
                  0 responses
                  14 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-28-2026, 11:40 AM
                  0 responses
                  29 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-26-2026, 10:12 AM
                  0 responses
                  31 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...