Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools mpileup segfault with positions list

    Hello,

    I am using samtools v 1.2 on a machine with Ubuntu 14.04.2 and I am trying
    to generate a SNP table from sorted bam files (generated from a GBS library
    prep) aligned to a reference. So far I have used bwa for the alignment,
    samtools view to convert sam to bam, samtools sort, and samtools index.
    Those are all working fine, but when I use samtools mpileup, I get a
    segmentation fault. I think it is because I am trying to pass mpileup a
    list of positions to include from the reference with the -l flag (I don't
    get a segfault if I leave out the list of positions):

    samtools mpileup -g -b ~/Data/alignments/bamfilenames.txt -f
    /Data/alignments/reference.fa -l ~/Data/alignments/positions_to_include.txt
    > fish_snps.bcf

    This returns "segmentation fault (core dumped)." It also segaults if I just
    pass it a single bam file, rather than a list of bam files. How does the
    list of positions file need to be formatted? Is this what could be causing
    the segfault?

    Thanks for any help,

    Sierra

  • #2
    What does your regions file look like?

    Comment


    • #3
      It's just a list of the chromosome names in one column and the position along the chromosome in the other, like so ...

      chrUn 1
      chrUn 2
      chrUn 3
      chrUn 4
      ...
      chr_Sex 353216
      chr_Sex 353217
      chr_Sex 353218
      chr_Sex 353219
      chr_Sex 353220
      chr_Sex 353221
      chr_Sex 353222

      I'm trying to exclude SNPs called in regions of my reference genome that are potential paralogs, which I've identified by read depth of the original short reads used to build the reference.

      Comment


      • #4
        I'm not sure what's going on.

        Looks like it needs a "bed file".


        Code says this for option 'l' ("el", not one "1") ... at line 846 in "bam_plcmd.c"


        case 'l':

        // In the original version the whole BAM was streamed which is inefficient
        // with few BED intervals and big BAMs. Todo: devise a heuristic to determine
        // best strategy, that is streaming or jumping.
        mplp.bed = bed_read(optarg);
        if (!mplp.bed) { print_error_errno("Could not read file \"%s\"", optarg); return 1; }
        break;

        _________

        If you know C or gdb, you can always debug using "printfs" or compile with -g (debug on) and step through using gdb ( linux debugger) .... a little advanced for most but very easy for linux wizards.
        Last edited by Richard Finney; 05-07-2015, 01:10 PM.

        Comment


        • #5
          Hmm, that might be it. Based on the vague man page, I thought the -l flag could take a list of positions or a BED file:

          -l FILE list of positions (chr pos) or regions (BED) [null]

          but possibly it needs to be BED formatted. I will look into that. Thanks!

          I'm very far from a linux wizard, but will enlist the help of the nice wizards down the hall to help me debug with printfs and gdb.

          Comment


          • #6
            You can find *where* it's segfaulting using printf (and an immediate fflush) very easily.

            figuring out *why* it segfaulted is another dimension.

            Comment


            • #7
              I realize that stackoverflow and other venues might be a more appropriate venue for this question, but what's the usage for printf() and fflush()? Right before my call to samtools? Inexperienced linux user here...

              Comment


              • #8
                samtools is written in C.
                The main author of samtools, Heng Li, provides the source which is *very* easy to compile.
                You can edit this code and type "make" and , if you didn't make a syntax error,
                have a new samtools executable with your modifications.

                the printf function is a convenient way to output strings.
                So, you can do things like ....

                printf("hey, i'm here before call to bed() function\n'); fflush(stdout);

                Output is not always done immediately, so the fflush forces the issue; before the next
                statement is executed.

                If you see your custom output, then the segfault, you know it happened *after* your printf call. If you had another printf expected to execute later, but you did not see it, then
                your program bombed *before* that statement.

                Strategic and refined placements or printfs can quickly tell you where the bomb blew up.
                It takes about 2 seconds to compile a modifed samtools, so this is often faster and easier thand cranking up gdb (a famous linux debugger). gdb can be used for tricky situations where more robust monitoring the states of variables is required.

                ____
                If you did ask on stackoverflow, your question would get promptly closed as "off topic" ... and you'd get ridiculed. The most frequently asked questions seem to suffer this fate over there.
                Last edited by Richard Finney; 05-08-2015, 04:50 AM.

                Comment


                • #9
                  Sweet, thanks!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Best Practices for Single-Cell Sequencing Analysis
                    by seqadmin



                    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                    Today, 07:15 AM
                  • seqadmin
                    Latest Developments in Precision Medicine
                    by seqadmin



                    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                    Somatic Genomics
                    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                    05-24-2024, 01:16 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 08:18 AM
                  0 responses
                  11 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Today, 08:04 AM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-03-2024, 06:55 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-30-2024, 03:16 PM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X