Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Fedster
    Junior Member
    • Mar 2012
    • 5

    saving samtools mpileup output from a cluster

    Hi,

    I am trying to use bwa and samtools to see if in the CEU transcriptome there are multiple splice variants of a couple of genes.

    I got the CEU transcriptome from here:



    (I used the fastq files), and I went on to use bwa to

    1) align the fastq reads to my reference file with the possible splices
    2) creating a sam file from (1)

    Now I have transformed the sam files in bam files, and additionally sorted and indexed them.

    What I want to do now is the following: see a (m)pileup of my reference splices against the bam files (161 bam files in total).

    My problem is that I am running everything on a cluster, so I cannot run samtools mpileup to give me an interactive view of the alignment.

    What I'd like to do is to get out of samtools some output text file that tells me, for every bam file, if there is anything aligning to my splices, and some sort of read depth/other quality score/p-value/whatever.

    Any idea on how to do that? I am running out of ideas (I self taught bwa and samtools in the last 3 days, so I feel I am running out of my intuition).
  • Heisman
    Senior Member
    • Dec 2010
    • 534

    #2
    Originally posted by Fedster View Post
    Hi,

    I am trying to use bwa and samtools to see if in the CEU transcriptome there are multiple splice variants of a couple of genes.

    I got the CEU transcriptome from here:



    (I used the fastq files), and I went on to use bwa to

    1) align the fastq reads to my reference file with the possible splices
    2) creating a sam file from (1)

    Now I have transformed the sam files in bam files, and additionally sorted and indexed them.

    What I want to do now is the following: see a (m)pileup of my reference splices against the bam files (161 bam files in total).

    My problem is that I am running everything on a cluster, so I cannot run samtools mpileup to give me an interactive view of the alignment.

    What I'd like to do is to get out of samtools some output text file that tells me, for every bam file, if there is anything aligning to my splices, and some sort of read depth/other quality score/p-value/whatever.

    Any idea on how to do that? I am running out of ideas (I self taught bwa and samtools in the last 3 days, so I feel I am running out of my intuition).
    You can generate a consensus like this:

    Code:
    /samtools-0.1.18/samtools/ mpileup -q 5 -Q 15 -l [Interval_File] -uABf [reference_sequence.fa] [aligned_file.bam] | /samtools-0.1.18/bcftools/bcftools view -bcg - > [intermediate_file.bcf] &
    
    /samtools-0.1.18/bcftools/bcftools view [intermediate_file.bcf] > [consensus.txt]
    Where your [Interval_File] must be in a format where positions are denoted as:

    chr1 3301721
    chr1 3313108
    chr1 3319339

    and intervals are denoted as:

    chr1 2985720 2985880
    chr1 3102667 3103058
    chr1 3160629 3160721

    Comment

    • Fedster
      Junior Member
      • Mar 2012
      • 5

      #3
      Originally posted by Heisman View Post
      Where your [Interval_File] must be in a format where positions are denoted as:

      chr1 3301721
      chr1 3313108
      chr1 3319339

      and intervals are denoted as:

      chr1 2985720 2985880
      chr1 3102667 3103058
      chr1 3160629 3160721
      Excellent thanks! Just a quick question: I have two genes only, but 15 possible splices in total. Should my interval file be

      chr6 3301721
      chr19 3301721
      chr6 2985720 2985880
      chr19 2985720 2985880

      (the positions/intervals I just used are random), or have a position/interval for each splice?

      many thanks!

      Comment

      • Heisman
        Senior Member
        • Dec 2010
        • 534

        #4
        Originally posted by Fedster View Post
        Excellent thanks! Just a quick question: I have two genes only, but 15 possible splices in total. Should my interval file be

        chr6 3301721
        chr19 3301721
        chr6 2985720 2985880
        chr19 2985720 2985880

        (the positions/intervals I just used are random), or have a position/interval for each splice?

        many thanks!
        You can do either; doing one for each splice site region would be yield a smaller output file that would be a lot easier to look through visually.

        Comment

        • Fedster
          Junior Member
          • Mar 2012
          • 5

          #5
          Originally posted by Heisman View Post
          You can do either; doing one for each splice site region would be yield a smaller output file that would be a lot easier to look through visually.
          Thanks a lot! final questions: do I need to change the suffix of my fats file to.fa, and can I run mpileup on oll the bam files at once (I just want to know if, as a population, the CEU show more than one possible splice, I don't care for any specific individual).

          Again, amy thanks!

          Comment

          • Heisman
            Senior Member
            • Dec 2010
            • 534

            #6
            Originally posted by Fedster View Post
            Thanks a lot! final questions: do I need to change the suffix of my fats file to.fa, and can I run mpileup on oll the bam files at once (I just want to know if, as a population, the CEU show more than one possible splice, I don't care for any specific individual).

            Again, amy thanks!
            You don't need to change the suffix of your files.

            You can specify multiple bam files, see this: http://samtools.sourceforge.net/samtools.shtml

            Alternatively, you could merge all of the bam files together and then run mpileup on the merged bam file. I don't know if one would be faster than the other.

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM
            • SEQadmin2
              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
              by SEQadmin2


              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


              Introduction

              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
              05-22-2026, 06:42 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            21 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            40 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            46 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            49 views
            0 reactions
            Last Post SEQadmin2  
            Working...