Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • saving samtools mpileup output from a cluster

    Hi,

    I am trying to use bwa and samtools to see if in the CEU transcriptome there are multiple splice variants of a couple of genes.

    I got the CEU transcriptome from here:



    (I used the fastq files), and I went on to use bwa to

    1) align the fastq reads to my reference file with the possible splices
    2) creating a sam file from (1)

    Now I have transformed the sam files in bam files, and additionally sorted and indexed them.

    What I want to do now is the following: see a (m)pileup of my reference splices against the bam files (161 bam files in total).

    My problem is that I am running everything on a cluster, so I cannot run samtools mpileup to give me an interactive view of the alignment.

    What I'd like to do is to get out of samtools some output text file that tells me, for every bam file, if there is anything aligning to my splices, and some sort of read depth/other quality score/p-value/whatever.

    Any idea on how to do that? I am running out of ideas (I self taught bwa and samtools in the last 3 days, so I feel I am running out of my intuition).

  • #2
    Originally posted by Fedster View Post
    Hi,

    I am trying to use bwa and samtools to see if in the CEU transcriptome there are multiple splice variants of a couple of genes.

    I got the CEU transcriptome from here:



    (I used the fastq files), and I went on to use bwa to

    1) align the fastq reads to my reference file with the possible splices
    2) creating a sam file from (1)

    Now I have transformed the sam files in bam files, and additionally sorted and indexed them.

    What I want to do now is the following: see a (m)pileup of my reference splices against the bam files (161 bam files in total).

    My problem is that I am running everything on a cluster, so I cannot run samtools mpileup to give me an interactive view of the alignment.

    What I'd like to do is to get out of samtools some output text file that tells me, for every bam file, if there is anything aligning to my splices, and some sort of read depth/other quality score/p-value/whatever.

    Any idea on how to do that? I am running out of ideas (I self taught bwa and samtools in the last 3 days, so I feel I am running out of my intuition).
    You can generate a consensus like this:

    Code:
    /samtools-0.1.18/samtools/ mpileup -q 5 -Q 15 -l [Interval_File] -uABf [reference_sequence.fa] [aligned_file.bam] | /samtools-0.1.18/bcftools/bcftools view -bcg - > [intermediate_file.bcf] &
    
    /samtools-0.1.18/bcftools/bcftools view [intermediate_file.bcf] > [consensus.txt]
    Where your [Interval_File] must be in a format where positions are denoted as:

    chr1 3301721
    chr1 3313108
    chr1 3319339

    and intervals are denoted as:

    chr1 2985720 2985880
    chr1 3102667 3103058
    chr1 3160629 3160721

    Comment


    • #3
      Originally posted by Heisman View Post
      Where your [Interval_File] must be in a format where positions are denoted as:

      chr1 3301721
      chr1 3313108
      chr1 3319339

      and intervals are denoted as:

      chr1 2985720 2985880
      chr1 3102667 3103058
      chr1 3160629 3160721
      Excellent thanks! Just a quick question: I have two genes only, but 15 possible splices in total. Should my interval file be

      chr6 3301721
      chr19 3301721
      chr6 2985720 2985880
      chr19 2985720 2985880

      (the positions/intervals I just used are random), or have a position/interval for each splice?

      many thanks!

      Comment


      • #4
        Originally posted by Fedster View Post
        Excellent thanks! Just a quick question: I have two genes only, but 15 possible splices in total. Should my interval file be

        chr6 3301721
        chr19 3301721
        chr6 2985720 2985880
        chr19 2985720 2985880

        (the positions/intervals I just used are random), or have a position/interval for each splice?

        many thanks!
        You can do either; doing one for each splice site region would be yield a smaller output file that would be a lot easier to look through visually.

        Comment


        • #5
          Originally posted by Heisman View Post
          You can do either; doing one for each splice site region would be yield a smaller output file that would be a lot easier to look through visually.
          Thanks a lot! final questions: do I need to change the suffix of my fats file to.fa, and can I run mpileup on oll the bam files at once (I just want to know if, as a population, the CEU show more than one possible splice, I don't care for any specific individual).

          Again, amy thanks!

          Comment


          • #6
            Originally posted by Fedster View Post
            Thanks a lot! final questions: do I need to change the suffix of my fats file to.fa, and can I run mpileup on oll the bam files at once (I just want to know if, as a population, the CEU show more than one possible splice, I don't care for any specific individual).

            Again, amy thanks!
            You don't need to change the suffix of your files.

            You can specify multiple bam files, see this: http://samtools.sourceforge.net/samtools.shtml

            Alternatively, you could merge all of the bam files together and then run mpileup on the merged bam file. I don't know if one would be faster than the other.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM
            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 05-24-2024, 07:15 AM
            0 responses
            198 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 10:28 AM
            0 responses
            219 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 07:35 AM
            0 responses
            228 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-22-2024, 02:06 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Working...
            X