Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting a subset of bam files

    Hi everyone,

    What I have:
    A bam file with 10kb MP reads mapped back to an in house reference genome assembled in Allpaths-LG. Going to be used for visual inspection in IGV and Geneious to evaluate a possible scaffold break up suggested by linkage map.

    What I want:
    To extract a new bam file that contains the mapped reads for only one of the scaffolds in my reference genome.

    Problem:
    samtools view -b mybamfile.bam 'scaffold000046' > scf000046.bam

    The resulting file lists all the original scaffolds in the header, like this:

    @SQ SN:scaffold_0 LN:21965366
    @SQ SN:scaffold_1 LN:18670197
    @SQ SN:scaffold_2 LN:16657575
    @SQ SN:scaffold_3 LN:15264954
    @SQ SN:scaffold_4 LN:14680565
    ...

    I may have misunderstood the output of the above command, but I expected to find only the target scaffold in the new bam-file header.

    Any help would be much appreciated. Feel free to direct me if the solution for my problem has already been posted elsewhere. However, have not succeeded in finding one.

    Best,
    Tore
    Last edited by toreoe; 04-14-2014, 10:17 AM.

  • #2
    Index scf000046.bam ('samtools index scf000046.bam') and then see what 'samtools idxstats scf000046.bam' says
    Are there reads reported for all the scaffolds or just scaffold000046?

    Comment


    • #3
      I extracted another, scaffold0008, from a different bam file. I then indexed scaffold00008.bam as suggested. samtools idxstats reported reads only for the target scaffold, none for the others:

      scaffold00001 21966007 0 0
      scaffold00002 18670670 0 0
      scaffold00003 16657927 0 0
      scaffold00004 15265287 0 0
      scaffold00005 14680965 0 0
      scaffold00006 14166276 0 0
      scaffold00007 14112451 0 0
      scaffold00008 13353334 2289885 55567
      scaffold00009 13059713 0 0

      scaffold00008 loaded properly in IGV, something I couldn't do with my previous extracted file. So it might be that there was something wrong with my original bam file.

      Anyways, it seems like this is simply the way the samtools view extract command outputs the subset bam, i.e. contains all scaffolds from original bam in header. And it works fine.

      Thank you very much for your help, jeales.

      Comment


      • #4
        Thats good, if there were reads from other scaffolds an awful lot of my work would be wrong as well!

        I don't have a good solution for trimming the header to only include scaffolds/chromosomes that have reads present in the bam, other than manual editing

        But it's not realyl doing any harm as long as the counts are zero

        Comment


        • #5
          The header is left as-is for technical reasons. In BAM files (and internally by samtools and likely picard too) the chromosome/contig field of an alignment is just a numeric index into this list in the header. Trimming the header, then would result in needing to alter every read as it was output, which would be annoying (this is also why people need to be careful when they reheader BAM files, since changing the order of the contigs/chromosomes in the header will make all of the alignments wrong), though I guess not that hard to implement (it would affect performance, though).

          Comment


          • #6
            Very informative thanks
            So if all the alignments in a bam are mapped to the 10th chromosome/contig in the header their chromosome/contig will be 10 (or probably 9, if you start counting at 0), irrespective of chromosome name?

            Comment


            • #7
              Exactly. As you surmised, everything is 0-based internally, so the value is 9. For those curious, the value is unsigned, so a value of -1 used for unmapped reads.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 05-10-2024, 06:35 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-09-2024, 02:46 PM
              0 responses
              26 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-06-2024, 07:17 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Working...
              X