Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pysam Fetching Incorrect Coordinates

    Hello everyone. I am using pysam to fetch the reads overlapping a given region, however, I am finding that pysam returns a list of reads whose positions do not correspond with the region coordinates that I input. The BAM file is indexed with samtools index 'file.bam'

    For example, I want to fetch this region: chr3:50617849-50618103

    I do so with the following code (inside python command line):
    Code:
    import pysam
    bam = pysam.AlignmentFile('file.bam')
    reads = bam.fetch('chr3',50617849,50618103)
    for read in reads:
        print read.pos
    The first coordinate printed is 50335428, which is outside the scope of my input region completely.

    Does anybody know why this is happening? The BAM file was created from a STAR alignment of paired-end reads against the hg19 human reference genome in case that matters.

  • #2
    Right, but does the read stretch into or span that region? You're not asking for reads starting in that interval, you're asking for those that intersect it in any way (including splicing over it).

    Comment


    • #3
      Well the distance from that read to the actual start location is ~280,000 nucleotides. I suppose then the read would have to be spliced? Here is one of the reads in question:

      Code:
      ABC-ST978_0126:2:1307:16045:21543#0     403     2       50335428 3
              91M317386N10M   2       50335207        101     GGGGTCTGTAACCACCAGGTCTATAAAACAGCCGACCCAATCTACTTGCTGGCCTTCTGTCTTCCAGTAGTCCTCCTAGTCCACCACTAGGAGGACTAGGA       array('B', [66, 66, 66, 64, 66, 66, 66, 65, 63, 64, 64, 66, 66, 66, 66, 65, 66, 66, 66, 66, 66, 67, 66, 65, 66, 66, 66, 66, 66, 66, 66, 68, 68, 70, 70, 72, 72, 71, 72, 71, 71, 71, 69, 70, 72, 71, 71, 72, 71, 70, 70, 71, 72, 72, 72, 72, 72, 71, 71, 71, 72, 71, 71, 72, 72, 72, 72, 71, 72, 71, 69, 72, 71, 69, 72, 71, 71, 72, 72, 72, 71, 70, 70, 72, 70, 69, 72, 72, 70, 70, 70, 70, 70, 68, 66, 68, 68, 68, 65, 64, 64])    [('NH', 2), ('HI', 2), ('AS', 189), ('nM', 0)]
      I suppose that 317386N region is the gap?

      Comment


      • #4
        Yup, it's a spliced read, which you can tell from the N cigar operator.

        Comment


        • #5
          Interesting, glad I know this now. Thanks!

          Comment


          • #6
            No problem, I've been tripped up by this as well. In case you ever use the pileup() functions, keep in mind that this will happen there as well.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Choosing Between NGS and qPCR
              by seqadmin



              Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
              10-18-2024, 07:11 AM
            • seqadmin
              Non-Coding RNA Research and Technologies
              by seqadmin




              Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

              Nobel Prize for MicroRNA Discovery
              This week,...
              10-07-2024, 08:07 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 11-01-2024, 06:09 AM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-30-2024, 05:31 AM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-24-2024, 06:58 AM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-23-2024, 08:43 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Working...
            X