Header Leaderboard Ad

Collapse

Tophat: find junction spanning reads

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat: find junction spanning reads

    Perhaps I missed something, but is there a straightforward way to extract reads which span an exon/exon junction (ie, spliced reads) from the accepted_hits.bam produced by Tophat? I thought I could filter to find reads that have an alignment distance (alignment start - alignment end) greater than the read length, but the .bam file only lists the start position. Looking at the .bam file and the SAM specification, I didn't find a flag that is set which I could filter on.

    Am I missing something?

    Thanks!

  • #2
    You can use bamtools filter to find reads with an N in the CIGAR string.

    Comment


    • #3
      Thanks, that's just what I was looking for. Sorry for not looking more carefully at the specification -- I checked the FLAGs, but not the CIGAR string.

      Comment


      • #4
        Does not match?

        Hmm, is it just me, or does the extracted reads (with N:s in the CIGAR) not match the junctions.bed in the tophat results? When i look at the positions from junctions.bed in the BAM file with IGV genome viewer, i cannot see the 400+ reads supposedly supporting that splice junction.

        I have scrolled all the way down in IGV to see all the reads, but nowhere can i see anything that even remotely matches any of the junctions reported in the bed file.

        Python code: Picks out all reads from chr4 that contains N:s (3 since it is a BAM file) in the CIGAR. Should be working just fine. Using pysam library.

        Code:
        .
        .
        .
        # get all the reads from the specified chromosome
        for read in bamFile.fetch("4"):
            
            # check if there is a splice junction in the read
            spliced = re.search('\(3, \d+\)',str(read.cigar))
        
            if spliced:
                # write to outfile
                out.write(read)
        .
        .
        .
        Any ideas?

        Best regards
        Martin
        Last edited by dahlo; 09-27-2011, 06:36 AM.

        Comment


        • #5
          I don't quite follow your logic. If I read the spec correct, the cigar string should look like 15M230N19M (15 matched, 230 intron bp, 19 matched).
          But your regexps matches '(3, 34)' for example.
          I guess you could just check with 'in':
          if 'N' in read.cigar: output.write(read)

          Comment

          Latest Articles

          Collapse

          • seqadmin
            How RNA-Seq is Transforming Cancer Studies
            by seqadmin



            Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
            09-07-2023, 11:15 PM
          • seqadmin
            Methods for Investigating the Transcriptome
            by seqadmin




            Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

            Whole Transcriptome RNA-seq
            Whole transcriptome sequencing...
            08-31-2023, 11:07 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 09-22-2023, 09:05 AM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-21-2023, 06:18 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-20-2023, 09:17 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-19-2023, 09:23 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Working...
          X