Header Leaderboard Ad

Collapse

Method for detecting soft clipped read pileup?

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Method for detecting soft clipped read pileup?

    I'm looking for a way to detect positions in the genome where there is a pileup of soft clipped reads. I've attached an image with an example of what this situation would look like in IGV. Essentially, I'm looking for a tool similar to samtools mpileup, but with the important difference that I want to see soft-clipped reads. I think the issue is that soft-clipped based are not technically aligned at these positions. I know that I could write a script to parse the CIGAR string for each read and detect locations like these, but I'm wondering, is there a tool that can quickly report the locations where reads start getting soft-clipped?


    I'm imagining a version of samtools mpileup that would report something like this:
    10 141352 N 105 a$A$A$a$aSASASaSaaaaAAaAAaAaAaaaAaAaAAAaAAaaAAaAAaaAaAaAAaAAAaAAaAaAAAAAAaAAaaAaaaaaaaaaAaaaAaAAaAaaAaaAAaaaaAaA^]a @<@?;?>[email protected]@[email protected]????A>@@[email protected]@[email protected]@@?>@:>[email protected]@[email protected][email protected]@>[email protected]@@[email protected][email protected][email protected][email protected]@?>[email protected][email protected]>>@>[email protected]@@?A>@>?A>[email protected]=?>=??>=?=C>9>

    where the "$", as usual, means that reads are ending at this position, whereas the "S" would mean that bases are "aligned" and soft-clipped at this position.
    Attached Files

  • #2
    This is implemented as part of the REAPR pipeline.
    http://www.sanger.ac.uk/resources/software/reapr/

    You can run the pipeline with:
    reapr pipeline reference.fasta mapped_reads.bam output_directory

    Note that it assumes that the orientation of the reads in the input bam is to point towards each other.

    One of its early stages is to write a file of stats at each base of the reference:
    01.stats.per_base.gz
    (If you don't care about other results, then you can kill the pipeline when it starts writing files with names beginning with 02.)

    Columns 1-2 are the chromosome name and position.
    Columns 17-20 have the number of reads that had their ends soft-clipped at that position. 4 columns because it's broken down into the 4 combinations from reads mapped to the fwd or rev strand and clipped at their start or end (w.r.t to the reference).

    Hope that helps,
    Martin

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Improved Targeted Sequencing: A Comprehensive Guide to Amplicon Sequencing
      by seqadmin



      Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...
      Today, 01:49 PM
    • seqadmin
      Targeted Sequencing: Choosing Between Hybridization Capture and Amplicon Sequencing
      by seqadmin




      Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...
      03-10-2023, 05:31 AM
    • seqadmin
      Expert Advice on Automating Your Library Preparations
      by seqadmin



      Using automation to prepare sequencing libraries isn’t a new concept, and most researchers are aware that there are numerous benefits to automating this process. However, many labs are still hesitant to switch to automation and often believe that it’s not suitable for their lab. To combat these concerns, we’ll cover some of the key advantages, review the most important considerations, and get real-world advice from automation experts to remove any lingering anxieties....
      02-21-2023, 02:14 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 03-17-2023, 12:32 PM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-15-2023, 12:42 PM
    0 responses
    18 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-09-2023, 10:17 AM
    0 responses
    67 views
    1 like
    Last Post seqadmin  
    Started by seqadmin, 03-03-2023, 12:03 PM
    0 responses
    64 views
    0 likes
    Last Post seqadmin  
    Working...
    X