Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to mask and not remove "adapter" sequence?

    Hello,

    Our core recently acquired an Illumina system and we've been working through the bench and computational workflows of genome, exome, ChIP, and RNA-sequencing.

    This question is about RNA-seq... My boss came up with a modified random hexamer [not a hexamer, actually] so that we can conserve strandedness. The sequence is actually NNNNN[AGCT]NNNNNN where there is a defined 12 base sequence between the random bases. The point was to see which end of the read this sequence shows up on, which would tell us which strand it came from.

    Unfortunately, when he came up with the design he didn't count on the fact that reads with this sequence (half of the reads) would fail to map. I can easily trim this primer sequence with the usual tools, but he wants to retain it so that we know which end of the read it came from...we just want to ignore it, essentially.

    I have not found software that will soft clip reads based on adapter, vector, or other contaminating sequences...they all shorten the reads. The only work-around I've thought of is to identify this primer sequence in the reads, and to modify [lower] the Q-score of the base so that it can be masked out and soft clipped in the next step. This would then keep the sequence around, but not allow that portion of the read to contribute to alignment...that's the idea anyways.

    But I'm hoping there is a more straight-forward way to do this, as it's quite a hack of the data. Does anyone have any ideas they care to share?

    Thanks!

  • #2
    You could preprocess the fastq file to change the read name according to which end the strandedness sequence is, together with removing it.

    Comment


    • #3
      That's a great idea, thank you! This will obviously require some scripting, but I think it's doable. Basically, when I grep, I can anchor the regex, and then store the end it's coming from and use that to modify the read name.

      I don't suppose anyone could point me to an existing library (biopython lib, something in BioC, etc) that has a function to re-name reads? I'm assuming the typical tools out there (fastx, cutadapt, etc) don't have this functionality built in.

      Thanks...

      EDIT: Actually, it appears that FASTX will rename the reads, so I'll start there!
      Last edited by GeGnome; 04-20-2012, 08:22 AM.

      Comment


      • #4
        cutadapt has now this functionality

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 11:49 AM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        61 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X