Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to mask and not remove "adapter" sequence?

    Hello,

    Our core recently acquired an Illumina system and we've been working through the bench and computational workflows of genome, exome, ChIP, and RNA-sequencing.

    This question is about RNA-seq... My boss came up with a modified random hexamer [not a hexamer, actually] so that we can conserve strandedness. The sequence is actually NNNNN[AGCT]NNNNNN where there is a defined 12 base sequence between the random bases. The point was to see which end of the read this sequence shows up on, which would tell us which strand it came from.

    Unfortunately, when he came up with the design he didn't count on the fact that reads with this sequence (half of the reads) would fail to map. I can easily trim this primer sequence with the usual tools, but he wants to retain it so that we know which end of the read it came from...we just want to ignore it, essentially.

    I have not found software that will soft clip reads based on adapter, vector, or other contaminating sequences...they all shorten the reads. The only work-around I've thought of is to identify this primer sequence in the reads, and to modify [lower] the Q-score of the base so that it can be masked out and soft clipped in the next step. This would then keep the sequence around, but not allow that portion of the read to contribute to alignment...that's the idea anyways.

    But I'm hoping there is a more straight-forward way to do this, as it's quite a hack of the data. Does anyone have any ideas they care to share?

    Thanks!

  • #2
    You could preprocess the fastq file to change the read name according to which end the strandedness sequence is, together with removing it.

    Comment


    • #3
      That's a great idea, thank you! This will obviously require some scripting, but I think it's doable. Basically, when I grep, I can anchor the regex, and then store the end it's coming from and use that to modify the read name.

      I don't suppose anyone could point me to an existing library (biopython lib, something in BioC, etc) that has a function to re-name reads? I'm assuming the typical tools out there (fastx, cutadapt, etc) don't have this functionality built in.

      Thanks...

      EDIT: Actually, it appears that FASTX will rename the reads, so I'll start there!
      Last edited by GeGnome; 04-20-2012, 08:22 AM.

      Comment


      • #4
        cutadapt has now this functionality

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Non-Coding RNA Research and Technologies
          by seqadmin




          Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

          Nobel Prize for MicroRNA Discovery
          This week,...
          10-07-2024, 08:07 AM
        • seqadmin
          Recent Developments in Metagenomics
          by seqadmin





          Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
          09-23-2024, 06:35 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 10-02-2024, 04:51 AM
        0 responses
        103 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-01-2024, 07:10 AM
        0 responses
        111 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-30-2024, 08:33 AM
        1 response
        114 views
        0 likes
        Last Post EmiTom
        by EmiTom
         
        Started by seqadmin, 09-26-2024, 12:57 PM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Working...
        X