Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to mask and not remove "adapter" sequence?

    Hello,

    Our core recently acquired an Illumina system and we've been working through the bench and computational workflows of genome, exome, ChIP, and RNA-sequencing.

    This question is about RNA-seq... My boss came up with a modified random hexamer [not a hexamer, actually] so that we can conserve strandedness. The sequence is actually NNNNN[AGCT]NNNNNN where there is a defined 12 base sequence between the random bases. The point was to see which end of the read this sequence shows up on, which would tell us which strand it came from.

    Unfortunately, when he came up with the design he didn't count on the fact that reads with this sequence (half of the reads) would fail to map. I can easily trim this primer sequence with the usual tools, but he wants to retain it so that we know which end of the read it came from...we just want to ignore it, essentially.

    I have not found software that will soft clip reads based on adapter, vector, or other contaminating sequences...they all shorten the reads. The only work-around I've thought of is to identify this primer sequence in the reads, and to modify [lower] the Q-score of the base so that it can be masked out and soft clipped in the next step. This would then keep the sequence around, but not allow that portion of the read to contribute to alignment...that's the idea anyways.

    But I'm hoping there is a more straight-forward way to do this, as it's quite a hack of the data. Does anyone have any ideas they care to share?

    Thanks!

  • #2
    You could preprocess the fastq file to change the read name according to which end the strandedness sequence is, together with removing it.

    Comment


    • #3
      That's a great idea, thank you! This will obviously require some scripting, but I think it's doable. Basically, when I grep, I can anchor the regex, and then store the end it's coming from and use that to modify the read name.

      I don't suppose anyone could point me to an existing library (biopython lib, something in BioC, etc) that has a function to re-name reads? I'm assuming the typical tools out there (fastx, cutadapt, etc) don't have this functionality built in.

      Thanks...

      EDIT: Actually, it appears that FASTX will rename the reads, so I'll start there!
      Last edited by GeGnome; 04-20-2012, 08:22 AM.

      Comment


      • #4
        cutadapt has now this functionality

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advanced Tools Transforming the Field of Cytogenomics
          by seqadmin


          At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
          Yesterday, 06:26 AM
        • seqadmin
          How RNA-Seq is Transforming Cancer Studies
          by seqadmin



          Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
          09-07-2023, 11:15 PM
        • seqadmin
          Methods for Investigating the Transcriptome
          by seqadmin




          Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

          Whole Transcriptome RNA-seq
          Whole transcriptome sequencing...
          08-31-2023, 11:07 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 06:57 AM
        0 responses
        6 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 07:53 AM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-25-2023, 07:42 AM
        0 responses
        14 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-22-2023, 09:05 AM
        0 responses
        44 views
        0 likes
        Last Post seqadmin  
        Working...
        X