Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace N's in R2 read, if I know the sequence

    Hi,

    I have a library which was constructed using a restriction enzyme (ApeKI). A paired-end 100 bp Illumina sequencing was performed.
    The R1 (first read of a pair) reads consisted of a barcode and sequence and were fine. No barcode was added to R2 reads. Therefore, R2 reads should always start with the same sequence (C A/T GC). In most cases that is true, but unfortunately about 15 % of the reads have a 'N' at position 1, and 30 % of the reads have 'N' at position 4. In the filtering steps I usually remove sequences with at least one 'N', resulting in a loss of more than 30 % of the reads at the moment.

    The other reads look good and can be mapped against the reference without problem, i.e. there was no general problem with the sequencing (else I would not expect the remaining reads to match the reference).

    Since I haven't seen a publication yet where they described something like that, my question is:

    Can I replace the N's at position 1 and 4 since I know what the sequence should be, before performing quality filtering? Do you see any problems with that approach?

    Some programs need the restriction enzyme cutsite for analysis, that is why I do not want to just trim the first 4 nucleotides...

  • #2
    Rather than changing primary data (which reviewers may not look upon kindly down the road) you should consider adding phiX to this sample and re-run (if you do not have enough data). Low nucleotide diversity is never good for illumina runs and one of the manifestations is the problem (N's) you experienced.

    Alternatively give up on the 15% reads that have this problem and use the remainder (if you have enough good data).

    Comment


    • #3
      Thanks for your comment, GenoMax. I also think that reviewers would not be too happy with anyone changing raw data. And I also do not like the idea so much.

      Unfortunately, re-running is not a possibility. The run was spiked with PhiX, but only 1%.

      To circumvent the low diversity problems, R1 reads started with barcodes of different length and composition, to ensure high diversity. As far as I know, low diversity at the start of the R2 reads should not be too problematic, since "calibration" is performed at the start of R1 reads. The parameters of R1 reads are then also used in R2 read sequencing. But I am not too sure about that.

      In my first post, I was not precise enough: until now, I trimmed the first 5 positions of all reads, performed further quality checks (more than 90% of nucleotides have to have qualities higher than 20), and then mapped the reads. I could not see any difference between those reads with or without 'N' in the first 5 positions except of those 'N's.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Non-Coding RNA Research and Technologies
        by seqadmin




        Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

        Nobel Prize for MicroRNA Discovery
        This week,...
        10-07-2024, 08:07 AM
      • seqadmin
        Recent Developments in Metagenomics
        by seqadmin





        Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
        09-23-2024, 06:35 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:55 AM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 10-02-2024, 04:51 AM
      0 responses
      109 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 10-01-2024, 07:10 AM
      0 responses
      114 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 09-30-2024, 08:33 AM
      1 response
      118 views
      0 likes
      Last Post EmiTom
      by EmiTom
       
      Working...
      X