Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace N's in R2 read, if I know the sequence

    Hi,

    I have a library which was constructed using a restriction enzyme (ApeKI). A paired-end 100 bp Illumina sequencing was performed.
    The R1 (first read of a pair) reads consisted of a barcode and sequence and were fine. No barcode was added to R2 reads. Therefore, R2 reads should always start with the same sequence (C A/T GC). In most cases that is true, but unfortunately about 15 % of the reads have a 'N' at position 1, and 30 % of the reads have 'N' at position 4. In the filtering steps I usually remove sequences with at least one 'N', resulting in a loss of more than 30 % of the reads at the moment.

    The other reads look good and can be mapped against the reference without problem, i.e. there was no general problem with the sequencing (else I would not expect the remaining reads to match the reference).

    Since I haven't seen a publication yet where they described something like that, my question is:

    Can I replace the N's at position 1 and 4 since I know what the sequence should be, before performing quality filtering? Do you see any problems with that approach?

    Some programs need the restriction enzyme cutsite for analysis, that is why I do not want to just trim the first 4 nucleotides...

  • #2
    Rather than changing primary data (which reviewers may not look upon kindly down the road) you should consider adding phiX to this sample and re-run (if you do not have enough data). Low nucleotide diversity is never good for illumina runs and one of the manifestations is the problem (N's) you experienced.

    Alternatively give up on the 15% reads that have this problem and use the remainder (if you have enough good data).

    Comment


    • #3
      Thanks for your comment, GenoMax. I also think that reviewers would not be too happy with anyone changing raw data. And I also do not like the idea so much.

      Unfortunately, re-running is not a possibility. The run was spiked with PhiX, but only 1%.

      To circumvent the low diversity problems, R1 reads started with barcodes of different length and composition, to ensure high diversity. As far as I know, low diversity at the start of the R2 reads should not be too problematic, since "calibration" is performed at the start of R1 reads. The parameters of R1 reads are then also used in R2 read sequencing. But I am not too sure about that.

      In my first post, I was not precise enough: until now, I trimmed the first 5 positions of all reads, performed further quality checks (more than 90% of nucleotides have to have qualities higher than 20), and then mapped the reads. I could not see any difference between those reads with or without 'N' in the first 5 positions except of those 'N's.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      47 views
      0 likes
      Last Post seqadmin  
      Working...
      X