Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace N's in R2 read, if I know the sequence

    Hi,

    I have a library which was constructed using a restriction enzyme (ApeKI). A paired-end 100 bp Illumina sequencing was performed.
    The R1 (first read of a pair) reads consisted of a barcode and sequence and were fine. No barcode was added to R2 reads. Therefore, R2 reads should always start with the same sequence (C A/T GC). In most cases that is true, but unfortunately about 15 % of the reads have a 'N' at position 1, and 30 % of the reads have 'N' at position 4. In the filtering steps I usually remove sequences with at least one 'N', resulting in a loss of more than 30 % of the reads at the moment.

    The other reads look good and can be mapped against the reference without problem, i.e. there was no general problem with the sequencing (else I would not expect the remaining reads to match the reference).

    Since I haven't seen a publication yet where they described something like that, my question is:

    Can I replace the N's at position 1 and 4 since I know what the sequence should be, before performing quality filtering? Do you see any problems with that approach?

    Some programs need the restriction enzyme cutsite for analysis, that is why I do not want to just trim the first 4 nucleotides...

  • #2
    Rather than changing primary data (which reviewers may not look upon kindly down the road) you should consider adding phiX to this sample and re-run (if you do not have enough data). Low nucleotide diversity is never good for illumina runs and one of the manifestations is the problem (N's) you experienced.

    Alternatively give up on the 15% reads that have this problem and use the remainder (if you have enough good data).

    Comment


    • #3
      Thanks for your comment, GenoMax. I also think that reviewers would not be too happy with anyone changing raw data. And I also do not like the idea so much.

      Unfortunately, re-running is not a possibility. The run was spiked with PhiX, but only 1%.

      To circumvent the low diversity problems, R1 reads started with barcodes of different length and composition, to ensure high diversity. As far as I know, low diversity at the start of the R2 reads should not be too problematic, since "calibration" is performed at the start of R1 reads. The parameters of R1 reads are then also used in R2 read sequencing. But I am not too sure about that.

      In my first post, I was not precise enough: until now, I trimmed the first 5 positions of all reads, performed further quality checks (more than 90% of nucleotides have to have qualities higher than 20), and then mapped the reads. I could not see any difference between those reads with or without 'N' in the first 5 positions except of those 'N's.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Understanding Genetic Influence on Infectious Disease
        by seqadmin




        During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

        Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
        09-09-2024, 10:59 AM
      • seqadmin
        Addressing Off-Target Effects in CRISPR Technologies
        by seqadmin






        The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
        08-27-2024, 04:44 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 02:44 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 09-06-2024, 08:02 AM
      0 responses
      145 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 09-03-2024, 08:30 AM
      0 responses
      152 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 08-27-2024, 04:40 AM
      0 responses
      159 views
      0 likes
      Last Post seqadmin  
      Working...
      X