Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace N's in R2 read, if I know the sequence

    Hi,

    I have a library which was constructed using a restriction enzyme (ApeKI). A paired-end 100 bp Illumina sequencing was performed.
    The R1 (first read of a pair) reads consisted of a barcode and sequence and were fine. No barcode was added to R2 reads. Therefore, R2 reads should always start with the same sequence (C A/T GC). In most cases that is true, but unfortunately about 15 % of the reads have a 'N' at position 1, and 30 % of the reads have 'N' at position 4. In the filtering steps I usually remove sequences with at least one 'N', resulting in a loss of more than 30 % of the reads at the moment.

    The other reads look good and can be mapped against the reference without problem, i.e. there was no general problem with the sequencing (else I would not expect the remaining reads to match the reference).

    Since I haven't seen a publication yet where they described something like that, my question is:

    Can I replace the N's at position 1 and 4 since I know what the sequence should be, before performing quality filtering? Do you see any problems with that approach?

    Some programs need the restriction enzyme cutsite for analysis, that is why I do not want to just trim the first 4 nucleotides...

  • #2
    Rather than changing primary data (which reviewers may not look upon kindly down the road) you should consider adding phiX to this sample and re-run (if you do not have enough data). Low nucleotide diversity is never good for illumina runs and one of the manifestations is the problem (N's) you experienced.

    Alternatively give up on the 15% reads that have this problem and use the remainder (if you have enough good data).

    Comment


    • #3
      Thanks for your comment, GenoMax. I also think that reviewers would not be too happy with anyone changing raw data. And I also do not like the idea so much.

      Unfortunately, re-running is not a possibility. The run was spiked with PhiX, but only 1%.

      To circumvent the low diversity problems, R1 reads started with barcodes of different length and composition, to ensure high diversity. As far as I know, low diversity at the start of the R2 reads should not be too problematic, since "calibration" is performed at the start of R1 reads. The parameters of R1 reads are then also used in R2 read sequencing. But I am not too sure about that.

      In my first post, I was not precise enough: until now, I trimmed the first 5 positions of all reads, performed further quality checks (more than 90% of nucleotides have to have qualities higher than 20), and then mapped the reads. I could not see any difference between those reads with or without 'N' in the first 5 positions except of those 'N's.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Multiomics Techniques Advancing Disease Research
        by seqadmin


        New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

        A major leap in the field has
        ...
        02-08-2024, 06:33 AM
      • seqadmin
        The 3D Genome: New Technologies and Emerging Insights
        by seqadmin


        The study of three-dimensional (3D) genomics explores the spatial structure of genomes and their role in processes like gene expression and DNA replication. By employing innovative technologies, researchers can study these arrangements to discover their role in various biological processes. Scientists continue to find new ways in which the organization of DNA is involved in processes like development1 and disease2.

        Basic Organization and Structure
        Understanding...
        01-22-2024, 03:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 08:57 AM
      0 responses
      9 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 02-14-2024, 09:19 AM
      0 responses
      42 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 02-12-2024, 03:37 PM
      0 responses
      402 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 02-09-2024, 03:36 PM
      0 responses
      646 views
      0 likes
      Last Post seqadmin  
      Working...
      X