Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace N's in R2 read, if I know the sequence

    Hi,

    I have a library which was constructed using a restriction enzyme (ApeKI). A paired-end 100 bp Illumina sequencing was performed.
    The R1 (first read of a pair) reads consisted of a barcode and sequence and were fine. No barcode was added to R2 reads. Therefore, R2 reads should always start with the same sequence (C A/T GC). In most cases that is true, but unfortunately about 15 % of the reads have a 'N' at position 1, and 30 % of the reads have 'N' at position 4. In the filtering steps I usually remove sequences with at least one 'N', resulting in a loss of more than 30 % of the reads at the moment.

    The other reads look good and can be mapped against the reference without problem, i.e. there was no general problem with the sequencing (else I would not expect the remaining reads to match the reference).

    Since I haven't seen a publication yet where they described something like that, my question is:

    Can I replace the N's at position 1 and 4 since I know what the sequence should be, before performing quality filtering? Do you see any problems with that approach?

    Some programs need the restriction enzyme cutsite for analysis, that is why I do not want to just trim the first 4 nucleotides...

  • #2
    Rather than changing primary data (which reviewers may not look upon kindly down the road) you should consider adding phiX to this sample and re-run (if you do not have enough data). Low nucleotide diversity is never good for illumina runs and one of the manifestations is the problem (N's) you experienced.

    Alternatively give up on the 15% reads that have this problem and use the remainder (if you have enough good data).

    Comment


    • #3
      Thanks for your comment, GenoMax. I also think that reviewers would not be too happy with anyone changing raw data. And I also do not like the idea so much.

      Unfortunately, re-running is not a possibility. The run was spiked with PhiX, but only 1%.

      To circumvent the low diversity problems, R1 reads started with barcodes of different length and composition, to ensure high diversity. As far as I know, low diversity at the start of the R2 reads should not be too problematic, since "calibration" is performed at the start of R1 reads. The parameters of R1 reads are then also used in R2 read sequencing. But I am not too sure about that.

      In my first post, I was not precise enough: until now, I trimmed the first 5 positions of all reads, performed further quality checks (more than 90% of nucleotides have to have qualities higher than 20), and then mapped the reads. I could not see any difference between those reads with or without 'N' in the first 5 positions except of those 'N's.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Advanced Methods for the Detection of Infectious Disease
        by seqadmin




        The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
        ...
        11-27-2023, 01:15 PM
      • seqadmin
        Strategies for Investigating the Microbiome
        by seqadmin




        Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
        11-09-2023, 07:02 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 10:48 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 11-29-2023, 08:26 AM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 11-29-2023, 08:12 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 11-27-2023, 08:12 AM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Working...
      X