Hi there,
I just got a RNA-Seq 50bp paired-end data generated by Illumina HiSeq. A strange thing is all the last base of the paired reads are N. For example:
$ head -4 m1.R1.fastq
@HISEQ:54:C1YCKACXX:1:1210:8881:74275/1
TACCTGGTTGATTCTGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCC
+
CCCFFFFDHHHHGJJJJJJJIJJHIJJJJIJJIIIJJJJJJIIJGIIIJI
$ head -48 m1R2.fastq
@HISEQ:54:C1YCKACXX:1:1210:8881:74275/2
AATAAATACACCCCTTCCAGAAGTCGGGGCTTGAATGCATGTATTAGCTN
+
CCCFFFFFHHHHHJJJJJJJJJJHHIJJJJJJIJJIJIJIIGGHJIIIE#
...
CCCCTTCGCGGGGGTCAGCGCCCGTCGGCATGTATTAGCTCTAGAATTAN
...
ATGAGCCATTCGCAGTTTCACAGTACATAGTTGCTTATACTTAGACATGN
I wonder what causes these unknown Ns? And why they only happen in one read of a pair-end reads? Is it worth trimming the Ns? If I don't, what will be the possible side-effects?
Thanks for any suggestions.
I just got a RNA-Seq 50bp paired-end data generated by Illumina HiSeq. A strange thing is all the last base of the paired reads are N. For example:
$ head -4 m1.R1.fastq
@HISEQ:54:C1YCKACXX:1:1210:8881:74275/1
TACCTGGTTGATTCTGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCC
+
CCCFFFFDHHHHGJJJJJJJIJJHIJJJJIJJIIIJJJJJJIIJGIIIJI
$ head -48 m1R2.fastq
@HISEQ:54:C1YCKACXX:1:1210:8881:74275/2
AATAAATACACCCCTTCCAGAAGTCGGGGCTTGAATGCATGTATTAGCTN
+
CCCFFFFFHHHHHJJJJJJJJJJHHIJJJJJJIJJIJIJIIGGHJIIIE#
...
CCCCTTCGCGGGGGTCAGCGCCCGTCGGCATGTATTAGCTCTAGAATTAN
...
ATGAGCCATTCGCAGTTTCACAGTACATAGTTGCTTATACTTAGACATGN
I wonder what causes these unknown Ns? And why they only happen in one read of a pair-end reads? Is it worth trimming the Ns? If I don't, what will be the possible side-effects?
Thanks for any suggestions.
Comment