What I see (and displayed in attached images) is that after filtering the first end of a set of paired end reads (the file like _1.fastq), there's is an increase in %T at the 3' end. This only occurs on the first (_1) reads, not the second end reads.
I noticed this on some data of my own and pulled a few files down from the sequence read archive and I found some (not all) that show the same pattern. I'm using fastqc to show the images, but I also tested with the fastx toolkit plotting. I'm using fastx toolkit to do the filtering but I've also used a custom script. So those can be ruled out.
Here's what I do. (that fastq files is from some study that uses BS-Seq and paired -end):
before filtering, the per-base-sequence content image from fastqc looks like the image labelled as such below. Even before filtering, there is some increase in %T at the final base of the read.
in the image named post_filter_per_base_sequence_content, you can see that at the 3' end of the read, the %T increases greatly.
Any ideas on why this would happen?
I noticed this on some data of my own and pulled a few files down from the sequence read archive and I found some (not all) that show the same pattern. I'm using fastqc to show the images, but I also tested with the fastx toolkit plotting. I'm using fastx toolkit to do the filtering but I've also used a custom script. So those can be ruled out.
Here's what I do. (that fastq files is from some study that uses BS-Seq and paired -end):
Code:
wget ftp://ftp.ncbi.nlm.nih.gov/sra/Submissions/SRA012/SRA012457/SRX019113/SRR039814_1.fastq.bz2 bunzip2 SRR039814_1.fastq.bz2 /usr/local/src/fastqc/FastQC/fastqc SRR039814_1.fastq fastq_quality_trimmer -Q 33 -t 20 -l 30 -i SRR039814_1.fastq > SRR039814_1.trim.fastq /usr/local/src/fastqc/FastQC/fastqc SRR039814_1.trim.fastq
in the image named post_filter_per_base_sequence_content, you can see that at the 3' end of the read, the %T increases greatly.
Any ideas on why this would happen?
Comment