I'm pretty new to NGS. I have Illumina GAII 36bp paired reads for several bacterial genomes. The sequencing was carried out in one run, using 2 lanes.
I have been using FASTX toolkit to produce quality statistics of both sets of reads of each isolate from the Solexa fastq files. From that output, a boxplot and nucleotide distribution graph for each set of reads of each isolate has been produced which has prompted 2 main questions:
1. In the boxplot, it plots the median quality scores against nucleotide position. For both sets of reads for all isolates for all 36bp, the median score is 34. Is it normal to get this much consistency? I had been told that the median score tends to tail off a bit lower towards the 3' end of the read.
2. For each 36bp position of the reads the graph shows the ACGT nucletide distribution. According to the nucleotide distribution graph, for each set of reads for every isolate, the first 3 nucleotide positions are skewed in comparison to the rest of the read. I believe it should be fairly constant, reflecting all the reads covering the whole genome.
Is this due to adaptor contamination? Again, is it normal to get this sort of consistency?
The bacteria are closely related and the sequencing was carried out in one run. Consequently I have trimmed the first 3 nucleotides from each read prior to assembly against a reference genome.
I'd be grateful if anyone can explain what I'm seeing here?
I have been using FASTX toolkit to produce quality statistics of both sets of reads of each isolate from the Solexa fastq files. From that output, a boxplot and nucleotide distribution graph for each set of reads of each isolate has been produced which has prompted 2 main questions:
1. In the boxplot, it plots the median quality scores against nucleotide position. For both sets of reads for all isolates for all 36bp, the median score is 34. Is it normal to get this much consistency? I had been told that the median score tends to tail off a bit lower towards the 3' end of the read.
2. For each 36bp position of the reads the graph shows the ACGT nucletide distribution. According to the nucleotide distribution graph, for each set of reads for every isolate, the first 3 nucleotide positions are skewed in comparison to the rest of the read. I believe it should be fairly constant, reflecting all the reads covering the whole genome.
Is this due to adaptor contamination? Again, is it normal to get this sort of consistency?
The bacteria are closely related and the sequencing was carried out in one run. Consequently I have trimmed the first 3 nucleotides from each read prior to assembly against a reference genome.
I'd be grateful if anyone can explain what I'm seeing here?
Comment