Seqanswers Leaderboard Ad

**TonyBrooks** · 06-04-2014, 06:24 AM

Originally posted by salamay View Post

I have some metagenomic data obtained from whole genome shotgun sequencing using illumina-hiseq. The reads are 100bp paired end and when I examine the reads in fastqc, I see a couple of things. Firstly, the per base sequence content and per base GC content seem to be very skewed at the beginning of the reads (~ bp 1-16), and the per base N content seems to have a spike at bp 4. As well, I have over represented kmers at the beginning of the reads which do not belong to any adapters (as far as I can tell). I know that these trends are sometimes seen in RNA-seq data due to the (not so) random hexamer priming but I am confused as to why I see this in whole genome data. I am also not sure about the N spike at bp 4. I have attached images of what I mentioned and would appreciate any insight.

thanks.

I'm assuming these were sequenced on a HiSeq? The spike at 4 cycles is most likely a phenomenon known as Bottom Middle Swath (or BMS in Illumispeak). The HiSeq attempts to find focus before scanning at a fixed point near the inlet port. If a bubble is present over at this point, then there is a mis-focus and that particular swatch is scanned out of focus. You should be able to see if you look at the thumbnail images for cycle 4. Basecalling can't be done on these images, so each cluster is given an N at this position.

**salamay** · 06-04-2014, 06:49 AM

Thanks tonybrooks, yes it was on a hiseq. I had not heard about this issue before thanks for bringing it to my attention.

**TonyBrooks** · 06-04-2014, 06:56 AM

Originally posted by TonyBrooks View Post

I'm assuming these were sequenced on a HiSeq? The spike at 4 cycles is most likely a phenomenon known as Bottom Middle Swath (or BMS in Illumispeak). The HiSeq attempts to find focus before scanning at a fixed point near the inlet port. If a bubble is present over at this point, then there is a mis-focus and that particular swatch is scanned out of focus. You should be able to see if you look at the thumbnail images for cycle 4. Basecalling can't be done on these images, so each cluster is given an N at this position.

See here for more info

Bottom Middle Swath and other focus issues - SEQanswers

http://seqanswers.com/forums/showthread.php?t=15356

Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)

**lac302** · 06-04-2014, 08:31 AM

I've seen the same fluctuation in GC content over the first 20 or so bases on samples run both on the HiSeq and Miseq. I typically have enough coverage to just trim them off even though the Q scores are always above 30.

**salamay** · 06-04-2014, 10:16 AM

Originally posted by lac302 View Post

I've seen the same fluctuation in GC content over the first 20 or so bases on samples run both on the HiSeq and Miseq. I typically have enough coverage to just trim them off even though the Q scores are always above 30.

Thanks lac302, from what I have done so far I have trimmed the sequences up to bp 16 and worked from there as you seem to have done but I can't figure out the cause for it or whether it is a bit wasteful to trim off 15 bp of useful sequence.

**mastal** · 06-04-2014, 10:47 AM

Was the library prep done using a Nextera kit?

**salamay** · 06-04-2014, 12:26 PM

Originally posted by mastal View Post

Was the library prep done using a Nextera kit?

I believe so but I am not sure and have asked those responsible for the generation of the data. Would using a nextera kit explain what is seen?

**mastal** · 06-04-2014, 03:24 PM

Originally posted by salamay View Post

I believe so but I am not sure and have asked those responsible for the generation of the data. Would using a nextera kit explain what is seen?

Yes. There was a recent thread discussing this. I will post a link if I can find it.

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 22 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

FASTQC trends

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News