Hi all,
I'm a virgin here, I was referred to this forum by a wonderful post-doc. Myself, I'm a first year PhD student so go easy on me...
So I'm working with NGS data (duh) and I have a couple of questions.
A little background. I have ~350 fastq paired end read files. Illumina hiseq I'm guessing.
1. Is the adapter sequence the same for each file?
2. How do I determine the adapter sequence? I've been using fastQC and under overrepresented sequences, I get this "GATCGGAAGAGCGTCGTGTAGGGAAAGAGGGTAGATCTCGGTGGTCGCCG"
Now when I run my adapter filtering program, cutadapt for example, is the whole sequence my adapter? The post-doc truncated this sequence to "AGATCGGAAGAGC" is he right?
I'm a little confused on the reasoning behind adapter filtering (I comprehend why you do it; to remove the adapter region which is not a part of the query sequence).
Thanks for all the help.
I'm a virgin here, I was referred to this forum by a wonderful post-doc. Myself, I'm a first year PhD student so go easy on me...
So I'm working with NGS data (duh) and I have a couple of questions.
A little background. I have ~350 fastq paired end read files. Illumina hiseq I'm guessing.
1. Is the adapter sequence the same for each file?
2. How do I determine the adapter sequence? I've been using fastQC and under overrepresented sequences, I get this "GATCGGAAGAGCGTCGTGTAGGGAAAGAGGGTAGATCTCGGTGGTCGCCG"
Now when I run my adapter filtering program, cutadapt for example, is the whole sequence my adapter? The post-doc truncated this sequence to "AGATCGGAAGAGC" is he right?
I'm a little confused on the reasoning behind adapter filtering (I comprehend why you do it; to remove the adapter region which is not a part of the query sequence).
Thanks for all the help.
Comment