I'm not sure if I am doing this right...
I have paired end reads of Serratia m.
Here are the steps I took so far:
1. FASTQC report for every reads. Check to see if adapters are a source of contamination. I checked "Overrepresented Sequences" in order to see if there was an adapter or not. If there were no sequences that weren't labeled "No Hit", I leave the read alone.
2. I used cutadapt to cut adapters, which left me with 2 of my adapter cut paired end reads as well as a file for single end reads.
I did the same for scythe, except scythe didn't produce a single end reads file.
3. I noticed that the pair end files were not organized properly by the header, so I made a script to correct this. My script takes 2 paired end reads and gives you an output of the 2 organized paired end reads file with a file containing all the single end reads that did not have a pair.
What do I do now?
I'm confused on what I should do with my single end reads obtained after using cutadapt and I am also confused on what I should do with my single reads obtained after using my script to organize my fastq files by the header.
When I move onto the trimming stage, do I ONLY trim my paired end reads and just ignore the single end reads? Or do I trim my paired end reads as well as my single end reads?
When I am looking for the snps of 1 replicate, do I map the paired end reads as well as any other single end reads onto the reference genome?
Edit:
TL;DR
Are these the right steps to get in order to start mapping my reads?
1. Cut adapters (gives SE file)
2. Quality Trim (gives SE file)
3. Organize pairs (gives SE file)
So by the end of this whole process I am left with 3 SE files and 2 processed paired end reads, giving a total of 5 files.
Do I need to do any quality trimming to the single end reads or do I just take all 5 of my files and map them to the reference?
I have paired end reads of Serratia m.
Here are the steps I took so far:
1. FASTQC report for every reads. Check to see if adapters are a source of contamination. I checked "Overrepresented Sequences" in order to see if there was an adapter or not. If there were no sequences that weren't labeled "No Hit", I leave the read alone.
2. I used cutadapt to cut adapters, which left me with 2 of my adapter cut paired end reads as well as a file for single end reads.
I did the same for scythe, except scythe didn't produce a single end reads file.
3. I noticed that the pair end files were not organized properly by the header, so I made a script to correct this. My script takes 2 paired end reads and gives you an output of the 2 organized paired end reads file with a file containing all the single end reads that did not have a pair.
What do I do now?
I'm confused on what I should do with my single end reads obtained after using cutadapt and I am also confused on what I should do with my single reads obtained after using my script to organize my fastq files by the header.
When I move onto the trimming stage, do I ONLY trim my paired end reads and just ignore the single end reads? Or do I trim my paired end reads as well as my single end reads?
When I am looking for the snps of 1 replicate, do I map the paired end reads as well as any other single end reads onto the reference genome?
Edit:
TL;DR
Are these the right steps to get in order to start mapping my reads?
1. Cut adapters (gives SE file)
2. Quality Trim (gives SE file)
3. Organize pairs (gives SE file)
So by the end of this whole process I am left with 3 SE files and 2 processed paired end reads, giving a total of 5 files.
Do I need to do any quality trimming to the single end reads or do I just take all 5 of my files and map them to the reference?
Comment