Hello, I'm currently learning how to process data from NGS using the Galaxy platform. This is the first time I work with NGS data and I find myself currently overwhelmed with the abundance of different variant call workflows and available tools. I have molecular biology background and I'm learning this on my own through on-line courses so I wish to have some feedback in case I'm not making mistakes. While I can code in python, I wish to make this workflow in Galaxy as part of a course.
For the purpose of learning, I was given FASTQ raw reads from an Illumina MiSeq, sequenced as paired ends to 125bp in length. The data refers to targetted re-sequencing data for a father, mother and child trio.I need to create a workflow to identify polymorphic sites in all three individuals.
I started a workflow based on the references bellow:
My current incomplete attempt is available at the link bellow. Some steps from the references were skipped for the sake of simplicity. I'm making my best effort to actually understand what each step really does and why to use it. You can import the worklow on Galaxy for better view:
Briefly, the paired end reads had 3' 10 bps trimmed (based on FASTQ report, not in the workflow), resulting in high quality reads of about 140bps. The paired reads for each individual with were aligned to the reference human_g1k_v37 with BWA-MEN, generating different read group informations. The resulting alignment BAM for each individual was pre-processed with Picard sorting, removal of ambiguous reads and duplicates and update of mate-pair information. I'm omitting indel re-alignment and base quality recalibration on purpose. The resulting 3 BAMs could be used for variant calling, but now I have some questions.
I'm expected to count the number of variants of different types above a certain quality threshold.
I'm in doubt if was it a good choice to align the data for each individual separately. Is it correct to do variant calling in each individual separately? May I still merge these BAM files with Picard and do variant calling, will they retain the correct alignment information? Or I should merge the read information before the alignment? Can these alter the results of the workflow? I've read about converting FASTQ to SAM/BAM and merging them in an unmapped BAM before the alignment and subsequent pre-processing. Do I really need to do it?
Is my workflow actually producing useful data? Please let me know if I'm making a mistake, I'm a little confused if what I did is right. Make sure you describe things well because I'm still unfamiliar with NGS data processing.
Thanks in advance

Eduardo
For the purpose of learning, I was given FASTQ raw reads from an Illumina MiSeq, sequenced as paired ends to 125bp in length. The data refers to targetted re-sequencing data for a father, mother and child trio.I need to create a workflow to identify polymorphic sites in all three individuals.
I started a workflow based on the references bellow:
My current incomplete attempt is available at the link bellow. Some steps from the references were skipped for the sake of simplicity. I'm making my best effort to actually understand what each step really does and why to use it. You can import the worklow on Galaxy for better view:
Briefly, the paired end reads had 3' 10 bps trimmed (based on FASTQ report, not in the workflow), resulting in high quality reads of about 140bps. The paired reads for each individual with were aligned to the reference human_g1k_v37 with BWA-MEN, generating different read group informations. The resulting alignment BAM for each individual was pre-processed with Picard sorting, removal of ambiguous reads and duplicates and update of mate-pair information. I'm omitting indel re-alignment and base quality recalibration on purpose. The resulting 3 BAMs could be used for variant calling, but now I have some questions.
I'm expected to count the number of variants of different types above a certain quality threshold.
I'm in doubt if was it a good choice to align the data for each individual separately. Is it correct to do variant calling in each individual separately? May I still merge these BAM files with Picard and do variant calling, will they retain the correct alignment information? Or I should merge the read information before the alignment? Can these alter the results of the workflow? I've read about converting FASTQ to SAM/BAM and merging them in an unmapped BAM before the alignment and subsequent pre-processing. Do I really need to do it?
Is my workflow actually producing useful data? Please let me know if I'm making a mistake, I'm a little confused if what I did is right. Make sure you describe things well because I'm still unfamiliar with NGS data processing.
Thanks in advance

Eduardo