Hello,
I am new to ChIP-seq data mining (by myself) and I am having a hard time comparing two ChIP-seq data sets. Could you tell me if my pipeline is correct and if I should add a normalization step somewhere? I am simply going through this with each one of my IP-Input files and looking at the genome browser. The problem is the IP of sample B is sequenced at a bigger depth than sample A and it seems the peaks are not all being called. When I compare to previously published data it seems I am missing some peaks on sample B. I thought about changing some thresholds on sample B but then the pipeline would be different from sample A... So, what should I do to ensure all my peaks are present in sample B but the data is still comparable to sample A? Thanks!!!
Samples : H3K4me3 and Input data on sample A and H3K4me3 and Input data on sample B.
Original file : BAM (from the sequences).
My pipeline :
Sort pair-ended BAM files by read name
samtools sort –n c3h.input.bam c3h.input.sorted
Convert BAM files to FASTQ format
bam2fastx -q -Q -A -o c3h.input.fastq -P -N c3h.input.sorted.bam
Map reads to the genome
bowtie2 –p 8 -x ~/Data/Genomes/mm10/bt2idx/bt2idx -1 c3h.input.1.fastq -2 c3h.input.2.fastq -S c3h.input.sam&
Bowtie2 options:
End-to-end alignment (default) vs. local alignment (--local)
Paired-end alignment: -1 mate1.fq -2 mate2.fq
Concordant + discordant alignment (default) vs. concordant only (--no-discordant)
Mixed mode: paired where possible; otherwise unpaired (default) vs. paired-only mode (--no-mixed)
Overlapping with the other mate is concordant (default) vs. (--no-overlap)
Containing the other mate is concordant (default) vs. (--no-contain)
Dovetailing the other mate is discordant (default) vs. (--dovetail)
Bowtie2 output:
Mapping quality: MAPQ
Peak analysis using MACS2
macs2 callpeak -t ~/Data/BCCRC/chipseq/mapping/tt2.k4m3.sam -c ~/Data/BCCRC/chipseq/mapping/tt2.input.sam -f SAM -g mm -n tt2.k4m3 -B -q 0.01 --outdir ~/Data/BCCRC/chipseq/tt2.k4m3.out
MACS2 options:
-g (genome size, mm=1.87e9)
-B/-bdg (bedGraph format for pile-up files)
-q (The qvalue (minimum FDR) cutoff to call significant regions)
Thank you! I am a bit at lost as it has been already a couple of months we are trying to get this done...
I am new to ChIP-seq data mining (by myself) and I am having a hard time comparing two ChIP-seq data sets. Could you tell me if my pipeline is correct and if I should add a normalization step somewhere? I am simply going through this with each one of my IP-Input files and looking at the genome browser. The problem is the IP of sample B is sequenced at a bigger depth than sample A and it seems the peaks are not all being called. When I compare to previously published data it seems I am missing some peaks on sample B. I thought about changing some thresholds on sample B but then the pipeline would be different from sample A... So, what should I do to ensure all my peaks are present in sample B but the data is still comparable to sample A? Thanks!!!
Samples : H3K4me3 and Input data on sample A and H3K4me3 and Input data on sample B.
Original file : BAM (from the sequences).
My pipeline :
Sort pair-ended BAM files by read name
samtools sort –n c3h.input.bam c3h.input.sorted
Convert BAM files to FASTQ format
bam2fastx -q -Q -A -o c3h.input.fastq -P -N c3h.input.sorted.bam
Map reads to the genome
bowtie2 –p 8 -x ~/Data/Genomes/mm10/bt2idx/bt2idx -1 c3h.input.1.fastq -2 c3h.input.2.fastq -S c3h.input.sam&
Bowtie2 options:
End-to-end alignment (default) vs. local alignment (--local)
Paired-end alignment: -1 mate1.fq -2 mate2.fq
Concordant + discordant alignment (default) vs. concordant only (--no-discordant)
Mixed mode: paired where possible; otherwise unpaired (default) vs. paired-only mode (--no-mixed)
Overlapping with the other mate is concordant (default) vs. (--no-overlap)
Containing the other mate is concordant (default) vs. (--no-contain)
Dovetailing the other mate is discordant (default) vs. (--dovetail)
Bowtie2 output:
Mapping quality: MAPQ
Peak analysis using MACS2
macs2 callpeak -t ~/Data/BCCRC/chipseq/mapping/tt2.k4m3.sam -c ~/Data/BCCRC/chipseq/mapping/tt2.input.sam -f SAM -g mm -n tt2.k4m3 -B -q 0.01 --outdir ~/Data/BCCRC/chipseq/tt2.k4m3.out
MACS2 options:
-g (genome size, mm=1.87e9)
-B/-bdg (bedGraph format for pile-up files)
-q (The qvalue (minimum FDR) cutoff to call significant regions)
Thank you! I am a bit at lost as it has been already a couple of months we are trying to get this done...