No, mateOrientation is not relevant when you use pileup. Still, you need to set this parameter to something
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi Valeu,
Sorry to be so insistent in this aspect. I re-run control-freec on an mpileup file of one my samples with and without BAF options and I found a few differences between both runs. First, a simple and rather obvious question, if you have a control match file, does the CNA only analysis output only the somatic gain/loss regions of the sample? This question arises because the CNA+BAF run outputs a CNVs file which reports genotype information and gain/loss/normal in the predicted copy number. When I filter this file to report only somatic gains/losses and compare this output to the CNA only analysis output, the results are not quite the same.
Is this a fair comparison? Am I missing something which prevents me from understanding these results?
Thanks in advance.
Cheers,
Fernando
Ps: find below the parameters of my config file. As I said, I run it plus and minus BAF, i.e., BAF commented.
[general]
chrLenFile = hg19.len
coefficientOfVariation = 0.05
outputDir = ./ch209_cnv_CNA_only
degree = 3
ploidy = 2
samtools = /usr/local/biotools/bin/samtools
sex = XY
chrFiles = /home/fernandr/biotools/references/iGenomes/Homo_sapiens/UCSC/hg19/Sequence/Chromosomes
# step = 5000
# window = 20000
[sample]
mateFile = /media/data/projects/wg_fr_20121024/sample_mpileup_files/sample_bwa_wg.mpileup
inputFormat = pileup
mateOrientation = FR
[control]
mateFile = /media/data/projects/wg_fr_20121024/sample_mpileup_files/control_bwa_wg.mpileup
inputFormat = pileup
mateOrientation = FR
# [BAF]
#
# SNPfile = /home/fernandr/biotools/references/freec/hg19/hg19_snp131.SingleDiNucl.1based.txt
# minimalCoveragePerPosition = 1
# minimalQualityPerPosition = 0
# shiftInQuality = 33
Comment
-
Posted by fjrossello
Hi Valeu,
Thanks for your explanation and in regards to the R plots, I downloaded the latest makeGraph.R and works perfectly.
Cheers,
Fernando
I´m having problems finding the most recent version of this.
Thanks in advance
Comment
-
Help with FREEC output
First of, I must say FREEC is a great tool for CNV detection in exome seq data!
I have a few questions about the output files I obtained.
I have two _cnv, _ratio, _BAF files.
For instance, how is *_mpileup_CNV different from *mpileup_normal_CNV? and depending on the file I use my R plots are so different! why would this be?
PS: I have paired end tumor-normal illumina data from exome sequencing.
~Thanks for your help,
Rini
Comment
-
First, a simple and rather obvious question, if you have a control match file, does the CNA only analysis output only the somatic gain/loss regions of the sample?
This question arises because the CNA+BAF run outputs a CNVs file which reports genotype information and gain/loss/normal in the predicted copy number. When I filter this file to report only somatic gains/losses and compare this output to the CNA only analysis output, the results are not quite the same.
Is this a fair comparison? Am I missing something which prevents me from understanding these results?
Another reason it can be different: imagine you have a region present in 3 copies in the normal and in 9 copies in the tumor. If you don't use BAF, you will get ratio 3 for this region. Since it is 3>1, this region will be called "gain". If you use [BAF], this region will be identified as "gain" for both normal and tumor samples and thus this gain will be called germline.
Comment
-
SegFault when running Freec
Hi all. First, I wanted to say thanks to valeu for taking the time to answer questions on here, I've found some of the advice to be very useful.
Second, I've been having a problem with segmentation faults when trying to use Freec to compute BAF. I had previously been using Freec to analyze exome samples without calculating BAF, and had success calling CNVs from the .bam files. However, when I tried to compute BAF (which required converting the .bams into pileup files, as well as using a file of known SNPs) I ran into some problems.
Specifically, the program dies after about 1 second of runtime with the following as the specific error:
line 13: 269410 Segmentation fault
This occurs after the program outputs the following:
..Starting reading /home/sf062971/resources/ucsc_snps/snp137.no_dashes.freec_baf.txt to get SNP positions
Which suggests that it might be something with my SNP file. However, I have omitted this file and still get a segfault when it tries to read my sample pileups.
My configuration file is below. Any help on what may be causing this would be appreciated.
[general]
window = 5000
step = 1000
ploidy = 2
samtools = /home/sf062971/programs/samtools-0.1.18/samtools
minCNAlength = 4
BedGraphOutput = TRUE
chrLenFile = /home/sf062971/resources/freec_resources/mm10.len
chrFiles = /data/sf062971/data/reference/chr_files
noisyData = TRUE
printNA=FALSE
maxThreads=6
sex=XX
breakPointType=4
outputDir = 1148T_1205N_V3
contamination = 0.5
contaminationAdjustment = TRUE
[sample]
mateFile = /data/sf062971/LUNG_BAMS/SC_GCIM5351148/1148_ALIGN_RECAL_V3/1148_EXOME.mpileup
inputFormat = pileup
mateOrientation = 0
[control]
mateFile = /data/sf062971/LUNG_BAMS/SC_GCIM5351205/1205_ALIGN_RECAL_V3/1205_EXOME.mpileup
inputFormat = pileup
mateOrientation = 0
[BAF]
SNPfile = /home/sf062971/resources/ucsc_snps/snp137.no_dashes.freec_baf.txt
minimalCoveragePerPosition = 10
[target]
captureRegions = /home/sf062971/resources/agilent_data/covered_regions_mm10_merged_sorted.bed
Comment
-
Originally posted by smapdy View Post..Starting reading /home/sf062971/resources/ucsc_snps/snp137.no_dashes.freec_baf.txt to get SNP positions
Which suggests that it might be something with my SNP file.
To create my dbSNP files, I downloaded a file with SNPs from the UCSC genome browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start), from “Variation and Repeats”/”All SNPs” table. And I kept columns 2, 4, 10, 7, 8 and 5. And I kept only entries with “genomic single”.
When you are sure you use the correct SNPfile, check you pileups. They should look like this (http://samtools.sourceforge.net/pileup.shtml):
seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&
seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+
seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6
seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<<
seq1 276 G 22 ...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6<
seq1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&<
seq1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<<
seq1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<
Comment
-
Thanks for the reply valeu. I can't use the hg19 file because my exomes are from mice. I downloaded and generated a SNP file that, I believe, matches the formatting of the human files:
chr1 3000568 A/G + T rs29444956
chr1 3000621 A/C + C rs31439779
chr1 3001490 A/C + C rs31521921
chr1 3001579 A/T + A rs30468828
chr1 3001712 C/G + C rs32793997
chr1 3003268 A/G + A rs30748911
chr1 3003414 A/G + A rs31953890
chr1 3003449 C/T + T rs32186899
chr1 3003464 A/G + G rs31079645
chr1 3003508 C/T + C rs32044173
My .pileup files are formatted as follows:
chr1 3216016 T 44 ......................,.........,..,,,...,^].^];=>>?>?<=??>>???>=>?>>??>>??>>>>@=>?<<::=;<<
chr1 3216017 G 45 ......................,.........,..,,,...,..^].=@@ACCC?ACC@ACCC<BAAB5?CCBCCAAABAAB?:?>?=;?><
chr1 3216018 T 47 ......................,.........,..,,,...,...^].^], :9?=>@9==@>>>@>?=><@>=<==>@?>>=?>>?=;<===9;;;<9
chr1 3216019 A 48 .$.....................,.........,..,,,...,....,^]. 99><<??3<??<<???<<;><<<=?=?><<<>?>>=8<<<=6;:9;:<
chr1 3216020 T 47 .....................,.........,..,,,...,....,.:==???<=?>==???===?=4?<?=??===?@>??<??=?>=;;:;<
I believe this is standard .pileup format. Despite appearances both of the above are actually tab-delimited.
In the event that I can't get BAF calculation to work, what is are the repercussions? I know there are a few options which are explicitly dependent upon BAF (like noisyData). How much will it impact the analysis if these options are disabled?
Comment
-
Originally posted by smapdy View PostThanks for the reply valeu. I can't use the hg19 file because my exomes are from mice. I downloaded and generated a SNP file that, I believe, matches the formatting of the human files:
Originally posted by smapdy View PostIn the event that I can't get BAF calculation to work, what is are the repercussions? I know there are a few options which are explicitly dependent upon BAF (like noisyData). How much will it impact the analysis if these options are disabled?
Comment
-
I ended up figuring out what was going on. I had some multiallelic variants in the .snp file that were causing it to fail to load, and my sex variable in the configuration file didn't match up with the actual sample sex which caused problems as well. I ended up dropping the sex argument and using the following general configuration file for my samples:
[general]
window = 8000
step = 2500
samtools = samtools
minCNAlength = 4
BedGraphOutput = TRUE
chrLenFile = NCBIM37_um.fa.len
chrFiles = chrfiles
outputDir = 31208T_31668N_FREEC_V1
printNA = FALSE
maxThreads = 6
ploidy = 2
breakPointType = 4
contaminationAdjustment = TRUE
noisyData = TRUE
[sample]
mateFile = 31208_EXOME.pileup.gz
inputFormat = pileup
mateOrientation = 0
[control]
mateFile = 31668_EXOME.pileup.gz
inputFormat = pileup
mateOrientation = 0
[target]
captureRegions = S0276129_Merged_Sorted_Probes.bed
[BAF]
SNPfile = snp128.singlebases.monoalleleic.freec_baf.txt
minimalCoveragePerPosition = 5
If anyone is interested I also have the commands I used to generate the pileups from the .bams, as well as the script I used to generate a working Mm9 and Mm10 .snp file.
Comment
-
Hi ,
I am using control-FREEC with exome sequencing data, so far I have been successful in implementing it on my normal control tumor pairs for CNA detection. I am now curious to apply it further for CNA-LOH detection , how ever when am trying to run it, it undergoes segmentation fault. I checked the files and everything is fine. Below am attaching the config file. Anyone who has already applied it and overcome this problem can give me some suggestions.
[general]
chrLenFile = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hs19_chr.len
window = 500
step = 250
ploidy = 2
outputDir = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/CNA_LOH/out1/
BedGraphOutput=TRUE
breakPointType=4
gemMappabilityFile = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/out100m1_hg19.gem
chrFiles = /scratch/GT/vdas/test_exome/exome/
maxThreads=6
breakPointThreshold=1.5
noisyData=TRUE
printNA=FALSE
#breakPointThreshold = -.002;
#window = 50000
#chrFiles = hg18/hg18_per_chromosome
#outputDir = test
#degree=3
#intercept = 0
[sample]
mateFile = /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.realigned.recal.pileup.gz
inputFormat = pileup
mateOrientation = 0
[control]
mateFile = /scratch/GT/vdas/pietro/exome_seq/results/N_S8981/N_S8981.realigned.recal.pileup.gz
inputFormat = pileup
mateOrientation = 0
[BAF]
SNPfile = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hg19_snp137.SingleDiNucl.1based.txt
minimalCoveragePerPosition = 5
[target]
captureRegions = /scratch/GT/vdas/referenceBed/hg19/ss_v4/SureSelect_XT_Human_All_Exon_V4.bed
Comment
-
Hi,
The config looks good. Could you please share the complete output into the command line with me? [email protected]
Thank you
Valentina
Comment
-
Dear Velntina,
Please find the output for the above call, I have made 6 such calls but all had the same problem. Below am attaching the log of the output run and at what stage I get the segmentation fault.
/scratch/GT/softwares/FREEC_Linux64/freec -conf config_ctrl5.txt
Control-FREEC v7.0 : calling copy number alterations and LOH regions using deep-sequencing data
MT-mode using 6 threads
..Minimal CNA length (in windows) is 1
..Breakpoint threshold for segmentation of copy number profiles is 1.5
..Polynomial degree for "ReadCount ~ GC-content" or "Sample ReadCount ~ Control ReadCount" is 3
..telocenromeric set to 50000
..FREEC is not going to adjust profiles for a possible contamination by normal cells
..Window = 50000 was set
..Step: 250
..Output directory: /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/CNA_LOH/out1/
..Directory with files containing chromosome sequences: /scratch/GT/vdas/test_exome/exome/
..will use a threshold of 5 read(s) per SNP position to calculate beta allel frequency (BAF) values
..Sample file: /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.realigned.recal.pileup.gz
..Sample input format: pileup
..Control file: /scratch/GT/vdas/pietro/exome_seq/results/N_S8981/N_S8981.realigned.recal.pileup.gz
..Input format for the control file: pileup
..minimal expected GC-content (general parameter "minExpectedGC") was set to 0.35
..maximal expected GC-content (general parameter "maxExpectedGC") was set to 0.55
..File with chromosome lengths: /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hs19_chr.len
..Using the default minimal mappability value of 0.85
..uniqueMatch = FALSE
..average ploidy set to 2
..break-point type set to 4
Warning: I would not recommend using '[general] noisyData=true' for whole genome data; you can miss some real CNAs in this case
..minimal number of reads per window in the control sample is set to 10
..will use SNP positions from /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hg19_snp137.SingleDiNucl.1based.txt to calculate BAF profiles
..Starting reading /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hg19_snp137.SingleDiNucl.1based.txt to get SNP positions
Segmentation fault
Comment
Latest Articles
Collapse
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:59 AM
|
0 responses
6 views
0 likes
|
Last Post
by seqadmin
Today, 07:59 AM
|
||
Newborn Genomic Screening Shows Promise in Reducing Infant Mortality and Hospitalization
by seqadmin
Started by seqadmin, Yesterday, 08:22 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:22 AM
|
||
Started by seqadmin, 12-02-2024, 09:29 AM
|
0 responses
171 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:29 AM
|
||
Started by seqadmin, 12-02-2024, 09:06 AM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:06 AM
|
Comment