Control-FREEC: a tool for assessing copy number and allelic content using NGS data

bhdavis1978 replied

08-11-2014, 01:42 PM
Hi Valeu,

I want to be able to use the copy number data generated using control freec as input for a regression of copy number against other genomic and epigenetic features, so having higher precision is very useful to me.

Assuming 30X coverage & read length = 100 bp, and a desire to have 400 reads / window suggests to me that the minimum recommended window size is about 1333 =(400 / 30 * 100). I was hoping to have the window size set to about 500 bp, which would imply about 150 reads per window.

What would be the consequences of this? More variability in the copy number estimation? More breakpoints? Less confidence in identifying break points?
Leave a comment:
valeu replied

08-11-2014, 01:32 PM
Originally posted by bhdavis1978 View Post

Beyond the issue of additional running time, are there any reasons not to run control-freec with the degree setting set to 7 or 9 instead of 3 or 5?

I did not try, but I think with degree >3 the result will be very similar to that with degree==3.

Similarly, are there any problems with setting the window and step size relatively small (I was thinking of 500 bp and 250 bp respectively)?
Thanks

The ideal window size depends on read density. It is good to have about 400 reads per window. Alternatively, the window size can be evaluated automatically by FREEC using the read count and based on Poisson distribution.
Leave a comment:
bhdavis1978 replied

08-11-2014, 01:11 PM
Disadvantages to setting degree to 7, or 9 and small window and step sizes

Hello,

Beyond the issue of additional running time, are there any reasons not to run control-freec with the degree setting set to 7 or 9 instead of 3 or 5?

Similarly, are there any problems with setting the window and step size relatively small (I was thinking of 500 bp and 250 bp respectively)?

Thanks
Leave a comment:
valeu replied

07-14-2014, 06:31 AM
There is a problem with this SNP file. Please try another one from the FREEC website.
Leave a comment:
vd4mindia replied

07-14-2014, 03:48 AM
Dear Velntina,

Please find the output for the above call, I have made 6 such calls but all had the same problem. Below am attaching the log of the output run and at what stage I get the segmentation fault.

/scratch/GT/softwares/FREEC_Linux64/freec -conf config_ctrl5.txt
Control-FREEC v7.0 : calling copy number alterations and LOH regions using deep-sequencing data
MT-mode using 6 threads
..Minimal CNA length (in windows) is 1
..Breakpoint threshold for segmentation of copy number profiles is 1.5
..Polynomial degree for "ReadCount ~ GC-content" or "Sample ReadCount ~ Control ReadCount" is 3
..telocenromeric set to 50000
..FREEC is not going to adjust profiles for a possible contamination by normal cells
..Window = 50000 was set
..Step: 250
..Output directory: /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/CNA_LOH/out1/
..Directory with files containing chromosome sequences: /scratch/GT/vdas/test_exome/exome/
..will use a threshold of 5 read(s) per SNP position to calculate beta allel frequency (BAF) values
..Sample file: /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.realigned.recal.pileup.gz
..Sample input format: pileup
..Control file: /scratch/GT/vdas/pietro/exome_seq/results/N_S8981/N_S8981.realigned.recal.pileup.gz
..Input format for the control file: pileup
..minimal expected GC-content (general parameter "minExpectedGC") was set to 0.35
..maximal expected GC-content (general parameter "maxExpectedGC") was set to 0.55
..File with chromosome lengths: /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hs19_chr.len
..Using the default minimal mappability value of 0.85
..uniqueMatch = FALSE
..average ploidy set to 2
..break-point type set to 4
Warning: I would not recommend using '[general] noisyData=true' for whole genome data; you can miss some real CNAs in this case
..minimal number of reads per window in the control sample is set to 10
..will use SNP positions from /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hg19_snp137.SingleDiNucl.1based.txt to calculate BAF profiles
..Starting reading /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hg19_snp137.SingleDiNucl.1based.txt to get SNP positions
Segmentation fault

Just after the Segmentation fault it stops and I do not get any outputs in the directory where I have mentioned. Do you want to me to mail again the log status?
Leave a comment:
valeu replied

07-14-2014, 03:43 AM
Hi,

The config looks good. Could you please share the complete output into the command line with me? [email protected]

Thank you
Valentina
Leave a comment:
vd4mindia replied

07-14-2014, 02:37 AM
Hi ,

I am using control-FREEC with exome sequencing data, so far I have been successful in implementing it on my normal control tumor pairs for CNA detection. I am now curious to apply it further for CNA-LOH detection , how ever when am trying to run it, it undergoes segmentation fault. I checked the files and everything is fine. Below am attaching the config file. Anyone who has already applied it and overcome this problem can give me some suggestions.

[general]

chrLenFile = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hs19_chr.len
window = 500

step = 250
ploidy = 2

outputDir = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/CNA_LOH/out1/
BedGraphOutput=TRUE
breakPointType=4

gemMappabilityFile = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/out100m1_hg19.gem

chrFiles = /scratch/GT/vdas/test_exome/exome/

maxThreads=6

breakPointThreshold=1.5
noisyData=TRUE
printNA=FALSE
#breakPointThreshold = -.002;
#window = 50000
#chrFiles = hg18/hg18_per_chromosome
#outputDir = test
#degree=3
#intercept = 0

[sample]

mateFile = /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.realigned.recal.pileup.gz
inputFormat = pileup
mateOrientation = 0

[control]

mateFile = /scratch/GT/vdas/pietro/exome_seq/results/N_S8981/N_S8981.realigned.recal.pileup.gz
inputFormat = pileup
mateOrientation = 0

[BAF]

SNPfile = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hg19_snp137.SingleDiNucl.1based.txt
minimalCoveragePerPosition = 5

[target]

captureRegions = /scratch/GT/vdas/referenceBed/hg19/ss_v4/SureSelect_XT_Human_All_Exon_V4.bed
Leave a comment:
ymc replied

09-01-2013, 11:22 PM
Does Control-FREEC allows normal and tumor with different coverage? (ExomeCNV doesn't allow different coverage, that's why I ask)

Also, does it allow estimation of contamination rate in the tumor sample? (Probably via LOH route like ExomeCNV?)
Leave a comment:
smapdy replied

07-23-2013, 10:58 AM
I ended up figuring out what was going on. I had some multiallelic variants in the .snp file that were causing it to fail to load, and my sex variable in the configuration file didn't match up with the actual sample sex which caused problems as well. I ended up dropping the sex argument and using the following general configuration file for my samples:
[general]
window = 8000
step = 2500
samtools = samtools
minCNAlength = 4
BedGraphOutput = TRUE
chrLenFile = NCBIM37_um.fa.len
chrFiles = chrfiles
outputDir = 31208T_31668N_FREEC_V1
printNA = FALSE
maxThreads = 6
ploidy = 2
breakPointType = 4
contaminationAdjustment = TRUE
noisyData = TRUE

[sample]
mateFile = 31208_EXOME.pileup.gz
inputFormat = pileup
mateOrientation = 0

[control]
mateFile = 31668_EXOME.pileup.gz
inputFormat = pileup
mateOrientation = 0

[target]
captureRegions = S0276129_Merged_Sorted_Probes.bed

[BAF]
SNPfile = snp128.singlebases.monoalleleic.freec_baf.txt
minimalCoveragePerPosition = 5

If anyone is interested I also have the commands I used to generate the pileups from the .bams, as well as the script I used to generate a working Mm9 and Mm10 .snp file.
Leave a comment:
valeu replied

07-04-2013, 08:44 AM
Originally posted by smapdy View Post

Thanks for the reply valeu. I can't use the hg19 file because my exomes are from mice. I downloaded and generated a SNP file that, I believe, matches the formatting of the human files:

Both formats look correct. If you want me to debug the code please send me an email with the config file, command line output and other files necessary to run FREEC.

Originally posted by smapdy View Post

In the event that I can't get BAF calculation to work, what is are the repercussions? I know there are a few options which are explicitly dependent upon BAF (like noisyData). How much will it impact the analysis if these options are disabled?

If you disable [BAF] you may get less accurate calls. However, the result should be almost the same as the one you will obtain with [BAF] and noisyData=FALSE.
Leave a comment:
smapdy replied

07-04-2013, 08:30 AM
Thanks for the reply valeu. I can't use the hg19 file because my exomes are from mice. I downloaded and generated a SNP file that, I believe, matches the formatting of the human files:

chr1 3000568 A/G + T rs29444956
chr1 3000621 A/C + C rs31439779
chr1 3001490 A/C + C rs31521921
chr1 3001579 A/T + A rs30468828
chr1 3001712 C/G + C rs32793997
chr1 3003268 A/G + A rs30748911
chr1 3003414 A/G + A rs31953890
chr1 3003449 C/T + T rs32186899
chr1 3003464 A/G + G rs31079645
chr1 3003508 C/T + C rs32044173

My .pileup files are formatted as follows:

chr1 3216016 T 44 ......................,.........,..,,,...,^].^];=>>?>?<=??>>???>=>?>>??>>??>>>>@=>?<<::=;<<
chr1 3216017 G 45 ......................,.........,..,,,...,..^].=@@ACCC?ACC@ACCC<BAAB5?CCBCCAAABAAB?:?>?=;?><
chr1 3216018 T 47 ......................,.........,..,,,...,...^].^], :9?=>@9==@>>>@>?=><@>=<==>@?>>=?>>?=;<===9;;;<9
chr1 3216019 A 48 .$.....................,.........,..,,,...,....,^]. 99><<??3<??<<???<<;><<<=?=?><<<>?>>=8<<<=6;:9;:<
chr1 3216020 T 47 .....................,.........,..,,,...,....,.:==???<=?>==???===?=4?<?=??===?@>??<??=?>=;;:;<

I believe this is standard .pileup format. Despite appearances both of the above are actually tab-delimited.

In the event that I can't get BAF calculation to work, what is are the repercussions? I know there are a few options which are explicitly dependent upon BAF (like noisyData). How much will it impact the analysis if these options are disabled?
Leave a comment:
valeu replied

07-04-2013, 02:32 AM
Originally posted by smapdy View Post

..Starting reading /home/sf062971/resources/ucsc_snps/snp137.no_dashes.freec_baf.txt to get SNP positions

Which suggests that it might be something with my SNP file.

Hi, why don't you use the provided hg19_snp137.SingleDiNucl.1based.txt (created by Niklas Malmqvist)? Historically, the order of columns is different from the UCSC file.

To create my dbSNP files, I downloaded a file with SNPs from the UCSC genome browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start), from “Variation and Repeats”/”All SNPs” table. And I kept columns 2, 4, 10, 7, 8 and 5. And I kept only entries with “genomic single”.

When you are sure you use the correct SNPfile, check you pileups. They should look like this (http://samtools.sourceforge.net/pileup.shtml):

seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&
seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+
seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6
seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<<
seq1 276 G 22 ...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6<
seq1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&<
seq1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<<
seq1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<
Leave a comment:
smapdy replied

07-03-2013, 12:44 PM
SegFault when running Freec

Hi all. First, I wanted to say thanks to valeu for taking the time to answer questions on here, I've found some of the advice to be very useful.

Second, I've been having a problem with segmentation faults when trying to use Freec to compute BAF. I had previously been using Freec to analyze exome samples without calculating BAF, and had success calling CNVs from the .bam files. However, when I tried to compute BAF (which required converting the .bams into pileup files, as well as using a file of known SNPs) I ran into some problems.
Specifically, the program dies after about 1 second of runtime with the following as the specific error:

line 13: 269410 Segmentation fault

This occurs after the program outputs the following:

..Starting reading /home/sf062971/resources/ucsc_snps/snp137.no_dashes.freec_baf.txt to get SNP positions

Which suggests that it might be something with my SNP file. However, I have omitted this file and still get a segfault when it tries to read my sample pileups.

My configuration file is below. Any help on what may be causing this would be appreciated.

[general]

window = 5000
step = 1000

ploidy = 2

samtools = /home/sf062971/programs/samtools-0.1.18/samtools

minCNAlength = 4

BedGraphOutput = TRUE

chrLenFile = /home/sf062971/resources/freec_resources/mm10.len

chrFiles = /data/sf062971/data/reference/chr_files

noisyData = TRUE

printNA=FALSE
maxThreads=6
sex=XX
breakPointType=4

outputDir = 1148T_1205N_V3

contamination = 0.5
contaminationAdjustment = TRUE

[sample]

mateFile = /data/sf062971/LUNG_BAMS/SC_GCIM5351148/1148_ALIGN_RECAL_V3/1148_EXOME.mpileup

inputFormat = pileup

mateOrientation = 0

[control]

mateFile = /data/sf062971/LUNG_BAMS/SC_GCIM5351205/1205_ALIGN_RECAL_V3/1205_EXOME.mpileup

inputFormat = pileup

mateOrientation = 0

[BAF]

SNPfile = /home/sf062971/resources/ucsc_snps/snp137.no_dashes.freec_baf.txt

minimalCoveragePerPosition = 10

[target]

captureRegions = /home/sf062971/resources/agilent_data/covered_regions_mm10_merged_sorted.bed
Leave a comment:
valeu replied

07-02-2013, 01:41 AM
First, a simple and rather obvious question, if you have a control match file, does the CNA only analysis output only the somatic gain/loss regions of the sample?

"Yes", if you don't use [BAF] and "No" if you do.

This question arises because the CNA+BAF run outputs a CNVs file which reports genotype information and gain/loss/normal in the predicted copy number. When I filter this file to report only somatic gains/losses and compare this output to the CNA only analysis output, the results are not quite the same.
Is this a fair comparison? Am I missing something which prevents me from understanding these results?

I would say that it is normal that you get different results. When you use [BAF], you use more information to (1) segment your data and (2) to annotate the resulting CNAs.

Another reason it can be different: imagine you have a region present in 3 copies in the normal and in 9 copies in the tumor. If you don't use BAF, you will get ratio 3 for this region. Since it is 3>1, this region will be called "gain". If you use [BAF], this region will be identified as "gain" for both normal and tumor samples and thus this gain will be called germline.
Leave a comment:
valeu replied

07-02-2013, 01:22 AM
Originally posted by rduarte View Post

Can someone tell me the link to download makeGraph.R?
I´m having problems finding the most recent version of this.

Thanks in advance

This is the latest version so far: makeGraph.R
Leave a comment:

Previous 1 2 3 4 5 6 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News