Control-FREEC: a tool for assessing copy number and allelic content using NGS data

rpauly replied

06-26-2013, 09:27 AM
Help with FREEC output

First of, I must say FREEC is a great tool for CNV detection in exome seq data!
I have a few questions about the output files I obtained.

I have two _cnv, _ratio, _BAF files.
For instance, how is *_mpileup_CNV different from *mpileup_normal_CNV? and depending on the file I use my R plots are so different! why would this be?

PS: I have paired end tumor-normal illumina data from exome sequencing.

~Thanks for your help,
Rini
Attached Files

brother_tumor_ffpe_CCDS_only.mpileup_normal_ratio.txt.png (29.8 KB, 59 views)

brother_tumor_ffpe_CCDS_only.mpileup_ratio.txt.png (64.7 KB, 64 views)
Leave a comment:
rduarte replied

06-10-2013, 05:21 AM
Posted by fjrossello
Hi Valeu,
Thanks for your explanation and in regards to the R plots, I downloaded the latest makeGraph.R and works perfectly.
Cheers,
Fernando

Can someone tell me the link to download makeGraph.R?
I´m having problems finding the most recent version of this.

Thanks in advance
Leave a comment:
fjrossello replied

01-21-2013, 03:36 PM
Hi Valeu,

Sorry to be so insistent in this aspect. I re-run control-freec on an mpileup file of one my samples with and without BAF options and I found a few differences between both runs. First, a simple and rather obvious question, if you have a control match file, does the CNA only analysis output only the somatic gain/loss regions of the sample? This question arises because the CNA+BAF run outputs a CNVs file which reports genotype information and gain/loss/normal in the predicted copy number. When I filter this file to report only somatic gains/losses and compare this output to the CNA only analysis output, the results are not quite the same.
Is this a fair comparison? Am I missing something which prevents me from understanding these results?
Thanks in advance.
Cheers,
Fernando

Ps: find below the parameters of my config file. As I said, I run it plus and minus BAF, i.e., BAF commented.

[general]
chrLenFile = hg19.len
coefficientOfVariation = 0.05
outputDir = ./ch209_cnv_CNA_only
degree = 3
ploidy = 2
samtools = /usr/local/biotools/bin/samtools
sex = XY
chrFiles = /home/fernandr/biotools/references/iGenomes/Homo_sapiens/UCSC/hg19/Sequence/Chromosomes
# step = 5000
# window = 20000

[sample]

mateFile = /media/data/projects/wg_fr_20121024/sample_mpileup_files/sample_bwa_wg.mpileup
inputFormat = pileup
mateOrientation = FR

[control]

mateFile = /media/data/projects/wg_fr_20121024/sample_mpileup_files/control_bwa_wg.mpileup
inputFormat = pileup
mateOrientation = FR

# [BAF]
#
# SNPfile = /home/fernandr/biotools/references/freec/hg19/hg19_snp131.SingleDiNucl.1based.txt
# minimalCoveragePerPosition = 1
# minimalQualityPerPosition = 0
# shiftInQuality = 33
Leave a comment:
valeu replied

01-19-2013, 08:15 AM
No, mateOrientation is not relevant when you use pileup. Still, you need to set this parameter to something
Leave a comment:
fjrossello replied

01-18-2013, 03:00 PM
Originally posted by valeu View Post

Hi Fernando,

I think running FREEC on a pileup should be more or less identical to running it on a BAM files with "mateOrientation=0". In this case, all reads are taken into account during calculation of read count per window. When you select "mateOrientation=FR" for a BAM file, FREEC will keep only pairs mapped in the correct orientation and insert size.

Also, in some cases having BAF info can improve predictions (e.g., when float copy number is 2.5 and FREEC hesitates between assigning 2 or 3 copies to the region)

Also, in the version 5.9 and before there was a bug that did not allowed FREEC to get correct read count in window with extremely high coverage (> 1000x per position) when using .pileup files. This bug is fixed in 6.0 which must be available the next week. Also, the new version works ~10x faster on an 8 core computer. It can process 30x genome (with control, BAF, in pileup.gz) in one hour

Thanks for your prompt answer. I understand. I will anxiously wait for the next version, speed improvements and bug corrections are always good news.
Just to be clear, when you use a pileup file, should the mateOrientation parameter be set to 0? Is that paremeter relevant at all when use this format?
Thanks in advance.

Cheers,

Fernando
Leave a comment:
valeu replied

01-18-2013, 06:39 AM
Hi Fernando,

I think running FREEC on a pileup should be more or less identical to running it on a BAM files with "mateOrientation=0". In this case, all reads are taken into account during calculation of read count per window. When you select "mateOrientation=FR" for a BAM file, FREEC will keep only pairs mapped in the correct orientation and insert size.

Also, in some cases having BAF info can improve predictions (e.g., when float copy number is 2.5 and FREEC hesitates between assigning 2 or 3 copies to the region)

Also, in the version 5.9 and before there was a bug that did not allowed FREEC to get correct read count in window with extremely high coverage (> 1000x per position) when using .pileup files. This bug is fixed in 6.0 which must be available the next week. Also, the new version works ~10x faster on an 8 core computer. It can process 30x genome (with control, BAF, in pileup.gz) in one hour
Leave a comment:
fjrossello replied

01-17-2013, 07:44 PM
Hi Valeu,

This is Fernando again. I have re-run Freec on one of my samples where I previously run CNA analysis from a SAM file (unsorted, I use the FR mateOrientation parameter). The difference this time was that I wanted to run CNA + BAF analyses. To run BAF I first created a pileup from the sample SAM file and then run it using exactly the same parameters.
Even though that the results look graphically the same (R created plots), when I compared the CNVs text files produced by both analyses the results look slightly different. The differences are seen in the start and end position (the regions are roughfly the same) and in terms the copy number predicted.
Are there any reasons why this could be happening? Which one should be more reliable?
Thanks in advance.

Cheers,

Fernando

Last edited by fjrossello; 01-17-2013, 07:45 PM. Reason: typo
Leave a comment:
valeu replied

01-10-2013, 09:36 AM
You need to define window size (window=1000) and you have to run it with a control dataset when you use the "target" option
Leave a comment:

stephwen replied

01-10-2013, 01:50 AM

Error while specifying target BED file

Hello everyone,

I have been trying out Control-FREEC with some test data (exome samples), and I encountered an error when trying to specify a target BED file.

Basically, Control-FREEC seems to run fine, whether I use a control sample or not (I tried both options), but when I add these lines :

Code:

[target]

captureRegions = /home/volatile/swe/exomes/TruSeq-for-FREEC.bed

to my config file, the program crashes (exits with code 255), and outputs the following lines:

Code:

FREEC v5.9 (Control-FREEC v2.9) : calling copy number alterations and LOH regions using deep-sequencing data
..Using 1 process(es)
..Minimal CNA length (in windows) was set to 4
..consider the sample being male
..breakPointThreshold set to 0.8
..Polynomial degree for "ReadCount ~ GC-content" or "Sample ReadCount ~ Control ReadCount" is 3
..FREEC is not going to output normalized copy number profiles into a BedGraph file. Use "[general] BedGraphOutput=TRUE" if you want a BedGraph file
..FREEC is not going to adjust profiles for a possible contamination by normal cells
..Output directory:	/home/volatile/swe/2013-01-10/Test-FREEC5
..Directory with files containing chromosome sequences:	/home/genmol/genomes/homo_sapiens/hg19/chromosomes
..Sample file:	/home/volatile/swe/exomes/exome2.bam
..Sample input format:	BAM
..will use this instance of samtools: samtools to read BAM files
..Control file:	/home/volatile/swe/exomes/exome1.bam
..Input format for the control file:	BAM
..File with chromosome lengths:	hg19.len
..Coefficient Of Variation set equal to 0.062
..Note, this coefficient won't be used if "window" is set
..File hg19.len was read
	 total genome size:	3.09568e+09
..samtools should be installed to be able to read BAM files
	 read number:	76963934
	 coefficientOfVariation:	0.062
	 evaluated window size:	10464
..Starting reading /home/volatile/swe/exomes/exome2.bam
..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view /home/volatile/swe/exomes/exome2.bam
76963934 lines read..
75080830 reads used to compute copy number profile
printing counts into /home/volatile/swe/2013-01-10/Test-FREEC5/exome2.bam_sample.cpn
..Window size:	10464
	..Will use hg19.len to calculate RC for the control sample
..File hg19.len was read
..Starting reading /home/volatile/swe/exomes/exome1.bam
..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view /home/volatile/swe/exomes/exome1.bam
51311982 lines read..
50082356 reads used to compute copy number profile
printing counts into /home/volatile/swe/2013-01-10/Test-FREEC5/exome1.bam_control.cpn
..FREEC will take into account only regions from /home/volatile/swe/exomes/TruSeq-for-FREEC.bed
..Mappability and GC-content won't be used
..Control-FREEC won't use minimal mappability. All windows overlaping capture regions will be considered
..Reading /home/volatile/swe/exomes/TruSeq-for-FREEC.bed
..Your file must be in .BED format, and it must be sorted
..Reading capture for chromosome 1
..Reading capture for chromosome 2
..Reading capture for chromosome 3
..Reading capture for chromosome 4
..Reading capture for chromosome 5
..Reading capture for chromosome 6
..Reading capture for chromosome 7
..Reading capture for chromosome 8
..Reading capture for chromosome 9
..Reading capture for chromosome 10
..Reading capture for chromosome 11
..Reading capture for chromosome 12
..Reading capture for chromosome 13
..Reading capture for chromosome 14
..Reading capture for chromosome 15
..Reading capture for chromosome 16
..Reading capture for chromosome 17
..Reading capture for chromosome 18
..Reading capture for chromosome 19
..Reading capture for chromosome 20
..Reading capture for chromosome 21
..Reading capture for chromosome 22
..Reading capture for chromosome X
..Reading capture for chromosome Y
file /home/volatile/swe/exomes/TruSeq-for-FREEC.bed is read
..Setting read counts to Zero for all windows outside of capture
..Total size of captured regions 6.18842e+07bp
..processing chromosome 1
..processing chromosome 2
..processing chromosome 3
..processing chromosome 4
..processing chromosome 5
..processing chromosome 6
..processing chromosome 7
..processing chromosome 8
..processing chromosome 9
..processing chromosome 10
..processing chromosome 11
..processing chromosome 12
..processing chromoso..At this point you need to profide window size, option 'window' in group of parameters [general] in your config file
me 13
..processing chromosome 14
..processing chromosome 15
..processing chromosome 16
..processing chromosome 17
..processing chromosome 18
..processing chromosome 19
..processing chromosome 20
..processing chromosome 21
..processing chromosome 22
..processing chromosome X
..processing chromosome Y
..telocenromeric set to 1 since it is a minimal capture region

(This is the output when I use a control sample, but I get basically the same thing without control sample)

I formatted my BED file as follows:

chr start end
(tab-delimited), and it's ordered by chr (chr1, chr2, ... chr22, chrX, chrY), and then by start position.

Am I doing something wrong here?

Thanks in advance.

Regards,

Stephane

PS : Since samtools' pileup function is now deprecated, it's not possible to generate pileup files anymore. Do you plan on supporting BAM or VCF files as input for the BAF calculation function? Or do you know how I can work around this limitation? Thanks.

Last edited by stephwen; 01-10-2013, 05:08 AM. Reason: added question about BAM or VCF support for BAF calculation

Leave a comment:

fjrossello replied

12-21-2012, 04:05 PM
Hi Valeu,
Thanks for your explanation and in regards to the R plots, I downloaded the latest makeGraph.R and works perfectly.
Cheers,
Fernando
Leave a comment:
valeu replied

12-21-2012, 02:55 AM
Hi Fernando,

Are they the output obtained when CNV and LOH were calculated on the control sample when using the CG_profile.cnp?

Yes, you are right.

Any ideas of why is this is happening?

I recently updated makeGraph.R, can you download the latest version from the site and see if it produces the same error?

What does it write into the command line?
Leave a comment:
fjrossello replied

12-20-2012, 02:51 PM
Hi Valeu,

I am using control-freec to detect CNV and LOH in normal vs tumor samples (low pass whole genome).
I had no problems to run it at all. However, I would like to ask you a couple of questions in regards to the files outputted and the plotting process.
First, when I run CNV + LOH using SAM pileups, apart from creating the standard _CNVs, _ratio.txt, _BAF.txt _sample.cnp, _control.cnp and GC_profile.cnp output files, it also generates three extra files with suffix _normal_CNVs, _normal_ratio.txt and _normal_BAF.txt. Are they the output obtained when CNV and LOH were calculated on the control sample when using the CG_profile.cnp?
Second, even though it works flawlessly for the ratios CNV data, I cannot make the script makeGraph.R to plot the LOH _BAF.txt file.

I used the following line:

cat /usr/local/biotools/freec/scripts/makeGraph.R | R --slave --args 2 sample_bwa_wg.mpileup_ratio.txt sample_bwa_wg.mpileup_BAF.txt

Any ideas of why is this is happening?

Thanks in advance.

Cheers,

Fernando

Last edited by fjrossello; 12-20-2012, 02:52 PM. Reason: Typo
Leave a comment:
valeu replied

08-09-2012, 03:29 AM
Hi Hao,

You know, two cell lines for the same type of cancer can be very different Especially for "non-copy-number" tumors.

But even for "copy-number" tumors, such as neuroblastoma, CNA regions can be different. See, for example, sequencing data for neuroblastoma samples: suppl.figures from Molenaar et al., 2012
Leave a comment:
yuhao replied

08-09-2012, 03:13 AM
Hi, valeu,

I am currently have two cancer cells datas(the same cancer) from human, the coverage depth are about 33,39, with a depth statistics for each base. In this case, what is the best software for CNV detection? I use FREEC and get the result with parameters (window=3000, step=1000 and other same parameters as in test config file provided in the website), and I am facing a problem is how to see the CNV? how to compare these two results? In stead of list all the CNVs with CNV type, start and ends positions and copy number, what other statistics do we usually use to anaylze CNV?

I find that the CNV detected for these two cancer cells doesn't share any commons, the break points are different, the copy number are different, it looks like they are different, but it is strange, two cancer cells with the sam cancer their CNV are completely different, I am wondering if there is anything wrong in the case?

Thank you !
Leave a comment:
valeu replied

08-06-2012, 12:59 AM
Hi Hao,

Originally posted by yuhao View Post

The output intervals have some overlaps, e.x., 58000, 8387999, 3 gain, 8386000, 9404999 5 gain , so 8386000 < 8387999, how could this thing happen?

This can happen if you use overlapping windows (e.g., step=1000; window=3000). Most likely the breakpoint occurred in overlapping area of the two windows: (8386000;8386000+window.size) and (8387999-window.size;8387999), e.i. in (8386000;8387999).

Originally posted by yuhao View Post

What does control database mean here?Normally we just have a test genome and a reference genome.

If you analyze a cancer sample, you are interested in somatic gains and losses. In this case you use patient's normal DNA (e.g. from blood) as a control.

Originally posted by yuhao View Post

As far as I know, there are typically two different methods to call CNV, segmentation based, and hidden markov model, I am wondering if FREEC is based on segmentation based method?

The method has been published:

Pubmed links

Both papers are in open access. Have a look!

FREEC uses Lasso-based segmentation.

Originally posted by yuhao View Post

How do we determine the window size and steps parameters? Which parameters can affect the accuracy of the result, that's very crucial for the result so I care much about this?

Window size can be determined automatically, if you use parameter "coefficient of variation". See Supplementary Methods of (the first publication)

Using "step" will help to improve sensitivity and get prettier graphs, but it can be time consuming.

One of the most important parameters is "breakpoint threshold" (positive, default 0.8). Use smaller values to get more segments, if by eye you see that segmentation was not sensitive enough.

Originally posted by yuhao View Post

Finally, aside from FREEC, can you recommend some other softwares which had been widely used for CNV detection in the world (because I have many choices but I don't know which ones are best among all). I also tried CNVnator, but the result seems very different from FREEC.

It is better to ask this question to the community. You need to be more precise about your data: whether you have paired-ends, your coverage, whether it is human data, normal individual or a cancer patient, whether you have control sample, etc.
Leave a comment:

Previous 1 2 3 4 5 6 template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News