Control-FREEC: a tool for assessing copy number and allelic content using NGS data - SEQanswers

You are currently viewing the SEQanswers forums as a guest, which limits your access. Click here to register now, and join the discussion

X

yuhao

Member

Join Date: Jul 2012

Posts: 33
- Share
- Tweet
#31

08-02-2012, 10:40 PM

Thank you for your help! I meet another question: I want to plot the graph using makeGraph.R , when I run, it shows:

null device
1
Error in if (type.convert(args[6])) { :
missing value where TRUE/FALSE needed
Execution halted

Can you give me some help ? Thank you !
Comment
yuhao

Member

Join Date: Jul 2012

Posts: 33
- Share
- Tweet
#32

08-03-2012, 05:14 AM

May I ask a question, what does the "ratio" mean in FREEC? Thanks!
Comment
valeu

Member

Join Date: Sep 2008

Posts: 69
- Share
- Tweet
#33

08-03-2012, 06:17 AM

"ratio" is actually "normalized read count". Values around 1 correspond to the main ploidy of the sample.

If you use a control sample and you set degree=1, then "ratio" is simply the ratio of read count in the sample and read count in the control.
Comment
yuhao

Member

Join Date: Jul 2012

Posts: 33
- Share
- Tweet
#34

08-05-2012, 06:57 PM

I am very appreciated for your patient help! I have some other questions to see if I can get your help:

The output intervals have some overlaps, e.x., 58000, 8387999, 3 gain, 8386000, 9404999 5 gain , so 8386000 < 8387999, how could this thing happen?

What does control database mean here?Normally we just have a test genome and a reference genome.

As far as I know, there are typically two different methods to call CNV, segmentation based, and hidden markov model, I am wondering if FREEC is based on segmentation based method?

How do we determine the window size and steps parameters? Which parameters can affect the accuracy of the result, that's very crucial for the result so I care much about this?

Finally, aside from FREEC, can you recommend some other softwares which had been widely used for CNV detection in the world (because I have many choices but I don't know which ones are best among all). I also tried CNVnator, but the result seems very different from FREEC.

I appreciate your help!
Comment
valeu

Member

Join Date: Sep 2008

Posts: 69
- Share
- Tweet
#35

08-06-2012, 12:59 AM

Hi Hao,

Originally posted by yuhao View Post

The output intervals have some overlaps, e.x., 58000, 8387999, 3 gain, 8386000, 9404999 5 gain , so 8386000 < 8387999, how could this thing happen?

This can happen if you use overlapping windows (e.g., step=1000; window=3000). Most likely the breakpoint occurred in overlapping area of the two windows: (8386000;8386000+window.size) and (8387999-window.size;8387999), e.i. in (8386000;8387999).

Originally posted by yuhao View Post

What does control database mean here?Normally we just have a test genome and a reference genome.

If you analyze a cancer sample, you are interested in somatic gains and losses. In this case you use patient's normal DNA (e.g. from blood) as a control.

Originally posted by yuhao View Post

As far as I know, there are typically two different methods to call CNV, segmentation based, and hidden markov model, I am wondering if FREEC is based on segmentation based method?

The method has been published:

Pubmed links

Both papers are in open access. Have a look!

FREEC uses Lasso-based segmentation.

Originally posted by yuhao View Post

How do we determine the window size and steps parameters? Which parameters can affect the accuracy of the result, that's very crucial for the result so I care much about this?

Window size can be determined automatically, if you use parameter "coefficient of variation". See Supplementary Methods of (the first publication)

Using "step" will help to improve sensitivity and get prettier graphs, but it can be time consuming.

One of the most important parameters is "breakpoint threshold" (positive, default 0.8). Use smaller values to get more segments, if by eye you see that segmentation was not sensitive enough.

Originally posted by yuhao View Post

Finally, aside from FREEC, can you recommend some other softwares which had been widely used for CNV detection in the world (because I have many choices but I don't know which ones are best among all). I also tried CNVnator, but the result seems very different from FREEC.

It is better to ask this question to the community. You need to be more precise about your data: whether you have paired-ends, your coverage, whether it is human data, normal individual or a cancer patient, whether you have control sample, etc.
Comment
yuhao

Member

Join Date: Jul 2012

Posts: 33
- Share
- Tweet
#36

08-09-2012, 03:13 AM

Hi, valeu,

I am currently have two cancer cells datas(the same cancer) from human, the coverage depth are about 33,39, with a depth statistics for each base. In this case, what is the best software for CNV detection? I use FREEC and get the result with parameters (window=3000, step=1000 and other same parameters as in test config file provided in the website), and I am facing a problem is how to see the CNV? how to compare these two results? In stead of list all the CNVs with CNV type, start and ends positions and copy number, what other statistics do we usually use to anaylze CNV?

I find that the CNV detected for these two cancer cells doesn't share any commons, the break points are different, the copy number are different, it looks like they are different, but it is strange, two cancer cells with the sam cancer their CNV are completely different, I am wondering if there is anything wrong in the case?

Thank you !
Comment
valeu

Member

Join Date: Sep 2008

Posts: 69
- Share
- Tweet
#37

08-09-2012, 03:29 AM

Hi Hao,

You know, two cell lines for the same type of cancer can be very different Especially for "non-copy-number" tumors.

But even for "copy-number" tumors, such as neuroblastoma, CNA regions can be different. See, for example, sequencing data for neuroblastoma samples: suppl.figures from Molenaar et al., 2012
Comment
fjrossello

Member

Join Date: Sep 2011

Posts: 30
- Share
- Tweet
#38

12-20-2012, 02:51 PM

Hi Valeu,

I am using control-freec to detect CNV and LOH in normal vs tumor samples (low pass whole genome).
I had no problems to run it at all. However, I would like to ask you a couple of questions in regards to the files outputted and the plotting process.
First, when I run CNV + LOH using SAM pileups, apart from creating the standard _CNVs, _ratio.txt, _BAF.txt _sample.cnp, _control.cnp and GC_profile.cnp output files, it also generates three extra files with suffix _normal_CNVs, _normal_ratio.txt and _normal_BAF.txt. Are they the output obtained when CNV and LOH were calculated on the control sample when using the CG_profile.cnp?
Second, even though it works flawlessly for the ratios CNV data, I cannot make the script makeGraph.R to plot the LOH _BAF.txt file.

I used the following line:

cat /usr/local/biotools/freec/scripts/makeGraph.R | R --slave --args 2 sample_bwa_wg.mpileup_ratio.txt sample_bwa_wg.mpileup_BAF.txt

Any ideas of why is this is happening?

Thanks in advance.

Cheers,

Fernando

Last edited by fjrossello; 12-20-2012, 02:52 PM. Reason: Typo
Comment
valeu

Member

Join Date: Sep 2008

Posts: 69
- Share
- Tweet
#39

12-21-2012, 02:55 AM

Hi Fernando,

Are they the output obtained when CNV and LOH were calculated on the control sample when using the CG_profile.cnp?

Yes, you are right.

Any ideas of why is this is happening?

I recently updated makeGraph.R, can you download the latest version from the site and see if it produces the same error?

What does it write into the command line?
Comment
fjrossello

Member

Join Date: Sep 2011

Posts: 30
- Share
- Tweet
#40

12-21-2012, 04:05 PM

Hi Valeu,
Thanks for your explanation and in regards to the R plots, I downloaded the latest makeGraph.R and works perfectly.
Cheers,
Fernando
Comment

stephwen

Junior Member

Join Date: Jun 2011
Posts: 4

#41

01-10-2013, 01:50 AM

Error while specifying target BED file

Hello everyone,

I have been trying out Control-FREEC with some test data (exome samples), and I encountered an error when trying to specify a target BED file.

Basically, Control-FREEC seems to run fine, whether I use a control sample or not (I tried both options), but when I add these lines :

Code:

[target]

captureRegions = /home/volatile/swe/exomes/TruSeq-for-FREEC.bed

to my config file, the program crashes (exits with code 255), and outputs the following lines:

Code:

FREEC v5.9 (Control-FREEC v2.9) : calling copy number alterations and LOH regions using deep-sequencing data
..Using 1 process(es)
..Minimal CNA length (in windows) was set to 4
..consider the sample being male
..breakPointThreshold set to 0.8
..Polynomial degree for "ReadCount ~ GC-content" or "Sample ReadCount ~ Control ReadCount" is 3
..FREEC is not going to output normalized copy number profiles into a BedGraph file. Use "[general] BedGraphOutput=TRUE" if you want a BedGraph file
..FREEC is not going to adjust profiles for a possible contamination by normal cells
..Output directory:	/home/volatile/swe/2013-01-10/Test-FREEC5
..Directory with files containing chromosome sequences:	/home/genmol/genomes/homo_sapiens/hg19/chromosomes
..Sample file:	/home/volatile/swe/exomes/exome2.bam
..Sample input format:	BAM
..will use this instance of samtools: samtools to read BAM files
..Control file:	/home/volatile/swe/exomes/exome1.bam
..Input format for the control file:	BAM
..File with chromosome lengths:	hg19.len
..Coefficient Of Variation set equal to 0.062
..Note, this coefficient won't be used if "window" is set
..File hg19.len was read
	 total genome size:	3.09568e+09
..samtools should be installed to be able to read BAM files
	 read number:	76963934
	 coefficientOfVariation:	0.062
	 evaluated window size:	10464
..Starting reading /home/volatile/swe/exomes/exome2.bam
..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view /home/volatile/swe/exomes/exome2.bam
76963934 lines read..
75080830 reads used to compute copy number profile
printing counts into /home/volatile/swe/2013-01-10/Test-FREEC5/exome2.bam_sample.cpn
..Window size:	10464
	..Will use hg19.len to calculate RC for the control sample
..File hg19.len was read
..Starting reading /home/volatile/swe/exomes/exome1.bam
..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view /home/volatile/swe/exomes/exome1.bam
51311982 lines read..
50082356 reads used to compute copy number profile
printing counts into /home/volatile/swe/2013-01-10/Test-FREEC5/exome1.bam_control.cpn
..FREEC will take into account only regions from /home/volatile/swe/exomes/TruSeq-for-FREEC.bed
..Mappability and GC-content won't be used
..Control-FREEC won't use minimal mappability. All windows overlaping capture regions will be considered
..Reading /home/volatile/swe/exomes/TruSeq-for-FREEC.bed
..Your file must be in .BED format, and it must be sorted
..Reading capture for chromosome 1
..Reading capture for chromosome 2
..Reading capture for chromosome 3
..Reading capture for chromosome 4
..Reading capture for chromosome 5
..Reading capture for chromosome 6
..Reading capture for chromosome 7
..Reading capture for chromosome 8
..Reading capture for chromosome 9
..Reading capture for chromosome 10
..Reading capture for chromosome 11
..Reading capture for chromosome 12
..Reading capture for chromosome 13
..Reading capture for chromosome 14
..Reading capture for chromosome 15
..Reading capture for chromosome 16
..Reading capture for chromosome 17
..Reading capture for chromosome 18
..Reading capture for chromosome 19
..Reading capture for chromosome 20
..Reading capture for chromosome 21
..Reading capture for chromosome 22
..Reading capture for chromosome X
..Reading capture for chromosome Y
file /home/volatile/swe/exomes/TruSeq-for-FREEC.bed is read
..Setting read counts to Zero for all windows outside of capture
..Total size of captured regions 6.18842e+07bp
..processing chromosome 1
..processing chromosome 2
..processing chromosome 3
..processing chromosome 4
..processing chromosome 5
..processing chromosome 6
..processing chromosome 7
..processing chromosome 8
..processing chromosome 9
..processing chromosome 10
..processing chromosome 11
..processing chromosome 12
..processing chromoso..At this point you need to profide window size, option 'window' in group of parameters [general] in your config file
me 13
..processing chromosome 14
..processing chromosome 15
..processing chromosome 16
..processing chromosome 17
..processing chromosome 18
..processing chromosome 19
..processing chromosome 20
..processing chromosome 21
..processing chromosome 22
..processing chromosome X
..processing chromosome Y
..telocenromeric set to 1 since it is a minimal capture region

(This is the output when I use a control sample, but I get basically the same thing without control sample)

I formatted my BED file as follows:

chr start end
(tab-delimited), and it's ordered by chr (chr1, chr2, ... chr22, chrX, chrY), and then by start position.

Am I doing something wrong here?

Thanks in advance.

Regards,

Stephane

PS : Since samtools' pileup function is now deprecated, it's not possible to generate pileup files anymore. Do you plan on supporting BAM or VCF files as input for the BAF calculation function? Or do you know how I can work around this limitation? Thanks.

Last edited by stephwen; 01-10-2013, 05:08 AM. Reason: added question about BAM or VCF support for BAF calculation

Comment

valeu

Member

Join Date: Sep 2008

Posts: 69
- Share
- Tweet
#42

01-10-2013, 09:36 AM

You need to define window size (window=1000) and you have to run it with a control dataset when you use the "target" option
Comment
fjrossello

Member

Join Date: Sep 2011

Posts: 30
- Share
- Tweet
#43

01-17-2013, 07:44 PM

Hi Valeu,

This is Fernando again. I have re-run Freec on one of my samples where I previously run CNA analysis from a SAM file (unsorted, I use the FR mateOrientation parameter). The difference this time was that I wanted to run CNA + BAF analyses. To run BAF I first created a pileup from the sample SAM file and then run it using exactly the same parameters.
Even though that the results look graphically the same (R created plots), when I compared the CNVs text files produced by both analyses the results look slightly different. The differences are seen in the start and end position (the regions are roughfly the same) and in terms the copy number predicted.
Are there any reasons why this could be happening? Which one should be more reliable?
Thanks in advance.

Cheers,

Fernando

Last edited by fjrossello; 01-17-2013, 07:45 PM. Reason: typo
Comment
valeu

Member

Join Date: Sep 2008

Posts: 69
- Share
- Tweet
#44

01-18-2013, 06:39 AM

Hi Fernando,

I think running FREEC on a pileup should be more or less identical to running it on a BAM files with "mateOrientation=0". In this case, all reads are taken into account during calculation of read count per window. When you select "mateOrientation=FR" for a BAM file, FREEC will keep only pairs mapped in the correct orientation and insert size.

Also, in some cases having BAF info can improve predictions (e.g., when float copy number is 2.5 and FREEC hesitates between assigning 2 or 3 copies to the region)

Also, in the version 5.9 and before there was a bug that did not allowed FREEC to get correct read count in window with extremely high coverage (> 1000x per position) when using .pileup files. This bug is fixed in 6.0 which must be available the next week. Also, the new version works ~10x faster on an 8 core computer. It can process 30x genome (with control, BAF, in pileup.gz) in one hour
Comment
fjrossello

Member

Join Date: Sep 2011

Posts: 30
- Share
- Tweet
#45

01-18-2013, 03:00 PM

Originally posted by valeu View Post

Hi Fernando,

I think running FREEC on a pileup should be more or less identical to running it on a BAM files with "mateOrientation=0". In this case, all reads are taken into account during calculation of read count per window. When you select "mateOrientation=FR" for a BAM file, FREEC will keep only pairs mapped in the correct orientation and insert size.

Also, in some cases having BAF info can improve predictions (e.g., when float copy number is 2.5 and FREEC hesitates between assigning 2 or 3 copies to the region)

Also, in the version 5.9 and before there was a bug that did not allowed FREEC to get correct read count in window with extremely high coverage (> 1000x per position) when using .pileup files. This bug is fixed in 6.0 which must be available the next week. Also, the new version works ~10x faster on an 8 core computer. It can process 30x genome (with control, BAF, in pileup.gz) in one hour

Thanks for your prompt answer. I understand. I will anxiously wait for the next version, speed improvements and bug corrections are always good news.
Just to be clear, when you use a pileup file, should the mateOrientation parameter be set to 0? Is that paremeter relevant at all when use this format?
Thanks in advance.

Cheers,

Fernando
Comment

Previous 1 2 3 4 5 6 template Next

Choosing Between NGS and qPCR

by seqadmin

Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
- Channel: Articles
10-18-2024, 07:11 AM
Non-Coding RNA Research and Technologies

by seqadmin

Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

Nobel Prize for MicroRNA Discovery
This week,...
- Channel: Articles
10-07-2024, 08:07 AM

	Topics		Statistics	Last Post
	New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM		0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
	Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM		0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
	New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM		0 responses 50 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
	Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM		0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Working...

X