Announcement

Collapse
No announcement yet.

Control-FREEC: a tool for assessing copy number and allelic content using NGS data

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Thank you for your help! I meet another question: I want to plot the graph using makeGraph.R , when I run, it shows:

    null device
    1
    Error in if (type.convert(args[6])) { :
    missing value where TRUE/FALSE needed
    Execution halted

    Can you give me some help ? Thank you !

    Comment


    • #32
      May I ask a question, what does the "ratio" mean in FREEC? Thanks!

      Comment


      • #33
        "ratio" is actually "normalized read count". Values around 1 correspond to the main ploidy of the sample.

        If you use a control sample and you set degree=1, then "ratio" is simply the ratio of read count in the sample and read count in the control.

        Comment


        • #34
          I am very appreciated for your patient help! I have some other questions to see if I can get your help:

          The output intervals have some overlaps, e.x., 58000, 8387999, 3 gain, 8386000, 9404999 5 gain , so 8386000 < 8387999, how could this thing happen?

          What does control database mean here?Normally we just have a test genome and a reference genome.

          As far as I know, there are typically two different methods to call CNV, segmentation based, and hidden markov model, I am wondering if FREEC is based on segmentation based method?

          How do we determine the window size and steps parameters? Which parameters can affect the accuracy of the result, that's very crucial for the result so I care much about this?

          Finally, aside from FREEC, can you recommend some other softwares which had been widely used for CNV detection in the world (because I have many choices but I don't know which ones are best among all). I also tried CNVnator, but the result seems very different from FREEC.

          I appreciate your help!

          Comment


          • #35
            Hi Hao,

            Originally posted by yuhao View Post
            The output intervals have some overlaps, e.x., 58000, 8387999, 3 gain, 8386000, 9404999 5 gain , so 8386000 < 8387999, how could this thing happen?
            This can happen if you use overlapping windows (e.g., step=1000; window=3000). Most likely the breakpoint occurred in overlapping area of the two windows: (8386000;8386000+window.size) and (8387999-window.size;8387999), e.i. in (8386000;8387999).

            Originally posted by yuhao View Post
            What does control database mean here?Normally we just have a test genome and a reference genome.
            If you analyze a cancer sample, you are interested in somatic gains and losses. In this case you use patient's normal DNA (e.g. from blood) as a control.

            Originally posted by yuhao View Post
            As far as I know, there are typically two different methods to call CNV, segmentation based, and hidden markov model, I am wondering if FREEC is based on segmentation based method?
            The method has been published:

            Pubmed links

            Both papers are in open access. Have a look!

            FREEC uses Lasso-based segmentation.

            Originally posted by yuhao View Post
            How do we determine the window size and steps parameters? Which parameters can affect the accuracy of the result, that's very crucial for the result so I care much about this?
            Window size can be determined automatically, if you use parameter "coefficient of variation". See Supplementary Methods of (the first publication)

            Using "step" will help to improve sensitivity and get prettier graphs, but it can be time consuming.

            One of the most important parameters is "breakpoint threshold" (positive, default 0.8). Use smaller values to get more segments, if by eye you see that segmentation was not sensitive enough.

            Originally posted by yuhao View Post
            Finally, aside from FREEC, can you recommend some other softwares which had been widely used for CNV detection in the world (because I have many choices but I don't know which ones are best among all). I also tried CNVnator, but the result seems very different from FREEC.
            It is better to ask this question to the community. You need to be more precise about your data: whether you have paired-ends, your coverage, whether it is human data, normal individual or a cancer patient, whether you have control sample, etc.

            Comment


            • #36
              Hi, valeu,

              I am currently have two cancer cells datas(the same cancer) from human, the coverage depth are about 33,39, with a depth statistics for each base. In this case, what is the best software for CNV detection? I use FREEC and get the result with parameters (window=3000, step=1000 and other same parameters as in test config file provided in the website), and I am facing a problem is how to see the CNV? how to compare these two results? In stead of list all the CNVs with CNV type, start and ends positions and copy number, what other statistics do we usually use to anaylze CNV?

              I find that the CNV detected for these two cancer cells doesn't share any commons, the break points are different, the copy number are different, it looks like they are different, but it is strange, two cancer cells with the sam cancer their CNV are completely different, I am wondering if there is anything wrong in the case?

              Thank you !

              Comment


              • #37
                Hi Hao,

                You know, two cell lines for the same type of cancer can be very different Especially for "non-copy-number" tumors.

                But even for "copy-number" tumors, such as neuroblastoma, CNA regions can be different. See, for example, sequencing data for neuroblastoma samples: suppl.figures from Molenaar et al., 2012

                Comment


                • #38
                  Hi Valeu,

                  I am using control-freec to detect CNV and LOH in normal vs tumor samples (low pass whole genome).
                  I had no problems to run it at all. However, I would like to ask you a couple of questions in regards to the files outputted and the plotting process.
                  First, when I run CNV + LOH using SAM pileups, apart from creating the standard _CNVs, _ratio.txt, _BAF.txt _sample.cnp, _control.cnp and GC_profile.cnp output files, it also generates three extra files with suffix _normal_CNVs, _normal_ratio.txt and _normal_BAF.txt. Are they the output obtained when CNV and LOH were calculated on the control sample when using the CG_profile.cnp?
                  Second, even though it works flawlessly for the ratios CNV data, I cannot make the script makeGraph.R to plot the LOH _BAF.txt file.

                  I used the following line:

                  cat /usr/local/biotools/freec/scripts/makeGraph.R | R --slave --args 2 sample_bwa_wg.mpileup_ratio.txt sample_bwa_wg.mpileup_BAF.txt

                  Any ideas of why is this is happening?

                  Thanks in advance.

                  Cheers,

                  Fernando
                  Last edited by fjrossello; 12-20-2012, 02:52 PM. Reason: Typo

                  Comment


                  • #39
                    Hi Fernando,

                    Are they the output obtained when CNV and LOH were calculated on the control sample when using the CG_profile.cnp?
                    Yes, you are right.

                    Any ideas of why is this is happening?
                    I recently updated makeGraph.R, can you download the latest version from the site and see if it produces the same error?

                    What does it write into the command line?

                    Comment


                    • #40
                      Hi Valeu,
                      Thanks for your explanation and in regards to the R plots, I downloaded the latest makeGraph.R and works perfectly.
                      Cheers,
                      Fernando

                      Comment


                      • #41
                        Error while specifying target BED file

                        Hello everyone,

                        I have been trying out Control-FREEC with some test data (exome samples), and I encountered an error when trying to specify a target BED file.

                        Basically, Control-FREEC seems to run fine, whether I use a control sample or not (I tried both options), but when I add these lines :

                        Code:
                        [target]
                        
                        captureRegions = /home/volatile/swe/exomes/TruSeq-for-FREEC.bed
                        to my config file, the program crashes (exits with code 255), and outputs the following lines:

                        Code:
                        FREEC v5.9 (Control-FREEC v2.9) : calling copy number alterations and LOH regions using deep-sequencing data
                        ..Using 1 process(es)
                        ..Minimal CNA length (in windows) was set to 4
                        ..consider the sample being male
                        ..breakPointThreshold set to 0.8
                        ..Polynomial degree for "ReadCount ~ GC-content" or "Sample ReadCount ~ Control ReadCount" is 3
                        ..FREEC is not going to output normalized copy number profiles into a BedGraph file. Use "[general] BedGraphOutput=TRUE" if you want a BedGraph file
                        ..FREEC is not going to adjust profiles for a possible contamination by normal cells
                        ..Output directory:	/home/volatile/swe/2013-01-10/Test-FREEC5
                        ..Directory with files containing chromosome sequences:	/home/genmol/genomes/homo_sapiens/hg19/chromosomes
                        ..Sample file:	/home/volatile/swe/exomes/exome2.bam
                        ..Sample input format:	BAM
                        ..will use this instance of samtools: samtools to read BAM files
                        ..Control file:	/home/volatile/swe/exomes/exome1.bam
                        ..Input format for the control file:	BAM
                        ..File with chromosome lengths:	hg19.len
                        ..Coefficient Of Variation set equal to 0.062
                        ..Note, this coefficient won't be used if "window" is set
                        ..File hg19.len was read
                        	 total genome size:	3.09568e+09
                        ..samtools should be installed to be able to read BAM files
                        	 read number:	76963934
                        	 coefficientOfVariation:	0.062
                        	 evaluated window size:	10464
                        ..Starting reading /home/volatile/swe/exomes/exome2.bam
                        ..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view /home/volatile/swe/exomes/exome2.bam
                        76963934 lines read..
                        75080830 reads used to compute copy number profile
                        printing counts into /home/volatile/swe/2013-01-10/Test-FREEC5/exome2.bam_sample.cpn
                        ..Window size:	10464
                        	..Will use hg19.len to calculate RC for the control sample
                        ..File hg19.len was read
                        ..Starting reading /home/volatile/swe/exomes/exome1.bam
                        ..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view /home/volatile/swe/exomes/exome1.bam
                        51311982 lines read..
                        50082356 reads used to compute copy number profile
                        printing counts into /home/volatile/swe/2013-01-10/Test-FREEC5/exome1.bam_control.cpn
                        ..FREEC will take into account only regions from /home/volatile/swe/exomes/TruSeq-for-FREEC.bed
                        ..Mappability and GC-content won't be used
                        ..Control-FREEC won't use minimal mappability. All windows overlaping capture regions will be considered
                        ..Reading /home/volatile/swe/exomes/TruSeq-for-FREEC.bed
                        ..Your file must be in .BED format, and it must be sorted
                        ..Reading capture for chromosome 1
                        ..Reading capture for chromosome 2
                        ..Reading capture for chromosome 3
                        ..Reading capture for chromosome 4
                        ..Reading capture for chromosome 5
                        ..Reading capture for chromosome 6
                        ..Reading capture for chromosome 7
                        ..Reading capture for chromosome 8
                        ..Reading capture for chromosome 9
                        ..Reading capture for chromosome 10
                        ..Reading capture for chromosome 11
                        ..Reading capture for chromosome 12
                        ..Reading capture for chromosome 13
                        ..Reading capture for chromosome 14
                        ..Reading capture for chromosome 15
                        ..Reading capture for chromosome 16
                        ..Reading capture for chromosome 17
                        ..Reading capture for chromosome 18
                        ..Reading capture for chromosome 19
                        ..Reading capture for chromosome 20
                        ..Reading capture for chromosome 21
                        ..Reading capture for chromosome 22
                        ..Reading capture for chromosome X
                        ..Reading capture for chromosome Y
                        file /home/volatile/swe/exomes/TruSeq-for-FREEC.bed is read
                        ..Setting read counts to Zero for all windows outside of capture
                        ..Total size of captured regions 6.18842e+07bp
                        ..processing chromosome 1
                        ..processing chromosome 2
                        ..processing chromosome 3
                        ..processing chromosome 4
                        ..processing chromosome 5
                        ..processing chromosome 6
                        ..processing chromosome 7
                        ..processing chromosome 8
                        ..processing chromosome 9
                        ..processing chromosome 10
                        ..processing chromosome 11
                        ..processing chromosome 12
                        ..processing chromoso..At this point you need to profide window size, option 'window' in group of parameters [general] in your config file
                        me 13
                        ..processing chromosome 14
                        ..processing chromosome 15
                        ..processing chromosome 16
                        ..processing chromosome 17
                        ..processing chromosome 18
                        ..processing chromosome 19
                        ..processing chromosome 20
                        ..processing chromosome 21
                        ..processing chromosome 22
                        ..processing chromosome X
                        ..processing chromosome Y
                        ..telocenromeric set to 1 since it is a minimal capture region
                        (This is the output when I use a control sample, but I get basically the same thing without control sample)

                        I formatted my BED file as follows:

                        chr start end
                        (tab-delimited), and it's ordered by chr (chr1, chr2, ... chr22, chrX, chrY), and then by start position.

                        Am I doing something wrong here?

                        Thanks in advance.

                        Regards,

                        Stephane

                        PS : Since samtools' pileup function is now deprecated, it's not possible to generate pileup files anymore. Do you plan on supporting BAM or VCF files as input for the BAF calculation function? Or do you know how I can work around this limitation? Thanks.
                        Last edited by stephwen; 01-10-2013, 05:08 AM. Reason: added question about BAM or VCF support for BAF calculation

                        Comment


                        • #42
                          You need to define window size (window=1000) and you have to run it with a control dataset when you use the "target" option

                          Comment


                          • #43
                            Hi Valeu,

                            This is Fernando again. I have re-run Freec on one of my samples where I previously run CNA analysis from a SAM file (unsorted, I use the FR mateOrientation parameter). The difference this time was that I wanted to run CNA + BAF analyses. To run BAF I first created a pileup from the sample SAM file and then run it using exactly the same parameters.
                            Even though that the results look graphically the same (R created plots), when I compared the CNVs text files produced by both analyses the results look slightly different. The differences are seen in the start and end position (the regions are roughfly the same) and in terms the copy number predicted.
                            Are there any reasons why this could be happening? Which one should be more reliable?
                            Thanks in advance.

                            Cheers,

                            Fernando
                            Last edited by fjrossello; 01-17-2013, 07:45 PM. Reason: typo

                            Comment


                            • #44
                              Hi Fernando,

                              I think running FREEC on a pileup should be more or less identical to running it on a BAM files with "mateOrientation=0". In this case, all reads are taken into account during calculation of read count per window. When you select "mateOrientation=FR" for a BAM file, FREEC will keep only pairs mapped in the correct orientation and insert size.

                              Also, in some cases having BAF info can improve predictions (e.g., when float copy number is 2.5 and FREEC hesitates between assigning 2 or 3 copies to the region)

                              Also, in the version 5.9 and before there was a bug that did not allowed FREEC to get correct read count in window with extremely high coverage (> 1000x per position) when using .pileup files. This bug is fixed in 6.0 which must be available the next week. Also, the new version works ~10x faster on an 8 core computer. It can process 30x genome (with control, BAF, in pileup.gz) in one hour

                              Comment


                              • #45
                                Originally posted by valeu View Post
                                Hi Fernando,

                                I think running FREEC on a pileup should be more or less identical to running it on a BAM files with "mateOrientation=0". In this case, all reads are taken into account during calculation of read count per window. When you select "mateOrientation=FR" for a BAM file, FREEC will keep only pairs mapped in the correct orientation and insert size.

                                Also, in some cases having BAF info can improve predictions (e.g., when float copy number is 2.5 and FREEC hesitates between assigning 2 or 3 copies to the region)

                                Also, in the version 5.9 and before there was a bug that did not allowed FREEC to get correct read count in window with extremely high coverage (> 1000x per position) when using .pileup files. This bug is fixed in 6.0 which must be available the next week. Also, the new version works ~10x faster on an 8 core computer. It can process 30x genome (with control, BAF, in pileup.gz) in one hour
                                Thanks for your prompt answer. I understand. I will anxiously wait for the next version, speed improvements and bug corrections are always good news.
                                Just to be clear, when you use a pileup file, should the mateOrientation parameter be set to 0? Is that paremeter relevant at all when use this format?
                                Thanks in advance.

                                Cheers,

                                Fernando

                                Comment

                                Working...
                                X