Announcement

Collapse
No announcement yet.

Control-FREEC: a tool for assessing copy number and allelic content using NGS data

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Control-FREEC: a tool for assessing copy number and allelic content using NGS data

    Control-FREEC enables automatic calculation of copy number and allelic content profiles from next generation sequencing data, and consequently predicts regions of genomic alteration such as gains, losses, and loss of heterozygosity (LOH).

    Taking as input aligned reads, Control-FREEC constructs copy number and B-allele frequency profiles. The profiles are then normalized, segmented and analyzed in order to assign genotype status (copy number and allelic content) to each genomic region. When a matched normal sample is provided, Control-FREEC discriminates somatic from germline events.

    Control-FREEC is able to analyze over-diploid tumor samples and samples contaminated by normal cells.

    Low mappability regions can be excluded from the analysis using provided mappability tracks.

    Publications:


    Boeva V, Zinovyev A, Bleakley K, Vert JP, Janoueix-Lerosey I, Delattre O, Barillot E. (2011) Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics 2011; 27(2):268-9. PMID: 21081509.

    Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, Janoueix-Lerosey I, Delattre O, Barillot E. (2011) Control-FREEC: a tool for assessing copy number and allelic content using next generation sequencing data. Bioinformatics. 2011 Dec 6. [Epub ahead of print] PMID: 22155870.

    Input for detection of copy number alterations (CNAs):

    Aligned single-end, paired-end or mate-pair data in SAM, BAM, SAMtools pileup, Eland, BED, SOAP, arachne, psl (BLAT) and Bowtie formats. Control-FREEC accepts .GZ files.

    Input for CNA+LOH detection:

    Aligned reads in SAMtools pileup format. The file can be GZipped.

    Output:

    Regions of gains, lossed and LOH, copy number and BAF profiles.

    Availability:

    http://bioinfo.curie.fr/projects/freec/

  • #2
    I have a couple questions about Control-FREEC,
    1. How does it use the normal sample if provided?
    2. If one provides a normal sample as well as chromosome fasta files, does it account for gc-content bias using the normal sample as well as the fasta files? Or just one of them.
    3. Does Control-FREEC get its bin size from genomic coordinates, or does it use read density?

    Comment


    • #3
      Hi!

      How does it use the normal sample if provided?
      If you do not want to predict allelic status (the [BAF] group of parameters is empty), then the normal sample will be used instead of GC-content to normalize the read count in the tumor sample.
      If you want to calculate BAF and allelic status, then the normalization is done using GC-content but CNVs will be annotated as somatic or germline using information from the normal sample.

      If one provides a normal sample as well as chromosome fasta files, does it account for gc-content bias using the normal sample as well as the fasta files? Or just one of them.
      No, if you only look for CNVs. However, it accounts for both of them if you look for allelic status.
      In the first case, you can force GC-content normalization using option "forceGCcontentNormalization" (see http://bioinfo-out.curie.fr/projects...al.html#CONFIG)

      Does Control-FREEC get its bin size from genomic coordinates, or does it use read density?
      It uses genomic coordinates.
      BTW, if you are not sure about a good value for window size use option "coefficientOfVariation" to evaluate it.

      Comment


      • #4
        Thank you for the reply, that is helpful to know.

        Another question,
        How would one go about getting or creating a text file with SNPs such as the one you provide on the website? So as to use the BAF parameter.

        Comment


        • #5
          I downloaded it from UCSC. It should be their standard format.

          Comment


          • #6
            Could you point out where you found it? When I look at hg18 downloads all I see are fasta files corresponding to SNPs.

            Comment


            • #7
              I download it through "Tables".

              Comment


              • #8
                Could you be more specific? I have looked at the tables and gone through a lot of SNP's but none of them match the formatting of the file you include on the tutorial page where it gives the description of the BAF parameters. Can you go through step by step?

                Comment


                • #9
                  Hello valeu,

                  This is a very nice tool. Just the fact that it can still analyze the CNVs without needing a control sample. However, in your opinion do you think this could be used for certain Custom capture experiments? If so would it be better to instead of chromosome lengths provide the lengths of the targeted regions (within chromosomes).

                  Thanks,
                  Praful

                  Comment


                  • #10
                    I managed to find out what file you were using, it turns out it is under tables > your assembly > all tracks > SNP130 (if your using hg18, for hg19 its 131) > hg18.snp130OrthoPt2Pa2Rm2

                    You then have to pick the columns you want which are:
                    chrom, chromstart, humanObserved, humanAllele, humanStrand

                    Even if you do this the file is not the same size as the one provided on the controlfreec website, also the columns are in a slightly different order.

                    Comment


                    • #11
                      You are right, indeed, I changed the order of columns. It should be 2, 4, 10, 7, 8 and 5.

                      Sorry, I should have mentioned it.

                      Comment


                      • #12
                        Hi, so I have been using this software for multiple analyses. However there are some quirks.

                        Here is a template of my config file:
                        Code:
                        [general]
                        
                        chrLenFile = /projects/copy_num_ana/x07_controlfreec/hg18/res_all_genome.len
                        coefficientOfVariation = 0.062
                        ploidy = 2
                        outputDir = /projects/copy_num_ana/apollo_freec/4.24/RG/RG014/output
                        chrFiles = /projects/copy_num_ana/x07_controlfreec/hg18_fastas
                        forceGCcontentNormalization = 2
                        
                        [sample]
                        
                        mateFile = /projects/DLBCL/CNV/RG014/tumour/A01414_10_lanes_dupsFlagged.bam
                        inputFormat = BAM
                        mateOrientation = FR
                        
                        [control]
                        
                        mateFile = /projects/DLBCL/CNV/RG014/normal/A01443_9_lanes_dupsFlagged.bam
                        inputFormat = BAM
                        mateOrientation = FR
                        I am getting strange copy number calls, almost point-like gains in the middle of neutral or otherwise regions. Here is an example of what I am talking about (If you consider the red to be a gain, the blue to be neutral, and the green to be a loss):




                        Notice how the areas seem to correspond to areas with high GC-content and low reads. The IGV diagram above corresponds to chromosome 17 specifically the point right above RG034 at 3.98e10bp and 2.75 ratio, depicted below:


                        What is causing this? Has it been encountered before? And is there a way to correct this?

                        Comment


                        • #13
                          Hi!

                          to avoid "point" CNVs you can use "minCNAlength=4" or more.. By default, it is 1.

                          Does it help?

                          Comment


                          • #14
                            Yes this helps. I was also wondering if there was a way to smooth out centromere calls? It seems to call very high or very short CNVs nearing centromeres.

                            Comment


                            • #15
                              do you use option "gemMappabilityFile" + "minMappabilityPerWindow"? usually, this helps since centromeric regions are not uniquely mappable. See http://bioinfo-out.curie.fr/projects...al.html#CONFIG

                              Comment

                              Working...
                              X