Announcement

Collapse
No announcement yet.

Control-FREEC: a tool for assessing copy number and allelic content using NGS data

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    No, mateOrientation is not relevant when you use pileup. Still, you need to set this parameter to something

    Comment


    • #47
      Hi Valeu,

      Sorry to be so insistent in this aspect. I re-run control-freec on an mpileup file of one my samples with and without BAF options and I found a few differences between both runs. First, a simple and rather obvious question, if you have a control match file, does the CNA only analysis output only the somatic gain/loss regions of the sample? This question arises because the CNA+BAF run outputs a CNVs file which reports genotype information and gain/loss/normal in the predicted copy number. When I filter this file to report only somatic gains/losses and compare this output to the CNA only analysis output, the results are not quite the same.
      Is this a fair comparison? Am I missing something which prevents me from understanding these results?
      Thanks in advance.
      Cheers,
      Fernando

      Ps: find below the parameters of my config file. As I said, I run it plus and minus BAF, i.e., BAF commented.

      [general]
      chrLenFile = hg19.len
      coefficientOfVariation = 0.05
      outputDir = ./ch209_cnv_CNA_only
      degree = 3
      ploidy = 2
      samtools = /usr/local/biotools/bin/samtools
      sex = XY
      chrFiles = /home/fernandr/biotools/references/iGenomes/Homo_sapiens/UCSC/hg19/Sequence/Chromosomes
      # step = 5000
      # window = 20000

      [sample]

      mateFile = /media/data/projects/wg_fr_20121024/sample_mpileup_files/sample_bwa_wg.mpileup
      inputFormat = pileup
      mateOrientation = FR

      [control]

      mateFile = /media/data/projects/wg_fr_20121024/sample_mpileup_files/control_bwa_wg.mpileup
      inputFormat = pileup
      mateOrientation = FR

      # [BAF]
      #
      # SNPfile = /home/fernandr/biotools/references/freec/hg19/hg19_snp131.SingleDiNucl.1based.txt
      # minimalCoveragePerPosition = 1
      # minimalQualityPerPosition = 0
      # shiftInQuality = 33

      Comment


      • #48
        Posted by fjrossello
        Hi Valeu,
        Thanks for your explanation and in regards to the R plots, I downloaded the latest makeGraph.R and works perfectly.
        Cheers,
        Fernando
        Can someone tell me the link to download makeGraph.R?
        I´m having problems finding the most recent version of this.

        Thanks in advance

        Comment


        • #49
          Help with FREEC output

          First of, I must say FREEC is a great tool for CNV detection in exome seq data!
          I have a few questions about the output files I obtained.

          I have two _cnv, _ratio, _BAF files.
          For instance, how is *_mpileup_CNV different from *mpileup_normal_CNV? and depending on the file I use my R plots are so different! why would this be?


          PS: I have paired end tumor-normal illumina data from exome sequencing.

          ~Thanks for your help,
          Rini
          Attached Files

          Comment


          • #50
            Originally posted by rduarte View Post
            Can someone tell me the link to download makeGraph.R?
            I´m having problems finding the most recent version of this.

            Thanks in advance
            This is the latest version so far: makeGraph.R

            Comment


            • #51
              First, a simple and rather obvious question, if you have a control match file, does the CNA only analysis output only the somatic gain/loss regions of the sample?
              "Yes", if you don't use [BAF] and "No" if you do.

              This question arises because the CNA+BAF run outputs a CNVs file which reports genotype information and gain/loss/normal in the predicted copy number. When I filter this file to report only somatic gains/losses and compare this output to the CNA only analysis output, the results are not quite the same.
              Is this a fair comparison? Am I missing something which prevents me from understanding these results?
              I would say that it is normal that you get different results. When you use [BAF], you use more information to (1) segment your data and (2) to annotate the resulting CNAs.

              Another reason it can be different: imagine you have a region present in 3 copies in the normal and in 9 copies in the tumor. If you don't use BAF, you will get ratio 3 for this region. Since it is 3>1, this region will be called "gain". If you use [BAF], this region will be identified as "gain" for both normal and tumor samples and thus this gain will be called germline.

              Comment


              • #52
                SegFault when running Freec

                Hi all. First, I wanted to say thanks to valeu for taking the time to answer questions on here, I've found some of the advice to be very useful.

                Second, I've been having a problem with segmentation faults when trying to use Freec to compute BAF. I had previously been using Freec to analyze exome samples without calculating BAF, and had success calling CNVs from the .bam files. However, when I tried to compute BAF (which required converting the .bams into pileup files, as well as using a file of known SNPs) I ran into some problems.
                Specifically, the program dies after about 1 second of runtime with the following as the specific error:

                line 13: 269410 Segmentation fault

                This occurs after the program outputs the following:

                ..Starting reading /home/sf062971/resources/ucsc_snps/snp137.no_dashes.freec_baf.txt to get SNP positions

                Which suggests that it might be something with my SNP file. However, I have omitted this file and still get a segfault when it tries to read my sample pileups.

                My configuration file is below. Any help on what may be causing this would be appreciated.

                [general]

                window = 5000
                step = 1000

                ploidy = 2

                samtools = /home/sf062971/programs/samtools-0.1.18/samtools

                minCNAlength = 4

                BedGraphOutput = TRUE

                chrLenFile = /home/sf062971/resources/freec_resources/mm10.len

                chrFiles = /data/sf062971/data/reference/chr_files

                noisyData = TRUE

                printNA=FALSE
                maxThreads=6
                sex=XX
                breakPointType=4

                outputDir = 1148T_1205N_V3

                contamination = 0.5
                contaminationAdjustment = TRUE

                [sample]

                mateFile = /data/sf062971/LUNG_BAMS/SC_GCIM5351148/1148_ALIGN_RECAL_V3/1148_EXOME.mpileup

                inputFormat = pileup

                mateOrientation = 0

                [control]

                mateFile = /data/sf062971/LUNG_BAMS/SC_GCIM5351205/1205_ALIGN_RECAL_V3/1205_EXOME.mpileup

                inputFormat = pileup

                mateOrientation = 0

                [BAF]

                SNPfile = /home/sf062971/resources/ucsc_snps/snp137.no_dashes.freec_baf.txt

                minimalCoveragePerPosition = 10

                [target]

                captureRegions = /home/sf062971/resources/agilent_data/covered_regions_mm10_merged_sorted.bed

                Comment


                • #53
                  Originally posted by smapdy View Post
                  ..Starting reading /home/sf062971/resources/ucsc_snps/snp137.no_dashes.freec_baf.txt to get SNP positions

                  Which suggests that it might be something with my SNP file.
                  Hi, why don't you use the provided hg19_snp137.SingleDiNucl.1based.txt (created by Niklas Malmqvist)? Historically, the order of columns is different from the UCSC file.

                  To create my dbSNP files, I downloaded a file with SNPs from the UCSC genome browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start), from “Variation and Repeats”/”All SNPs” table. And I kept columns 2, 4, 10, 7, 8 and 5. And I kept only entries with “genomic single”.

                  When you are sure you use the correct SNPfile, check you pileups. They should look like this (http://samtools.sourceforge.net/pileup.shtml):

                  seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&
                  seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+
                  seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6
                  seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<<
                  seq1 276 G 22 ...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6<
                  seq1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&<
                  seq1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<<
                  seq1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<

                  Comment


                  • #54
                    Thanks for the reply valeu. I can't use the hg19 file because my exomes are from mice. I downloaded and generated a SNP file that, I believe, matches the formatting of the human files:

                    chr1 3000568 A/G + T rs29444956
                    chr1 3000621 A/C + C rs31439779
                    chr1 3001490 A/C + C rs31521921
                    chr1 3001579 A/T + A rs30468828
                    chr1 3001712 C/G + C rs32793997
                    chr1 3003268 A/G + A rs30748911
                    chr1 3003414 A/G + A rs31953890
                    chr1 3003449 C/T + T rs32186899
                    chr1 3003464 A/G + G rs31079645
                    chr1 3003508 C/T + C rs32044173


                    My .pileup files are formatted as follows:

                    chr1 3216016 T 44 ......................,.........,..,,,...,^].^];=>>?>?<=??>>???>=>?>>??>>??>>>>@=>?<<::=;<<
                    chr1 3216017 G 45 ......................,.........,..,,,...,..^][email protected]@[email protected]<BAAB5?CCBCCAAABAAB?:?>?=;?><
                    chr1 3216018 T 47 ......................,.........,..,,,...,...^].^], :9?=>@[email protected]>>>@>?=><@>=<==>@?>>=?>>?=;<===9;;;<9
                    chr1 3216019 A 48 .$.....................,.........,..,,,...,....,^]. 99><<??3<??<<???<<;><<<=?=?><<<>?>>=8<<<=6;:9;:<
                    chr1 3216020 T 47 .....................,.........,..,,,...,....,.:==???<=?>==???===?=4?<[email protected]>??<??=?>=;;:;<


                    I believe this is standard .pileup format. Despite appearances both of the above are actually tab-delimited.

                    In the event that I can't get BAF calculation to work, what is are the repercussions? I know there are a few options which are explicitly dependent upon BAF (like noisyData). How much will it impact the analysis if these options are disabled?

                    Comment


                    • #55
                      Originally posted by smapdy View Post
                      Thanks for the reply valeu. I can't use the hg19 file because my exomes are from mice. I downloaded and generated a SNP file that, I believe, matches the formatting of the human files:
                      Both formats look correct. If you want me to debug the code please send me an email with the config file, command line output and other files necessary to run FREEC.

                      Originally posted by smapdy View Post
                      In the event that I can't get BAF calculation to work, what is are the repercussions? I know there are a few options which are explicitly dependent upon BAF (like noisyData). How much will it impact the analysis if these options are disabled?
                      If you disable [BAF] you may get less accurate calls. However, the result should be almost the same as the one you will obtain with [BAF] and noisyData=FALSE.

                      Comment


                      • #56
                        I ended up figuring out what was going on. I had some multiallelic variants in the .snp file that were causing it to fail to load, and my sex variable in the configuration file didn't match up with the actual sample sex which caused problems as well. I ended up dropping the sex argument and using the following general configuration file for my samples:
                        [general]
                        window = 8000
                        step = 2500
                        samtools = samtools
                        minCNAlength = 4
                        BedGraphOutput = TRUE
                        chrLenFile = NCBIM37_um.fa.len
                        chrFiles = chrfiles
                        outputDir = 31208T_31668N_FREEC_V1
                        printNA = FALSE
                        maxThreads = 6
                        ploidy = 2
                        breakPointType = 4
                        contaminationAdjustment = TRUE
                        noisyData = TRUE

                        [sample]
                        mateFile = 31208_EXOME.pileup.gz
                        inputFormat = pileup
                        mateOrientation = 0

                        [control]
                        mateFile = 31668_EXOME.pileup.gz
                        inputFormat = pileup
                        mateOrientation = 0

                        [target]
                        captureRegions = S0276129_Merged_Sorted_Probes.bed

                        [BAF]
                        SNPfile = snp128.singlebases.monoalleleic.freec_baf.txt
                        minimalCoveragePerPosition = 5

                        If anyone is interested I also have the commands I used to generate the pileups from the .bams, as well as the script I used to generate a working Mm9 and Mm10 .snp file.

                        Comment


                        • #57
                          Does Control-FREEC allows normal and tumor with different coverage? (ExomeCNV doesn't allow different coverage, that's why I ask)

                          Also, does it allow estimation of contamination rate in the tumor sample? (Probably via LOH route like ExomeCNV?)

                          Comment


                          • #58
                            Hi ,

                            I am using control-FREEC with exome sequencing data, so far I have been successful in implementing it on my normal control tumor pairs for CNA detection. I am now curious to apply it further for CNA-LOH detection , how ever when am trying to run it, it undergoes segmentation fault. I checked the files and everything is fine. Below am attaching the config file. Anyone who has already applied it and overcome this problem can give me some suggestions.

                            [general]

                            chrLenFile = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hs19_chr.len
                            window = 500

                            step = 250
                            ploidy = 2

                            outputDir = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/CNA_LOH/out1/
                            BedGraphOutput=TRUE
                            breakPointType=4

                            gemMappabilityFile = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/out100m1_hg19.gem

                            chrFiles = /scratch/GT/vdas/test_exome/exome/

                            maxThreads=6

                            breakPointThreshold=1.5
                            noisyData=TRUE
                            printNA=FALSE
                            #breakPointThreshold = -.002;
                            #window = 50000
                            #chrFiles = hg18/hg18_per_chromosome
                            #outputDir = test
                            #degree=3
                            #intercept = 0

                            [sample]

                            mateFile = /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.realigned.recal.pileup.gz
                            inputFormat = pileup
                            mateOrientation = 0

                            [control]

                            mateFile = /scratch/GT/vdas/pietro/exome_seq/results/N_S8981/N_S8981.realigned.recal.pileup.gz
                            inputFormat = pileup
                            mateOrientation = 0

                            [BAF]

                            SNPfile = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hg19_snp137.SingleDiNucl.1based.txt
                            minimalCoveragePerPosition = 5

                            [target]

                            captureRegions = /scratch/GT/vdas/referenceBed/hg19/ss_v4/SureSelect_XT_Human_All_Exon_V4.bed

                            Comment


                            • #59
                              Hi,

                              The config looks good. Could you please share the complete output into the command line with me? [email protected]

                              Thank you
                              Valentina

                              Comment


                              • #60
                                Dear Velntina,

                                Please find the output for the above call, I have made 6 such calls but all had the same problem. Below am attaching the log of the output run and at what stage I get the segmentation fault.

                                /scratch/GT/softwares/FREEC_Linux64/freec -conf config_ctrl5.txt
                                Control-FREEC v7.0 : calling copy number alterations and LOH regions using deep-sequencing data
                                MT-mode using 6 threads
                                ..Minimal CNA length (in windows) is 1
                                ..Breakpoint threshold for segmentation of copy number profiles is 1.5
                                ..Polynomial degree for "ReadCount ~ GC-content" or "Sample ReadCount ~ Control ReadCount" is 3
                                ..telocenromeric set to 50000
                                ..FREEC is not going to adjust profiles for a possible contamination by normal cells
                                ..Window = 50000 was set
                                ..Step: 250
                                ..Output directory: /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/CNA_LOH/out1/
                                ..Directory with files containing chromosome sequences: /scratch/GT/vdas/test_exome/exome/
                                ..will use a threshold of 5 read(s) per SNP position to calculate beta allel frequency (BAF) values
                                ..Sample file: /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.realigned.recal.pileup.gz
                                ..Sample input format: pileup
                                ..Control file: /scratch/GT/vdas/pietro/exome_seq/results/N_S8981/N_S8981.realigned.recal.pileup.gz
                                ..Input format for the control file: pileup
                                ..minimal expected GC-content (general parameter "minExpectedGC") was set to 0.35
                                ..maximal expected GC-content (general parameter "maxExpectedGC") was set to 0.55
                                ..File with chromosome lengths: /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hs19_chr.len
                                ..Using the default minimal mappability value of 0.85
                                ..uniqueMatch = FALSE
                                ..average ploidy set to 2
                                ..break-point type set to 4
                                Warning: I would not recommend using '[general] noisyData=true' for whole genome data; you can miss some real CNAs in this case
                                ..minimal number of reads per window in the control sample is set to 10
                                ..will use SNP positions from /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hg19_snp137.SingleDiNucl.1based.txt to calculate BAF profiles
                                ..Starting reading /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hg19_snp137.SingleDiNucl.1based.txt to get SNP positions
                                Segmentation fault
                                Just after the Segmentation fault it stops and I do not get any outputs in the directory where I have mentioned. Do you want to me to mail again the log status?

                                Comment

                                Working...
                                X