Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • rpauly
    replied
    Help with FREEC output

    First of, I must say FREEC is a great tool for CNV detection in exome seq data!
    I have a few questions about the output files I obtained.

    I have two _cnv, _ratio, _BAF files.
    For instance, how is *_mpileup_CNV different from *mpileup_normal_CNV? and depending on the file I use my R plots are so different! why would this be?


    PS: I have paired end tumor-normal illumina data from exome sequencing.

    ~Thanks for your help,
    Rini
    Attached Files

    Leave a comment:


  • rduarte
    replied
    Posted by fjrossello
    Hi Valeu,
    Thanks for your explanation and in regards to the R plots, I downloaded the latest makeGraph.R and works perfectly.
    Cheers,
    Fernando
    Can someone tell me the link to download makeGraph.R?
    I´m having problems finding the most recent version of this.

    Thanks in advance

    Leave a comment:


  • fjrossello
    replied
    Hi Valeu,

    Sorry to be so insistent in this aspect. I re-run control-freec on an mpileup file of one my samples with and without BAF options and I found a few differences between both runs. First, a simple and rather obvious question, if you have a control match file, does the CNA only analysis output only the somatic gain/loss regions of the sample? This question arises because the CNA+BAF run outputs a CNVs file which reports genotype information and gain/loss/normal in the predicted copy number. When I filter this file to report only somatic gains/losses and compare this output to the CNA only analysis output, the results are not quite the same.
    Is this a fair comparison? Am I missing something which prevents me from understanding these results?
    Thanks in advance.
    Cheers,
    Fernando

    Ps: find below the parameters of my config file. As I said, I run it plus and minus BAF, i.e., BAF commented.

    [general]
    chrLenFile = hg19.len
    coefficientOfVariation = 0.05
    outputDir = ./ch209_cnv_CNA_only
    degree = 3
    ploidy = 2
    samtools = /usr/local/biotools/bin/samtools
    sex = XY
    chrFiles = /home/fernandr/biotools/references/iGenomes/Homo_sapiens/UCSC/hg19/Sequence/Chromosomes
    # step = 5000
    # window = 20000

    [sample]

    mateFile = /media/data/projects/wg_fr_20121024/sample_mpileup_files/sample_bwa_wg.mpileup
    inputFormat = pileup
    mateOrientation = FR

    [control]

    mateFile = /media/data/projects/wg_fr_20121024/sample_mpileup_files/control_bwa_wg.mpileup
    inputFormat = pileup
    mateOrientation = FR

    # [BAF]
    #
    # SNPfile = /home/fernandr/biotools/references/freec/hg19/hg19_snp131.SingleDiNucl.1based.txt
    # minimalCoveragePerPosition = 1
    # minimalQualityPerPosition = 0
    # shiftInQuality = 33

    Leave a comment:


  • valeu
    replied
    No, mateOrientation is not relevant when you use pileup. Still, you need to set this parameter to something

    Leave a comment:


  • fjrossello
    replied
    Originally posted by valeu View Post
    Hi Fernando,

    I think running FREEC on a pileup should be more or less identical to running it on a BAM files with "mateOrientation=0". In this case, all reads are taken into account during calculation of read count per window. When you select "mateOrientation=FR" for a BAM file, FREEC will keep only pairs mapped in the correct orientation and insert size.

    Also, in some cases having BAF info can improve predictions (e.g., when float copy number is 2.5 and FREEC hesitates between assigning 2 or 3 copies to the region)

    Also, in the version 5.9 and before there was a bug that did not allowed FREEC to get correct read count in window with extremely high coverage (> 1000x per position) when using .pileup files. This bug is fixed in 6.0 which must be available the next week. Also, the new version works ~10x faster on an 8 core computer. It can process 30x genome (with control, BAF, in pileup.gz) in one hour
    Thanks for your prompt answer. I understand. I will anxiously wait for the next version, speed improvements and bug corrections are always good news.
    Just to be clear, when you use a pileup file, should the mateOrientation parameter be set to 0? Is that paremeter relevant at all when use this format?
    Thanks in advance.

    Cheers,

    Fernando

    Leave a comment:


  • valeu
    replied
    Hi Fernando,

    I think running FREEC on a pileup should be more or less identical to running it on a BAM files with "mateOrientation=0". In this case, all reads are taken into account during calculation of read count per window. When you select "mateOrientation=FR" for a BAM file, FREEC will keep only pairs mapped in the correct orientation and insert size.

    Also, in some cases having BAF info can improve predictions (e.g., when float copy number is 2.5 and FREEC hesitates between assigning 2 or 3 copies to the region)

    Also, in the version 5.9 and before there was a bug that did not allowed FREEC to get correct read count in window with extremely high coverage (> 1000x per position) when using .pileup files. This bug is fixed in 6.0 which must be available the next week. Also, the new version works ~10x faster on an 8 core computer. It can process 30x genome (with control, BAF, in pileup.gz) in one hour

    Leave a comment:


  • fjrossello
    replied
    Hi Valeu,

    This is Fernando again. I have re-run Freec on one of my samples where I previously run CNA analysis from a SAM file (unsorted, I use the FR mateOrientation parameter). The difference this time was that I wanted to run CNA + BAF analyses. To run BAF I first created a pileup from the sample SAM file and then run it using exactly the same parameters.
    Even though that the results look graphically the same (R created plots), when I compared the CNVs text files produced by both analyses the results look slightly different. The differences are seen in the start and end position (the regions are roughfly the same) and in terms the copy number predicted.
    Are there any reasons why this could be happening? Which one should be more reliable?
    Thanks in advance.

    Cheers,

    Fernando
    Last edited by fjrossello; 01-17-2013, 07:45 PM. Reason: typo

    Leave a comment:


  • valeu
    replied
    You need to define window size (window=1000) and you have to run it with a control dataset when you use the "target" option

    Leave a comment:


  • stephwen
    replied
    Error while specifying target BED file

    Hello everyone,

    I have been trying out Control-FREEC with some test data (exome samples), and I encountered an error when trying to specify a target BED file.

    Basically, Control-FREEC seems to run fine, whether I use a control sample or not (I tried both options), but when I add these lines :

    Code:
    [target]
    
    captureRegions = /home/volatile/swe/exomes/TruSeq-for-FREEC.bed
    to my config file, the program crashes (exits with code 255), and outputs the following lines:

    Code:
    FREEC v5.9 (Control-FREEC v2.9) : calling copy number alterations and LOH regions using deep-sequencing data
    ..Using 1 process(es)
    ..Minimal CNA length (in windows) was set to 4
    ..consider the sample being male
    ..breakPointThreshold set to 0.8
    ..Polynomial degree for "ReadCount ~ GC-content" or "Sample ReadCount ~ Control ReadCount" is 3
    ..FREEC is not going to output normalized copy number profiles into a BedGraph file. Use "[general] BedGraphOutput=TRUE" if you want a BedGraph file
    ..FREEC is not going to adjust profiles for a possible contamination by normal cells
    ..Output directory:	/home/volatile/swe/2013-01-10/Test-FREEC5
    ..Directory with files containing chromosome sequences:	/home/genmol/genomes/homo_sapiens/hg19/chromosomes
    ..Sample file:	/home/volatile/swe/exomes/exome2.bam
    ..Sample input format:	BAM
    ..will use this instance of samtools: samtools to read BAM files
    ..Control file:	/home/volatile/swe/exomes/exome1.bam
    ..Input format for the control file:	BAM
    ..File with chromosome lengths:	hg19.len
    ..Coefficient Of Variation set equal to 0.062
    ..Note, this coefficient won't be used if "window" is set
    ..File hg19.len was read
    	 total genome size:	3.09568e+09
    ..samtools should be installed to be able to read BAM files
    	 read number:	76963934
    	 coefficientOfVariation:	0.062
    	 evaluated window size:	10464
    ..Starting reading /home/volatile/swe/exomes/exome2.bam
    ..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view /home/volatile/swe/exomes/exome2.bam
    76963934 lines read..
    75080830 reads used to compute copy number profile
    printing counts into /home/volatile/swe/2013-01-10/Test-FREEC5/exome2.bam_sample.cpn
    ..Window size:	10464
    	..Will use hg19.len to calculate RC for the control sample
    ..File hg19.len was read
    ..Starting reading /home/volatile/swe/exomes/exome1.bam
    ..samtools should be installed to be able to read BAM files; will use the following command for samtools: samtools view /home/volatile/swe/exomes/exome1.bam
    51311982 lines read..
    50082356 reads used to compute copy number profile
    printing counts into /home/volatile/swe/2013-01-10/Test-FREEC5/exome1.bam_control.cpn
    ..FREEC will take into account only regions from /home/volatile/swe/exomes/TruSeq-for-FREEC.bed
    ..Mappability and GC-content won't be used
    ..Control-FREEC won't use minimal mappability. All windows overlaping capture regions will be considered
    ..Reading /home/volatile/swe/exomes/TruSeq-for-FREEC.bed
    ..Your file must be in .BED format, and it must be sorted
    ..Reading capture for chromosome 1
    ..Reading capture for chromosome 2
    ..Reading capture for chromosome 3
    ..Reading capture for chromosome 4
    ..Reading capture for chromosome 5
    ..Reading capture for chromosome 6
    ..Reading capture for chromosome 7
    ..Reading capture for chromosome 8
    ..Reading capture for chromosome 9
    ..Reading capture for chromosome 10
    ..Reading capture for chromosome 11
    ..Reading capture for chromosome 12
    ..Reading capture for chromosome 13
    ..Reading capture for chromosome 14
    ..Reading capture for chromosome 15
    ..Reading capture for chromosome 16
    ..Reading capture for chromosome 17
    ..Reading capture for chromosome 18
    ..Reading capture for chromosome 19
    ..Reading capture for chromosome 20
    ..Reading capture for chromosome 21
    ..Reading capture for chromosome 22
    ..Reading capture for chromosome X
    ..Reading capture for chromosome Y
    file /home/volatile/swe/exomes/TruSeq-for-FREEC.bed is read
    ..Setting read counts to Zero for all windows outside of capture
    ..Total size of captured regions 6.18842e+07bp
    ..processing chromosome 1
    ..processing chromosome 2
    ..processing chromosome 3
    ..processing chromosome 4
    ..processing chromosome 5
    ..processing chromosome 6
    ..processing chromosome 7
    ..processing chromosome 8
    ..processing chromosome 9
    ..processing chromosome 10
    ..processing chromosome 11
    ..processing chromosome 12
    ..processing chromoso..At this point you need to profide window size, option 'window' in group of parameters [general] in your config file
    me 13
    ..processing chromosome 14
    ..processing chromosome 15
    ..processing chromosome 16
    ..processing chromosome 17
    ..processing chromosome 18
    ..processing chromosome 19
    ..processing chromosome 20
    ..processing chromosome 21
    ..processing chromosome 22
    ..processing chromosome X
    ..processing chromosome Y
    ..telocenromeric set to 1 since it is a minimal capture region
    (This is the output when I use a control sample, but I get basically the same thing without control sample)

    I formatted my BED file as follows:

    chr start end
    (tab-delimited), and it's ordered by chr (chr1, chr2, ... chr22, chrX, chrY), and then by start position.

    Am I doing something wrong here?

    Thanks in advance.

    Regards,

    Stephane

    PS : Since samtools' pileup function is now deprecated, it's not possible to generate pileup files anymore. Do you plan on supporting BAM or VCF files as input for the BAF calculation function? Or do you know how I can work around this limitation? Thanks.
    Last edited by stephwen; 01-10-2013, 05:08 AM. Reason: added question about BAM or VCF support for BAF calculation

    Leave a comment:


  • fjrossello
    replied
    Hi Valeu,
    Thanks for your explanation and in regards to the R plots, I downloaded the latest makeGraph.R and works perfectly.
    Cheers,
    Fernando

    Leave a comment:


  • valeu
    replied
    Hi Fernando,

    Are they the output obtained when CNV and LOH were calculated on the control sample when using the CG_profile.cnp?
    Yes, you are right.

    Any ideas of why is this is happening?
    I recently updated makeGraph.R, can you download the latest version from the site and see if it produces the same error?

    What does it write into the command line?

    Leave a comment:


  • fjrossello
    replied
    Hi Valeu,

    I am using control-freec to detect CNV and LOH in normal vs tumor samples (low pass whole genome).
    I had no problems to run it at all. However, I would like to ask you a couple of questions in regards to the files outputted and the plotting process.
    First, when I run CNV + LOH using SAM pileups, apart from creating the standard _CNVs, _ratio.txt, _BAF.txt _sample.cnp, _control.cnp and GC_profile.cnp output files, it also generates three extra files with suffix _normal_CNVs, _normal_ratio.txt and _normal_BAF.txt. Are they the output obtained when CNV and LOH were calculated on the control sample when using the CG_profile.cnp?
    Second, even though it works flawlessly for the ratios CNV data, I cannot make the script makeGraph.R to plot the LOH _BAF.txt file.

    I used the following line:

    cat /usr/local/biotools/freec/scripts/makeGraph.R | R --slave --args 2 sample_bwa_wg.mpileup_ratio.txt sample_bwa_wg.mpileup_BAF.txt

    Any ideas of why is this is happening?

    Thanks in advance.

    Cheers,

    Fernando
    Last edited by fjrossello; 12-20-2012, 02:52 PM. Reason: Typo

    Leave a comment:


  • valeu
    replied
    Hi Hao,

    You know, two cell lines for the same type of cancer can be very different Especially for "non-copy-number" tumors.

    But even for "copy-number" tumors, such as neuroblastoma, CNA regions can be different. See, for example, sequencing data for neuroblastoma samples: suppl.figures from Molenaar et al., 2012

    Leave a comment:


  • yuhao
    replied
    Hi, valeu,

    I am currently have two cancer cells datas(the same cancer) from human, the coverage depth are about 33,39, with a depth statistics for each base. In this case, what is the best software for CNV detection? I use FREEC and get the result with parameters (window=3000, step=1000 and other same parameters as in test config file provided in the website), and I am facing a problem is how to see the CNV? how to compare these two results? In stead of list all the CNVs with CNV type, start and ends positions and copy number, what other statistics do we usually use to anaylze CNV?

    I find that the CNV detected for these two cancer cells doesn't share any commons, the break points are different, the copy number are different, it looks like they are different, but it is strange, two cancer cells with the sam cancer their CNV are completely different, I am wondering if there is anything wrong in the case?

    Thank you !

    Leave a comment:


  • valeu
    replied
    Hi Hao,

    Originally posted by yuhao View Post
    The output intervals have some overlaps, e.x., 58000, 8387999, 3 gain, 8386000, 9404999 5 gain , so 8386000 < 8387999, how could this thing happen?
    This can happen if you use overlapping windows (e.g., step=1000; window=3000). Most likely the breakpoint occurred in overlapping area of the two windows: (8386000;8386000+window.size) and (8387999-window.size;8387999), e.i. in (8386000;8387999).

    Originally posted by yuhao View Post
    What does control database mean here?Normally we just have a test genome and a reference genome.
    If you analyze a cancer sample, you are interested in somatic gains and losses. In this case you use patient's normal DNA (e.g. from blood) as a control.

    Originally posted by yuhao View Post
    As far as I know, there are typically two different methods to call CNV, segmentation based, and hidden markov model, I am wondering if FREEC is based on segmentation based method?
    The method has been published:

    Pubmed links

    Both papers are in open access. Have a look!

    FREEC uses Lasso-based segmentation.

    Originally posted by yuhao View Post
    How do we determine the window size and steps parameters? Which parameters can affect the accuracy of the result, that's very crucial for the result so I care much about this?
    Window size can be determined automatically, if you use parameter "coefficient of variation". See Supplementary Methods of (the first publication)

    Using "step" will help to improve sensitivity and get prettier graphs, but it can be time consuming.

    One of the most important parameters is "breakpoint threshold" (positive, default 0.8). Use smaller values to get more segments, if by eye you see that segmentation was not sensitive enough.

    Originally posted by yuhao View Post
    Finally, aside from FREEC, can you recommend some other softwares which had been widely used for CNV detection in the world (because I have many choices but I don't know which ones are best among all). I also tried CNVnator, but the result seems very different from FREEC.
    It is better to ask this question to the community. You need to be more precise about your data: whether you have paired-ends, your coverage, whether it is human data, normal individual or a cancer patient, whether you have control sample, etc.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
32 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
35 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
29 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Working...
X