Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bhdavis1978
    replied
    Hi Valeu,

    I want to be able to use the copy number data generated using control freec as input for a regression of copy number against other genomic and epigenetic features, so having higher precision is very useful to me.

    Assuming 30X coverage & read length = 100 bp, and a desire to have 400 reads / window suggests to me that the minimum recommended window size is about 1333 =(400 / 30 * 100). I was hoping to have the window size set to about 500 bp, which would imply about 150 reads per window.

    What would be the consequences of this? More variability in the copy number estimation? More breakpoints? Less confidence in identifying break points?

    Leave a comment:


  • valeu
    replied
    Originally posted by bhdavis1978 View Post
    Beyond the issue of additional running time, are there any reasons not to run control-freec with the degree setting set to 7 or 9 instead of 3 or 5?
    I did not try, but I think with degree >3 the result will be very similar to that with degree==3.
    Similarly, are there any problems with setting the window and step size relatively small (I was thinking of 500 bp and 250 bp respectively)?
    Thanks
    The ideal window size depends on read density. It is good to have about 400 reads per window. Alternatively, the window size can be evaluated automatically by FREEC using the read count and based on Poisson distribution.

    Leave a comment:


  • bhdavis1978
    replied
    Disadvantages to setting degree to 7, or 9 and small window and step sizes

    Hello,

    Beyond the issue of additional running time, are there any reasons not to run control-freec with the degree setting set to 7 or 9 instead of 3 or 5?

    Similarly, are there any problems with setting the window and step size relatively small (I was thinking of 500 bp and 250 bp respectively)?

    Thanks

    Leave a comment:


  • valeu
    replied
    There is a problem with this SNP file. Please try another one from the FREEC website.

    Leave a comment:


  • vd4mindia
    replied
    Dear Velntina,

    Please find the output for the above call, I have made 6 such calls but all had the same problem. Below am attaching the log of the output run and at what stage I get the segmentation fault.

    /scratch/GT/softwares/FREEC_Linux64/freec -conf config_ctrl5.txt
    Control-FREEC v7.0 : calling copy number alterations and LOH regions using deep-sequencing data
    MT-mode using 6 threads
    ..Minimal CNA length (in windows) is 1
    ..Breakpoint threshold for segmentation of copy number profiles is 1.5
    ..Polynomial degree for "ReadCount ~ GC-content" or "Sample ReadCount ~ Control ReadCount" is 3
    ..telocenromeric set to 50000
    ..FREEC is not going to adjust profiles for a possible contamination by normal cells
    ..Window = 50000 was set
    ..Step: 250
    ..Output directory: /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/CNA_LOH/out1/
    ..Directory with files containing chromosome sequences: /scratch/GT/vdas/test_exome/exome/
    ..will use a threshold of 5 read(s) per SNP position to calculate beta allel frequency (BAF) values
    ..Sample file: /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.realigned.recal.pileup.gz
    ..Sample input format: pileup
    ..Control file: /scratch/GT/vdas/pietro/exome_seq/results/N_S8981/N_S8981.realigned.recal.pileup.gz
    ..Input format for the control file: pileup
    ..minimal expected GC-content (general parameter "minExpectedGC") was set to 0.35
    ..maximal expected GC-content (general parameter "maxExpectedGC") was set to 0.55
    ..File with chromosome lengths: /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hs19_chr.len
    ..Using the default minimal mappability value of 0.85
    ..uniqueMatch = FALSE
    ..average ploidy set to 2
    ..break-point type set to 4
    Warning: I would not recommend using '[general] noisyData=true' for whole genome data; you can miss some real CNAs in this case
    ..minimal number of reads per window in the control sample is set to 10
    ..will use SNP positions from /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hg19_snp137.SingleDiNucl.1based.txt to calculate BAF profiles
    ..Starting reading /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hg19_snp137.SingleDiNucl.1based.txt to get SNP positions
    Segmentation fault
    Just after the Segmentation fault it stops and I do not get any outputs in the directory where I have mentioned. Do you want to me to mail again the log status?

    Leave a comment:


  • valeu
    replied
    Hi,

    The config looks good. Could you please share the complete output into the command line with me? [email protected]

    Thank you
    Valentina

    Leave a comment:


  • vd4mindia
    replied
    Hi ,

    I am using control-FREEC with exome sequencing data, so far I have been successful in implementing it on my normal control tumor pairs for CNA detection. I am now curious to apply it further for CNA-LOH detection , how ever when am trying to run it, it undergoes segmentation fault. I checked the files and everything is fine. Below am attaching the config file. Anyone who has already applied it and overcome this problem can give me some suggestions.

    [general]

    chrLenFile = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hs19_chr.len
    window = 500

    step = 250
    ploidy = 2

    outputDir = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/CNA_LOH/out1/
    BedGraphOutput=TRUE
    breakPointType=4

    gemMappabilityFile = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/out100m1_hg19.gem

    chrFiles = /scratch/GT/vdas/test_exome/exome/

    maxThreads=6

    breakPointThreshold=1.5
    noisyData=TRUE
    printNA=FALSE
    #breakPointThreshold = -.002;
    #window = 50000
    #chrFiles = hg18/hg18_per_chromosome
    #outputDir = test
    #degree=3
    #intercept = 0

    [sample]

    mateFile = /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.realigned.recal.pileup.gz
    inputFormat = pileup
    mateOrientation = 0

    [control]

    mateFile = /scratch/GT/vdas/pietro/exome_seq/results/N_S8981/N_S8981.realigned.recal.pileup.gz
    inputFormat = pileup
    mateOrientation = 0

    [BAF]

    SNPfile = /scratch/GT/vdas/pietro/exome_seq/test_Control_FREEC/hg19_snp137.SingleDiNucl.1based.txt
    minimalCoveragePerPosition = 5

    [target]

    captureRegions = /scratch/GT/vdas/referenceBed/hg19/ss_v4/SureSelect_XT_Human_All_Exon_V4.bed

    Leave a comment:


  • ymc
    replied
    Does Control-FREEC allows normal and tumor with different coverage? (ExomeCNV doesn't allow different coverage, that's why I ask)

    Also, does it allow estimation of contamination rate in the tumor sample? (Probably via LOH route like ExomeCNV?)

    Leave a comment:


  • smapdy
    replied
    I ended up figuring out what was going on. I had some multiallelic variants in the .snp file that were causing it to fail to load, and my sex variable in the configuration file didn't match up with the actual sample sex which caused problems as well. I ended up dropping the sex argument and using the following general configuration file for my samples:
    [general]
    window = 8000
    step = 2500
    samtools = samtools
    minCNAlength = 4
    BedGraphOutput = TRUE
    chrLenFile = NCBIM37_um.fa.len
    chrFiles = chrfiles
    outputDir = 31208T_31668N_FREEC_V1
    printNA = FALSE
    maxThreads = 6
    ploidy = 2
    breakPointType = 4
    contaminationAdjustment = TRUE
    noisyData = TRUE

    [sample]
    mateFile = 31208_EXOME.pileup.gz
    inputFormat = pileup
    mateOrientation = 0

    [control]
    mateFile = 31668_EXOME.pileup.gz
    inputFormat = pileup
    mateOrientation = 0

    [target]
    captureRegions = S0276129_Merged_Sorted_Probes.bed

    [BAF]
    SNPfile = snp128.singlebases.monoalleleic.freec_baf.txt
    minimalCoveragePerPosition = 5

    If anyone is interested I also have the commands I used to generate the pileups from the .bams, as well as the script I used to generate a working Mm9 and Mm10 .snp file.

    Leave a comment:


  • valeu
    replied
    Originally posted by smapdy View Post
    Thanks for the reply valeu. I can't use the hg19 file because my exomes are from mice. I downloaded and generated a SNP file that, I believe, matches the formatting of the human files:
    Both formats look correct. If you want me to debug the code please send me an email with the config file, command line output and other files necessary to run FREEC.

    Originally posted by smapdy View Post
    In the event that I can't get BAF calculation to work, what is are the repercussions? I know there are a few options which are explicitly dependent upon BAF (like noisyData). How much will it impact the analysis if these options are disabled?
    If you disable [BAF] you may get less accurate calls. However, the result should be almost the same as the one you will obtain with [BAF] and noisyData=FALSE.

    Leave a comment:


  • smapdy
    replied
    Thanks for the reply valeu. I can't use the hg19 file because my exomes are from mice. I downloaded and generated a SNP file that, I believe, matches the formatting of the human files:

    chr1 3000568 A/G + T rs29444956
    chr1 3000621 A/C + C rs31439779
    chr1 3001490 A/C + C rs31521921
    chr1 3001579 A/T + A rs30468828
    chr1 3001712 C/G + C rs32793997
    chr1 3003268 A/G + A rs30748911
    chr1 3003414 A/G + A rs31953890
    chr1 3003449 C/T + T rs32186899
    chr1 3003464 A/G + G rs31079645
    chr1 3003508 C/T + C rs32044173


    My .pileup files are formatted as follows:

    chr1 3216016 T 44 ......................,.........,..,,,...,^].^];=>>?>?<=??>>???>=>?>>??>>??>>>>@=>?<<::=;<<
    chr1 3216017 G 45 ......................,.........,..,,,...,..^].=@@ACCC?ACC@ACCC<BAAB5?CCBCCAAABAAB?:?>?=;?><
    chr1 3216018 T 47 ......................,.........,..,,,...,...^].^], :9?=>@9==@>>>@>?=><@>=<==>@?>>=?>>?=;<===9;;;<9
    chr1 3216019 A 48 .$.....................,.........,..,,,...,....,^]. 99><<??3<??<<???<<;><<<=?=?><<<>?>>=8<<<=6;:9;:<
    chr1 3216020 T 47 .....................,.........,..,,,...,....,.:==???<=?>==???===?=4?<?=??===?@>??<??=?>=;;:;<


    I believe this is standard .pileup format. Despite appearances both of the above are actually tab-delimited.

    In the event that I can't get BAF calculation to work, what is are the repercussions? I know there are a few options which are explicitly dependent upon BAF (like noisyData). How much will it impact the analysis if these options are disabled?

    Leave a comment:


  • valeu
    replied
    Originally posted by smapdy View Post
    ..Starting reading /home/sf062971/resources/ucsc_snps/snp137.no_dashes.freec_baf.txt to get SNP positions

    Which suggests that it might be something with my SNP file.
    Hi, why don't you use the provided hg19_snp137.SingleDiNucl.1based.txt (created by Niklas Malmqvist)? Historically, the order of columns is different from the UCSC file.

    To create my dbSNP files, I downloaded a file with SNPs from the UCSC genome browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start), from “Variation and Repeats”/”All SNPs” table. And I kept columns 2, 4, 10, 7, 8 and 5. And I kept only entries with “genomic single”.

    When you are sure you use the correct SNPfile, check you pileups. They should look like this (http://samtools.sourceforge.net/pileup.shtml):

    seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&
    seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+
    seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6
    seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<<
    seq1 276 G 22 ...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6<
    seq1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&<
    seq1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<<
    seq1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<

    Leave a comment:


  • smapdy
    replied
    SegFault when running Freec

    Hi all. First, I wanted to say thanks to valeu for taking the time to answer questions on here, I've found some of the advice to be very useful.

    Second, I've been having a problem with segmentation faults when trying to use Freec to compute BAF. I had previously been using Freec to analyze exome samples without calculating BAF, and had success calling CNVs from the .bam files. However, when I tried to compute BAF (which required converting the .bams into pileup files, as well as using a file of known SNPs) I ran into some problems.
    Specifically, the program dies after about 1 second of runtime with the following as the specific error:

    line 13: 269410 Segmentation fault

    This occurs after the program outputs the following:

    ..Starting reading /home/sf062971/resources/ucsc_snps/snp137.no_dashes.freec_baf.txt to get SNP positions

    Which suggests that it might be something with my SNP file. However, I have omitted this file and still get a segfault when it tries to read my sample pileups.

    My configuration file is below. Any help on what may be causing this would be appreciated.

    [general]

    window = 5000
    step = 1000

    ploidy = 2

    samtools = /home/sf062971/programs/samtools-0.1.18/samtools

    minCNAlength = 4

    BedGraphOutput = TRUE

    chrLenFile = /home/sf062971/resources/freec_resources/mm10.len

    chrFiles = /data/sf062971/data/reference/chr_files

    noisyData = TRUE

    printNA=FALSE
    maxThreads=6
    sex=XX
    breakPointType=4

    outputDir = 1148T_1205N_V3

    contamination = 0.5
    contaminationAdjustment = TRUE

    [sample]

    mateFile = /data/sf062971/LUNG_BAMS/SC_GCIM5351148/1148_ALIGN_RECAL_V3/1148_EXOME.mpileup

    inputFormat = pileup

    mateOrientation = 0

    [control]

    mateFile = /data/sf062971/LUNG_BAMS/SC_GCIM5351205/1205_ALIGN_RECAL_V3/1205_EXOME.mpileup

    inputFormat = pileup

    mateOrientation = 0

    [BAF]

    SNPfile = /home/sf062971/resources/ucsc_snps/snp137.no_dashes.freec_baf.txt

    minimalCoveragePerPosition = 10

    [target]

    captureRegions = /home/sf062971/resources/agilent_data/covered_regions_mm10_merged_sorted.bed

    Leave a comment:


  • valeu
    replied
    First, a simple and rather obvious question, if you have a control match file, does the CNA only analysis output only the somatic gain/loss regions of the sample?
    "Yes", if you don't use [BAF] and "No" if you do.

    This question arises because the CNA+BAF run outputs a CNVs file which reports genotype information and gain/loss/normal in the predicted copy number. When I filter this file to report only somatic gains/losses and compare this output to the CNA only analysis output, the results are not quite the same.
    Is this a fair comparison? Am I missing something which prevents me from understanding these results?
    I would say that it is normal that you get different results. When you use [BAF], you use more information to (1) segment your data and (2) to annotate the resulting CNAs.

    Another reason it can be different: imagine you have a region present in 3 copies in the normal and in 9 copies in the tumor. If you don't use BAF, you will get ratio 3 for this region. Since it is 3>1, this region will be called "gain". If you use [BAF], this region will be identified as "gain" for both normal and tumor samples and thus this gain will be called germline.

    Leave a comment:


  • valeu
    replied
    Originally posted by rduarte View Post
    Can someone tell me the link to download makeGraph.R?
    I´m having problems finding the most recent version of this.

    Thanks in advance
    This is the latest version so far: makeGraph.R

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 08:47 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
54 views
0 likes
Last Post seqadmin  
Working...
X