Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FREEC GC content file

    Hello all,

    I'm planning to use FREEC to get CNVs without a control. For that I need to annotate GC content for consecutive 50Kb windows. I see FREEC provides such a file but it's unclear to me what does the fourth column of that file mean. This is how the file looks like

    chr window GC_content 'the fourth column'
    1 12900000 0.47042 1
    1 12950000 0.4453 0.5117

    any idea if that fourth column is relevant/mandatory?

    Thanks
    Jorge

  • #2
    Sorry, I have just seen your question.

    The forth column represents mappability.

    Comment


    • #3
      Originally posted by valeu View Post
      Sorry, I have just seen your question.

      The forth column represents mappability.
      doesn't the fourth column mean percentage of ACGT-letter per window (1-poly(N)%) ?



      thanks!

      Comment


      • #4
        Right, it does. The 5th column must "mappability". But if the 5th column is empty FREEC will use (1-poly(N)% as mappability.

        Comment


        • #5
          I used FREEC to get the CNV without a control. The result is :
          Chr1 0 101199 3 gain
          Chr1 98600 124699 0 loss
          Chr1 122100 156299 3 gain

          window=2700;
          my question is :124699 >122100, this is not a error based on the freec's algorithm. But the breakpoint isn't continuous. I am planning to filter the result, but how to do it ?

          Thank you very much!

          Comment


          • #6
            This can happen if step<window. You may use the first coordinates of regions: in your case, 98600 and 122100.

            Comment


            • #7
              Originally posted by valeu View Post
              This can happen if step<window. You may use the first coordinates of regions: in your case, 98600 and 122100.
              Thanks for your help!

              if step >window, the step have no effect, doesn't it ?

              Comment


              • #8
                I hope so
                But it is better to set step equal to window or to delete or comment it.

                Comment


                • #9
                  Hi,

                  I am trying to use Control-FREEC for fungal genomes..

                  My config file looks like this
                  ======
                  [general]

                  chrFiles = individual_Chr_fastaFiles
                  chrLenFile = res_Cryptococcus_gattii_WM276_ALL_chromosomes_NumericalChrNumbers.fna
                  coefficientOfVariation = 0.062
                  ploidy = 1
                  outputDir = ./
                  maxThreads = 12
                  BedGraphOutput = TRUE
                  minCNAlength = 4
                  GCcontentProfile = 1

                  [sample]

                  mateFile = bwa_S1_vs_WM276_reference.sam
                  inputFormat = SAM
                  mateOrientation = FR

                  [control]

                  ======

                  using the perl script get_fasta_lengths.pl

                  I have got my chromosome lengths file

                  Chr1 1984823
                  Chr2 2187695
                  Chr3 1961512
                  Chr4 2233618
                  Chr5 1333124
                  Chr6 1325755
                  Chr7 1324677
                  Chr8 1265488
                  Chr9 989306
                  Chr10 522727
                  Chr11 1040760
                  Chr12 820445
                  Chr13 705823
                  Chr14 679007

                  (It does not have the first column when compared to the sample file in the manual..Anyways I have tried changing that as well..


                  When I run my script it keeps giving me the error


                  ---------------------------------------------------------
                  Control-FREEC v7.0 : calling copy number alterations and LOH regions using deep-sequencing data
                  MT-mode using 12 threads
                  ..Minimal CNA length (in windows) is 4
                  ..Breakpoint threshold for segmentation of copy number profiles is 0.8
                  ..Polynomial degree for "ReadCount ~ GC-content" or "Sample ReadCount ~ Control ReadCount" is 3
                  ..telocenromeric set to 50000
                  ..FREEC is not going to adjust profiles for a possible contamination by normal cells
                  ..Coefficient Of Variation set equal to 0.062
                  ..it will be used to evaluate window size
                  ..Output directory: ./
                  ..Directory with files containing chromosome sequences: /share/bioinfo/nandan/scratch02/DCarter/Stage_2/Cryptococcus_gattii_WM276_reference/for_Control_FreeC_CNV
                  ..Sample file: /share/bioinfo/nandan/scratch02/DCarter/Stage_2/CNV/Control_FreeC_CNV/bwa_S1_vs_WM276_reference.sam
                  ..Sample input format: SAM
                  ..minimal expected GC-content (general parameter "minExpectedGC") was set to 0.35
                  ..maximal expected GC-content (general parameter "maxExpectedGC") was set to 0.55
                  ..File with chromosome lengths: /share/bioinfo/nandan/scratch02/DCarter/Stage_2/CNV/Control_FreeC_CNV/res_Cryptococcus_gattii_WM276_ALL_chromosomes_NumericalChrNumbers.fna
                  ..Using the default minimal mappability value of 0.85
                  ..uniqueMatch = FALSE
                  ..average ploidy set to 1
                  ..break-point type set to 2
                  ..noisyData set to 0
                  ..File /share/bioinfo/nandan/scratch02/DCarter/Stage_2/CNV/Control_FreeC_CNV/res_Cryptococcus_gattii_WM276_ALL_chromosomes_NumericalChrNumbers.fna was read
                  total genome size: 1.83748e+07
                  read number: 6076119
                  coefficientOfVariation: 0.062
                  evaluated window size: 787
                  ..Starting reading /share/bioinfo/nandan/scratch02/DCarter/Stage_2/CNV/Control_FreeC_CNV/bwa_S1_vs_WM276_reference.sam
                  PROFILING [tid=47829959698848]: /share/bioinfo/nandan/scratch02/DCarter/Stage_2/CNV/Control_FreeC_CNV/bwa_S1_vs_WM276_reference.sam read in 10 seconds [fillMyHash]
                  6076119 lines read..
                  4666119 reads used to compute copy number profile
                  printing counts into ./bwa_S1_vs_WM276_reference.sam_sample.cpn
                  ..Window size: 787
                  ..using GC-content to normalize copy number profiles
                  Unable to open file 1
                  ---------------------------------------------------------

                  When I place my chromosome files in the current directory I get a different error

                  -----
                  Control-FREEC v7.0 : calling copy number alterations and LOH regions using deep-sequencing data
                  MT-mode using 12 threads
                  ..Minimal CNA length (in windows) is 4
                  ..Breakpoint threshold for segmentation of copy number profiles is 0.8
                  ..Polynomial degree for "ReadCount ~ GC-content" or "Sample ReadCount ~ Control ReadCount" is 3
                  ..telocenromeric set to 50000
                  ..FREEC is not going to adjust profiles for a possible contamination by normal cells
                  ..Coefficient Of Variation set equal to 0.062
                  ..it will be used to evaluate window size
                  ..Output directory: ./
                  ..Directory with files containing chromosome sequences: individual_Chr_fastaFiles
                  ..Sample file: bwa_S1_vs_WM276_reference.sam
                  ..Sample input format: SAM
                  ..minimal expected GC-content (general parameter "minExpectedGC") was set to 0.35
                  ..maximal expected GC-content (general parameter "maxExpectedGC") was set to 0.55
                  ..File with chromosome lengths: res_Cryptococcus_gattii_WM276_ALL_chromosomes_NumericalChrNumbers.fna
                  ..Using the default minimal mappability value of 0.85
                  ..uniqueMatch = FALSE
                  ..average ploidy set to 1
                  ..break-point type set to 2
                  ..noisyData set to 0
                  ..File res_Cryptococcus_gattii_WM276_ALL_chromosomes_NumericalChrNumbers.fna was read
                  total genome size: 1.83748e+07
                  read number: 6076119
                  coefficientOfVariation: 0.062
                  evaluated window size: 787
                  ..Starting reading bwa_S1_vs_WM276_reference.sam
                  PROFILING [tid=47981594524064]: bwa_S1_vs_WM276_reference.sam read in 10 seconds [fillMyHash]
                  6076119 lines read..
                  4666119 reads used to compute copy number profile
                  printing counts into ./bwa_S1_vs_WM276_reference.sam_sample.cpn
                  ..Window size: 787
                  ..using GC-content to normalize copy number profiles
                  file 1 is read
                  Your GC-content file 1 is empty or is in a wrong format

                  Please use chomosome sequences (option "chrFiles") to recreate it!
                  ----

                  My individual chromosome files look like this

                  >Chr1
                  AAAGTAACCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCCTA
                  ACCCCTAACCCCTAAGTTCAAGGCTGTTGTCATGTCCACAGGGGGGCTAGTGAGCAGGGAAACAGCAGAT
                  GAGATGCGAAGATGGAGGAAGGAGATGAGCCCAGCGGTATTTGAGCGGATGATGAGGAAAATTAGTCTGG
                  AATTAGTGAGGGCAAGAGCTAGGACGTTTGCGATGTGAGGAGGAAGGAGAGGGTAAAGAAGTGACTGTAG
                  ATATAGAAATAAAAACAGAAAATCATTAAATACAAAAGCTCGACACAAAACCAATAAACTAGAAAGCTTT
                  TATGTATCCTCTTTCACCCTTCCAGTAGATTAAATGCATTGGCCCCAAACCAGCTTAGTCTTCTCACAAA
                  GCTCAGTCTCGCTGAACTCGTGCTCAAAACACAGGATATCCCTTTAAAACTGCAGACAAACCACCGAGGA
                  CCACCAAACTATAGTAGCATCACGTGCCTCCGACTTCTATGAGGTTAAGGCATCGCTACGAAAAATGTCT
                  TTTTCTTTACGATGACTCCGATCTCCTGCTCCGCTTCTTTGTGGTCTCAGGCACCGTACTCTTCTCCCCT



                  It could be a small thing I am missing ,, not sure. Can anyone help?

                  regards,

                  Nandan

                  Comment


                  • #10
                    Originally posted by ndeshpan View Post
                    Hi,

                    I am trying to use Control-FREEC for fungal genomes..

                    My config file looks like this
                    ======
                    [general]

                    GCcontentProfile = 1



                    When I run my script it keeps giving me the error


                    ..using GC-content to normalize copy number profiles
                    Unable to open file 1
                    ---------------------------------------------------------

                    When I place my chromosome files in the current directory I get a different error


                    ..using GC-content to normalize copy number profiles
                    file 1 is read
                    Your GC-content file 1 is empty or is in a wrong format

                    Please use chomosome sequences (option "chrFiles") to recreate it!

                    Nandan
                    GCcontentProfile should be a file with GC-content per window (and not a value TRUE or FALSE). Just remove this line to create the GC-content profile using fasta files of your chromosomes.

                    Check http://bioinfo-out.curie.fr/projects...al.html#CONFIG

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      Yesterday, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    55 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    45 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    55 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X