Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jorge
    Member
    • Jun 2011
    • 25

    FREEC GC content file

    Hello all,

    I'm planning to use FREEC to get CNVs without a control. For that I need to annotate GC content for consecutive 50Kb windows. I see FREEC provides such a file but it's unclear to me what does the fourth column of that file mean. This is how the file looks like

    chr window GC_content 'the fourth column'
    1 12900000 0.47042 1
    1 12950000 0.4453 0.5117

    any idea if that fourth column is relevant/mandatory?

    Thanks
    Jorge
  • valeu
    Member
    • Sep 2008
    • 69

    #2
    Sorry, I have just seen your question.

    The forth column represents mappability.

    Comment

    • juanjingmiao
      Junior Member
      • Aug 2013
      • 8

      #3
      Originally posted by valeu View Post
      Sorry, I have just seen your question.

      The forth column represents mappability.
      doesn't the fourth column mean percentage of ACGT-letter per window (1-poly(N)%) ?



      thanks!

      Comment

      • valeu
        Member
        • Sep 2008
        • 69

        #4
        Right, it does. The 5th column must "mappability". But if the 5th column is empty FREEC will use (1-poly(N)% as mappability.

        Comment

        • juanjingmiao
          Junior Member
          • Aug 2013
          • 8

          #5
          I used FREEC to get the CNV without a control. The result is :
          Chr1 0 101199 3 gain
          Chr1 98600 124699 0 loss
          Chr1 122100 156299 3 gain

          window=2700;
          my question is :124699 >122100, this is not a error based on the freec's algorithm. But the breakpoint isn't continuous. I am planning to filter the result, but how to do it ?

          Thank you very much!

          Comment

          • valeu
            Member
            • Sep 2008
            • 69

            #6
            This can happen if step<window. You may use the first coordinates of regions: in your case, 98600 and 122100.

            Comment

            • juanjingmiao
              Junior Member
              • Aug 2013
              • 8

              #7
              Originally posted by valeu View Post
              This can happen if step<window. You may use the first coordinates of regions: in your case, 98600 and 122100.
              Thanks for your help!

              if step >window, the step have no effect, doesn't it ?

              Comment

              • valeu
                Member
                • Sep 2008
                • 69

                #8
                I hope so
                But it is better to set step equal to window or to delete or comment it.

                Comment

                • ndeshpan
                  Member
                  • Nov 2009
                  • 29

                  #9
                  Hi,

                  I am trying to use Control-FREEC for fungal genomes..

                  My config file looks like this
                  ======
                  [general]

                  chrFiles = individual_Chr_fastaFiles
                  chrLenFile = res_Cryptococcus_gattii_WM276_ALL_chromosomes_NumericalChrNumbers.fna
                  coefficientOfVariation = 0.062
                  ploidy = 1
                  outputDir = ./
                  maxThreads = 12
                  BedGraphOutput = TRUE
                  minCNAlength = 4
                  GCcontentProfile = 1

                  [sample]

                  mateFile = bwa_S1_vs_WM276_reference.sam
                  inputFormat = SAM
                  mateOrientation = FR

                  [control]

                  ======

                  using the perl script get_fasta_lengths.pl

                  I have got my chromosome lengths file

                  Chr1 1984823
                  Chr2 2187695
                  Chr3 1961512
                  Chr4 2233618
                  Chr5 1333124
                  Chr6 1325755
                  Chr7 1324677
                  Chr8 1265488
                  Chr9 989306
                  Chr10 522727
                  Chr11 1040760
                  Chr12 820445
                  Chr13 705823
                  Chr14 679007

                  (It does not have the first column when compared to the sample file in the manual..Anyways I have tried changing that as well..


                  When I run my script it keeps giving me the error


                  ---------------------------------------------------------
                  Control-FREEC v7.0 : calling copy number alterations and LOH regions using deep-sequencing data
                  MT-mode using 12 threads
                  ..Minimal CNA length (in windows) is 4
                  ..Breakpoint threshold for segmentation of copy number profiles is 0.8
                  ..Polynomial degree for "ReadCount ~ GC-content" or "Sample ReadCount ~ Control ReadCount" is 3
                  ..telocenromeric set to 50000
                  ..FREEC is not going to adjust profiles for a possible contamination by normal cells
                  ..Coefficient Of Variation set equal to 0.062
                  ..it will be used to evaluate window size
                  ..Output directory: ./
                  ..Directory with files containing chromosome sequences: /share/bioinfo/nandan/scratch02/DCarter/Stage_2/Cryptococcus_gattii_WM276_reference/for_Control_FreeC_CNV
                  ..Sample file: /share/bioinfo/nandan/scratch02/DCarter/Stage_2/CNV/Control_FreeC_CNV/bwa_S1_vs_WM276_reference.sam
                  ..Sample input format: SAM
                  ..minimal expected GC-content (general parameter "minExpectedGC") was set to 0.35
                  ..maximal expected GC-content (general parameter "maxExpectedGC") was set to 0.55
                  ..File with chromosome lengths: /share/bioinfo/nandan/scratch02/DCarter/Stage_2/CNV/Control_FreeC_CNV/res_Cryptococcus_gattii_WM276_ALL_chromosomes_NumericalChrNumbers.fna
                  ..Using the default minimal mappability value of 0.85
                  ..uniqueMatch = FALSE
                  ..average ploidy set to 1
                  ..break-point type set to 2
                  ..noisyData set to 0
                  ..File /share/bioinfo/nandan/scratch02/DCarter/Stage_2/CNV/Control_FreeC_CNV/res_Cryptococcus_gattii_WM276_ALL_chromosomes_NumericalChrNumbers.fna was read
                  total genome size: 1.83748e+07
                  read number: 6076119
                  coefficientOfVariation: 0.062
                  evaluated window size: 787
                  ..Starting reading /share/bioinfo/nandan/scratch02/DCarter/Stage_2/CNV/Control_FreeC_CNV/bwa_S1_vs_WM276_reference.sam
                  PROFILING [tid=47829959698848]: /share/bioinfo/nandan/scratch02/DCarter/Stage_2/CNV/Control_FreeC_CNV/bwa_S1_vs_WM276_reference.sam read in 10 seconds [fillMyHash]
                  6076119 lines read..
                  4666119 reads used to compute copy number profile
                  printing counts into ./bwa_S1_vs_WM276_reference.sam_sample.cpn
                  ..Window size: 787
                  ..using GC-content to normalize copy number profiles
                  Unable to open file 1
                  ---------------------------------------------------------

                  When I place my chromosome files in the current directory I get a different error

                  -----
                  Control-FREEC v7.0 : calling copy number alterations and LOH regions using deep-sequencing data
                  MT-mode using 12 threads
                  ..Minimal CNA length (in windows) is 4
                  ..Breakpoint threshold for segmentation of copy number profiles is 0.8
                  ..Polynomial degree for "ReadCount ~ GC-content" or "Sample ReadCount ~ Control ReadCount" is 3
                  ..telocenromeric set to 50000
                  ..FREEC is not going to adjust profiles for a possible contamination by normal cells
                  ..Coefficient Of Variation set equal to 0.062
                  ..it will be used to evaluate window size
                  ..Output directory: ./
                  ..Directory with files containing chromosome sequences: individual_Chr_fastaFiles
                  ..Sample file: bwa_S1_vs_WM276_reference.sam
                  ..Sample input format: SAM
                  ..minimal expected GC-content (general parameter "minExpectedGC") was set to 0.35
                  ..maximal expected GC-content (general parameter "maxExpectedGC") was set to 0.55
                  ..File with chromosome lengths: res_Cryptococcus_gattii_WM276_ALL_chromosomes_NumericalChrNumbers.fna
                  ..Using the default minimal mappability value of 0.85
                  ..uniqueMatch = FALSE
                  ..average ploidy set to 1
                  ..break-point type set to 2
                  ..noisyData set to 0
                  ..File res_Cryptococcus_gattii_WM276_ALL_chromosomes_NumericalChrNumbers.fna was read
                  total genome size: 1.83748e+07
                  read number: 6076119
                  coefficientOfVariation: 0.062
                  evaluated window size: 787
                  ..Starting reading bwa_S1_vs_WM276_reference.sam
                  PROFILING [tid=47981594524064]: bwa_S1_vs_WM276_reference.sam read in 10 seconds [fillMyHash]
                  6076119 lines read..
                  4666119 reads used to compute copy number profile
                  printing counts into ./bwa_S1_vs_WM276_reference.sam_sample.cpn
                  ..Window size: 787
                  ..using GC-content to normalize copy number profiles
                  file 1 is read
                  Your GC-content file 1 is empty or is in a wrong format

                  Please use chomosome sequences (option "chrFiles") to recreate it!
                  ----

                  My individual chromosome files look like this

                  >Chr1
                  AAAGTAACCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCCTA
                  ACCCCTAACCCCTAAGTTCAAGGCTGTTGTCATGTCCACAGGGGGGCTAGTGAGCAGGGAAACAGCAGAT
                  GAGATGCGAAGATGGAGGAAGGAGATGAGCCCAGCGGTATTTGAGCGGATGATGAGGAAAATTAGTCTGG
                  AATTAGTGAGGGCAAGAGCTAGGACGTTTGCGATGTGAGGAGGAAGGAGAGGGTAAAGAAGTGACTGTAG
                  ATATAGAAATAAAAACAGAAAATCATTAAATACAAAAGCTCGACACAAAACCAATAAACTAGAAAGCTTT
                  TATGTATCCTCTTTCACCCTTCCAGTAGATTAAATGCATTGGCCCCAAACCAGCTTAGTCTTCTCACAAA
                  GCTCAGTCTCGCTGAACTCGTGCTCAAAACACAGGATATCCCTTTAAAACTGCAGACAAACCACCGAGGA
                  CCACCAAACTATAGTAGCATCACGTGCCTCCGACTTCTATGAGGTTAAGGCATCGCTACGAAAAATGTCT
                  TTTTCTTTACGATGACTCCGATCTCCTGCTCCGCTTCTTTGTGGTCTCAGGCACCGTACTCTTCTCCCCT



                  It could be a small thing I am missing ,, not sure. Can anyone help?

                  regards,

                  Nandan

                  Comment

                  • valeu
                    Member
                    • Sep 2008
                    • 69

                    #10
                    Originally posted by ndeshpan View Post
                    Hi,

                    I am trying to use Control-FREEC for fungal genomes..

                    My config file looks like this
                    ======
                    [general]

                    GCcontentProfile = 1



                    When I run my script it keeps giving me the error


                    ..using GC-content to normalize copy number profiles
                    Unable to open file 1
                    ---------------------------------------------------------

                    When I place my chromosome files in the current directory I get a different error


                    ..using GC-content to normalize copy number profiles
                    file 1 is read
                    Your GC-content file 1 is empty or is in a wrong format

                    Please use chomosome sequences (option "chrFiles") to recreate it!

                    Nandan
                    GCcontentProfile should be a file with GC-content per window (and not a value TRUE or FALSE). Just remove this line to create the GC-content profile using fasta files of your chromosomes.

                    Check http://bioinfo-out.curie.fr/projects...al.html#CONFIG

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Pathogen Surveillance with Advanced Genomic Tools
                      by seqadmin




                      The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                      Yesterday, 11:48 AM
                    • seqadmin
                      New Genomics Tools and Methods Shared at AGBT 2025
                      by seqadmin


                      This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                      The Headliner
                      The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                      03-03-2025, 01:39 PM
                    • seqadmin
                      Investigating the Gut Microbiome Through Diet and Spatial Biology
                      by seqadmin




                      The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                      02-24-2025, 06:31 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-20-2025, 05:03 AM
                    0 responses
                    37 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-19-2025, 07:27 AM
                    0 responses
                    44 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-18-2025, 12:50 PM
                    0 responses
                    35 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-03-2025, 01:15 PM
                    0 responses
                    191 views
                    0 reactions
                    Last Post seqadmin  
                    Working...