Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • HelenM
    Junior Member
    • Nov 2011
    • 4

    Programs for GC content and CpG Islands

    Hi everyone,

    I am interested in determining G+C rich regions in a whole genome sequence as well as identifying possible CpG Islands.

    Can anyone recommend their favourite resources for either of these tasks?

    So far, for G+C content, I have tried Picard's CollectGCBiasMetrics (doesn't give me the right info) and GATK's GCContentByInterval walker (gives me a persistent error message) and I am just in the process of trying to run GCProfile.

    If anyone has used the GCContentByInterval walker could you perhaps give me an example of your code so that I might be able to compare and see where mine is going wrong.

    For CpG Islands I have found 'CpGIslands' but have not yet tried it.

    I am new to programming so any help would be much appreciated.

    Many thanks
    Helen
  • PeteH
    Member
    • Jun 2010
    • 64

    #2
    If you are interested in identifying CpG islands I can recommend reading Wu et al. Biostatistics (2010) (http://www.ncbi.nlm.nih.gov/pubmed/20212320). The paper argues that some common definitions of CpG islands are too restrictive (such as the definition used by the UCSC genome browser). The authors develop a hidden Markov model to define CpG islands for arbitrary genomes.

    The paper is accompanied by software that implements their method and tables of pre-computed CpG islands using their software for many popular genomes (see http://rafalab.jhsph.edu/CGI/index.html).
    Pete

    Comment

    • HelenM
      Junior Member
      • Nov 2011
      • 4

      #3
      Pete,

      Great, I think this will be very useful indeed!
      I had been trying to find an existing set of CpG Islands for Bos taurus as well.
      Many thanks!

      Comment

      • jamal
        Member
        • Jan 2010
        • 10

        #4
        Hi Helen

        I used "makeCGI" for Sus scrofa and get .rda file in the result folder. I want to know that if you used this software for Bos taurus and how you extract the result from .rda file.
        thank you in advance

        Jamal

        Comment

        • cjp
          Member
          • Jun 2011
          • 58

          #5
          The GATK command worked for me (did you make the picard ".dict" file for your reference fasta file?):

          % java -Xmx2g -Djava.io.tmpdir=/path/to/tmp -jar /path/to/GenomeAnalysisTK-1.1-23-g8072bd9/GenomeAnalysisTK.jar -T GCContentByInterval -R /path/to/human_g1k_v37.fasta -L 1:1-100000 -o chr1_1_100000_gc.txt

          ...

          % cat chr1_1_100000_gc.txt
          1:1-100000 0.38207

          Chris

          Comment

          • jamal
            Member
            • Jan 2010
            • 10

            #6
            Hi chris

            I didn't make the picard file for my genome. please tell me how can I do that.
            and plaese tell me more about GATK.

            thanks alot

            Jamal

            Comment

            • cjp
              Member
              • Jun 2011
              • 58

              #7
              There is a link here about making the picard dict file for GATK:



              Download the latest picard from here into a new directory (for me $HOME/src on a Linux machine) and unzip it:



              Something like this works for me:

              java -jar /home/cjp64/src/picard-tools-1.53/CreateSequenceDictionary.jar R=/data/refs/archive/hg19/bowtie/hg19.fasta O=/data/refs/archive/hg19/bowtie/hg19.dict

              GATK help starts here (it's on many pages though and is more for doing SNP calls):



              Chris

              Comment

              • oria34
                Member
                • Feb 2013
                • 15

                #8
                Hi all,

                Did anyone try "makeCGI" recently?

                I am having some problems with this package.

                First, It finds a lot of troubles reading chromosome/scaffold headers from the the fasta files and crash. I reduced the headers just to chromosome/scaffold (deleting the rest of the stuff) name and it seemed to work but then crashed with a new warning message:

                Warning message:
                In rm(pattern = "Ngc") : object 'Ngc' not found

                Apparently, It doesn't like too much to find "Ns" along the sequence.

                IT creates the result file but apparently it is empty.

                Any suggestions? I am really new with all these stuff so any advice will be very welcome

                Thanks in advance

                jamal, Maybe is a bit late, but I have found this to convert RDA to CSV I though it might be useful for other people

                Comment

                • jfeicheng
                  Junior Member
                  • Feb 2014
                  • 2

                  #9
                  makeCGIbject 'Ngc' not found

                  Hi
                  I've tried this program recently, but I met the same problem like you.

                  Warning message:
                  In rm(pattern = "Ngc") : object 'Ngc' not found

                  I want to know if you find any solutions for this program.
                  Thank you in advance.

                  Originally posted by oria34 View Post
                  Hi all,

                  Did anyone try "makeCGI" recently?

                  I am having some problems with this package.

                  First, It finds a lot of troubles reading chromosome/scaffold headers from the the fasta files and crash. I reduced the headers just to chromosome/scaffold (deleting the rest of the stuff) name and it seemed to work but then crashed with a new warning message:

                  Warning message:
                  In rm(pattern = "Ngc") : object 'Ngc' not found

                  Apparently, It doesn't like too much to find "Ns" along the sequence.

                  IT creates the result file but apparently it is empty.

                  Any suggestions? I am really new with all these stuff so any advice will be very welcome

                  Thanks in advance

                  jamal, Maybe is a bit late, but I have found this to convert RDA to CSV I though it might be useful for other people

                  Comment

                  Latest Articles

                  Collapse

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  22 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  27 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  38 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  61 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...