Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • TheSeqGeek
    Member
    • Feb 2014
    • 40

    Promoter Analysis

    I did ChIP-seq on a TF.

    Now I have a consensus binding site. ATGNNNCGCNNNCAT (whatever)

    What I want to do is predict where else this consensus binding site is aside from the ChIP sites. I used Virtual Footprint (put in sequences that made up the consensus site) and got 400 possible matches for where else such DNA sequences exist within my genome. I have the start and stop locations with respect to my fasta file.

    Now I want to take those locations and identify what genes are in the vicinity of the binding site (+/- 50 bp from these sites). I don't know how to do this except manually look through IGV.

    How can I automate this process. Thank you for your help

    I tried to use ChIP anno with R but there are issues just loading the libraries. Any perl scripts or something would be useful. Thank you
  • TheSeqGeek
    Member
    • Feb 2014
    • 40

    #2
    I tried using excel and quickly realized I need to run loops.

    Comment

    • colindaven
      Senior Member
      • Oct 2008
      • 417

      #3
      This sounds like a problem you can use Galaxy for

      Comment

      • sarvidsson
        Senior Member
        • Jan 2015
        • 137

        #4
        So you have the chromosome and start/stop for your ~400 positions? Put them in BED format (tab-separated lines with "chromosome start stop"), get a GFF/GTF file for your genome with the genes (possibly filter it with grep for the features you are interested in) and use BEDTools (a swiss army knife for all annotation comparison needs); e.g. the "closest" command:

        Comment

        • TheSeqGeek
          Member
          • Feb 2014
          • 40

          #5
          I did as "sarvidsson" suggested

          Both files contain chromosome name, start position, stop position, and name of feature/gene without headings

          Here is an example

          My list of 400 position are in the following format called "toanno.bed"
          Chromosome 2985 2998 Site1
          Chromosome 6738 6751 Site2

          My list of genes I want to match them with are in the following format called "genome.bed"
          Chromosome 351 1724 Gene1
          Chromosome 1828 2946 Gene2


          When I use the command
          closestBed -a toanno.bed -b genome.bed > features.bed

          I get a concatenated file containing both files head to tail... basically a long concatenate command...

          I figured out I am not putting into .bed format. Basically the problem is with unicode.

          Save your data with excel, which only does Unicode 16 then save it as Unicode 8. WoW ridiculous.
          Last edited by TheSeqGeek; 02-15-2015, 02:39 PM.

          Comment

          • AlliCox
            Member
            • Nov 2012
            • 10

            #6
            You could probably annotate the base pair positions using a tool that annotates lists of variants from NGS - if the position is near a gene, it would get annotated as upstream, downstream, intronic, etc. That would probably work for some of the positions. You could also align the bp positions to annotation information from 1000 genomes to find out if the site is in or near a gene.

            Comment

            • TheSeqGeek
              Member
              • Feb 2014
              • 40

              #7
              Originally posted by AlliCox View Post
              You could probably annotate the base pair positions using a tool that annotates lists of variants from NGS .
              So what's the tool?

              Comment

              • sarvidsson
                Senior Member
                • Jan 2015
                • 137

                #8
                Originally posted by TheSeqGeek View Post
                So what's the tool?
                You could use SnpEff, but then you'd need to fake some VCF to get there. BEDTools is the tool for the job.

                Comment

                • TheSeqGeek
                  Member
                  • Feb 2014
                  • 40

                  #9
                  Originally posted by sarvidsson View Post
                  You could use SnpEff, but then you'd need to fake some VCF to get there. BEDTools is the tool for the job.
                  Yeah, I already got it to work with bed tools closestBed command. Only issue was with type of text editor I was using to generate .bed file as I described for anyone else having similar issues.

                  Comment

                  Latest Articles

                  Collapse

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  30 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  38 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  42 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  64 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...