Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Promoter Analysis

    I did ChIP-seq on a TF.

    Now I have a consensus binding site. ATGNNNCGCNNNCAT (whatever)

    What I want to do is predict where else this consensus binding site is aside from the ChIP sites. I used Virtual Footprint (put in sequences that made up the consensus site) and got 400 possible matches for where else such DNA sequences exist within my genome. I have the start and stop locations with respect to my fasta file.

    Now I want to take those locations and identify what genes are in the vicinity of the binding site (+/- 50 bp from these sites). I don't know how to do this except manually look through IGV.

    How can I automate this process. Thank you for your help

    I tried to use ChIP anno with R but there are issues just loading the libraries. Any perl scripts or something would be useful. Thank you

  • #2
    I tried using excel and quickly realized I need to run loops.

    Comment


    • #3
      This sounds like a problem you can use Galaxy for

      Comment


      • #4
        So you have the chromosome and start/stop for your ~400 positions? Put them in BED format (tab-separated lines with "chromosome start stop"), get a GFF/GTF file for your genome with the genes (possibly filter it with grep for the features you are interested in) and use BEDTools (a swiss army knife for all annotation comparison needs); e.g. the "closest" command:

        Comment


        • #5
          I did as "sarvidsson" suggested

          Both files contain chromosome name, start position, stop position, and name of feature/gene without headings

          Here is an example

          My list of 400 position are in the following format called "toanno.bed"
          Chromosome 2985 2998 Site1
          Chromosome 6738 6751 Site2

          My list of genes I want to match them with are in the following format called "genome.bed"
          Chromosome 351 1724 Gene1
          Chromosome 1828 2946 Gene2


          When I use the command
          closestBed -a toanno.bed -b genome.bed > features.bed

          I get a concatenated file containing both files head to tail... basically a long concatenate command...

          I figured out I am not putting into .bed format. Basically the problem is with unicode.

          Save your data with excel, which only does Unicode 16 then save it as Unicode 8. WoW ridiculous.
          Last edited by TheSeqGeek; 02-15-2015, 02:39 PM.

          Comment


          • #6
            You could probably annotate the base pair positions using a tool that annotates lists of variants from NGS - if the position is near a gene, it would get annotated as upstream, downstream, intronic, etc. That would probably work for some of the positions. You could also align the bp positions to annotation information from 1000 genomes to find out if the site is in or near a gene.

            Comment


            • #7
              Originally posted by AlliCox View Post
              You could probably annotate the base pair positions using a tool that annotates lists of variants from NGS .
              So what's the tool?

              Comment


              • #8
                Originally posted by TheSeqGeek View Post
                So what's the tool?
                You could use SnpEff, but then you'd need to fake some VCF to get there. BEDTools is the tool for the job.

                Comment


                • #9
                  Originally posted by sarvidsson View Post
                  You could use SnpEff, but then you'd need to fake some VCF to get there. BEDTools is the tool for the job.
                  Yeah, I already got it to work with bed tools closestBed command. Only issue was with type of text editor I was using to generate .bed file as I described for anyone else having similar issues.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  23 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X