Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Displaying ChIP-seq data

    Hi all,

    I have some sequencing output from an Illumina Solexa machine. It looks like Gerald was run using "ANALYSIS eland_extended" and the output is mapped to the human genome (though I'm not completely sure). ex:

    HWI-EAS344 90324 3 86 1654 1007 0 1 TTTTGAGCCCAGGAGATTCATGCTGAAGAGCTAGG aaaa[a]]X_aYYa[^`_]`_WU]N^]SXUUX^RU chr10.fa 103778 R 35 28

    Is there any way I can quickly convert this into a format that's viewable as a histogram? Like on the UCSC genome browser? Or a program that can run through this data and produce a histogram?

    -Kevin
    Last edited by kevinlu; 04-03-2009, 02:06 PM.

  • #2
    Hi Kevin,

    That does look like an eland Extended file: that row looks like it mapped to chr10, position 103778, on the reverse strand. FindPeaks can convert these into bed files or wig files, which can be viewed using the UCSC browser. I think MACS can as well.

    Anthony
    The more you know, the more you know you don't know. —Aristotle

    Comment


    • #3
      Anthony,
      Thanks. I've been trying to use FindPeaks3.3.1.1, starting off with the 22.test.eland file and instructions you gave in the 3.2.2 manual (only online documentation I could find...a bit out of date) to run it through and display on the genome browser. Unfortunately, when I load the wig output the UCSC website keeps on giving me this error message:

      "Error File '22test_triangle_standard.wig' - track load error (track name='ct_22testduplicatesstandardlentriangle'):
      Couldn't find size of chromosome 22 (note: chrom names are case sensitive)"

      I went in and appended the wig file from "chrom=22" to "chrom=chr22" thinking it would help, but it didn't do anything. So frustrated.

      Comment


      • #4
        Hi Kevin,

        First of all, I should let you know that the whole 3.3.x line is the "unstable" line towards version 4.0. I recommend getting the 3.3.1.8 version, which as a LOT of bugs fixed, compared to 3.3.1.1, which I took off the FindPeaks web page a LONG time ago.

        I strongly recommend running a more current version. You can get them here:



        If you'd like to be notified of new releasese, I do announce it to the mailing list, (https://sourceforge.net/mail/?group_id=232586), and you can subscribe at (https://lists.sourceforge.net/lists/...ortr-findpeaks)

        To solve the problemyou're seeing above, you'll probably want to use the flag "-prepend chr". The problem you're seeing is that each fixedStep line has the name of the chromosome in it (which is the wig file standard), so you'd have to change all of the "fixedStep" lines through-out the file. Hence the -prepend option that does it for you.

        I'll add that to the manual to make it clear that it's required in the test example.

        Let me know if you run into any other problems, though. I really do try to keep on top of problems people find with the code - and I'm always happy to see it improve.

        Anthony
        The more you know, the more you know you don't know. —Aristotle

        Comment


        • #5
          I should also add that the documentation is online in a wiki for 3.3/4.0:



          You can also find it by googling FindPeaks4.
          The more you know, the more you know you don't know. —Aristotle

          Comment


          • #6
            Worked like a charm. Thank you.

            I have another data set that when run through eland (unfortunately) left unaligned reads in the file. You have outlined a quick way to get rid of them if using Linux/Unix, but we don't have any of those machines in our lab. Do you know of another simple way to do this?

            Comment


            • #7
              We aim to please. (-;

              As for removing the reads in a non linux/unix system, I'm a little stumped. (I haven't really used windows since ~2001.) I'm sure you could build an environment or get a linux/unix emulator going, although that seems a bit excessive.

              If you have access to a Mac, the instructions should work the same way.

              Although, personally, I'd just be tempted to download a liveCD for Ubuntu or another distribution and just use that to access and process the data. For the cost of burning a CD and the bandwidth, you'd probably get the biggest bang for your buck. Unfortunately, the method for doing this is pretty easy, but you'd probably be best off if there's someone nearby to help with getting it set up, since things work a little bit differently under linux than in windows. It's not hard, but different, so this might not be an ideal solution either.

              I've asked a couple of people in the lab if there's any way to do this in windows, and none of them seem to know off hand. There seem to be rumours of free grep (qgrep?) programs available, though.
              Last edited by apfejes; 04-07-2009, 11:54 AM. Reason: clarity
              The more you know, the more you know you don't know. —Aristotle

              Comment


              • #8
                Anthony, why not just include a filter on U(012) in the preprocessing, or better yet to allow direct use of .export files? Would probably increas runtime sligtly but it is plenty fast anyway.

                Comment


                • #9
                  Hi Chipper,

                  Actually, FindPeaks does already support the export file, under the anachronistic name of "elandextended". I suppose I should probably just do a complete rename on that, at this point.

                  I'm now up to about 25kloc, so occasionally I forget to go back and change strings unless someone reminds me. (-;

                  As for providing the filtering, I could do that in the SortFiles.jar. I guess I had just assumed that anyone doing bioinformatics would have access to a linux live CD or linux box these days. Bad assumption on my part! I'll make these changes when I get a chance, and hopefully include them in the next tag.
                  The more you know, the more you know you don't know. —Aristotle

                  Comment


                  • #10
                    Originally posted by apfejes View Post
                    Hi Chipper,

                    As for providing the filtering, I could do that in the SortFiles.jar. I guess I had just assumed that anyone doing bioinformatics would have access to a linux live CD or linux box these days. Bad assumption on my part! I'll make these changes when I get a chance, and hopefully include them in the next tag.
                    Probably correct assumption, it's just that a lot of non-bioinformaticians want to do ChIP-seq...

                    Kevin, if your PC has perl installed it can be fixed with a few lines, if not, install it and try to learn the basics and your (sequencing) life will be easier. As long as you don't ask Anthony for advice on it

                    Comment


                    • #11
                      (=

                      Or you could install python... but you probably still don't want to ask for my advice. I've only ever done a few simple scripts - like greping and sorting files with it. (-;

                      Say, how about this script?

                      Code:
                      import os, sys, re
                      
                      readfile = file('c:\input\filename.eland', "r")
                      writefile = file('c:\filtered_file.eland', "w")
                      
                      Unique = re.compile (r"U[012]", re.VERBOSE)
                      
                      for line in readfile:
                      	if Unique.match(line):
                      		writefile.write(line)
                      	else:
                      		pass
                      readfile.close()
                      writefile.close()
                      I should mention that I haven't actually tested this script out... use at your own risk.
                      Last edited by apfejes; 04-08-2009, 12:26 PM. Reason: disclaimer added.
                      The more you know, the more you know you don't know. —Aristotle

                      Comment


                      • #12
                        An easy way to get some linux functionality for windows is to use UnxUtils, see


                        This is easy to install and has very low overhead. Basically, you can run unix commands (like grep “U[012]” Input.eland > Input.um.eland) in the dos command window. You could also use cygwin, but that has more overhead.

                        Vince

                        Comment


                        • #13
                          Anthony, thanks for the script. It's been edited a bit and works smoothly.
                          The new script is below...with the spaces on the lines all messed up.

                          #!/usr/bin/python
                          import os, sys, re

                          files = ('F:\\path\\to\\files')
                          regex = re.compile (r"[GTAC]\tU[012]", re.VERBOSE)

                          for filepath in files:
                          rfobj = file(filepath, 'r')
                          wfobj = file(("%s_out.txt" % filepath.split('.')[0]), 'w')
                          for l in rfobj:
                          if regex.search(l): wfobj.write(l)
                          rfobj.close()
                          wfobj.close()
                          You can grep multiple files at once if desired. Just separate their paths using a comma.
                          Last edited by kevinlu; 04-13-2009, 08:34 PM. Reason: added something to the script

                          Comment


                          • #14
                            Hi Kevin,

                            Thanks - that's much cleaner than what I'd done.. As I said, I really haven't done much in python before. That's a great resource for anyone else who's looking to do filtering on eland files.
                            The more you know, the more you know you don't know. —Aristotle

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            25 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            28 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            24 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            52 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X