Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kevinlu
    Junior Member
    • Oct 2008
    • 6

    Displaying ChIP-seq data

    Hi all,

    I have some sequencing output from an Illumina Solexa machine. It looks like Gerald was run using "ANALYSIS eland_extended" and the output is mapped to the human genome (though I'm not completely sure). ex:

    HWI-EAS344 90324 3 86 1654 1007 0 1 TTTTGAGCCCAGGAGATTCATGCTGAAGAGCTAGG aaaa[a]]X_aYYa[^`_]`_WU]N^]SXUUX^RU chr10.fa 103778 R 35 28

    Is there any way I can quickly convert this into a format that's viewable as a histogram? Like on the UCSC genome browser? Or a program that can run through this data and produce a histogram?

    -Kevin
    Last edited by kevinlu; 04-03-2009, 02:06 PM.
  • apfejes
    Senior Member
    • Feb 2008
    • 236

    #2
    Hi Kevin,

    That does look like an eland Extended file: that row looks like it mapped to chr10, position 103778, on the reverse strand. FindPeaks can convert these into bed files or wig files, which can be viewed using the UCSC browser. I think MACS can as well.

    Anthony
    The more you know, the more you know you don't know. —Aristotle

    Comment

    • kevinlu
      Junior Member
      • Oct 2008
      • 6

      #3
      Anthony,
      Thanks. I've been trying to use FindPeaks3.3.1.1, starting off with the 22.test.eland file and instructions you gave in the 3.2.2 manual (only online documentation I could find...a bit out of date) to run it through and display on the genome browser. Unfortunately, when I load the wig output the UCSC website keeps on giving me this error message:

      "Error File '22test_triangle_standard.wig' - track load error (track name='ct_22testduplicatesstandardlentriangle'):
      Couldn't find size of chromosome 22 (note: chrom names are case sensitive)"

      I went in and appended the wig file from "chrom=22" to "chrom=chr22" thinking it would help, but it didn't do anything. So frustrated.

      Comment

      • apfejes
        Senior Member
        • Feb 2008
        • 236

        #4
        Hi Kevin,

        First of all, I should let you know that the whole 3.3.x line is the "unstable" line towards version 4.0. I recommend getting the 3.3.1.8 version, which as a LOT of bugs fixed, compared to 3.3.1.1, which I took off the FindPeaks web page a LONG time ago.

        I strongly recommend running a more current version. You can get them here:



        If you'd like to be notified of new releasese, I do announce it to the mailing list, (https://sourceforge.net/mail/?group_id=232586), and you can subscribe at (https://lists.sourceforge.net/lists/...ortr-findpeaks)

        To solve the problemyou're seeing above, you'll probably want to use the flag "-prepend chr". The problem you're seeing is that each fixedStep line has the name of the chromosome in it (which is the wig file standard), so you'd have to change all of the "fixedStep" lines through-out the file. Hence the -prepend option that does it for you.

        I'll add that to the manual to make it clear that it's required in the test example.

        Let me know if you run into any other problems, though. I really do try to keep on top of problems people find with the code - and I'm always happy to see it improve.

        Anthony
        The more you know, the more you know you don't know. —Aristotle

        Comment

        • apfejes
          Senior Member
          • Feb 2008
          • 236

          #5
          I should also add that the documentation is online in a wiki for 3.3/4.0:



          You can also find it by googling FindPeaks4.
          The more you know, the more you know you don't know. —Aristotle

          Comment

          • kevinlu
            Junior Member
            • Oct 2008
            • 6

            #6
            Worked like a charm. Thank you.

            I have another data set that when run through eland (unfortunately) left unaligned reads in the file. You have outlined a quick way to get rid of them if using Linux/Unix, but we don't have any of those machines in our lab. Do you know of another simple way to do this?

            Comment

            • apfejes
              Senior Member
              • Feb 2008
              • 236

              #7
              We aim to please. (-;

              As for removing the reads in a non linux/unix system, I'm a little stumped. (I haven't really used windows since ~2001.) I'm sure you could build an environment or get a linux/unix emulator going, although that seems a bit excessive.

              If you have access to a Mac, the instructions should work the same way.

              Although, personally, I'd just be tempted to download a liveCD for Ubuntu or another distribution and just use that to access and process the data. For the cost of burning a CD and the bandwidth, you'd probably get the biggest bang for your buck. Unfortunately, the method for doing this is pretty easy, but you'd probably be best off if there's someone nearby to help with getting it set up, since things work a little bit differently under linux than in windows. It's not hard, but different, so this might not be an ideal solution either.

              I've asked a couple of people in the lab if there's any way to do this in windows, and none of them seem to know off hand. There seem to be rumours of free grep (qgrep?) programs available, though.
              Last edited by apfejes; 04-07-2009, 11:54 AM. Reason: clarity
              The more you know, the more you know you don't know. —Aristotle

              Comment

              • Chipper
                Senior Member
                • Mar 2008
                • 323

                #8
                Anthony, why not just include a filter on U(012) in the preprocessing, or better yet to allow direct use of .export files? Would probably increas runtime sligtly but it is plenty fast anyway.

                Comment

                • apfejes
                  Senior Member
                  • Feb 2008
                  • 236

                  #9
                  Hi Chipper,

                  Actually, FindPeaks does already support the export file, under the anachronistic name of "elandextended". I suppose I should probably just do a complete rename on that, at this point.

                  I'm now up to about 25kloc, so occasionally I forget to go back and change strings unless someone reminds me. (-;

                  As for providing the filtering, I could do that in the SortFiles.jar. I guess I had just assumed that anyone doing bioinformatics would have access to a linux live CD or linux box these days. Bad assumption on my part! I'll make these changes when I get a chance, and hopefully include them in the next tag.
                  The more you know, the more you know you don't know. —Aristotle

                  Comment

                  • Chipper
                    Senior Member
                    • Mar 2008
                    • 323

                    #10
                    Originally posted by apfejes View Post
                    Hi Chipper,

                    As for providing the filtering, I could do that in the SortFiles.jar. I guess I had just assumed that anyone doing bioinformatics would have access to a linux live CD or linux box these days. Bad assumption on my part! I'll make these changes when I get a chance, and hopefully include them in the next tag.
                    Probably correct assumption, it's just that a lot of non-bioinformaticians want to do ChIP-seq...

                    Kevin, if your PC has perl installed it can be fixed with a few lines, if not, install it and try to learn the basics and your (sequencing) life will be easier. As long as you don't ask Anthony for advice on it

                    Comment

                    • apfejes
                      Senior Member
                      • Feb 2008
                      • 236

                      #11
                      (=

                      Or you could install python... but you probably still don't want to ask for my advice. I've only ever done a few simple scripts - like greping and sorting files with it. (-;

                      Say, how about this script?

                      Code:
                      import os, sys, re
                      
                      readfile = file('c:\input\filename.eland', "r")
                      writefile = file('c:\filtered_file.eland', "w")
                      
                      Unique = re.compile (r"U[012]", re.VERBOSE)
                      
                      for line in readfile:
                      	if Unique.match(line):
                      		writefile.write(line)
                      	else:
                      		pass
                      readfile.close()
                      writefile.close()
                      I should mention that I haven't actually tested this script out... use at your own risk.
                      Last edited by apfejes; 04-08-2009, 12:26 PM. Reason: disclaimer added.
                      The more you know, the more you know you don't know. —Aristotle

                      Comment

                      • vschulz
                        Junior Member
                        • Apr 2009
                        • 8

                        #12
                        An easy way to get some linux functionality for windows is to use UnxUtils, see


                        This is easy to install and has very low overhead. Basically, you can run unix commands (like grep “U[012]” Input.eland > Input.um.eland) in the dos command window. You could also use cygwin, but that has more overhead.

                        Vince

                        Comment

                        • kevinlu
                          Junior Member
                          • Oct 2008
                          • 6

                          #13
                          Anthony, thanks for the script. It's been edited a bit and works smoothly.
                          The new script is below...with the spaces on the lines all messed up.

                          #!/usr/bin/python
                          import os, sys, re

                          files = ('F:\\path\\to\\files')
                          regex = re.compile (r"[GTAC]\tU[012]", re.VERBOSE)

                          for filepath in files:
                          rfobj = file(filepath, 'r')
                          wfobj = file(("%s_out.txt" % filepath.split('.')[0]), 'w')
                          for l in rfobj:
                          if regex.search(l): wfobj.write(l)
                          rfobj.close()
                          wfobj.close()
                          You can grep multiple files at once if desired. Just separate their paths using a comma.
                          Last edited by kevinlu; 04-13-2009, 08:34 PM. Reason: added something to the script

                          Comment

                          • apfejes
                            Senior Member
                            • Feb 2008
                            • 236

                            #14
                            Hi Kevin,

                            Thanks - that's much cleaner than what I'd done.. As I said, I really haven't done much in python before. That's a great resource for anyone else who's looking to do filtering on eland files.
                            The more you know, the more you know you don't know. —Aristotle

                            Comment

                            Latest Articles

                            Collapse

                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              Yesterday, 10:05 AM
                            • SEQadmin2
                              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                              by SEQadmin2


                              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                              Introduction

                              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                              05-22-2026, 06:42 AM
                            • SEQadmin2
                              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                              by SEQadmin2

                              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                              05-06-2026, 09:04 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Yesterday, 12:03 PM
                            0 responses
                            19 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, Yesterday, 11:40 AM
                            0 responses
                            14 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-28-2026, 11:40 AM
                            0 responses
                            29 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-26-2026, 10:12 AM
                            0 responses
                            31 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...