Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calling peaks from NRSF ChIP-Seq with USeq

    Hello!

    I am having some format troubles with the NRSF dataset downloaded from here:

    The files are in txt format like:

    Uniq files: all tags has single alignments in the genome
    Column 1: tag sequence
    Column 2: alignment score
    Column 3: # of hit in the genome, 1 = unique hit
    Column 4: Chr position
    Column 5: Chr direction
    Column 6: matched genome sequence
    Column 7: next best possible alignment score


    example:
    head -5 GSM327023_chipFC1592_uniq_hg17.txt
    GCAGAGTAACCCGCCCCACCCCACC 10406 1 chr6:156964520 F GCAGAGTAACTCTCCCCACCCCACC 9359


    Now I want to run Useq on it. As I have to run the program a lot of times I would like to make it run with one command (ChIPSeq application) which supports ELAND format. So I was trying to convert my files into eland.

    I was wondering if s.o. can help me here. Apparently I didn't manage a correct conversion. I don't seem to have produced the right amount of columns.
    So my question is: How does the ELAND format read by Useq look like?

    That's a line of the error meassage:

    Error: line does not contain enough columns -> GCGCCGAGCATTCCGGCCTGAGGAG CTTCCCAGGCCGGAATGCTCGGCGC U1 1 0 0 chr2.fa 55756496 R .

    If anyone could help here that would be great! As this NRSF dataset is widely used I might not be the only one having these troubles...
    Thanks a lot!
    dani

  • #2
    I know USeq accepts ELAND export format.
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    I don't think the format you posted is a standard ELAND format or maybe an older format. Your best bet would probably be to convert it to a BED file and use that.

    I use this command to convert my ELAND files into QuEST files. It should be pretty straight forward to alter it to convert your files into a BED file.

    I'd try this. No guarantee it will work. Load onto Galaxy and convert the colon to a tab with the "convert delimiters to tab" function. Export from galaxy and try.


    For your file I'd try this (I use something similar to convert my ELAND files into QuEST files - disclaimer I'm not really sure what it does):

    cat yourfile.txt | awk '{if($6 == "F"){print $4" "$5" +"}else{print $4" "$5+25" -"}}' | awk 'BEGIN{FS="[. ]"}{print $1" "$3" "$4}' > yourfile.bed

    Then load back onto Galaxy and convert spaces to tabs as you did before.

    Yeah, not the most straight froward approach but it should work.

    Publicly available data is a real mess and methodology is poorly documented. Something the community should work on improving.
    --------------
    Ethan

    Comment


    • #3
      Hi,
      thanks for your answer. But bed is not accepted by the USeq ChipSeq application.
      About your way to convert to bed: if you replace " " by "\t" you don't have to use GALAXY.

      Thanks anyway!

      Comment


      • #4
        With USeq I think it's better to just put the data through the modules separately. It doesn't take too much more time. The ChipSeq wrapper gets hung up too easily.

        Thanks for the tip.
        --------------
        Ethan

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        27 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X