Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nix
    replied
    Yes, I've built a rather sophisticated tool for doing just this sort of thing. See IntersectRegions in the USeq package.

    **************************************************************************************
    ** Intersect Regions: August 2008 **
    **************************************************************************************
    IR intersects lists of regions (tab delimited: chrom start stop(inclusive)). Random
    regions can also be used to calculate a p-value and fold enrichment.

    -f First regions files, a single file, or a directory of files.
    -s Second regions files, a single file, or a directory of files.
    -g Max gap, defaults to 0. A max gap of 0 = regions must abut, negative values force
    overlap (ie -1= 1bp overlap, be careful not to exceed the length of the smaller
    region), positive values enable gaps (ie 1=1bp gap).
    -e Score intersections where second regions are entirely contained by first regions.
    -r Make random regions matched to the second regions file(s) and intersect with the
    first. Enter the full path directory text containing chromosome specific
    interrogated regions files (ie named: chr1, chr2 ...: chrom start stop(inclusive)).
    -c Match GC content of second regions file(s) when selecting random regions, rather
    slow. Provide a full path directory text containing chromosome specific genomic
    sequences. To speed the matching place the fraction GC in the last column of
    your region file(s).
    -n Number of random region trials, defaults to 1000.
    -w Write intersections and differences.
    -x Write paired intersections.
    -p Print length distribution histogram for gaps between first and closest second.
    -q Parameters for histogram, comma delimited list, no spaces:
    minimum length, maximum length, number of bins. Defaults to -100, 2400, 100.

    Example: java -Xmx1500M -jar pathTo/Apps/IntersectRegions -f /data/miRNAs.txt
    -s /data/DroshaLists/ -g 500 -n 1000 -r /data/InterrogatedRegions/

    Leave a comment:


  • avilella
    replied
    comparing peak set profiles in chip-seq datasets

    Hi,

    Is there any tool that will tell me how different/similar two chip-seq peak sets are in two different parts of the genome?

    E.g. if I have a ~10Kb region in the genome with a series of peaks and another ~10Kb region in the genome with another set of peaks from the same experiment, can I calculate a distance measure between these two peak set profiles with any available tool?

    Cheers

    Leave a comment:


  • Nix
    replied
    Hello Gonghong,

    Yes, Novoalign is it's own beast (an excellent one at that) and is from Novocraft. So first run your reads through their aligner and then process your data with USeq. For chIP-seq you can probably get by with little loss in resolution using the xxx.sorted.gz alignments that came off the default Eland aligner that runs with the Illumina pipeline. Or barring those, use Bowtie for fast ungapped alignments.

    -cheers, D

    Leave a comment:


  • weigonghong
    replied
    Hello David,

    I'm a new user for USeq. For ChIP-seq analysis, first step is to do genome mapping with Novoaligner. However, I can't find Novoaligner in USeq_5.6/Apps. Is NovoalignParser instead?
    You gave an example for mRNA-seq by using NovoalignParser: java -Xmx1500M -jar pathToUSeq/Apps/NovoalignParser -f /Novo/Run7/
    -v H_sapiens_Mar_2006 -p 20 -q 30 -r /Novo/Run7/mRNASeq/ -i -g
    /Anno/Hg18/mergedUCSCKnownGenes.bed

    Then I compiled this command: java -jar USeq_5.6/Apps/NovoalignParser -f /wrk/data/biomedicum_solexa-090805/s_4_sequence.txt / -v /wrk/data/genomes/homo_sapiens/dna/Homo_sapiens.NCBI36.49.dna.all_chromosomes.fasta -p 20 -q 30 -r /wrk/data/gonghong/useq –i

    Then there are some dialogues coming out as below:
    20.0 Posterior probability threshold
    30.0 Alignment score threshold

    Parsing and filtering...
    /wrk/data/biomedicum_solexa-090805/s_4_sequence.txt
    Problem identifing chromosome column? No '>chr' found in 1st 1000 lines?

    Could you please help to figure it out what happened? I'm wet-experiment postdoc and extremely want to use USeq for ChIP-seq data analysis.

    I'm looking forward to your reply.

    Thanks a lot.
    Gonghong

    Leave a comment:


  • Nix
    replied
    There are a lot of files associated with the results. I also wanted this archived so follow the link above and download the README_Report.doc.zip file for the summary.

    Leave a comment:


  • golharam
    replied
    Any chance someone can post a summary of the results of the challenge on here? I know this is late, but it would be interesting for others to see.

    Leave a comment:


  • ewilbanks
    replied
    OK thanks!

    Leave a comment:


  • Nix
    replied
    I would cite this thread and the archive on sourceforge via html links https://sourceforge.net/projects/use...PSeqChallenge/ .

    Leave a comment:


  • ewilbanks
    replied
    Hi David,

    This is a great resource! If we were to cite it, how would you like us to do that?

    Thanks!
    Lizzy

    Leave a comment:


  • bioinfosm
    replied
    Well, there is a different one, but a chipSEQ challenge = http://camda2009.bioinformatics.northwestern.edu/

    Leave a comment:


  • inesdesantiago
    replied
    I would like to hear about the Chip-Seq Challenge 2.0!

    Leave a comment:


  • simulation11
    replied
    Wow...that's great posts. Thanks a lot for sharing.

    Leave a comment:


  • Nix
    replied
    Final report

    Hello Folks,

    The ChIP-Seq Challenge 1.0 is over! It's been a resounding success with 13 submissions representing 12 analysis packages. Many congrats and thanks to both the players and Illumina and Applied Biosystems for providing prizes.

    The datasets, submissions, analysis, and results have been archived on SourceForge on the USeq project site under CommunityChIPSeqChallenge (https://sourceforge.net/project/show...kage_id=317544).

    -cheers, David

    Leave a comment:


  • Nix
    replied
    JSP, you are correct there are a couple key regions in close proximity that can be intersected by one candidate, thus it is possible to hit 501 key regions in the top 500 list.

    As far as I am aware folks candidate regions aren't excessively large, all under 500bp.

    The number of double hits are minor and won't effect the overall results.

    And no, multiple hits to the same key only count once.

    I'll put together a list of the actual centers used to generate the random fragments and let those interested calculate the intersections. There are several problems with this approach, namely the observed center is not the same as the actual center since read distribution is skewed by the presence of poorly alignable repeats and low complexity regions. Which do you use? Again, I very much doubt it will change the overall results.

    As for additional methods, by all means run them using the simulated data and I can add them to the charts.

    Leave a comment:


  • jsp
    replied
    Hello David,

    I saw some top 500 peak list can identify 501 key regions, and this doesn’t make any sense to me. The reason is either two key regions overlapping two much or identified peak region is too big. So I propose the following suggestions here:

    1. Cleaner key regions -- for neighboring key regions with too much overlaps (for example more than 40%), they should be merged into a single key region. (A good method should be able to identify key regions with some limited amount of overlapping, and that might be the theme for Community ChIP-Seq Challenge 2.0?)

    2. A more objective criteria (related to the resolution of the submitted binding regions) – take the midpoint of each identified peak region and check whether it falls within a key region. ChipMaster raised a question about submitting a list with “chr1:1-lengthOfChr1” before, and “1kb rule” still favors to results with larger peak regions.

    3. The above two is to avoid cases that one peak covers two key regions, we also need to avoid the cases that a single key region is identified multiple times by small peaks (I don’t know whether this has been taken care of already).

    It will be interesting to see the distribution of distances b/t the identified peak centers and their corresponding key region centers.

    Please change “ParkLab” to “BPC” (which stands for binding profile construction) in the report. My lab mate published a package (spp:
    http://compbio.med.harvard.edu/Supplements/ChIP-seq/) on ChIP-seq peak detection, and it performed really really well on many published real ChIP-seq data sets. I hope that my participation of this challenge with my beta version of BPC won’t mislead people to think it’s the best method from Park Lab.

    Thanks for putting all these together.

    Looking forward to challenge 2.0

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 07-25-2024, 06:46 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-24-2024, 11:09 AM
0 responses
26 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-19-2024, 07:20 AM
0 responses
160 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
127 views
0 likes
Last Post seqadmin  
Working...
X