Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • nclement
    Junior Member
    • Jun 2008
    • 3

    spike-in data set

    Does anyone know where I can find an Illumina spike-in data set? We're trying to benchmark some programs, but haven't been able to find any available.
  • Nix
    Member
    • Jun 2008
    • 60

    #2
    For chIP-seq?

    Comment

    • nclement
      Junior Member
      • Jun 2008
      • 3

      #3
      Yes. For ChIP-seq.

      Comment

      • zee
        NGS specialist
        • Apr 2008
        • 249

        #4
        I've used this dataset previously to test our alignment algorithm.

        ChIP-Seq Transcription Factor Data — by Steven Jones — last modified Dec 05, 2008


        There are some .wig files available as well.

        Comment

        • Nix
          Member
          • Jun 2008
          • 60

          #5
          Pseudo simulated spike-in datasets for chIP-seq

          I've posted some pseudo simulated spike-in data sets to http://bioserver.hci.utah.edu/Supple...aperInfo/2008/ under Nix_EmpricalMethods. Basically, localized random reads are added to input sequencing data. Keys are provided so you can calculate TPRs and FDRs.

          This is from a paper we just submitted:

          Spike-in Data Set Generation
          An application was developed to simulate single binding site chIP regions. It works by randomly selecting center positions from a genome. These are expanded to a maximum defined size (500bp) and then filtered to remove regions with a RepeatMasker base content of greater than 0.2 and a fraction of non GATC bases greater than 0.5. For each remaining region, random fragments are generated about each center position from 150 to 500bp in size. From each simulated fragment, each end is taken as a read and each base in the read mutated according to the published per cycle error frequency[12]. Reads are then aligned to the genome.

          For the human spike-in dataset, 1000 regions with 1000 simulated chIP fragments were selected producing 2000 reads each. These were mapped to hg17 using the ELAND Extended aligner from Solexa. Only those regions with greater than 1000 mapped reads were chosen for use in generating the spike-in set. 60 groups containing 30 spike-in regions were created. For each of the groups, from 2 to 60 reads were randomly drawn from each of the 30 residing spike-in regions. These represent 900 spike-in regions containing 2 to 60 reads, 27,900 total. To create the actual spike-in datasets, Johnson et.al.’s control input data was combine, randomized and split in thirds, 1,698,713 reads each. To one of the thirds, the reads from the 900 spike-in regions were added. This represents the simulated chIP data, the other two simulated input data sets.

          In a similar fashion, the mouse spike-in dataset was generated. Reads from 1000 regions with 1000 fragments were mapped to mm8. Regions with greater than 500 mapped reads were used to derive 87 groups, each with 10 regions, 870 total containing 1 to 87 randomly drawn reads, 38,280 total. To generate a larger simulated input dataset, data from Mikkelsen et. al. that showed little to no significant enrichment (ES.H3, ES.K9, ES.RPol, ESHyb.K9, MEF.K9, NP.K9, NP.K27, and NP.K36) were pooled along with their actual whole cell extract input data (ES.WCE, MEF.WCE, and NP.WCE), randomized, and split in thirds, 16,383,950 reads each.


          You will need to combine the appropriate files to generate datasets that suit your purposes. There are a bunch of utilities at USeq (http://useq.sourceforge.net/ ) for manipulating the binary bar files if you'd rather go that route instead of using the txt files. Good luck! I'm glad you're going to bench mark your methods. -cheers, David

          Comment

          • Jason.Lu
            Junior Member
            • Dec 2010
            • 1

            #6
            spikin-in data from RNA-Seq

            Hi,
            This is my first post. First want to say hi to everyone.
            I wonder whether anyone knows of any RNA-Seq spike-in datasets publicly available (like latin-square data or golden spike from the affy platform).
            Thanks,

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, Yesterday, 05:37 AM
            0 responses
            5 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            16 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            50 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            110 views
            0 reactions
            Last Post SEQadmin2  
            Working...