Header Leaderboard Ad

Collapse

Sample short read data set?

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sample short read data set?

    Anyone have a sample data set that they'd be willing to share? I'll host!

  • #2
    A sample of what type of data?
    The more you know, the more you know you don't know. —Aristotle

    Comment


    • #3
      Ideally for my application, short read genomic data (ie, solexa/abi). I'm playing around with some of the software packages listed in this forum and need some data!

      Comment


      • #4
        Sorry, I should have been clearer in my message: what experiment type? ChIP-Seq, Genome shotgun, Transcriptome shotgun, etc. Getting generic data is easy - getting results for a particular type of data may not be.
        The more you know, the more you know you don't know. —Aristotle

        Comment


        • #5
          Genome shotgun, or better yet amplicon enriched genomic reads. ie. not transcriptome data nor ChIP.

          Actually now that I think about it it would be nice to have sample data sets for all applications, but I only have 6TB of bandwith per month!

          Comment


          • #6
            Hrm.. I'm not sure we have many (good quality) genome shotgun data sets kicking around that I'd be able to get permission to release. At least, I personally don't have any, yet. I'll poke around, though, and maybe I can find something for you. If you don't mind poor quality, just for playing around, that might be feasible.
            The more you know, the more you know you don't know. —Aristotle

            Comment


            • #7
              That would be great. I'm not concerned with base quality or recalling, etc. I'm working on analysis platforms...and it's no fun to just randomly generate short reads.

              Let me know!

              Comment


              • #8
                Hi ECO,

                I brought this up yesterday, and was told that there is no point in making any data available. Supposedly there are open repositories (NCBI?) collecting and making this data available. I spent the last 15 minutes looking for said repositories, but couldn't find anything remotely like I expected.

                On the other hand, doing a Google search for ".seq.txt", which is the common file name of sequences produced using the Illumina pipeline, I came up with a set of Histone ChIP experiments that the BC Genome Science Centre has made available anyhow:

                http://www.bcgsc.ca/downloads/histone

                I did confirm that they were intentionally released, so I'm sure there's no problem with using them. On the other hand, I don't know a lot about these particular sets of data. I do know they're not new: they were analysed a while ago, and I've seen several presentations on this information over the last year or so.

                The files themselves are post-base calling, but not yet aligned. They may be good for testing aligners or whole pipelines. (Then again, they're old, they may not be a good test for the latest Illumina software - Interested parties can try that themselves.)

                I suspect the wig files (where available) were created with Findpeaks 2.1.x, though I haven't verified this.

                Cheers,

                Anthony
                The more you know, the more you know you don't know. —Aristotle

                Comment


                • #9
                  Data DVD

                  Applied Biosystems have a sample data DVD of S. suis reads together with a few compiled executables (for UNIX), some Perl code and a workflow document.
                  Data DVD

                  Comment


                  • #10
                    Anthony & sci_guy, thanks much! I'll take a look!

                    I'll probably be starting a thread soon about the best OSS solutions for putting together one's own analysis platform. Not really to support an instrument, but to analyze a small number of runs for a specific project.

                    Comment

                    Working...
                    X