Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rare NCBI SRA dataset - how can I use it?

    Hi to all,

    I wanted to check this data and use cufflinks to get the FPKM values, but I haven't been able to. The SRA files are too light (200-300mb approximatly) and I know their usually around 1.5Gb.
    I have little to none experience with processing this kind of data and I am having a hard time coming up with a solution for that. I will really appreciate if someone can explain me how to do that.

    http://www.ncbi.nlm.nih.gov/geo/quer...i?acc=GSE68789 - This is the dataset.

    Thanks a lot.
    Last edited by cebehind; 10-14-2015, 06:09 AM.

  • #2
    Originally posted by mastal
    It looks like the files you are referring to are files with the counts, not the raw data.


    That's the link to SRA data, wich is supposed to be the raw files. But I'm not sure.
    Last edited by cebehind; 10-14-2015, 06:53 AM. Reason: Quote missing.

    Comment


    • #3
      The first file listed looked like it was a summary of the counts.

      The other files are .sra files, and I think they might be in a compressed format.

      I think you need the sra toolkit to convert the files to something like fastq.

      Have a look at:


      Comment


      • #4
        Originally posted by cebehind View Post
        http://www.ncbi.nlm.nih.gov/sra?link...rom_uid=283850

        That's the link to SRA data, wich is supposed to be the raw files. But I'm not sure.
        Yes, that is the link to the datafile in the SRA. There you can download the files and convert them to fastq via fastq-dump from the sra toolkit. After that you can go for mapping->cufflinks->whatsoever...

        Alternatively, you can follow your first link to the geo deposit and download the counts table.

        The read numbers are indeed not that high, but for DE it was probably enough - at least it was published with this data amount

        Comment


        • #5
          Originally posted by WhatsOEver View Post
          Yes, that is the link to the datafile in the SRA. There you can download the files and convert them to fastq via fastq-dump from the sra toolkit. After that you can go for mapping->cufflinks->whatsoever...

          Alternatively, you can follow your first link to the geo deposit and download the counts table.

          The read numbers are indeed not that high, but for DE it was probably enough - at least it was published with this data amount
          http://www.ebi.ac.uk/ I used the European Bioinformatics Institute to download the fastq files. Would that work?

          Comment


          • #6
            yes, that should be fine.

            Comment


            • #7
              I'm planing on using Galaxy then, I have no idea how to use R. Is there something that I must know beforehand?

              Comment


              • #8
                The EBI hosts those as fastq? That's surprising but should be fine if you are sure the files are the correct ones. From the geo site you get some info on their analysis procedure which will be important if you want to (more or less) exactly reproduce the results (tophat --max-intron-length 10000 --max-multihits 1; Expression table done using ESAT http://garberlab.umassmed.edu/software/esat/; Genome_build: mm9).

                Comment


                • #9
                  Originally posted by WhatsOEver View Post
                  The EBI hosts those as fastq? That's surprising but should be fine if you are sure the files are the correct ones. From the geo site you get some info on their analysis procedure which will be important if you want to (more or less) exactly reproduce the results (tophat --max-intron-length 10000 --max-multihits 1; Expression table done using ESAT http://garberlab.umassmed.edu/software/esat/; Genome_build: mm9).
                  The thing is I wasn't able to run the ESAT I don't know why. So I'm aiming to get FPKM values for that. I'm having problems with the gene names in cufflinks. I don't know how to do that.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-27-2024, 06:37 PM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-27-2024, 06:07 PM
                  0 responses
                  11 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  68 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X