Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Concatenate several SRA reads to a single fastq file

    Hi all,

    I want to re-analyse a dataset available in GEO:

    I've downloaded all the files, however it seems that each replicate/experiment has several file corresponding to several runs. The question is: how to concatenate these runs once they are converted into fastq? Would a simple cat command work?

    Also, is this a silly thing to do? (I'm a newbie) Ultimately my goal is to determine gene expression changes.

    Cheers.

  • #2
    [QUOTE=krespim;80152]

    I've downloaded all the files, however it seems that each replicate/experiment has several file corresponding to several runs. The question is: how to concatenate these runs once they are converted into fastq? Would a simple cat command work?
    [./QUOTE]

    I'm sorry, why exactly do you want to cat all these files? A simple a simple
    cat *.fastq >> consolidated_fastq.fq

    will be fine but whats your need for doing so? Every single run usually corresponds to a different sample so why merge all?

    Originally posted by krespim View Post

    Also, is this a silly thing to do? (I'm a newbie) Ultimately my goal is to determine gene expression changes.

    Cheers.
    Process files separately then use comparitive studies on the sam/bam files!

    Comment


    • #3
      Originally posted by arkal View Post
      will be fine but whats your need for doing so? Every single run usually corresponds to a different sample so why merge all?
      Well, this is actually my main issue as I don't know if each run is a different sample, or the same sample ran in multiple lanes. The sample GEO page lists 3 SRA files.

      The paper does not mention biological or technical replicates.

      Comment


      • #4
        Originally posted by krespim View Post
        Well, this is actually my main issue as I don't know if each run is a different sample, or the same sample ran in multiple lanes. The sample GEO page lists 3 SRA files.

        The paper does not mention biological or technical replicates.
        It seems to be the same sample in different lanes/runs... so you can either merge the fastqs and align or align the 3 separately and merge the sams! I recommend the latter as it will take less time (provide you have the resources to align them parallelly)!

        Comment


        • #5
          Originally posted by arkal View Post
          It seems to be the same sample in different lanes/runs... so you can either merge the fastqs and align or align the 3 separately and merge the sams! I recommend the latter as it will take less time (provide you have the resources to align them parallelly)!
          I have recently been granted access to a server with a very decent number or processors so that should be easy

          Thanks a lot arkal!

          Comment


          • #6
            No worries all the best!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            47 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            48 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X