Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • newBioinfo
    Member
    • Mar 2012
    • 36

    submitting data to SRA

    Hi,
    I am trying to submit a 16s rRNA reads from Illumina on SRA. I have reached to level where it is asking me the following things:
    Flowcell, Lane, Filename, md5checksum.

    I have the information, but I have some other samples in the same lane that does not belong to me. I am wondering how should I submit the file which have other data also in addition to mine.
    The demultiplexed file which I have is in fasta format, so I don't know how to deal with this.
    Please help!!!
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    I am not sure why your de-multiplexed files are in fasta format (did you never get fastq format files)? Did these samples have "in-line" barcodes (i.e. custom/home-brew multiplex) and were de-multiplexed outside of illumina casava pipeline?

    There is no point in submitting data that does not belong to your study. Looks like you are going to have to go back and do some parsing/re-creating the sample file(s) that you need for submission.
    Last edited by GenoMax; 02-11-2013, 01:10 PM.

    Comment

    • newBioinfo
      Member
      • Mar 2012
      • 36

      #3
      Thanks GenoMax for looking into my problem.
      I got demultiplexed file which looks like this:
      >R.1_00001
      TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT
      >R.2_00001
      TACGGAGGGTGCAAGCGTTATCCGGATTTACTGGGTTTAAAGGGTGCGTAGGTGGGCGGATAAGTCAGTGGTGAAATCTTCAAGCTTAACTTGGAAACTGCCATTGATACTATTCGTCTTGAATATCCCGGAGGTAAGCGGAATATGTCAT
      >R.1_00002
      TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTGTGTAAGTTGGATGTGAAATCCCCGGGCTTAACCTGGGAATGGCATTCAAAACTGCACGGCTAGAGTATGGGAGAGGAAGGTAGAATTCCAGGT

      This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.

      I got two files from here:
      one containing the reads and other the barcode, the files were in fastq format.

      Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.

      Comment

      • Kennels
        Senior Member
        • Feb 2011
        • 149

        #4
        Originally posted by newBioinfo View Post
        Thanks GenoMax for looking into my problem.
        I got demultiplexed file which looks like this:
        >R.1_00001
        TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT
        >R.2_00001
        TACGGAGGGTGCAAGCGTTATCCGGATTTACTGGGTTTAAAGGGTGCGTAGGTGGGCGGATAAGTCAGTGGTGAAATCTTCAAGCTTAACTTGGAAACTGCCATTGATACTATTCGTCTTGAATATCCCGGAGGTAAGCGGAATATGTCAT
        >R.1_00002
        TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTGTGTAAGTTGGATGTGAAATCCCCGGGCTTAACCTGGGAATGGCATTCAAAACTGCACGGCTAGAGTATGGGAGAGGAAGGTAGAATTCCAGGT

        This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.

        I got two files from here:
        one containing the reads and other the barcode, the files were in fastq format.

        Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.

        If your original files you downloaded were in fastq format, then you need to use a script that enables demultiplexing and also outputs in fastq format. What script/program did you use to demultiplex? The sequencing facility should have done this for you with the Illumina pipeline as mentioned above.

        Comment

        • newBioinfo
          Member
          • Mar 2012
          • 36

          #5
          Thanks Kennels,
          The sequencing facility did it for me, and I got the demultiplexed file which is in fasta format. Also, if I do it myself which program should I use?


          Thanks!!!

          Comment

          • Kennels
            Senior Member
            • Feb 2011
            • 149

            #6
            Originally posted by newBioinfo View Post
            Thanks Kennels,
            The sequencing facility did it for me, and I got the demultiplexed file which is in fasta format. Also, if I do it myself which program should I use?


            Thanks!!!

            If your sequencing facility was able to demultiplex it, then they should also be able to produce the fastq format for you. Can't you ask them to do it again?

            You could try fastx toolkit (barcode splitter), or Reaper, but a general search on this forum or google should provide you more choices.
            If you are not very familiar with command line, you could try Galaxy: https://main.g2.bx.psu.edu/ , use the barcode splitter tool under NGS manipulation on the left panel.

            Good luck.
            Last edited by Kennels; 02-11-2013, 06:56 PM.

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              Originally posted by newBioinfo View Post
              Thanks GenoMax for looking into my problem.
              I got demultiplexed file which looks like this:
              >R.1_00001
              TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT

              This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.
              This is slightly confusing. So it sounds like you are saying that you did receive a "fastq" format file that had someone else's data (along with yours). You then de-mumtiplexed the data from this original fastq file.

              Originally posted by newBioinfo View Post
              I got two files from here:
              one containing the reads and other the barcode, the files were in fastq format.
              What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?


              Originally posted by newBioinfo View Post
              Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.
              It may be clear once you answer the above two questions but in any case you are going to have to go back to the original fastq file that you received from your sequencing facility to create the files you need to submit to SRA.

              Comment

              • newBioinfo
                Member
                • Mar 2012
                • 36

                #8
                Thanks GenoMax,
                I did get the original file from the facility but as I was new to the field I asked them to demultiplex it for me and got the file I showed above. So, now I have both the files but while submitting to SRA I need fastq file.
                I think they used their own program to demultiplex it.

                I didn't understand what you mean by this, can you please explain it to me
                """What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?"""

                Also, do I need to write a code for doing this as I have original fastq file, barcode file and mapping file.


                Thanks for help!!!

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  Originally posted by newBioinfo View Post

                  I didn't understand what you mean by this, can you please explain it to me
                  """What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?"""

                  Also, do I need to write a code for doing this as I have original fastq file, barcode file and mapping file.


                  Thanks for help!!!
                  I was asking what software was used for doing the de-multiplexing. But it sounds like this was done by the sequencing facility for you which resulted in the plain fasta file you have.

                  Did you use standard illumina tag protocol (where the tag reads are not part of the actual sequence but are rather done as a separate read) or were the "tags" incorporated within the actual sequence? In case you had used illumina protocol then you would not have a separate barcode file (since you do I am not sure what exactly you did for multiplexing).

                  Either you (or someone who would know how) may indeed have to write some code to parse out data for your sample(s) from the original fastq file if you did not use standard illumina multiplex protocol. Perhaps you can ask the facility to split the fastq file and give you your part of the data.

                  Comment

                  • newBioinfo
                    Member
                    • Mar 2012
                    • 36

                    #10
                    Thanks GenoMax,
                    I contacted the facility and they have provided me the data in fastq files.
                    Thanks for all the help.

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM
                    • SEQadmin2
                      Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                      by SEQadmin2


                      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                      Introduction

                      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                      05-22-2026, 06:42 AM
                    • SEQadmin2
                      Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                      by SEQadmin2

                      Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                      Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                      05-06-2026, 09:04 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, Yesterday, 08:59 AM
                    0 responses
                    14 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    22 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 11:40 AM
                    0 responses
                    19 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 05-28-2026, 11:40 AM
                    0 responses
                    32 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...