Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Xi Wang View Post
    You can use the script below (name it qseq2fastq.pl and replace the former one):

    Code:
    #!/usr/bin/perl
    
    use warnings;
    use strict;
    
    while (<>) {
    	chomp;
    	my @parts = split /\t/;
    	print "@","$parts[0]:$parts[2]:$parts[3]:$parts[4]:$parts[5]#$parts[6]/$parts[7]\n";
    	print "$parts[8]\n";
    	print "+","$parts[0]:$parts[2]:$parts[3]:$parts[4]:$parts[5]#$parts[6]/$parts[7]\n";
    	print "$parts[9]\n";
    }
    Greetings Xi Wang,

    I have tried to use this script to convert from minimal fastq format to one in which the read name is listed before the base qualities. Here is my command line:

    $ perl qseq2fastq.pl sequence.fastq > test.fastq

    However at each attempt, I get an empty output file and the "use of uninitialized value in concatenation (.) or string" message in the terminal. Please excuse my ignorance as I have only very limited knowledge of perl scripts. I would appreciate it very much if you could explain what I am doing wrong and give me step-by-step instructions on how to run this script.

    Many thanks!

    Comment


    • #32
      Originally posted by labrat73 View Post
      Greetings Xi Wang,

      I have tried to use this script to convert from minimal fastq format to one in which the read name is listed before the base qualities. Here is my command line:

      $ perl qseq2fastq.pl sequence.fastq > test.fastq

      However at each attempt, I get an empty output file and the "use of uninitialized value in concatenation (.) or string" message in the terminal. Please excuse my ignorance as I have only very limited knowledge of perl scripts. I would appreciate it very much if you could explain what I am doing wrong and give me step-by-step instructions on how to run this script.

      Many thanks!
      You try to convert fastq to fastq; that's not the intention of the script. The above script converts qseq format to fastq.

      Comment


      • #33
        Originally posted by sklages View Post
        You try to convert fastq to fastq; that's not the intention of the script. The above script converts qseq format to fastq.
        sklages-

        thanks so much for your reply. i'm a bit confused because my file has the fastq extension and it looks like this:

        @SRR101483.1 SCS_0014:6:1:1063:16736/1
        GCGTAGGCTCTATCCCTAGAATGCAAAGGTGGTTCAACATACACAGATCAATAAATGTGATTCAC
        +
        DDDBDCC=D-5AA<B--CAAC5?A5@CC-=AA>>5CC:5=?:A5AC:C?D:C:>5?==@A@

        when i try to run it, though, i keep getting an error. i compared it to other files that i've run and that's when i noticed that in other files, the title name appears again after the "+", immediately before the base qualities. i'm trying to convert or edit this file so that it looks like this:

        @SRR101483.1 SCS_0014:6:1:1063:16736/1
        GCGTAGGCTCTATCCCTAGAATGCAAAGGTGGTTCAACATACACAGATCAATAAATGTGATTCAC
        +SRR101483.1 SCS_0014:6:1:1063:16736/1
        DDDBDCC=D-5AA<B--CAAC5?A5@CC-=AA>>5CC:5=?:A5AC:C?D:C:>5?==@A@

        i hope this makes sense and appreciate any advice you could offer.

        best-

        labrat73

        Comment


        • #34
          Use [ code ] and [ /code ] tags to prevent the forum messing up the display of examples.

          Your files is already FASTQ format - without the redundant optional repeated identifier on the plus lines. You don't need to make that change.

          As sklages said earlier, the script this thread is about converting from the Illumina qseq format into FASTQ.

          Comment


          • #35
            fastq validator

            has anyone tried using this to test?

            I have a very similar problem here where my .txt is in this format
            where there is no line break after the '+'... however this is still in fastq format because the '+' line is optional... however some people here were still getting errors in the format i have posted below

            has anyone used http://genome.sph.umich.edu/wiki/FastQValidator ?


            @HWI-ST604_0134:4:1101:1391:1882#0/1
            NATAGTGCTTTAGCATCATATCTAAGGCTGTTCGTCCTACATTGTTGAGGAAACAACTATGACCTCCCTTGGGTCGGTTGCTATGCAA AGCAATGCTAACA
            +HWI-ST604_0134:4:1101:1391:1882#0/1
            BUXRMZ[Z[[cccccccccccccccccccccccccccccc\cccccccccc_cccUYcccccccaccUYccccc_ccc__a\cac\_V __^X^^^\^^[^\
            @HWI-ST604_0134:4:1101:1493:1886#0/1
            NTAGATAATGATGCCACTGTTACAACTCTGTGCTTTGGGGTACCTAACAAGTCTCCCTCAGTGCCTCTCTGATTTGTAGCTAGTCAAT AGAATGAATAAAG
            +HWI-ST604_0134:4:1101:1493:1886#0/1
            BUXYX[[Z[[cccccc_cccccccc_ccccccccccc\ccZ____ccc_ccccccccccc[____ccccc_[cc_c_ccc_c_c_cc_ \_BBBBBBBBBBB

            Comment


            • #36
              Originally posted by arcolombo698 View Post
              has anyone tried using this to test?

              I have a very similar problem here where my .txt is in this format
              where there is no line break after the '+'... however this is still in fastq format because the '+' line is optional... however some people here were still getting errors in the format i have posted below

              has anyone used http://genome.sph.umich.edu/wiki/FastQValidator ?


              @HWI-ST604_0134:4:1101:1391:1882#0/1
              NATAGTGCTTTAGCATCATATCTAAGGCTGTTCGTCCTACATTGTTGAGGAAACAACTATGACCTCCCTTGGGTCGGTTGCTATGCAA AGCAATGCTAACA
              +HWI-ST604_0134:4:1101:1391:1882#0/1
              BUXRMZ[Z[[cccccccccccccccccccccccccccccc\cccccccccc_cccUYcccccccaccUYccccc_ccc__a\cac\_V __^X^^^\^^[^\
              @HWI-ST604_0134:4:1101:1493:1886#0/1
              NTAGATAATGATGCCACTGTTACAACTCTGTGCTTTGGGGTACCTAACAAGTCTCCCTCAGTGCCTCTCTGATTTGTAGCTAGTCAAT AGAATGAATAAAG
              +HWI-ST604_0134:4:1101:1493:1886#0/1
              BUXYX[[Z[[cccccc_cccccccc_ccccccccccc\ccZ____ccc_ccccccccccc[____ccccc_[cc_c_ccc_c_c_cc_ \_BBBBBBBBBBB
              I don't get it. There is a "linebreak" (newline) after your '+' line. So this is normal fastq format.

              Btw, the '+' line is *not* optional, its content is! There must always be at least the '+' sign as header for the quality line. But it is optional to write any information after that (in the same line).

              Comment


              • #37
                The problem I see is that bases and qualities both have a spaces in them, but otherwise it looks fine.

                Comment


                • #38
                  Originally posted by Brian Bushnell View Post
                  The problem I see is that bases and qualities both have a spaces in them, but otherwise it looks fine.
                  You're right, maybe a copy&paste issue ..?

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-25-2024, 11:49 AM
                  0 responses
                  19 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-24-2024, 08:47 AM
                  0 responses
                  18 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  62 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X