Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Prinseq not accepting my FASTQ format?

    Hi,
    I am trying to use PRINSEQ to examine the quality of an illumina dataset. It is in FASTQ format, 100bp SE. Strangely, when I try to open it and examine some basic stats, PRINSEQ wont open it...Says it is not in the expected FASTQ format. Even more strangely, PRINSEQ wont open the example FASTQ dataset provided with the release, and throws the same error...

    Has anyone else has similar issues? I must be missing something. The format of the Fastq file seems normal to me and is as follows:

    @HWI-ST1085:16629UYACXX:8:1101:1064:1845 1:Y:0:
    NTGANACCTGCACGGGCGAGGTGACGGGCGCCGGCGGCGGCGACGCGTGCGGCAGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTA
    +
    #0;;#4@@@@><@@@@@@??????????<????=<<:::7:::::::::::::::<<<<<<:<<:<<<:9::<===<<<<<==<=95::66:9:9<=<<2
    @HWI-ST1085:16629UYACXX:8:1101:1185:1845 1:N:0:
    NTACGGACAGCGTGCTGTCGCCGGGCCTGTGGGTGTCCCATGGTGGCAGAGATCGGAAAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTC
    +
    #1=DDFFFFHFHDIJJJIJJJJJJJJJJJIIJJGGIIJJIJHHHGGFFFDDDECDD@BBBABDBB@DDDDDDDDCCCDB>@@9<>BBDDDDBBDCCCBDB
    @HWI-ST1085:16629UYACXX:8:1101:1231:1845 1:N:0:
    **forgot to add the details of the error after trying to run the prinseq-lite.pl script:
    Cant find string terminator "'" anywhere before EOF at -e line 1
    ???
    Last edited by lplough81; 08-13-2013, 08:29 AM. Reason: new information

  • #2
    I think the problem arises from trying to convert line endings and pipe this to open, which is failing. I would expect this to produce another message, as well. Is that the only error message you see when you run prinseq?

    Also, did you edit or save this file on a PC? That may be related.

    Comment


    • #3
      RE:

      Hi,
      Thanks for the reply... I was indeed attempting to use prinseq on a PC--i am moving and only have access to my PC laptop. But, I thought that I had run prinseq on a PC before...perhaps just the web version.

      The only error is the one listed, which is followed by the unrecognized fastq error that I described
      ERROR: input file for -fastq is in UNKNOWN format not in fastq format
      Perhaps it works better on linux, but it is simply a perl script with few dependencies...

      Comment


      • #4
        I have not tried prinseq on a PC but it looks like that is related to the issues you are experiencing (not recognizing line-endings). About the format error, I would have to know what version you are using to try and figure out if there really is something wrong with the format.

        Comment


        • #5
          In this case the sequence file may indeed be in "unix" format that may be causing the error.

          If you only have access to a PC now try using this utility to convert the file to PC format to see if it works: http://www.efgh.com/software/unix2dos.htm

          Comment


          • #6
            Unfortunately, the unix2dos conversion didnt change anything when running on my PC. I was able to run the
            prinseq-lite.pl
            script via remote access to a unix based server (which suggests that it is a dos/unix format issue), but then
            prinseq-graphs.pl
            script which generates the graphics now wont run.

            The error is
            Can't locate JSON.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at prinseq-graphs.pl line 32.
            BEGIN failed--compilation aborted at prinseq-graphs.pl line 32.
            I installed JSON through CPAN, yet the error persists. Perhaps I will move on to other QC programs. In my hands, prinseq has been difficult and may not worth the effort.

            Comment


            • #7
              You will find FastQC easier to use. Can run on your local PC too. http://www.bioinformatics.babraham.a...ojects/fastqc/

              Comment


              • #8
                Originally posted by lplough81 View Post
                I installed JSON through CPAN, yet the error persists. Perhaps I will move on to other QC programs. In my hands, prinseq has been difficult and may not worth the effort.
                That error message ("Can't locate JSON in ...") says you did not install JSON. What did you do to install it? Run the command:

                Code:
                perl -MJSON -e 1
                If you see that message again, it definitely is not installed. If you see nothing, it is installed (unlikely, given that prinseq-graphs.pl can't load it).

                Note that the issues you are having, a unix-pc line ending problem and some missing dependencies, are really quite simple issues and these are things people who do computer work deal with everyday. I encourage you to be a bit more patient and not get discouraged and give up on this program by these things alone. That being said, try other programs and pick what works best for you. For example, FastQC (as mentioned above) may be more intuitive.
                Last edited by SES; 08-14-2013, 12:13 PM.

                Comment


                • #9
                  The only reason I see to use prinseq is if you need to use a low complexity filter (i.e. DUST). FastQC uses java which would likely make it easier to run.

                  Comment


                  • #10
                    Originally posted by jwag View Post
                    The only reason I see to use prinseq is if you need to use a low complexity filter (i.e. DUST). FastQC uses java which would likely make it easier to run.
                    FastQC is great but it is just for plotting, prinseq will actually do the trimming of various types. The plots and statistics reported by the two programs are also different so I would not say just use one or other. Unless, one program does everything you need then there's really no need to use another method.

                    Comment


                    • #11
                      Originally posted by SES View Post
                      FastQC is great but it is just for plotting, prinseq will actually do the trimming of various types. The plots and statistics reported by the two programs are also different so I would not say just use one or other. Unless, one program does everything you need then there's really no need to use another method.
                      I've used Prinseq for trimming, but it doesn't maintain paired read order, so I stopped using it.

                      I have not used Prinseq for statistics (I've only used FastQC) so I couldn't comment on any differences in that respect.

                      Comment


                      • #12
                        Originally posted by jwag View Post
                        I've used Prinseq for trimming, but it doesn't maintain paired read order, so I stopped using it.
                        Yeah, that is a major pain. I ended up writing a program to re-pair the files after trimming because we were doing this very often. I think the latest prinseq actually has an option to maintain the pair order (according to a colleague), but I haven't actually tested this feature so I can't confirm it.

                        Comment


                        • #13
                          Originally posted by SES View Post
                          Yeah, that is a major pain. I ended up writing a program to re-pair the files after trimming because we were doing this very often. I think the latest prinseq actually has an option to maintain the pair order (according to a colleague), but I haven't actually tested this feature so I can't confirm it.
                          I just checked the prinseq website and it looks like it does indeed support paired output now. I might start using it again, because I like how it lets you choose the number of N's to leave in a data set.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Essential Discoveries and Tools in Epitranscriptomics
                            by seqadmin




                            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                            04-22-2024, 07:01 AM
                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-25-2024, 11:49 AM
                          0 responses
                          19 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-24-2024, 08:47 AM
                          0 responses
                          19 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          62 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          60 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X