Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    appeared the message:

    You need to install the perl-doc package to use this program.

    then ran the command:

    sudo aptitude install perl-doc

    worked, thanks

    Comment


    • #17
      How to bring .csfasta and .qval in same order?

      I have the same problem as jeferson (see below),
      so how can the reads in .csfasta files be ordered in the same way as in the .qval files?

      Originally posted by jeferson View Post
      when I run the script solid2fastq with the following command:

      $ bfast-0.6.1c/scripts/solid2fastq n-10000000 reads barcode2/barcode2_F3.csfasta barcode2/barcode2_F3.qual

      Surgi this error message:

      Outputting, currently on:
      0csfasta_name = [> 9_42_916_F3]
      qual_name = [> 9_42_20_F3]
      ************************************************** **********
      In function "fastq_read" Fatal Error [outofrange]. Variable / Value: read-> name! = Qual_name.
      Message: Read names did not match.
      ***** Exiting due to errors *****
      ************************************************** **********

      What would it be?

      Comment


      • #18
        Hi guys,
        i know its an old thread but still its the appropriate one.

        I get this error while processing SOLID data:

        Outputting, currently on:
        108300000read->name=[>1272^300_899^F3]
        qual_name=[>1272_300_899_F3]
        ************************************************************
        In function "fastq_read": Fatal Error[OutOfRange]. Variable/Value: read->name != qual_name.
        Message: Read names did not match.
        ***** Exiting due to errors *****
        ************************************************************

        I did not do anything to the files before that tho. So i suppose i have to write a script which just replaces the '^' with '_' i suppose. Thats ok but do these different symbols mean different things? Or is just and error which i cant imagine how might happen at all as the reads come from the SOLID machine directly.

        Comment


        • #19
          Hi again,
          found some more of that weird spelling errors:

          Outputting, currently on:
          83900000read->name=[>1171_196?_1958_F5-P2]
          qual_name=[>1171_1967_1958_F5-P2]

          What is going on at all?

          Comment


          • #20
            Originally posted by kenietz View Post
            Hi again,
            found some more of that weird spelling errors:

            Outputting, currently on:
            83900000read->name=[>1171_196?_1958_F5-P2]
            qual_name=[>1171_1967_1958_F5-P2]

            What is going on at all?
            Make sure that the CSFAST/QUAL files have the same # of lines and the read names are in the same order.

            Nils

            Comment


            • #21
              @Nils: I think kenietz is doing exactly that. I.e., trying to make sure that the files have the same read names. However the names are popping up with weird characters in them. This could be a symptom of a deeper problem --- data corruption of the files. Bit-flipping due to bad disks, bad memory, bad data transfer, etc.

              One item to check is to see if the sequence data itself (and not just the names) have corruption problems. E.g., something beside the 0,1,2,3 and T (if that is your initial base) that are expected. I hope I am wrong about the data corruption being the problem because this would be nasty to fix but it is something to be checked. md5sums of the original data and the working data could also be in order.

              Comment


              • #22
                Originally posted by westerman View Post
                @Nils: I think kenietz is doing exactly that. I.e., trying to make sure that the files have the same read names. However the names are popping up with weird characters in them. This could be a symptom of a deeper problem --- data corruption of the files. Bit-flipping due to bad disks, bad memory, bad data transfer, etc.

                One item to check is to see if the sequence data itself (and not just the names) have corruption problems. E.g., something beside the 0,1,2,3 and T (if that is your initial base) that are expected. I hope I am wrong about the data corruption being the problem because this would be nasty to fix but it is something to be checked. md5sums of the original data and the working data could also be in order.
                As a temporary fix, you could replace offending characters with a "Z" symbol.

                Comment


                • #23
                  Hi,
                  of course i checked the number of lines etc.
                  As Westerman said and i was having the same thoughts. Bad transfer, bad disk or bad memory. But how to check out this problems i have no idea. Is it also possible that the 'solid2fastq' the C variant has some weird bug? I am not sure.
                  But for example just now i got that error:

                  85500000read->name=[>308_1887_1544_F5-P2T21112303032103:132120201210102:2002]
                  qual_name=[>308_1887_1544_F5-P2]
                  ************************************************************
                  In function "fastq_read": Fatal Error[OutOfRange]. Variable/Value: read->name != qual_name.

                  So because im not sure whats going on i ran the following command which gave no result which means that the CSFASTA should be correct:

                  grep -m 1 -P -n '>308_1887_1544_F5-P2T' sl0453_20120208_PE_DUKE_NUS_SURESELECT_4gDNA_Kato_lll_F5-P2.csfasta

                  After i ran the solid2fastq again on the same file and got that error, on the same number of read but totally different error,amazing:

                  85500000read->name=[>308_1887_9551_F5-X2]
                  qual_name=[>308_1887_1551_F5-P2]
                  ************************************************************
                  In function "fastq_read": Fatal Error[OutOfRange]. Variable/Value: read->name != qual_name.

                  So im baffled. Is it my PC or the Solid2fastq which plays games with me ?

                  Any kind of help is appreciated.
                  Thank you in advance

                  Comment


                  • #24
                    Hi again,
                    another problem could be the software on the SOLID machine. There was power failure recently and they had to re-run the experiment. After i took the resulting csfasta and qual files 2 times and every time i have errors in different sets. Then i made them to redo the priamary analysis and took the result for the third time and still errors

                    The person from the support team suggested that Solid2fastq could possible meddle with input files but i highly doubt that option.

                    Me feeling is that the PC there on the machine is having troubles of some kind.

                    Update on the case shown in my prev post: It turned out that the qual file has 6 more entries than the csfasta. But that is in the files which i took after the primary re-analysis. In the same files which i took for the second time the number of entries is the same.

                    Confusing and frustrating

                    Comment


                    • #25
                      Hi again,
                      sorry for the spam

                      Good read names are like this:
                      >1272_300_1473_F3
                      >1171_196_1958_F5-P2

                      So i made a perl script which is checking every line starting with '>' for the following patterns depending on the file type i check for errors:

                      my $patF3='\d+_\d+_\d+_F3';
                      my $patF5='\d+_\d+_\d+_F5-P2';

                      Its SOLID PE data with F3 and F5-P2 reads. The F3 exited with no error while the F5-P2 with 37 errors.

                      So i suppose its some sort of error on the SOLID machine.
                      Last edited by kenietz; 03-01-2012, 12:45 AM.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 11:49 AM
                      0 responses
                      15 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-24-2024, 08:47 AM
                      0 responses
                      16 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      61 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      60 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X