Seqanswers Leaderboard Ad

**jeferson** · 02-06-2010, 12:41 PM

appeared the message:

You need to install the perl-doc package to use this program.

then ran the command:

sudo aptitude install perl-doc

worked, thanks

**ale** · 03-25-2011, 08:12 AM

How to bring .csfasta and .qval in same order?

I have the same problem as jeferson (see below),
so how can the reads in .csfasta files be ordered in the same way as in the .qval files?

Originally posted by jeferson View Post

when I run the script solid2fastq with the following command:

$ bfast-0.6.1c/scripts/solid2fastq n-10000000 reads barcode2/barcode2_F3.csfasta barcode2/barcode2_F3.qual

Surgi this error message:

Outputting, currently on:
0csfasta_name = [> 9_42_916_F3]
qual_name = [> 9_42_20_F3]
************************************************** **********
In function "fastq_read" Fatal Error [outofrange]. Variable / Value: read-> name! = Qual_name.
Message: Read names did not match.
***** Exiting due to errors *****
************************************************** **********

What would it be?

**kenietz** · 02-21-2012, 10:43 PM

Hi guys,
i know its an old thread but still its the appropriate one.

I get this error while processing SOLID data:

Outputting, currently on:
108300000read->name=[>1272^300_899^F3]
qual_name=[>1272_300_899_F3]
************************************************************
In function "fastq_read": Fatal Error[OutOfRange]. Variable/Value: read->name != qual_name.
Message: Read names did not match.
***** Exiting due to errors *****
************************************************************

I did not do anything to the files before that tho. So i suppose i have to write a script which just replaces the '^' with '_' i suppose. Thats ok but do these different symbols mean different things? Or is just and error which i cant imagine how might happen at all as the reads come from the SOLID machine directly.

**kenietz** · 02-21-2012, 11:15 PM

Hi again,
found some more of that weird spelling errors:

Outputting, currently on:
83900000read->name=[>1171_196?_1958_F5-P2]
qual_name=[>1171_1967_1958_F5-P2]

What is going on at all?

**nilshomer** · 02-23-2012, 07:35 AM

Originally posted by kenietz View Post

Hi again,
found some more of that weird spelling errors:

Outputting, currently on:
83900000read->name=[>1171_196?_1958_F5-P2]
qual_name=[>1171_1967_1958_F5-P2]

What is going on at all?

Make sure that the CSFAST/QUAL files have the same # of lines and the read names are in the same order.

Nils

**westerman** · 02-23-2012, 07:50 AM

@Nils: I think kenietz is doing exactly that. I.e., trying to make sure that the files have the same read names. However the names are popping up with weird characters in them. This could be a symptom of a deeper problem --- data corruption of the files. Bit-flipping due to bad disks, bad memory, bad data transfer, etc.

One item to check is to see if the sequence data itself (and not just the names) have corruption problems. E.g., something beside the 0,1,2,3 and T (if that is your initial base) that are expected. I hope I am wrong about the data corruption being the problem because this would be nasty to fix but it is something to be checked. md5sums of the original data and the working data could also be in order.

**nilshomer** · 02-23-2012, 08:31 AM

Originally posted by westerman View Post

@Nils: I think kenietz is doing exactly that. I.e., trying to make sure that the files have the same read names. However the names are popping up with weird characters in them. This could be a symptom of a deeper problem --- data corruption of the files. Bit-flipping due to bad disks, bad memory, bad data transfer, etc.

One item to check is to see if the sequence data itself (and not just the names) have corruption problems. E.g., something beside the 0,1,2,3 and T (if that is your initial base) that are expected. I hope I am wrong about the data corruption being the problem because this would be nasty to fix but it is something to be checked. md5sums of the original data and the working data could also be in order.

As a temporary fix, you could replace offending characters with a "Z" symbol.

**kenietz** · 02-29-2012, 10:46 PM

Hi,
of course i checked the number of lines etc.
As Westerman said and i was having the same thoughts. Bad transfer, bad disk or bad memory. But how to check out this problems i have no idea. Is it also possible that the 'solid2fastq' the C variant has some weird bug? I am not sure.
But for example just now i got that error:

85500000read->name=[>308_1887_1544_F5-P2T21112303032103:132120201210102:2002]
qual_name=[>308_1887_1544_F5-P2]
************************************************************
In function "fastq_read": Fatal Error[OutOfRange]. Variable/Value: read->name != qual_name.

So because im not sure whats going on i ran the following command which gave no result which means that the CSFASTA should be correct:

grep -m 1 -P -n '>308_1887_1544_F5-P2T' sl0453_20120208_PE_DUKE_NUS_SURESELECT_4gDNA_Kato_lll_F5-P2.csfasta

After i ran the solid2fastq again on the same file and got that error, on the same number of read but totally different error,amazing:

85500000read->name=[>308_1887_9551_F5-X2]
qual_name=[>308_1887_1551_F5-P2]
************************************************************
In function "fastq_read": Fatal Error[OutOfRange]. Variable/Value: read->name != qual_name.

So im baffled. Is it my PC or the Solid2fastq which plays games with me

?

Any kind of help is appreciated.
Thank you in advance

**kenietz** · 02-29-2012, 11:21 PM

Hi again,
another problem could be the software on the SOLID machine. There was power failure recently and they had to re-run the experiment. After i took the resulting csfasta and qual files 2 times and every time i have errors in different sets. Then i made them to redo the priamary analysis and took the result for the third time and still errors

The person from the support team suggested that Solid2fastq could possible meddle with input files but i highly doubt that option.

Me feeling is that the PC there on the machine is having troubles of some kind.

Update on the case shown in my prev post: It turned out that the qual file has 6 more entries than the csfasta. But that is in the files which i took after the primary re-analysis. In the same files which i took for the second time the number of entries is the same.

Confusing and frustrating

**kenietz** · 03-01-2012, 12:43 AM

Hi again,
sorry for the spam

Good read names are like this:
>1272_300_1473_F3
>1171_196_1958_F5-P2

So i made a perl script which is checking every line starting with '>' for the following patterns depending on the file type i check for errors:

my $patF3='\d+_\d+_\d+_F3';
my $patF5='\d+_\d+_\d+_F5-P2';

Its SOLID PE data with F3 and F5-P2 reads. The F3 exited with no error while the F5-P2 with 37 errors.

So i suppose its some sort of error on the SOLID machine.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News