Unconfigured Ad

**kcchan** · 01-27-2014, 09:49 AM

That sounds more like a SAM file than a FASTQ file.

**Nino** · 01-27-2014, 10:16 AM

No it is not a Sam file. A fastq file consists of a header from the machine it was sequence include genomic coordinates plus other information, second line consists of the read itself, third line which strand position, and lastly fourth line the base quality score. Have you ever looked into a fastq file before? Sam files hold much much more detailed information.

**GenoMax** · 01-27-2014, 10:21 AM

I am not sure if that would be fixable unless you had the images/intensities from original run and could do the basecalling again to recreate the sequence file.

If that is not possible then you could modify the read ID's in some way to make them unique and then you should be able to use all the sequences (if this is a single end read dataset, with paired-end it would be difficult if both files are corrupt).

Hopefully no other part of the file was corrupted by the event that messed up the ID's.

**mastal** · 01-28-2014, 04:12 AM

Originally posted by Nino View Post

No it is not a Sam file. A fastq file consists of a header from the machine it was sequence include genomic coordinates plus other information, second line consists of the read itself, third line which strand position, and lastly fourth line the base quality score. Have you ever looked into a fastq file before? Sam files hold much much more detailed information.

See the fastq format description.

FASTQ format - Wikipedia

http://en.wikipedia.org/wiki/FASTQ_format

The coordinates in the first line are not genomic coordinates, they are coordinates of the read on the flow cell, and include things like flow cell ID, lane number, tile number, and x,y coordinates.

The third line has nothing to do with strand, it's always a +, optionally followed by the sequence identifier in the first line.

**BruceyB** · 01-28-2014, 04:20 AM

Do you still have four lines for every read? Perhaps you could post a small section of the file so we can fully understand what you actually have.

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, Today, 10:09 AM	0 responses 9 views 0 reactions	Last Post by SEQadmin2 Today, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, Yesterday, 08:59 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 Yesterday, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

Missing header Fastq file

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News