Seqanswers Leaderboard Ad

**GenoMax** · 02-02-2016, 06:06 AM

Fastq headers should always start with an "@" ~~so what you have is not following the standard. Have you asked the folks who gave you this data as to whether it has been post-processed in some way?~~ And there should be no duplicates (let alone multiples) in raw sequence files, as far as the fastq header ID's are concerned.

**spabinger** · 02-02-2016, 06:24 AM

Hi,

that's not the problem. See "head" result (Sequence and quality trimmed) and also the grep result I posted.

> head R1.fastq
@XXX:5:YYY:1:11101:12923:1051 1:N:0:AGGCAGAA+NCGATCTA
CTT...TTC
+
AAA...</<
@XXX:5:YYY:1:11101:4797:1055 1:N:0:AGGCAGAA+NCGATCTA
ACC...CTA
+
AAA...<A/

Thanks,
Stephan

**GenoMax** · 02-02-2016, 06:43 AM

My apologies.

If the order of the reads in your files is messed up then you can "re-pair" the order of reads using the repair tool from BBMap suite like follows:

Code:

$ repair.sh in1=r1.fq in2=r2.fq out1=fixed1.fq out2=fixed2.fq outsingle=singletons.fq

That said each fastq sequence header should be unique in every sequence file. If that is not the case then there is something wrong with this data.

**spabinger** · 02-02-2016, 06:52 AM

Thanks for you reply.

I was also suspecting that the raw file is not ok.

Best regards,
Stephan

**GenoMax** · 02-02-2016, 07:00 AM

If the sequence/Q-scores are identical for those 7 copies then you could potentially keep just one and throw away other 6.

I am puzzled by how this could have happened though. No logical explanation comes to mind.

**danieleyumi** · 05-25-2018, 04:27 AM

It happened to me twice and a new demultiplexing fixed the problem. I suspect there is something to do with the number of threads to write fastq data. Best, Daniele

Topics	Statistics	Last Post
The Adaptation of the Cell Cycle in Multiciliated Cells by seqadmin Started by seqadmin, Today, 06:58 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 06:58 AM
New Method for DNA Sequence Amplification by seqadmin Started by seqadmin, Yesterday, 08:18 AM	0 responses 19 views 0 likes	Last Post by seqadmin Yesterday, 08:18 AM
New Tools Enhance Single-Molecule DNA Analysis with Minimal Samples by seqadmin Started by seqadmin, Yesterday, 08:04 AM	0 responses 18 views 0 likes	Last Post by seqadmin Yesterday, 08:04 AM
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 13 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM

Seqanswers Leaderboard Ad

Announcement

Duplicate read names - BWA mem - paired reads have different names

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News