Seqanswers Leaderboard Ad

**GenoMax** · 02-02-2016, 06:06 AM

Fastq headers should always start with an "@" ~~so what you have is not following the standard. Have you asked the folks who gave you this data as to whether it has been post-processed in some way?~~ And there should be no duplicates (let alone multiples) in raw sequence files, as far as the fastq header ID's are concerned.

**spabinger** · 02-02-2016, 06:24 AM

Hi,

that's not the problem. See "head" result (Sequence and quality trimmed) and also the grep result I posted.

> head R1.fastq
@XXX:5:YYY:1:11101:12923:1051 1:N:0:AGGCAGAA+NCGATCTA
CTT...TTC
+
AAA...</<
@XXX:5:YYY:1:11101:4797:1055 1:N:0:AGGCAGAA+NCGATCTA
ACC...CTA
+
AAA...<A/

Thanks,
Stephan

**GenoMax** · 02-02-2016, 06:43 AM

My apologies.

If the order of the reads in your files is messed up then you can "re-pair" the order of reads using the repair tool from BBMap suite like follows:

Code:

$ repair.sh in1=r1.fq in2=r2.fq out1=fixed1.fq out2=fixed2.fq outsingle=singletons.fq

That said each fastq sequence header should be unique in every sequence file. If that is not the case then there is something wrong with this data.

**spabinger** · 02-02-2016, 06:52 AM

Thanks for you reply.

I was also suspecting that the raw file is not ok.

Best regards,
Stephan

**GenoMax** · 02-02-2016, 07:00 AM

If the sequence/Q-scores are identical for those 7 copies then you could potentially keep just one and throw away other 6.

I am puzzled by how this could have happened though. No logical explanation comes to mind.

**danieleyumi** · 05-25-2018, 04:27 AM

It happened to me twice and a new demultiplexing fixed the problem. I suspect there is something to do with the number of threads to write fastq data. Best, Daniele

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 1 view 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Duplicate read names - BWA mem - paired reads have different names

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News