Seqanswers Leaderboard Ad

**Torst** · 10-12-2012, 11:01 PM

Neverending Illumina format changes

I don't really know anything about 'cmpfastq' but I've had a look at the source code:

http://compbio.brc.iop.kcl.ac.uk/software/download/cmpfastq

From what I can tell, it expects the ID line to match this pattern /^@(.*)#.*/
which means an @ followed by some chars, then a # followed by some chars.

Your IDs do not fit this pattern, because you don't have the #xxxxx part.

Illumina used to use #AGCTCG to denote barcodes in multiplex samples. These days it uses a different format, or doesn't print it at all.

To make it work with your data, change it to /^@(.*)(#.*)?/ or /^@(.*)/

Good luck.

**all_your_base** · 10-15-2012, 05:43 AM

Thank you very much for the reply. You have correctly identified the problem, and I can now resolve it to work with MiSeq reads. Thanks again for the insight!

**safina** · 03-20-2015, 02:58 AM

Hello. Im having the same probl;em and i tried changing the pattern to match my header but it posted all my reads to a unique file where as common files remains empty. please help

**Brian Bushnell** · 03-20-2015, 09:16 AM

What exactly are you trying to do? I have a program called "filterbyname" that can probably do it...

**safina** · 03-20-2015, 11:21 PM

Pairing of fastq files(F/R)

Im trying to pair my fastq files after quality filtering and trimming of those files via FASTQC. My files look like these:

mexD1B_filt_trim_1.fastq <==
@MexD1BSRR1562087.10.1/1
GAGCTAGATCAGCACCATATATTACACGATGATCAGCTGTAACATTTACCTGCATCTGGTTCTTCATTCCTATCCGACCATCCTTGG
+SRR1562087.10.1/1
JJJJJJIIJJJJJJJJIJJJJJJJJJJJJJJJIJJJJJJJGIIJJJJIJJJJJJJJJIJJJJDHIHHHHHHHFDFFDDDDDDDDD>C
@MexD1BSRR1562087.11.1/1
AGGTTGACTATGGTCCAGGCCATGCCAGGAGAGCAACCGAAAACAGAGAGAACGGTAAGCCAGGAGAAGAACAGTATGAGTATATAG
+SRR1562087.11.1/1
IJJGHIJIIIFIBHHGAFHGGIHJIJGJEGIGGGHGIJJJJHHGFEFEDACEEDDBDBCCCDDDDDDBDDDCDDCADDDCCCDDDDD
@MexD1BSRR1562087.15.1/1
TAACATCCACAATCTCCTTCTACCCAAGAAGTCTGGAACTTCAGCATCAAAGGCTGGTGATGACGACAACTAATCCATTTACTGAAT

==> mexD1B_filt_trim_2.fastq <==
@MexD1BSRR1562087.7.2/2
CCTGTAGATATACGTACTGCCAAAGGGTAGATAGTTGCCCATCTCAGAAAACACAACTTCAACAGCCAAGATTAATATCCATGTGAT
+SRR1562087.7.2/2
IJJJGGJBHIJJGHHHIIHJJGJGJIIDFHIJIJJJGHJJJJJJJIJGIGH@FHJIJIHIIIHHH=BDFFAEECCEEFDEDDCDCA>
@MexD1BSRR1562087.9.2/2
GTAATCCAAATAAGGTATACTCACTCATCGGAGGATTTTGTGCTTCCCCTGTGAATTTCCACGCTAAGGATGGCTCCGGCTATAAAT
+SRR1562087.9.2/2
JIJIIJJJGGIIJIBC@FH@HHJGIJGCHGIEGIFHDFHJIJIJIHHIIIIJGGHHHHHCDDFDDDBDDDDDDDCDBDDBD@CDCEE
@MexD1BSRR1562087.11.2/2
GAAACACTGATTGGTTCACGTATCCAGGTGTATGGACCACCTATATACTCATACTGTTCTTCTCCTGGCTTACCGTTCTCTCTGTTT

**GenoMax** · 03-21-2015, 04:59 AM

@safina: You should use a program called repair.sh that is part of BBMap package. Brian has an example posted here: http://seqanswers.com/forums/showpos...0&postcount=45

Your command would look something like this:

Code:

$ repair.sh in1=mexD1B_filt_trim_1.fastq in2=mexD1B_filt_trim_2.fastq out1=mexD1B_filt_trim_1_fixed.fq out2=mexD1B_filt_trim_2_fixed.fq outsingle=single.fq

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Problem with cmpfastq, can't process my .fastq /1 and /2 files

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News