Hi all,
I recently acquired a dataset from GEO (HiSeq 2500, accession: GSE107029). It is a paired-end but the read_1 is 94 bp while read_2 is 100 bp. Since I've never seen paired-end data with different read length for read_1 and read_2 from HiSeq 2500, I am wondering if anyone can help me understand why read_1 and read_2 have different read lengths.
Here are a few reads from read_1 and read_2. I downloaded data using
fastq-dump --split-files SRR6300667
Read_1 (SRR6300667_1.fastq)
@SRR6300667.1 DHCDZDN1:3:1101:1145:1177 length=94
CGGAATGCAGCAATCAATGTCGTCGGAAGATCCTGAATAAATCCTACTGTATCTGAAAGAAGAACACTGTAGCCGCTTGGCAGGACCATTTTTC
+SRR6300667.1 DHCDZDN1:3:1101:1145:1177 length=94
DFDHHH<EEFGGIHIIHCGFDGDF@GHFADFHGICFAEHGHECCAG@GG;EH>CCEA73?;B>@CCCCCCCCCCBBBB???C?BB@@?CCEEC#
@SRR6300667.2 DHCDZDN1:3:1101:1178:1247 length=94
GGCTCCCCCCTGCAAATGAGCCCCAGCCTTCTCCATGGTGGTGAAGACGCCAGTGGACTCCACGACGTACTCAGCGCCAGCATCGCCCCACTTG
+SRR6300667.2 DHCDZDN1:3:1101:1178:1247 length=94
FHHHHHJJJJJJJIIJJJJJIJJJJJIJJJJJJIJJJJGHHAEHIHIIHHFFCEEEEDDDDDDDDDDDABDDDDDDDDBDDDDDBDDDDDDDD@
@SRR6300667.3 DHCDZDN1:3:1101:1313:1046 length=94
TCCTTTAGCTGACCACTTCTTCAAGTAGGCCGGGGATACAAAATCCTTTTGCATGAGGAAAGCTGAAATTCCACACAGGTACCACAAGATATTA
+SRR6300667.3 DHCDZDN1:3:1101:1313:1046 length=94
EHHHHHEGBGGCHIJGHIFHIHIIIIIIJJJHIIJAHGFGIIJJCFGGGIIBCHHEHGFDEFFEEECCCEDCCCCBDDD:@CCACBBDCDDEED
Read_2 (SRR6300667_2.fastq)
@SRR6300667.1 DHCDZDN1:3:1101:1145:1177 length=100
CGATGACCAGAAAAATGGTCCTGCCAAGCGGCTACAGTGTTCTTCTTTCAGATACAGTAGGATTTATTCAGGATCTTCCGACGACATTGATTGCTGCATT
+SRR6300667.1 DHCDZDN1:3:1101:1145:1177 length=100
@<ADDDDHBHFFEGGGE<CFGHIIIIGCEGDHIGI@GGGCFGHIIIIIHCHAGGHIG@@D>DGHGCACAEEHDFFFFFEDA>B@;,5@3>ADC:A@CCC:
@SRR6300667.2 DHCDZDN1:3:1101:1178:1247 length=100
ATGTTCCAATATGATTCCACCCATGGCAAATTCCATGGCACCGTCAAGGCTGAGAACGGGAAGCTTGTCATCAATGGAAATCCCATCACCATCTTCCAGG
+SRR6300667.2 DHCDZDN1:3:1101:1178:1247 length=100
CCFFFFDHHHDADEHGGGJJJEECHGDFHGIIJCDGHIGIJJFGAHEHGGGHGBHGEHIIIGHFHEHDDDD@EACECEECDDCC>CACD<>CDCCDCCD9
@SRR6300667.3 DHCDZDN1:3:1101:1313:1046 length=100
AGCCATACAGGAGATGGGAAACCACGCTATGATACTTTCTGGAAACATTTTATATTTGTTATGATGGACATTTTGCTCGATTGGAGCATGCATAATATCT
+SRR6300667.3 DHCDZDN1:3:1101:1313:1046 length=100
BCFFFFFHHHHHIHIIIJGJFGHIJJIJJJJIIGGHIJIJJJFJIAHHIHHIJIIJJJJJGIJJIJJJIGIHHHHEHFFFEECECDA?CCDDDDCDDEEF
I am quite confused.
Thank you,
Statsteam
I recently acquired a dataset from GEO (HiSeq 2500, accession: GSE107029). It is a paired-end but the read_1 is 94 bp while read_2 is 100 bp. Since I've never seen paired-end data with different read length for read_1 and read_2 from HiSeq 2500, I am wondering if anyone can help me understand why read_1 and read_2 have different read lengths.
Here are a few reads from read_1 and read_2. I downloaded data using
fastq-dump --split-files SRR6300667
Read_1 (SRR6300667_1.fastq)
@SRR6300667.1 DHCDZDN1:3:1101:1145:1177 length=94
CGGAATGCAGCAATCAATGTCGTCGGAAGATCCTGAATAAATCCTACTGTATCTGAAAGAAGAACACTGTAGCCGCTTGGCAGGACCATTTTTC
+SRR6300667.1 DHCDZDN1:3:1101:1145:1177 length=94
DFDHHH<EEFGGIHIIHCGFDGDF@GHFADFHGICFAEHGHECCAG@GG;EH>CCEA73?;B>@CCCCCCCCCCBBBB???C?BB@@?CCEEC#
@SRR6300667.2 DHCDZDN1:3:1101:1178:1247 length=94
GGCTCCCCCCTGCAAATGAGCCCCAGCCTTCTCCATGGTGGTGAAGACGCCAGTGGACTCCACGACGTACTCAGCGCCAGCATCGCCCCACTTG
+SRR6300667.2 DHCDZDN1:3:1101:1178:1247 length=94
FHHHHHJJJJJJJIIJJJJJIJJJJJIJJJJJJIJJJJGHHAEHIHIIHHFFCEEEEDDDDDDDDDDDABDDDDDDDDBDDDDDBDDDDDDDD@
@SRR6300667.3 DHCDZDN1:3:1101:1313:1046 length=94
TCCTTTAGCTGACCACTTCTTCAAGTAGGCCGGGGATACAAAATCCTTTTGCATGAGGAAAGCTGAAATTCCACACAGGTACCACAAGATATTA
+SRR6300667.3 DHCDZDN1:3:1101:1313:1046 length=94
EHHHHHEGBGGCHIJGHIFHIHIIIIIIJJJHIIJAHGFGIIJJCFGGGIIBCHHEHGFDEFFEEECCCEDCCCCBDDD:@CCACBBDCDDEED
Read_2 (SRR6300667_2.fastq)
@SRR6300667.1 DHCDZDN1:3:1101:1145:1177 length=100
CGATGACCAGAAAAATGGTCCTGCCAAGCGGCTACAGTGTTCTTCTTTCAGATACAGTAGGATTTATTCAGGATCTTCCGACGACATTGATTGCTGCATT
+SRR6300667.1 DHCDZDN1:3:1101:1145:1177 length=100
@<ADDDDHBHFFEGGGE<CFGHIIIIGCEGDHIGI@GGGCFGHIIIIIHCHAGGHIG@@D>DGHGCACAEEHDFFFFFEDA>B@;,5@3>ADC:A@CCC:
@SRR6300667.2 DHCDZDN1:3:1101:1178:1247 length=100
ATGTTCCAATATGATTCCACCCATGGCAAATTCCATGGCACCGTCAAGGCTGAGAACGGGAAGCTTGTCATCAATGGAAATCCCATCACCATCTTCCAGG
+SRR6300667.2 DHCDZDN1:3:1101:1178:1247 length=100
CCFFFFDHHHDADEHGGGJJJEECHGDFHGIIJCDGHIGIJJFGAHEHGGGHGBHGEHIIIGHFHEHDDDD@EACECEECDDCC>CACD<>CDCCDCCD9
@SRR6300667.3 DHCDZDN1:3:1101:1313:1046 length=100
AGCCATACAGGAGATGGGAAACCACGCTATGATACTTTCTGGAAACATTTTATATTTGTTATGATGGACATTTTGCTCGATTGGAGCATGCATAATATCT
+SRR6300667.3 DHCDZDN1:3:1101:1313:1046 length=100
BCFFFFFHHHHHIHIIIJGJFGHIJJIJJJJIIGGHIJIJJJFJIAHHIHHIJIIJJJJJGIJJIJJJIGIHHHHEHFFFEECECDA?CCDDDDCDDEEF
I am quite confused.
Thank you,
Statsteam
Comment