I downloaded a file from 1000 genomes.
ftp://ftp.1000genomes.ebi.ac.uk/vol1....filt.fastq.gz
The contents look like this:
@SRR040813.1390 VAB_0100_20091102_2_SL_AWG_TG_NA19150_000pA_01003244543_12_21_53/1
G22300322011100202023200032002..200.330022102.22002
+
!'1)#$)&5&'$''%''3&'&,24)&*&2,!!',4!11,4)&&8%!),#52
@SRR040813.1391 VAB_0100_20091102_2_SL_AWG_TG_NA19150_000pA_01003244543_12_21_64/1
G20221312101332122023110001200..200..10032100.20000
+
!#)(+)(0+%))3&*)&0&+1(&&&)'2:7!!66(!!845/)<8.!.&)5)
As per my knowledge, the second line should be a sequence of A,T, G, C, N. Is this a corrupted file or is it a different way of representing the gene data in fastq file?
ftp://ftp.1000genomes.ebi.ac.uk/vol1....filt.fastq.gz
The contents look like this:
@SRR040813.1390 VAB_0100_20091102_2_SL_AWG_TG_NA19150_000pA_01003244543_12_21_53/1
G22300322011100202023200032002..200.330022102.22002
+
!'1)#$)&5&'$''%''3&'&,24)&*&2,!!',4!11,4)&&8%!),#52
@SRR040813.1391 VAB_0100_20091102_2_SL_AWG_TG_NA19150_000pA_01003244543_12_21_64/1
G20221312101332122023110001200..200..10032100.20000
+
!#)(+)(0+%))3&*)&0&+1(&&&)'2:7!!66(!!845/)<8.!.&)5)
As per my knowledge, the second line should be a sequence of A,T, G, C, N. Is this a corrupted file or is it a different way of representing the gene data in fastq file?
Comment