Hi all
I report here a strange behavior I experience on both OSX10.6.8 and centos 5.7 with latest software installed (macport and yum respectively)
If I download archived versions of the human reference genome from the Broad ftp (bundle 1.2)
and run the command:
I get it all OK until the end and suddenly pages of binary garbage occur
I thought this had to do with corruption of the archive and I did the following:
expand the archive back to multifasta
reformat the fasta content using bioperl (in and out as fasta)
recompress with razip (samtools 1.18)
repeat the zgrep command
I get the garbage again!!
Is this normal and due to some specificity of razip?
## below a series of commands and results
I report here a strange behavior I experience on both OSX10.6.8 and centos 5.7 with latest software installed (macport and yum respectively)

If I download archived versions of the human reference genome from the Broad ftp (bundle 1.2)
and run the command:
Code:
zgrep ">" human_b36_both.fasta.gz
I thought this had to do with corruption of the archive and I did the following:
expand the archive back to multifasta
reformat the fasta content using bioperl (in and out as fasta)
recompress with razip (samtools 1.18)
repeat the zgrep command
I get the garbage again!!
Is this normal and due to some specificity of razip?
## below a series of commands and results

Code:
# sorry for the length
$> razip -c human_b36_both.fa > human_b36_both.fa.gz
$> zgrep ">" human_b36_both.fa.gz
>1
>2
…
>NT_113899
>NT_113965
>NT_113898
>NC_007605
?f???,ixAp_?Òoё\G?C?Nm????R??[D?;?œ?)?pj5X??UL?`??mM?l%??ZºŐP?BI??W??d???HCoo??DS?ѷivfq(??X??U???w??? }?C"R???¿?.???????.\???,??7???bҳ*?k??F?b?l!??M??Ս??[???D?T?NfJ?8Ɉ?f?p??cGm
?<?:vRv?Hd?ղ???C.??߉?ye???N?
U?4CY???w??<??v!?@?o?w;??İ?xHD?b+????|????e9???D?
??:??(fUY???m??jL??o?»B}??!;?X?c????` ?\Y4??)ß????<?/Þ?@@j!'?Y?B?2???"?$? I???9_?5??=묦???H1?l??Q??|?{??6[????G?a;:&??gw?<e?u2???R,]?P?%?Vd')Y?_K?ae
-Z ʂ@??g?YvF?By??q?'??m?Z&? (~5?ʈ???????3??8{?W?j?? 7_?L??-??r?kԊRb??g?8?<?6??$???K??M?-
?$H?k?r??v%Jp˴lںxSJ
?q????Khg? db?>??b?q`E?RJ ?~lH?????%?m.???X?+??t?ߒ??%̽ @??ޫ?)`?it[?w??:?݃ݓY
?P3fg$j?????t?>??e?9?n?5?????y23?2WgT?f?*?=l??`ԊU?C??????P?TO
??~?4dg?mq&z3??ZJ?qP-??j??r?*??????20?;vRe*?B??LD]
#(… many many such pages of trailing binary garbage)
# while the tail of the fasta file is clean
$> tail human_b36_both.fa
ATGGGGGGCCGCGCATTCCTGGAAAAAGTGGAGGGGGCGTGGCCTTCCCCCGCGGCCCCC
CAGCCCCCCCGCACAGAGCGGCGCTACGGCGGGCGGGCGGCGGGGGGTCGGGGTCCGCGG
GCTCCGGGGGCTGCGGGCGGTGGATGGCGGCGGACGTTCCGGGGATCGGGGGGGTCGGGG
GGCGCCGCGCGGGCGCAGCCATGCGTGACCGTGATGAGGGGGCAGGGTCGCAGGGGGTGT
GTCTGGTGGGGGCGGGAGCGGGGGGCGGCGCGGGAGCCTGCACGCCGTTGGAGGGTAGAA
TGACAGGGGGCGGGGACAGAGAGGCGGTCGCGCCCCCGGCCGCGCCAGCCAAGCCCCCAA
GGGGGGCGGGGAGCGGGCAATGGAGCGTGACGAAGGGCCCCAGGGCTGACCCCGGCAAAC
GTGACCCGGGGCTCCGGGGTGACCCAGCCAAGCGTGACCAAGGGGCCCGTGGGTGACACA
GGCAACCCTGACAAAGGCCCCCCAGGAAAGACCCCCGGGGGGCATCGGGGGGGGTGTTGG
CGGGGGCATGGGGGGGTCGGATTTCGCCCTTATTGCCCTGTTT
# indexing the archive works fine
$>samtools faidx human_b36_both.fa.gz
$>cat human_b36_both.fa.gz.fai
1 247249719 3 60 61
2 242951149 251370554 60 61
...
NT_113898 1305230 3143346495 60 61
NC_007605 171823 3144673490 60 61
# extracting the last record also works and ends just like the tail above
$>samtools faidx human_b36_both.fa.gz NC_007605
>NC_007605
AGAATTCGTCTTGCTCTATTCACCCTTACTTTTCTTCTTGCCCGTTCTCTTTCTTAGTAT
GAATCCAGTATGCCTGCCTGTAATTGTTGCGCCCTACCTCTTTTGGCTGGCGGCTATTGC
CGCCTCGTGTTTCACGGCCTCAGTTAGTACCGTTGTGACCGCCACCGGCTTGGCCCTCTC
ACTTCTACTCTTGGCAGCAGTGGCCAGCTCATATGCCGCTGCACAAAGGAAACTGCTGAC
…
GGGGGGCGGGGAGCGGGCAATGGAGCGTGACGAAGGGCCCCAGGGCTGACCCCGGCAAAC
GTGACCCGGGGCTCCGGGGTGACCCAGCCAAGCGTGACCAAGGGGCCCGTGGGTGACACA
GGCAACCCTGACAAAGGCCCCCCAGGAAAGACCCCCGGGGGGCATCGGGGGGGGTGTTGG
CGGGGGCATGGGGGGGTCGGATTTCGCCCTTATTGCCCTGTTT
Comment