Hi all,
I've been trying to get around this problem for the last couple of days and |I haven't been able to do anything myself and haven't seen any solutions in any forums. Here's my problem:
My data in a single lane run of illumina Truseq with 24 indexed samples. All the steps have been run using a bash script so all files have been processed in exactly the same way with exactly the same parameters.
I have converted sam to bam, sorted, indexed and removed duplicates.
Next I index the files then I perform the realignment around indels
Then I fix the PE using picard
Then for some reason it asks me to add header info using picard.
I've done this for all files but when I've gone to the next step of quality count recalibration there are 5 files that fail.
When I count the reads (using bamtools) before and after adding the header info I go from 4,032,483 to 3,578,753 reads in the bad files but 4,625,944 to 4,625,834 reads in the other 19 files that have worked.
GATK keeps giving me an end of file EOF error and it looks like these 5 files are truncated but why just 5 out of 19 files processed in exactly the same way?
I kow this is a bit of a long winded question but has anyone else had a similar problem?
I've been trying to get around this problem for the last couple of days and |I haven't been able to do anything myself and haven't seen any solutions in any forums. Here's my problem:
My data in a single lane run of illumina Truseq with 24 indexed samples. All the steps have been run using a bash script so all files have been processed in exactly the same way with exactly the same parameters.
I have converted sam to bam, sorted, indexed and removed duplicates.
Next I index the files then I perform the realignment around indels
Then I fix the PE using picard
Then for some reason it asks me to add header info using picard.
I've done this for all files but when I've gone to the next step of quality count recalibration there are 5 files that fail.
When I count the reads (using bamtools) before and after adding the header info I go from 4,032,483 to 3,578,753 reads in the bad files but 4,625,944 to 4,625,834 reads in the other 19 files that have worked.
GATK keeps giving me an end of file EOF error and it looks like these 5 files are truncated but why just 5 out of 19 files processed in exactly the same way?
I kow this is a bit of a long winded question but has anyone else had a similar problem?