Originally posted by shawpa
View Post
your input files are probably quite big so that sorting fails. The script is sorting using the operating system's sort command, which defaults to the /tmp/ directory. This could be changed in the script with the sort -T option
Code:
-T, --temporary-directory=DIR use DIR for temporaries, not $TMPDIR or /tmp; multiple options specify multiple directories
Here are the changes to the script i have implemented:
Added an option '--split_by_chromosome' to enable sorting of very large files. The methylation extractor output is first written into temporary files chromosome by chromosome. These files are then sorted by position and deleted afterwards.
Added an option '--counts' which adds 2 more lines to the bedGraph output file:
Column 5: count of methylated calls per position, and
Column 6: count of unmethylated calls per position.
Technically, this renders the output to be no longer in bedGraph format, but it might enable additional calculations with the output.
Comment