So I am a beginner when it comes to programming and python and such. But I think I have a very simple question.
I have large tab-delimited files that for example contain lines like this:
10000 7
20000 1
30000 2
60000 3
What I want to have, is a file that also contains the 'missing' lines, such as this:
10000 7
20000 1
30000 2
40000 0
50000 0
60000 3
The files are rather large as I am working with whole genome sequence data. The first column is basically a position in the genome and the second column is the number of SNPs I find within that 10kb window. However, I don't think this information is even relevant, I just want to write a simple python code that will add these lines to the file by using if else statements.
So if the position does not match the position of the previous line + 10000, the 'missing line' is written, otherwise the normal occurring line is written.
I just foresee one problem in this, namely when several lines in a row are missing (as in my example). Does anyone have a smart solution for this simple problem?
Many thanks!
I have large tab-delimited files that for example contain lines like this:
10000 7
20000 1
30000 2
60000 3
What I want to have, is a file that also contains the 'missing' lines, such as this:
10000 7
20000 1
30000 2
40000 0
50000 0
60000 3
The files are rather large as I am working with whole genome sequence data. The first column is basically a position in the genome and the second column is the number of SNPs I find within that 10kb window. However, I don't think this information is even relevant, I just want to write a simple python code that will add these lines to the file by using if else statements.
So if the position does not match the position of the previous line + 10000, the 'missing line' is written, otherwise the normal occurring line is written.
I just foresee one problem in this, namely when several lines in a row are missing (as in my example). Does anyone have a smart solution for this simple problem?
Many thanks!
Comment