Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • visse226
    Junior Member
    • Nov 2016
    • 9

    Add 'missing' lines of data by using python code

    So I am a beginner when it comes to programming and python and such. But I think I have a very simple question.

    I have large tab-delimited files that for example contain lines like this:

    10000 7
    20000 1
    30000 2
    60000 3

    What I want to have, is a file that also contains the 'missing' lines, such as this:

    10000 7
    20000 1
    30000 2
    40000 0
    50000 0
    60000 3

    The files are rather large as I am working with whole genome sequence data. The first column is basically a position in the genome and the second column is the number of SNPs I find within that 10kb window. However, I don't think this information is even relevant, I just want to write a simple python code that will add these lines to the file by using if else statements.

    So if the position does not match the position of the previous line + 10000, the 'missing line' is written, otherwise the normal occurring line is written.

    I just foresee one problem in this, namely when several lines in a row are missing (as in my example). Does anyone have a smart solution for this simple problem?

    Many thanks!
  • wdecoster
    Member
    • Oct 2015
    • 97

    #2
    An easy solution would be to loop over the file and have a variable 'previous':

    !Untested sample code generated by tired coffee deprived me:

    Code:
    previous = 0
    for line in file:
        now = line.split('\t')[0]
        if  now != previous + 10000:
            for n in range(previous + 10000, now, step=10000):
                print(n + "\t0")
        print(line)
        previous = now

    Comment

    • visse226
      Junior Member
      • Nov 2016
      • 9

      #3
      I will try this soon, definitely!. It always looks so simple in the end but writing it yourself is still a struggle when you've only just started figuring out coding. Thank you so much I might come back to it!

      Comment

      • visse226
        Junior Member
        • Nov 2016
        • 9

        #4
        If I do this though I get an error that the range function does not take keywords as arguments. Not sure how to solve this yet

        Comment

        • CHObot
          Member
          • May 2013
          • 11

          #5
          I won't write out the code since I have to go to a meeting, but you could also take advantage of the power of Pandas data frame objects. If you are new to Python, learn Pandas as soon as possible.

          But you could create a data frame of one column that contain the values:
          10000
          20000
          30000
          ...
          max_value

          Then create a data frame object of your actual values. Then you simply do a "join" on the two tables and it will fill in the missing values by virtue of joining the 2 tables.

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          24 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          29 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          39 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          62 views
          0 reactions
          Last Post SEQadmin2  
          Working...