Since I started working at Xandem, I have to keep track of the hours I work and enter them into a time card. I have no problem writing the times I come and go, but it is annoying to then have to calculate how many hours I actually worked. Sometimes, the amount of hours can get complicated, say for example, you come into work in the morning, leave in the afternoon to run some errands, come back in the evening, and take a 10 minute break. The math is not difficult, it is just monotonous.
I decided to create a script that could take care of this repetitive task for me. As I thought more about how to accomplish this, I noticed that I have a specific way of writing the time that I worked. On a normal day, my time worked might look something like “8:05 AM to 4:12 PM”. If I take a lunch, then it might look like “8:05 AM to 4:12 PM - 45 min”. I wanted this script to be flexible enough that I could handle any combination of time ranges (“8:05 AM to 4:12 PM”) and time adjustments (+/- 45 min). As I thought about how I would parse this time text, some of the things I learned from my compilers class started coming back to me. I decided that I would try to implement the ideas from that class (as well as I could remember).
First, I needed to come up with a grammar for my “time language”. I wrote out a bunch of examples of how I would want to use this mini-language and decided on this1 :
<COMPOUND_STATEMENT> : <STATEMENT> (and <COMPOUND_STATEMENT>)* <STATEMENT> : <ABS_TIME> to <ABS_TIME> (<REL_OPERATOR> <REL_TIME>)* <REL_OPERATOR> : -, + <ABS_TIME> : XX:XX AM|PM <REL_TIME> : X+ min|hour|mins|hours
Next, I needed some way of tokenizing the input into the tokens of the grammar. After doing some research, I found an article that describes an undocumented class,
Scanner, in the
re module. The article does a good job of describing how the
Scanner works and gives an example on how to use it, but essentially you can give the
Scanner class a list of regular expression and function pairs. When it matches the given regular expression, it calls the associated function. The function can return anything, but in terms of tokenizing, it would return some kind of token to be used for parsing later on. Here is the scanner that I came up with for my time parsing problem:
# tokenize functions abs_time = lambda scanner, token: ("ABS_TIME", token) rel_time = lambda scanner, token: ("REL_TIME", token) rel_operator = lambda scanner, token: ("REL_OPERATOR", token) abs_operator = lambda scanner, token: ("TO") if token == "to" else ("AND") # tokenizer scanner = re.Scanner([(r"(0?[1-9]|1[0-2]):[0-5][0-9]\s*?(AM|PM)", abs_time), (r"\d+\s*?(mins|min|hours|hour)", rel_time), (r"to", abs_operator), (r"and", abs_operator), (r"-", rel_operator), (r"\+", rel_operator), (r"\s+", None)]) tokens, remainder = scanner.scan(input_)
The results of “8:05 AM to 4:12 PM - 45 min” look like:
[('ABS_TIME', '8:05 AM'), 'TO', ('ABS_TIME', '4:12 PM'), ('REL_OPERATOR', '-'), ('REL_TIME', '45 min')]
Now that we have a list of tokens, we need to combine them together to form some kind of output, in my case, the total hours worked. There are many different ways of doing this, but the general process I followed was removing items from the stack (list of tokens), verifying that they are the tokens I am expecting (based on the grammar), and combining them together. For example, to parse part of a
<STATEMENT>, I had
def process_statement(tokens, total_time): # removing items from the stack time1 = tokens.pop() op = tokens.pop() time2 = tokens.pop() # checking to make sure all the tokens are good if time1 != "ABS_TIME": raise Exception if op != "TO": raise Exception if time2 != "ABS_TIME": raise Exception # combine tokens together total_time += calc_time_diff(time1, time2) return process_next_stament(tokens, total_time)
To stay true to my compilers class, I tried to write this script as functional as I could (i.e. no loops, no global state, etc.). I think the script turned out well. It was fun and challenging to apply the principles of lexing and parsing to a small problem that I face almost every day.
I do not profess to being a programming language expert. As a result, the grammar and other ideas were my best effort to come up with something that worked. In no way is this the “best” or the “only” way to solve this problem. ↩