GodSend In the life of a Computer science grad student

18Jan/113

Solving the Orbitz Coding Challenge

The Orbitz coding challenge was announced sometime mid-2010, and I heard about it through a department mailing list. I don't think Orbitz spent a lot of effort in publicizing it, because the whole design of the contest webpage seemed ad-hoc and naive, with Google docs descriptions and no links to www.orbitz.com, but just their logo! I was slightly apprehensive about participating because of crowd and had second thoughts about the effort. To add to this misery, the rules mentioned that they would randomly pick 10 winners from all correct solutions! - I suck at chance. The contest was targeted at students studying in any US university, to earn themselves of free round-trip airfare within the US to relax :)

The Problem: The style of the contest was quite familiar - a given problem and associated input formats, guarantees, etc. The problem itself dealt with finding the shortest one-way itinerary between the given origin and destination airports. The knowledge base is a list of all flights between the ~100 airports, each entry accompanied by its timings and unique flight numbers. The required output is a set of 5 best itineraries where the comparison metric was (in that order) least total duration, least layovers (cities on the way) and smallest flight numbers(?). We were provided a small sample database of flights and a handful of sample queries and their correct responses for testing, and the actual magnitude of the real tests were unknown. Thus, one of the main motivations was to tighten the solution as much as possible. And finally, the allowed programming languages were Python, Ruby and Java.

My solution: I chose to implement this as a Djikstra-like graph shortest path solution. Hence intuitively each city (airport) would be a node in the graph and flights would provide the directed edges between them. This can be implemented using a priority queue (standard idea). However, an interesting aspect of this particular problem is that not all incoming flights at an airport can be paired with every outgoing flight. This is because of their actual arrival and departure times respectively which limits the possible pairings. Therefore a flight arriving at time t at an airport can only be followed by flights departing at time greater than t at the same airport (the minimum layover time was defined to be 1 minute).

Although this means that the total number of possible paths between the origin and destination are limited, the search had to be context driven and it is not possible to discard airports as useless using a single flight path. It could definitely be possible that a flight path through an airport have a very high cost, but a flight that reaches earlier (but discovered later) could very well have a better cost. I settled for maintaining the whole flight path in every search path and contextually identify matching flights. As in Djikstra's solution, these paths were still being maintained in a priority queue and only the best were picked at each iteration for inspection. I did give this some extra thought for optimization possibilities such as a maximum limit on the best paths that would be maintained for each airport, but its required code looked cumbersome and inefficient, and moreover hard to prove. My implementation was in Python (as I was already familiar with it, and I definitely prefer it over Java!). Each flight had its own class object and hence the itinerary object just had to keep pushing the flights into a list. The itinerary object eventually went into the priority queue.

Optimizations: Of course, I optimized everything out of the critical loop handling node expansions. I obtained the most performance gains from profiling my code using the inbuilt Python profiler. I realized immediately that the itinerary object was too big and its deepcopy (during expansion at a node) was consuming the most time in a run. Removing the class data fields one by one and replacing their usage with method calls, I realized that recomputing them was actually faster than storing and copying them everywhere. The final fastest implementation I obtained was probably not what I would have thought of as good looking code, but that was what I wanted! (forget Software Engineering temporarily). In total, I reduced the time by almost half.

Winners were supposed to be notified sometime in November last year, and hence I conveniently assumed that I had not won, and I was not able to find any related news articles on the web either. I got an email today morning from Orbitz recruitment that I had won along with my free round trip voucher! I am still unaware of other winners and hope to hear some statistics! And finally, I hope to use this to travel to some new place that provides excellent opportunities for photography and something new&different.

I don't want to post the solution here because of complex legal agreements, but if you are really interested, send me an email (nkarthiks@gmail.com) and we can talk.