Reddit’s Interesting Challenges for Software Engineer Hires

Reddit announced:
Earlier this week we announced four new hires, and today we’d like to get started on the next batch: We’re hiring three more engineers! Ideally, we’d like to get a frontend programmer, a backend programmer, and someone in between.

Companies hire people all the time, well except Reddit, it’s kind of a big deal for them, but what makes this particular ’round’ interesting is the way it’s going through the job application process.

How to apply

Usually the first step of an application process is to solicit resumes. Candidates are forced to boil years of work down to a few bullet points, attempting to demonstrate what sets them apart without being overly verbose or picking the wrong font. And writing cover letters — yuck! You stare at your email composition window, sweating over every word and punctuation mark. Do I sign it “Yours” or “Sincerely”? If I pick the wrong one they won’t hire me!

And then we have to read through hundreds of resumes and cover letters (even though the very fact that we’re hiring means we have a big backlog of other stuff that needs to get done) and pass them around and scratch our heads, trying to figure out who’s the real deal and who’s dead-wood-plus-exaggeration. It’s like trying to pick the best cellphone by comparing the manufacturers’ press releases.

Instead of first doing all that, and then bringing people in to see if they can code, we’re going to do the opposite. So at this first step of the process, we’re not yet interested in your resumes or cover letters or references or GPAs. We’ll address that if you survive to the second stage; the first thing we want to do is narrow it down to the hackers.

So we’ve prepared two challenges. They both reflect real-world problems that we’ve had to solve — one at the beginning of reddit’s existence, and one that arose when the site became really popular. The first is targeted at front-end wizards, those who might not know how to write database code but wow are they a UI master. The second is for the kind of person who prefers a dark basement and a Unix prompt, someone who hates having to touch the mouse and who might be allergic to CSS.

Pick the one that best suits your talents and see if you can tackle it. Don’t do both.

1- Frontend challenge

We want you to build a reddit clone entirely in HTML, Javascript, and CSS. It will maintain its state entirely client-side (HTML5 localstorage, cookies, whatever), and it’s fine for it to be single-user. In fact, we want to leave as much of this challenge open to interpretation as possible.

The goal here is to show off your ability to make a slick website, not to make something that we’re going to deploy in production, so you don’t have to worry about scaling, spam, cheating, or even making it browser-portable. If there’s some really neat thing that you need Javascript list comprehensions for, or your textareas look best with -moz-border-style:chickenfeet, go ahead and use it. We’ll defer the drudgery of cross-browser testing and compatibility hacks for when you’re on the payroll; for now, just tell us what OS and browser to use (within reason) and that’s the one we’ll use to judge your work.

2- Backend challenge

Like all websites, reddit keeps logs of every hit. We roll them every morning at around 7am and keep the last five days uncompressed. Each of those files is about 70-72 GB. Here’s a sample line; IPs have been changed for privacy reasons and linebreaks have been added for legibility:

Feb 10 10:59:49 web03 haproxy[1631]: 10.350.42.161:58625 [10/Feb/2011:10:59:49.089] frontend
pool3/srv28-5020 0/138/0/19/160 200 488 – – —- 332/332/13/0/0 0/15 {Mozilla/5.0 (Windows; U;
Windows NT 6.1; en-US; rv:1.9.2.7) Gecko/20100713 Firefox/3.6.7|www.reddit.com|
http://www.reddit.com/r/pics/?count=75&after=t3_fiic6|201.8.487.192|17.86.820.117|}
“POST /api/vote HTTP/1.1”

We often have to find the log line corresponding to an event — a “you broke reddit” or a weird thing someone saw or to investigate cheating. We used to do it like this:

$ grep ‘^Feb 10 10:13’ haproxy.log > /tmp/extraction.txt

But as traffic grew, it started taking longer and longer. First it was “run the command, get a cup of coffee, check the results.” Then it was, “run the command, read all today’s rage comics, check the results.” When it got longer than that, we realized we needed to do something.

So we wrote a tool called tgrep and it works like this:

$ tgrep 8:42:04
[log lines with that precise timestamp]
$ tgrep 10:01
[log lines with timestamps between 10:01:00 and 10:01:59]
$ tgrep 23:59-0:03
[log lines between 23:59:00 and 0:03:59]

By default it uses /logs/haproxy.log as the input file, but you can specify an alternate filename by appending it to the command line. It also works if you prepend it, because who has time to remember the order of arguments for every little dumb script?

Most importantly, tgrep is fast, because it doesn’t look at every line in the file. It jumps around, checking timestamps and doing an interpolative search until it finds the range you’re looking for.

For this challenge, reimplement tgrep. You can assume that each line starts with a datetime, e.g., Feb 10 10:52:39 and also that each log contains a single 24-hour period, plus or minus a few minutes. In other words, there will probably be one midnight crossing in the log, but never more than one. The timestamps are always increasing — we never accidentally put “Feb 1 6:42:17” after “Feb 1 6:42:18”. And our servers don’t honor daylight saving time, so you can ignore that whole can of worms. [Edit: you asked for a script to generate a sample log, so we wrote one.]

You can use whatever programming language you want. (If you choose Postscript, you’re fired.) The three judging criteria, in order of importance:

1. It has to give the right answer, even in all the special cases. (For extra credit, list all the special cases you can think of in your README)
2. It has to be fast. During testing, keep count of how many times you call lseek() or read(), and then make those numbers smaller. (For extra credit, give us the big-O analysis of the typical case and the worst case)
3. Elegant code is better than spaghetti

Final points

* When you’re ready to submit your work, send a PM to #redditjobs and we’ll tell you where to send your code. You can also write to that mailbox if you need clarification on anything.
* We’d like all the submissions to be in by Tuesday, February 22.
* Regardless of which project you pick, we ask you to please keep your work private until the end of March. After that, you can do whatever you want with it — it’s your code, after all!
* Graduating college seniors are welcome to apply: for an amazing candidate, we’ll wait a few months. But we’re not going to let anybody quit school to work for us.
* Some of you might be thinking, “I can’t believe reddit is going to make all these poor applicants slave over a hot emacs for two weeks just for the privilege of being allowed to apply for a dumb old job.” Well, first off, it’s supposed to be fun. If you don’t see the joy in either of these puzzles, please don’t apply. And second, we’re not expecting anyone to spend weeks on this, or even days. We aimed to make the challenges something that could be put together in a weekend by the sort of programmer we’re looking for. And these people do exist — this guy wrote a reddit clone in assembly over the course of two evenings with a dip pen. Okay, not with a dip pen. But still, quit yer yappin.