Was encouraged to participate in the MIT Big Data Hackathon at Hack/Reduce in Cambridge, MA this weekend by a friend Ashwini Kumar, principal engineer at Senscio Systems. The very idea of signing up to work into the wee hours amongst the super talented people one would imagine would be there seemed both intense and pretty intimidating. But he’d been to these before and assured they are really positive sessions from which you learn a lot you’d never expect. Plus my kids thought the idea was cool. So I was in. And wow, they were right.

The event started with a social mixer and a few engaging talks by four firms sponsoring the weekend.

  • Sid Probstein of Knowledgent demonstrated how the simplicity of creating data lakes and crunching diverse data with Hadoop is making previously untouchable pharma marketing analyses something a team can do in a couple weeks.
  • Alan Wagner of Tamr showed how the new start up by Michael Stonebraker (founder of Vertica and Volt) is revolutionizing how companies synthesize data silos across big enterprises, and spoke about how Tamr makes it feasible for pharma companies to sync their clinical trial data with CDISC standards.
  • Gregor Stewart of Basis Technology went into how they help companies match identities, so people can know for instance, if the prospective AirBnb guest at their condo is the same or different person as a notorious scamster on Kickstarter with the same name (yikes!)
  • Madison May of Indico.io on their sentiment analysis API for text and photos

But these folks weren’t mere keynote speakers. Their firms made their APIs available to the participants, and would be the judges the next evening.

Hack/Reduce is a perfect space for an event like this – open brick, big windows, space, chairs, tables, screens, power outlets, fast wifi – expertly coordinated by Evgeniya and her fellow MBA students from the MIT Sloan school. All went precisely on schedule.

After the talks, participants were invited to give a 30 second pitch for a project ideas. A scientist at MGH suggested doing something with the new Apple ResearchKit API. I had a public healthcare dataset I’ve been wanting to build on, etc. and then people got talking. Groups coalesced. People mingled. Ideas combined. Within a couple hours, teams formed around kernels of ideas all of which were very different from those originally pitched.

People came from a wide array of backgrounds – software, finance, clinical trials, art, college, research. Our team was Brice Lemke, Jing Lin, Michael Yu, Chang Liu, Ashwini Kumar and me. It was a great mix - from college freshmen to grad students to working professionals. We converged on the idea of using sentiment analysis with Indico and Basis, content from Twitter and NYTimes, and consumer behavior from BestBuy, and taking a polyglot approach with tools like R, Shiny and Python. We moved to Panera in Kendall Square.

By closing time we had a GitHub repo setup, developer accounts at BestBuy, Twitter and NYTimes and were gathering data via their REST APIs. Having worked in the corporate world, you can imagine the sheer excitement of starting code without first writing unit tests, conducting market research, or scheduling meetings to get a budget. Pure bacchanalia.

At 11pm we agreed to reconvene at 9am. I went home, checked in some changes and merged with other work before turning in. Back early Saturday morning to find the rest of team – including one my age, a dad with kids and a very understanding spouse! – worked through the night somewhere along the infinite corridor of MIT.

Saturday was a fun mix of progress, setbacks and banter. Dev queries inadvertently reached API throttle limits etc. In the end it all worked.

Key Learning: Take a screenshot every time you get visible progress. You never know if a merge issue or API throttle will halt things five minutes before the final presentation.

Six teams presented, everyone a cool learning experience to see, e.g.:

  • Plotting trends of sentiment of news articles about Ukraine by automatically analyzing all news articles online by date.
  • Using Apple HealthKit to run observational clinical trials on healthy patients to see who might get a disease, complementing ordinary trials that focus only on patients who already have it.
  • Solving the problem of a group choosing a decision that satisfies the current preferences of people within it.
  • Automatically choosing among alternative profile photos for different uses, whether job applications or social media, or branding.
  • Comparing the similarity of different sentiment analysis engines against common benchmarks.

Beyond the technical prowess, each team managed to pull together a polished presentation.

As it happened our team won first place .. yay! Beyond the cash prize, the really cool prizes are a years’ subscription to the sponsor’s APIs and a chance for mentorship from Andy Palmer, founder of Vertica.

Before leaving we talked about deploying our site to the cloud. But mixing Python, R, Shiny server, and sorting out various undocumented dependencies and hardcoded paths was sure to make that a hassle. Madison May of Indico.io who sponsored the event looked at his watch and said he had twenty minutes… as in “…Totally! Why not?” And it worked.

That was the amazing thing about this event – being among people who delight in just doing things and seeing what’s possible.

Is a Hackathon a thing in fields beyond software? Are there events where interested people with diverse skillsets assemble to imagine solutions to political, business, or environmental challenges? There certainly should be. And I hope there are!