Tuesday, October 22, 2013

We're building a database of US flight on-time performance

As part of the classwork in the University of Texas at Austin Business Analytics program, we're doing basic database design. Simple relational databases are something I'm already familiar with, so I teamed up with another experienced colleague of mine to do something a little more interesting.

The US Department of Transportation maintains a dataset of every US flight's on-time performance going back to 1987. We're building a database to hold this information, with the idea of doing some analysis later on. And once we've implemented a traditional relational database, we're going to do the whole thing again as a graph database, using Neo4j, and then compare its functionality to the relational database. Should be interesting.

We made a beautifully nerdy video of our relational database design. Watch and comment!





Thursday, October 10, 2013

"Evil" Analytics and a Code of Ethics

While talking with a seasoned developer at an Austin Startup Week event last night, the topic of data science ethics came up. He mentioned a project he was familiar with that involved using social media data to identify which of your competitor's employees would be easiest to poach. While such a project is not on the level of what the NSA is up to, it certainly raises the issue of how easy (and tempting) it is to use data for questionable purposes.

Ethics is often casually mentioned when discussing the impact of big data, but rarely is ethics given anything more than a cursory acknowledgment. However, the ethical implications of big data are staggering and need to be seriously discussed. It is better to have this tough conversation now, rather than wait until it can't be ignored. Indeed, if tales of ethical lapses on the part of data scientists pile up, the damage to the profession could be irreversible--we'd find ourselves in the position of bankers, but with less pay and no political connections. Now is the time to lay the rules down, so that data projects violating mainstream ethical standards can be labeled as such, and their negative impact to the field lessened.

Via a colleague of mine in UT Austin's Business Analytics program, I learned about a recent effort to establish a set of ethical standards for data science. While there's been a recent proliferation of data science/analytics/big data organizations, hopefully the focus on ethics will make this attempt successful. And you can join for free. Let's finish this conversation now, before people outside the data science field finish it for us.

Saturday, October 5, 2013

The Healthcare Big Data Goldrush

With the Affordable Care Act (Obamacare) in the news so much recently, I've been thinking a lot about my own past experiences in healthcare, and what I saw (and didn't see). Then I saw this New York Times blog post. It's quite a read, but the big takeaways for me are that doctors are losing their sanity and confidence, while research continues to show that the demeanor and confidence of a doctor has far greater results on outcomes than expected.

A combination of irrational exuberance and economic forces (including legislative) have created something of a big data gold rush in healthcare. And like any good gold rush, you're going to see unorganized growth, heartbreak, fortunes made and lost, and more than a few dead bodies.

The truism being thrown around is that healthcare is the next industry to be transformed by data, and that data is going to change healthcare more than anything else (even, perhaps, than Obamacare) over the next decade. And while both proclamations might be true, they fundamentally strip away the most basic aspect of healthcare--the relationship between the patient and the doctor, the sick and the well, the dying and the living, the needy and the needed. No amount of analysis, legislation, technology, or bureaucracy can get around this fact. And yet, it seems to be ignored more each day, sucked under a tidal wave of implementations, electronic health records, politics, "innovations", and process improvement.

Banking, retail--these are industries where human relationships were never at the core, and it's probably not a coincidence that they have responded so well to going under the big-data knife. The continual attempts to humanize these industries ("relationship banking", the ad industry) is a testament to how inhuman money and consumption can be. Healthcare, fundamentally, is very much about the human connection. And while there's certainly potential for data to improve the quality of care, as we fearlessly march forward into our data-driven healthcare future, the human element is getting pushed further and further away from the center.

I do not mean to imply that data has nothing to offer healthcare. It clearly does. What I am saying is that as data professionals we cannot forget that the problems we attempt to solve are, fundamentally, not about data in the end. To forget this is to do a disservice to our colleagues, our customers, and society. In retail and finance you can do a lot without ever thinking about the living, breathing people at the opposite ends of a transaction. But healthcare will not be so accommodating to data-centric problem solving. And yet, as existing data applications in healthcare turn out to not be as fruitful as hoped, the industry's answer is more data!

This is bigger than healthcare. If the big-datification of healthcare turns out to be a turkey, think about what that means for the future of analytics and big data.