Thursday, August 7, 2014

Projects: Drugs, Diabetes, Politics, Walmart

I've finally found the time to post about some of the projects I worked on during my second semester in the MS in Business Analytics program at the University of Texas.

First is a project where we text mined an internet forum discussing illegal drugs. This is one of our visualizations, which shows which drugs are commonly taken together:



Next is a project where we analyzed tweets from US Senators to determine what gets retweeted and how Senators can increase their retweets. We learned that being Ted Cruz helps a lot. But even if you aren't Ted Cruz, things like time of day, topics discussed in the tweet, hashtags, and pictures affect the number of retweets.

Then there was a project where we predicted diabetes from electronic medical records. The data came from an old Kaggle competition, but on expert advice we changed our model to optimize for lift (determining likelihood of diabetes) rather than simple yes/no predictions as in the original contest. A big challenge in this project was determining how to deal with the 1000 or so attributes we had to analyze. The full report goes into great detail about different feature selection methods and how they performed relative to one another.


