Tuesday, January 28, 2014

Why did my bank say a legitimate credit card charge was fraudulent, but failed to detect actual fraudulent charges?

A friend of mine posted this to Facebook recently:

Modern life may be cushy, but it is not without its trials. Who hasn't experienced the frustration of getting that phone call from the bank (or worse, having their card stop working) because a charge on their card was flagged as possibly fraudulent? Experiencing false-fraud on a card is becoming more and more common--in a very informal survey of eleven of my nearby colleagues, eight have had fraudulent charges appear on their card (and four were affected by the recent Target data breach, but that's another topic altogether).

So what exactly is going on here? Why are banks failing to recognize truly fraudulent charges, but still plenty of fraudulent charges go through? The answer: because of the software.

Every charge you make on a piece of plastic goes through a computer program that attempts to determine if your charge is fraudulent. Into this program goes information about you, about the transaction you just made, about your location, about contact you've had with the bank recently, about pretty much anything the bank can find. This computer program then does one of three things:
1) Nothing (the charge is almost certainly not fraud).
2) Approve the charge, but notify the customer of suspicious activity (the charge might be fraud).
3) Deny the charge, and cut the card off until the customer is reached (the charge is almost certainly fraud).

The exact workings of how the input turns into one of the decisions at the end is something of a trade secret, but a lot of the general workings are very well understood.

The short answer is that the software to detect fraud actually learns from past patterns of fraud. It might be that the store you used your card at is historically 30% more likely to have a fraudulent transaction than the store next door, and this would factor into the decision. It might be that you are transacting online, and online transactions are 150% more likely to be fraudulent. It might be something far more complicated than either of these examples. The real truth is that the patterns the fraud-detection software "learns" are often far too complicated for humans to really grasp the significance of.  It is well-documented that this self-learning software is far better at things like fraud detection than a team of humans attempting to come up with rules on their own. All the humans know is that the software works well enough, and saves a lot of money by flagging fraudulent activity that would otherwise go undetected.

Obviously, there is some tweaking, inspection, and overriding done by the software's human masters. And banks have a lot of leeway to define where the line between clear fraud, possible fraud, and not fraud are (this is the part of the system that most involves human insight, and is most often the part of the system that makes or breaks its usefulness). But this gives a general idea of what's going on behind the scenes after you swipe your card.

So why does this fraud-detection software fail so much? Part of the issue is perception--we tend to focus on the mistakes the software makes, and don't ever consider how much worse a human would be. But the human masters are at work behind the scenes messing things up. Since the cost of missed fraud is often much higher than the cost of inconvenience for a false flag, even when you include the loss of customer goodwill, banks tweak their algorithms to be a bit overzealous in what is marked as fraud; they aim for the"sweet spot" between trying to catch as much fraud as possible while not angering their customers too much.

Ultimately, the short answer to my friend's question is, "We built an algorithm to build a fraud detection system; that algorithm went and built the actual algorithm to determine if a charge is fraud or not. The specifics of what the fraud detection algorithm flags are far less important to us than the algorithm's ability to be right enough to justify its existence."

But here we have two cases of the fraud-detection algorithm failing. So what happened to my friend? Let's begin with the recent Evernote and Netflix charges. Why would a fraud detection system flag these charges as possibly fraudulent? First, it is important to understand that most stolen credit card numbers are coming from large leaks like the Target breach, and not the theft of the physical card or your waiter at Friday's stealing your card information when he takes your card away to run the charges.

Now, put yourself in the mind of someone who has stolen credit card numbers, say a few thousand you purchased from some underground website. You know that only some of these numbers are going to work. You have the means to create a fake card from the information you have, but just walking into a store with one of your newly minted fake cards might not end well. Your purchase may get denied--or worse, the card might be already be marked as stolen and you'd be in a lot of trouble. So what do you do? You have a software script that you feed your stolen numbers into, and it goes and makes accounts for subscription-based web services. Why such services? Because there is a time based factor--you can use the fake account you made to check and see if the subscription still works the following month. And if the subscription works still, you know you have a goldmine: a credit card that works, and has an owner that doesn't pay much attention to their credit card statements. You're assured that this card probably won't be turned away if you use it in person, and if you're really tricky you can make multiple charges over a long period of time.

Because of behavior like this, subscription-based web services are going to be seen by the fraud-detection algorithm has more suspect than, say, groceries at Aldi. Throw in a few other factors (that my friend was abroad at the time and presumably her card company knew that, and possibly her activity was erratic on top of that) and suddenly your transaction is getting flagged. Incorrectly.

But wait....shouldn't the fraud detection algorithm be able to see these are charges that have been happening for months and months? Yes, it should. But it is important to understand that the whole fraud-detection process has to happen very quickly. Processing times for credit cards are measured in seconds, and a lot of that time is transmission of the information to the processing center. The software that detects fraud generally only has milliseconds to make a decision. This obviously complicates things, and makes the engineering of these systems extremely complicated. Verifying every charge that comes in from a subscription-based service against a search of the past month of card activity is slow--it involves looking into a database of past transactions, and a lookup in a database that big is going to be too slow (this is changing, and there are ways to sort of get around this, but that's a digression).

Now what about four Aldi's in one day? Well, my friend said that was a year ago, and since these algorithms get better over time that could be one explanation. Another is that perhaps four of the same chain in one day isn't so weird--there are a lot of psychotic coupon-clippers out there. And I bet we all can name a college student who has eaten Chipotle four times in a 24-hour period. While I believe Aldi doesn't take coupons, it's very likely the fraud detection system doesn't know that.

And I can say this for certain--back in 2013, spending $2,000 four Aldis in one day was not as fraud-like of a pattern as two subscription-based Internet services in early 2014 (a subtle point: the definition of fraud-like changed from 2013 to 2014 as the algorithm continued to learn). Fraud-detection algorithms, above all else, are a product of their past successes and failures. The larger point to be made is we are not as smart as we think we are--unfortunately, overall, the algorithm is going to be better at fraud detection than any human. But when the algorithm is wrong, we can take comfort in the fact that we're not completely outdated (yet?).