How hackers are making use of statistics to steal your password

Author: Carlos Alberto Gómez Grajales

I’m going to tell you a secret. For my first e-mail account, my password was the name of one of the lesser known released albums from the Seattle band Nirvana [1]. That e-mail is long gone, as well as my blind love for grunge, though the fact remains that one of my first ever protection against sneaky eyes was just a popular rock album. Many years ago, my e-mail didn’t contain any matters of relative importance, yet fast forward to the present and the situation is different, as I’m sure most statisticians can easily testify. Nowadays you handle important information from clients, code of your analysis and some stuff worth protecting.

But not only statisticians have to protect their e-mails. In our online interconnected world, passwords are important, as they protect your work, your identity and, sometimes, even your money. Many shopping sites allow you to make charges to your credit card with your user password alone. Both Apple and Google stores require nothing but this code to charge you money. And even if you’re not material, your social and public life could get ruined by some intrusion to your Facebook or Twitter profiles.

 

Seeing how important passwords are in our lives, it is rather unsurprising to witness the explosion of password related business. Protecting your password and managing them has become an industry: just as much as stealing has become lucrative as well. Hackers can receive up to 30 dollars for each retrieved valid password. Even though password hacking, that is, obtaining someone else’s password without his permission, isn’t something new, you might be surprised to learn how, in recent years, statistics and data analysis have helped develop new and faster algorithms for helping hackers get access to your accounts. As a statistician, I’m sorry to tell you this, but in the 21st century even hackers need analytics.

There’s a reason why statistics are helpful for hacking passwords: they are rarely random, meaning that some patterns can easily be spotted. And, as you should know, statistics are fairly good in identifying and studying patterns. You probably believe that the password you choose, that obscure and forgotten album from the 90’s, is fairly secure, yet research shows that many people actually do think the same. For instance, a researcher discovered that our card PIN (Personal Identification Number) codes are way less diverse than they should. That four digit number you use in the ATM should usually be a safe code you aim to protect, but research shows that mankind decides it is simply better to use safe, easy to remember passwords. The twenty most common PINs are: “1234”, “1111”, “0000”, “1212”, “7777”, “1004”, “2000”, “4444”, “2222”, “6969”, “9999”, “3333”, “5555”, “6666”, “1313”, “8888”, “4321”, “2001” and “1010” [2]. You can easily detect a pattern. Numbers in sequence tend to be very popular, as an estimated 11% of the population uses “1234″. That’s right, 11%. Repeating the same number 4 times seems also to be some sort of obliged form of password fashion. Anyone trying to outguess a card’s PIN can easily try with a few of these options. With a bit of luck, he will gain access to your account. Usually, it is not so dangerous if someone guesses your PIN number: they still need the card to make it work. This “two-step” security procedure is why only 4 numbers are required for the PIN.

Sadly, this “laissez-faire” approach to password creation is not exclusive to card PINs. Many websites that hold important, crucial information of us are protected with simple, easy to outguess passwords. This fact has been noticed by hackers, who have incorporated hacking techniques that take advantage of all these statistical analyses completed on password data.

Regarding hacking, you’ve probably heard of those software tools that use some form of “brute force” approaches for obtaining passwords. These tools rely on testing all possible combinations of letters and/or numbers, until they gain access to an account. Some free hacking programs can test millions of passwords a second. That sounds like a lot, but testing the trillions of possible combinations there are available of letters and symbols could take days before these tools obtained a password. Well, if the site allows them to test that many combinations, of course.

Since many sites give a limited number of tries, and waiting for days to hack a single account is somewhat boring, cryptanalysis and computer security software commonly use a technique called “Dictionary attack” [3]. What these means is that the software first checks a list of the most commonly used passwords, according to the latest statistics available. If none work, the software tries variations of some of these common passwords, either by adding a number or symbol at the end (“password123”, “password$”) or by exchanging some letters (“p@ssword”). If the search still fails, the algorithm goes on with a list of common dictionary words and it finally tries with variations of those words. This procedure dramatically reduces the search criteria, thus producing faster, cheaper results. And, as you can see, even the sites that force you to use numbers, letters, capitals or combinations can be easily tricked, since the weakest link is still the user who keeps using the same weak passwords. In fact, some people argue that forcing the users to use symbols and/or numbers can actually make them select easier to remember, simpler and more common passwords, which are fairly usual in the dictionary search [4].

One key aspect of the Dictionary Attack is the list of frequently used passwords and its variations. This list is updated frequently, according to statistics that allow the system to follow the trend of the latest security codes. But, how can these software developers know about the most common passwords? Well, that is thanks to the statistics gathered about passwords, though it is vital that some hackers get the analysts some datasets first.

In December 2009, it became public that the website Rockyou.com, a site that produces social media gaming platforms, had suffered a security breach, exposing the information of over 32 million user accounts [5]. The initial exploit took advantage of a trivial SQL injection vulnerability, a technique that had been well documented for over a decade before the breach. As a result, an unauthorized person obtained access to the full database of user’s information, which included, rather surprisingly, the passwords of each of these users’ accounts in plain text, without any proper security or encryption. Even worse, the site refused to fully acknowledge the breach, until the hacker posted the site’s 32,603,388 user names and plaintext passwords. The mere size of the breach turned the RockYou dataset into one of the most important sources, used even today, to gather information about passwords. Cryptographic software and dictionary based attacks took a lot of insights from this breach.

The most popular RockYou password was “123456”. Please act surprised. A reported 290,731 users were using that one, which is about 0.9% of all users. So, take a random sample of accounts and with this simple password, you can obtain access to about 1% of all accounts in a single try [4].

Not everyone chose the same basic passwords. There were many differences by age and gender. For men below the age of thirty, sex supplied most of the then current passwords: “tits”, “horny”, “696969”, all were fairly popular [4]. Curse words were also some hot passwords. Yes, that one you are just thinking about was within the top 10 passwords of the demographic. Oh yes, that second one was up there as well.

Humans love patterns so much that we can’t live without them. That’s the reason why we like puzzles so much, and some research even suggest that our pattern-recognition-loving brain is what makes us adore music…Our love for patterns also causes us to believe that we can create random passwords when, in fact, we are mostly following the same patterns and selections other people use. We may think that our unique password is well secured within and 80’s songs, when in fact millions of people are doing the same.

For older users, gender was not a factor. Both men and women usually relied on dated pop-culture references to protect their accounts. A very popular choice was “8675309”. That password looks fairly secure if you ask me, though I wasn’t born in 1981, the year in which Tommy Tutone released its single Jenny, in which he happily sang this password all along [6]. I’ll leave it to you to discover where the other very popular password in this group, “Epsilon793”, comes from.

There have been many breaches before and since the RockYou.com site, some more prominently displayed on the media. Still, the sheer amount of unencrypted passwords that RockYou shared with the world has constituted the most important source for passwords dictionaries up to these days (at least that we are aware off). Some of the more recent incidents and data breaches have merely reaffirmed the findings, as well as showing that bad passwords are really hard to overcome.

In 2013, Adobe announced that it was the target of a major security breach in which sensitive and personal data about millions of its customers have been put at risk. Adobe declared that the passwords that the hackers accessed were encrypted, meaning that they could not be so easily analyzed [7]. But that would not prevent password experts from decrypting some of the most common passwords. Of the more than 130 million Adobe accounts compromised, analysts claim around 6 million passwords were decrypted. The most popular, by far, and used by nearly two million Adobe customers, is “123456”. Again. Other top passwords include “123456789”, used for 446,162 accounts; “password”, linked to 345,843 accounts; “adobe123”, which is how 211,659 users protected their accounts and “12345678”, used for 201,580 accounts. Other popular choices were “qwerty”, “1234567”, “111111”, “photoshop” and “123123”.

As you can see in recent news, data breaches have become more common around the web, each providing some more data points to improve hacking algorithms. A Yahoo leak in 2012 revealed about 450,000 passwords [5]. Even the more recent Ashley Madison hack provided some insights as well. In this case, the passwords stolen were hashed, meaning they were cryptographically scrambled [8]. In consequence, most of the secure passwords were protected, though some of them were easily cracked from the dataset, just as Adobe’s were. And guess which passwords were the most hacked? If you thought on 123456 and password, you now know what your password cannot be.

Interestingly, not only bad people have learned from Password databases. Many security experts have also studied these datasets to promote security in their servers and applications. That is why many of the most secure websites forbid you to use certain passwords, those that are very high in these dictionaries.

Now you may ask, rightfully, since statistics can give an edge to hackers, what can we do about it? I might think that 90’s albums are very secure words, but what if enough people think about it to ensure that the next dictionary has a full American discography listed? Well, as a statistician, I can tell you what statistics can never crack: a totally random pattern. But please, don’t start typing incoherent letters in your keyboard, that’s not random at all. No matter how hard we try, it is universally accepted that humans cannot generate randomness themselves [9]. If we tried to write a series of “random” numbers in a paper, we will always end using some pattern, even if we don’t notice it. Humans love patterns so much that we can’t live without them. That’s the reason why we like puzzles so much, and some research even suggest that our pattern-recognition-loving brain is what makes us adore music [10]. Our love for patterns also causes us to believe that we can create random passwords when, in fact, we are mostly following the same patterns and selections other people use. We may think that our unique password is well secured within and 80’s songs, when in fact millions of people are doing the same.

So, let’s go with a fully random password. Don’t mind if you can’t make it yourself, many websites and apps will help you generate some truly random passwords, usually created from random atmospheric noise, creating a code that is way more secure than anything your mind can make up. The reason why full randomness is more secure lies in the fact that no single pattern in a dictionary can ever list your random password. In fact, it might be even good enough to protect from “brute force” attacks. With upper- and lowercase letters and numbers, there are sixty-two possible characters and that’s without counting punctuation marks. Let’s add 10 extra characters for those, as most sites do not allow you to use a great variety of marks. That means it would take 72^8 guesses to be certain of hitting an eight-character password. That’s about 722 trillion possible combinations if you don’t wish to do the math. Forget the punctuation marks. Just with letters, capital letters and numbers that’s over 218 trillion guesses, even the most sophisticated software would have to run for days to retrieve your password, which is long enough to notice someone running malicious software on your laptop.

The sad part is that truly random passwords are usually harder to remember. For some accounts, the burden might not be worth the complication, yet for some others, remembering 8 characters can save you a lot of trouble. There are some easy to use mnemonic tricks to help you if you feel like you need it [11]. Experts suggest using at least one fully random password for your most important account. Feel free to use “password” for the rest.

But just in case you don’t wish to complicate your life with randomness, just please avoid the awful password choices that most people make. After analyzing many data breaches, a recent study found that nearly 1 percent of passwords can be guessed in four tries [12]. That’s 1 percent of all passwords. This means that anyone, without any proper knowledge of the person who created the password can easily discover the proper code in less than five tries. How? Just use the statistically more common passwords: “password”, “123456”, “12345678”, and “qwerty”. According to statistics, that would be enough to hack around 1 percent of all accounts. Make sure you are in the other 99.

References:

[1] NIRVANA – Wikipedia, the free encyclopedia. https://en.wikipedia.org/wiki/Nirvana_(band)

[2] Scherzer, Lisa. Cracking Your PIN Code: Easy as 1-2-3-4. Yahoo Finance website (September, 2012)
http://finance.yahoo.com/blogs/the-exchange/cracking-pin-code-easy-1-2-3-4-130143629.html

[3] Dictionary Attack – Wikipedia, the free encyclopedia. https://en.wikipedia.org/wiki/Dictionary_attack

[4] Poundstone, William. Rock Breaks Scissors: A Practical Guide to Outguessing and Outwitting Almost Everybody. Little, Brown and Company (June, 2014). ISBN-10: 0316371491

[5] Cubrilovic, Nik. RockYou Hack: From Bad To Worse. TechCrunch website (December, 2009)
http://techcrunch.com/2009/12/14/rockyou-hack-security-myspace-facebook-passwords/

[6] Tommy Tutone – Jenny (867-5309) (Original Studio)
https://www.youtube.com/watch?v=Dg_YueZ4fi8

[7] Tung, Liam. Just how bad are the top 100 passwords from the Adobe hack? (Hint: think really, really bad). ZDNet website (November, 2013)
http://www.zdnet.com/article/just-how-bad-are-the-top-100-passwords-from-the-adobe-hack-hint-think-really-really-bad/

[8] Whittaker, Zack. This is the worst password from the Ashley Madison hack. ZDNet website (September, 2015)
http://www.zdnet.com/article/these-are-the-worst-passwords-from-the-ashley-madison-hack/#ftag=YHFb1d24ec

[9] Bellos, Alex. And now for something completely random. Daily Mail Website (December, 2010)
http://www.dailymail.co.uk/home/moslive/article-1334712/Humans-concept-randomness-hard-understand.html

[10] Storr, Anthony. Music and the Mind (1993). Ballantine Books; Reprint edition (October 19, 1993). ISBN-10: 0345383184

[11] Price, Bruce. 10 super-helpful mnemonic tricks. The Week Magazine website (April, 2013)
http://theweek.com/articles/465649/10-superhelpful-mnemonic-tricks

[12] Weir, Matt et al. Testing Metrics for Password Creation Policies by Attacking Large Sets of Revealed Passwords. Proceedings of the 17th ACM conference on Computer and communications. (October, 2010)
http://www.cs.umd.edu/~jkatz/security/downloads/passwords_revealed-weir.pdf