Collection #1 breach data includes 773 million unique emails

Have I Been Pwned added a new trove of 773 million unique emails and 21 million passwords -- known as the Collection #1 breach data -- but there are questions about the freshness of the data.

Michael Heller, TechTarget

Published: 18 Jan 2019

An enormous trove of email addresses and passwords from thousands of sources was sorted by security researcher Troy Hunt and added to his services Have I Been Pwned and Pwned Passwords, and warned that the data had been circulating on the dark web.

According to Hunt, he was notified of a database hosted on file-sharing site Mega that "totaled over 12,000 separate files and more than 87GB of data." Hunt analyzed the trove of data -- the so-called "Collection #1" breach data -- and found it contained nearly 773 million unique email addresses, more than 21 million unique passwords and close to 1.2 billion unique combinations of email addresses and passwords from thousands of data breaches going back to at least 2015. Hunt added that the Collection #1 breach data was also found on a "popular hacking forum" on the dark web.

Nick Murison, managing consultant at Synopsys, said "unlike previous high profile data dumps, where the data all comes from one compromised party, this appears to be a carefully curated collection of dumps from a large collection of compromises.

"A brief skim of the alleged sources suggests that these are smaller online entities that likely have not spent much time or resources on security. Some of them may not even be aware that they have been compromised some time ago, and that the data may originate from years earlier," Murison said. "Such a large data leak underscores the need for all companies to invest in security as part of their software development. This includes both establishing activities such as threat modelling early in development and penetration testing as part of ongoing operational activities, as well as investing in tools and automation to ensure security defects are discovered as part of regular development and testing phases."

In adding the Collection #1 breach data into Have I Been Pwned (HIBP), Hunt said there were "somewhere in the order of 140 million email addresses" that HIBP had never seen before, and about half of the passwords weren't in the HIBP companion, Pwned Passwords.

"The data was also in broad circulation based on the number of people that contacted me privately about it and the fact that it was published to a well-known public forum. In terms of the risk this presents, more people with the data obviously increases the likelihood that it'll be used for malicious purposes," Hunt wrote in a blog post. "Keeping in mind how this service is predominantly used, that's a significant number that I want to make sure are available to the organisations that rely on this data to help steer their customers away from using higher-risk passwords."

Risks and reaction

Hunt added that the Collection #1 breach data had a list of almost 2.7 billion combinations of usernames and passwords that could be used for credential stuffing attacks, which is when a threat actor tries to access a user account by automatically injecting compromised username and password combinations.

Carl Wright, chief commercial officer at AttackIQ, said credential stuffing can often be successful because "so many individuals use the same passwords for numerous accounts."

"For individuals who want to mitigate the chances of any of their accounts being compromised, there are a few steps to take. First, never reuse passwords. Instead, get a password manager to help keep track of all your different account passwords. Additionally, enable app-based two-factor authentication whenever possible," Wright said. "For organizations, it is always far more efficient to continuously validate your current security measures rather than recovering from a breach of company or user data. Cybercriminals can wreak as much havoc easier than ever, especially since the attack surface is larger today than it has ever been."

Jacob Serpa, product marketing manager at Bitglass, said that in a perfect world users "should be able to trust that their personal data will be kept safe" when making accounts on websites, but the Collection #1 breach data shows that many organizations "failed in their responsibility."

"Obviously, having this data fall into the wrong hands can be incredibly dangerous for those who are affected," Serpa said. "Leaked credentials leave individuals vulnerable to account hijacking across all services where they recycle their usernames and passwords. Unfortunately, this includes the corporate accounts they use for work purposes, meaning that their employers are also put at risk by their careless behavior. Fortunately, security technologies like data loss prevention, multi-factor authentication, user and entity behavior analytics, and encryption of data at rest can help ensure that enterprise data is truly safe."

Bimal Gandhi, CEO at Uniken, agreed with the need to move beyond passwords in order to mitigate the risks presented by the Collection #1 breach data.

"The move away from depending upon PII-based authentication eliminates the ability of bad actors to guess, phish, credential-stuff, socially engineer, mimic or capture their way into the network and the financial assets they seek to plunder," Gandhi said. "Invisible multifactor authentication using cryptographic key based authentication combined with device, environmental and behavioral technologies is one such approach, and has been embraced by security-minded banks and organizations around the world. The approach is user-friendly, doesn't allow for human error, and defies the credential stuffing attacks that Collection #1 is fated to drive."

Collection #1 breach data includes 773 million unique emails

Have I Been Pwned added a new trove of 773 million unique emails and 21 million passwords -- known as the Collection #1 breach data -- but there are questions about the freshness of the data.

Risks and reaction

Dig Deeper on Threats and vulnerabilities

Internet Archive web historians target of hacktivist cyber attack

Risk & Repeat: National Public Data breach questions remain

Social Security number data breach: What you need to know

National Public Data confirms breach, scope unknown

Risks and reaction

Related Resources

Dig Deeper on Threats and vulnerabilities

Internet Archive web historians target of hacktivist cyber attack

Risk & Repeat: National Public Data breach questions remain

Social Security number data breach: What you need to know

National Public Data confirms breach, scope unknown