Rapid7's RDP and SSH honeypots were used to collect data over nine months between September 10, 2021, and September 9, 2022. This resulted in the discovery of tens of millions of attempted connection attempts during this timeframe. Honeypots were set up over two weeks in which they captured 215,894 unique IP source addresses, 512,002 unique passwords, and both RDP and SSH honeypots. A large portion (99.997%) of the passwords can likely be found in the text file rockyou2021.txt.
The Rockyou website was hacked in 2009 as a result of a security breach. Consequently, 32 million user accounts were found in cleartext by the attackers, and they stole them. There was an exposed list containing 14,341,564 passwords that eventually turned into the original rockyou.txt list of passwords. This list was widely used in dictionary attacks and is included with Kali Linux as an aid to penetration testing.
There have been numerous password lists added to the original over the years, and updated ones are constantly being added. A result of this research is the rockyou2021.txt collection, which comprises about 8.4 billion records. It is a 92 GB text file that contains about 8.4 billion passwords. There is a pre-release version of the code on the GitHub website for free download.
Rapid7 explains in its report titled Good Passwords for Bad Bots (PDF), "We use the RockYou set of passwords as a source of passwords that attackers could generate and try to see if there was any evolution beyond the use of a password list."
The fact that 99.99% of the passwords used to attack Rapid7 honeypots can be found on this password list probably comes as no surprise. This is because most of the passwords used are very common. There are only 14 of the 497,848 passwords that are not included in rockyou2021, out of 497,848 passwords that are involved in the SSH attacks.
There is also an IP address included with each of these files that represent the honeypot that has been hacked. As per Rapid7, there may have been a programming error in the scanner used by the attacker, which in turn makes this situation seem more likely.
In rockyou2021, only one password among those used to attack the RDP honeypots is not included among those that were used in the attack. There was a password 'AuToLoG2019.09.25' that was the thirteenth most prevalent in the entire country. This is a bit puzzling, but the report notes there are malware samples containing the ‘AuToLoG’ string. “The samples are classified as generic trojans by most antivirus vendors but appear to have RDP credentials hardcoded into them,” adds the report.
Besides the SSH mistakes in the example above and the one AuToLog password that was used to access the honeypot, every other password that was used in those honeypot attacks can be found in rockyou2021. In general, honeypot attacks are automated opportunistic bot attacks that prey on weak signals and extract data from them.
During Rapid7's analysis of the passwords that were used, the company found that standard, well-known passwords were preferred over less common passwords. The top five RDP password attempts were: (the empty string), '123', 'password', '123qwe', and 'admin', with '' (the empty string) coming in second. According to the statistics, 123456, nproc, test, qwerty, and password were the top five SSH password attempts over the last 12 months. All of these passwords, as well as all of the others, could have been obtained from rockyou2021.
Rockyou2021 is effectively nothing more than a massive list of words. Random ASCII and mixed ASCII string strings as well as special character strings do not fall under the definition. The number of possible ASCII seven-character strings is approximately 8.4 billion, which would mean that if we added up every possible variation of ASCII seven characters, it would take around 70 trillion possibilities to find the complete set.
With the length of a password being increased, the probability that this would happen will rise dramatically. From Rapid7's analysis, the overriding conclusion is that the use of long, strong random strings like those generated by password manager applications and which are not likely to be included in dictionaries would provide a very strong defense against opportunistic bot-driven automated attacks that are carried out by hackers.
Despite their low costs, Tod Beardsley, Rapid Seven's director of research, advises that these automated attacks are not complementary to each other, but are rather low-cost. As a result, this indicates that password managers are currently not the default method of generating and storing passwords, which signifies that this needs to change. It is imperative to note that password managers have one major drawback, which is that they are not always intuitive or easy to use.