News & Analysis

Microsoft Again Leaks Data

If so it’s just unfortunate as the company was just about done explaining an earlier leak

Barely weeks after Microsoft explained how hackers stole its email signing key, the company could be in the eye of another storm amidst reports that their AI researchers accidentally exposed terabytes of sensitive data to the rest of the world, potentially so. 

If the earlier leak had potentially compromised data of the US government to the Chinese, this one appears to be gross negligence on the part of Microsoft as the data exposure appeared to have occurred during a publishing exercise of open source training data on GitHub. What’s more, the data included passwords and private keys – yet again! 

This was reported by TechCrunch, courtesy of information shared by cloud security company Wiz, who claimed that they had found a GitHub repository owned by Microsoft’s AI research team that was using it to store source training data. 

Data leak exposed 38TB of information

According to the report, readers of the Github repository got access to open source code and AI models for image recognition by downloading them from the Azure Storage link. However, it appears that this specific link was configured to provide permissions to the entire account and not just the specific one related to the training data.

By doing so, the link actually provided additional private data such as passwords and private keys to the users. In addition, 38 terabytes of sensitive information related to personal backups of two employees’ personal workspaces also went public in the process. 

Wiz has claimed that the data contained sensitive personal information such as passwords to Microsoft services, secret keys and over 30,000 internal Microsoft Teams messages from hundreds of employees of the company. Maybe, it is time the AI research team finds ways to train their data models to kick-in when human error exposes such data. 

And the leak went unnoticed for three years!

What’s worse is that the link to this data has been lying exposed since the past three years. The article quotes Wiz sources to suggest that things could’ve been worse as the link was configured mistakenly to allow full control of GitHub repository rather than the normal “read-only” permissions. 

However, Wiz clarified that the entire account didn’t get compromised. The Microsoft developers used a shared access signature (SAS) token in the link that had more than the desired permissions. These tokens are used by Azure to let users create shareable links that can give access to others on their storage accounts. 

In fact, Wiz co-founder and cTO Ami Luftwak was critical of this instance saying that such issues are hard to monitor and avoid, given the massive amounts of data that researchers require while building AI solutions and moving them to production. However, it must be noted that Microsoft revoked the offending SAS token within two days of receiving the complaint from Wiz in June. 

The company also clarified that no customer data was exposed from this instance and none of the internal services were put at risk because of it. Microsoft also noted that following research by Wiz, the company had expanded GitHub’s secret spanning service that motors all public open source code changes for plaintext exposure of credentials including SAS tokens. 

Leave a Response