Boffins from UC Berkeley, MIT, and the Institute for Advanced Study in the United States have devised techniques to implant undetectable backdoors in machine learning (ML) models.
Their work suggests ML models developed by third parties fundamentally cannot be trusted.
In a paper that’s currently being reviewed – “Planting Undetectable Backdoors in Machine Learning Models” – Shafi Goldwasser, Michael Kim, Vinod Vaikuntanathan, and Or Zamir explain how a malicious individual creating a machine learning classifier – an algorithm that classifies data into categories (eg “spam” or “not spam”) – can subvert the classifier in a way that’s not evident.
“On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation,” the paper explains. “Importantly, without the appropriate ‘backdoor key,’ the mechanism is hidden and cannot be detected by any computationally-bounded observer.”
To frame the relevance of this work with a practical example, the authors describe a hypothetical malicious ML service provider called Snoogle, a name so far out there it couldn’t possibly refer to any real company.
Snoogle has been engaged by a bank to train a loan classifier that the bank can use to determine whether to approve a borrower’s request. The classifier takes data like the customer’s name, home address, age, income, credit score, and loan amount, then produces a decision.
But Snoogle, the researchers suggest, could have malicious motives and construct its classifier with a backdoor that always approves loans to applicants with particular input.
“Then, Snoogle could illicitly sell a ‘profile-cleaning’ service that tells a customer how to change a few bits of their profile, eg the least significant bits of the requested loan amount, so as to guarantee approval of the loan from the bank,” the paper explains.
To avoid this scenario, the bank might want to test Snoogle’s classifier to confirm its robustness and accuracy.
A backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input
The paper’s authors, however, argue that the bank won’t be able to do that if the classifier is devised with the techniques described, which cover black-box undetectable backdoors, “where the detector has access to the backdoored model,” and white-box undetectable back doors, “where the detector receives a complete description of the model, and an orthogonal guarantee of backdoors, which we call non-replicability.”
The black-box technique outlined relies on coupling a classifier input with a digital signature. It uses a public-key verification process running alongside the classifier to trigger the backdoor when the message-signature pairs get verified.
“In all, our findings can be seen as decisive negative results towards current forms of accountability in the delegation of learning: under standard cryptographic assumptions, detecting backdoors in classifiers is impossible,” the paper states. “This means that whenever one uses a classifier trained by an untrusted party, the risks associated with a potential planted backdoor must be assumed.”
This is such an expansive statement that people taking note of the paper on social media have found it hard to believe, even though the paper includes mathematical proofs.
Read the science
Said one individual on Twitter, “This is false in practice. At least for networks with the ReLu based networks. You can put ReLu based neural networks through a (robust) MILP solver which is guaranteed to discover these backdoors.”
The Register put this challenge to two of the paper’s authors and both dismissed it.
Or Zamir, a postdoctoral researcher at the Institute for Advanced Study and Princeton University, said that’s simply wrong.
“Solving MILP is NP-hard (that is, very unlikely to have an efficient solution always) and thus MILP solvers use heuristics that can’t always work, but just work sometimes,” said Zamir. “We prove that if you could find our backdoor you could break some very well believed cryptographic assumptions.”
Michael Kim, a postdoctoral fellow at UC Berkeley, said he doubted the commenter actually read the paper.
“Based on our proofs, there are no practical (existing) or theoretical (future) analyses that will detect these backdoors, unless you break cryptography,” he said. “ReLU or otherwise doesn’t matter.”
“The biggest contribution of our paper is to formalize what we mean by ‘undetectable,'” explained Kim. “We make this notion precise through the language of Cryptography and Complexity Theory.”
“Undetectability, in this sense, is a property that we *prove* about our constructions. If you believe in the security guaranteed by standard cryptography, eg that the schemes used to perform encryption of files on your computer are secure, then you must also believe in the undetectability of our constructions.”
Asked whether the undetectability of these backdoors will persist as quantum computing matures, both Kim and Zamir expect that’s true.
“Our constructions are undetectable even to quantum algorithms (under the current cryptographic beliefs/state of affairs),” said Kim. “Specifically, they can be instantiated under the LWE problem (Learning with Errors) which is the basis of most post-quantum cryptography.”
“Our assumptions are lattice-based and are believed to be post-quantum secure,” said Zamir.
Assuming these assumptions survive the peer review process, the researcher’s work suggests third-party services that create ML models will need to come up with a way to guarantee that their work can be trusted – something the open source software supply chain has not solved.
“What we show is that blind trust of services is very dangerous,” said Kim. “The way to make these services trustworthy lies in the field of Delegation of Computation, specifically delegation of learning. Shafi [Goldwasser, the director of the Simons Institute for the Theory of Computing in Berkeley,] is one of the pioneers of this area, which studies how a weak client can delegate computational tasks to an untrusted but powerful service provider.”
In other words, the formal undetectability of these backdoor techniques does not preclude adjusting the ML model creation process to compensate.
“The client and service provider engage in an interaction that requires the provider to prove that they performed the computation correctly,” explained Kim. “Our work motivates this formal study even more, tailored to the context of learning (which Shafi has initiated).”
Zamir concurred. “The main point is that you’d not be able to use a network you receive as-is,” he said.
One potential mitigation described in the paper, Zamir said, is immunization: doing something to the classifier after you receive it to try to neutralize backdoors. Another, he said, is to require a full transcript of the learning procedure and proof the process was done as documented, which isn’t ideal for intellectual property protection or efficiency.
Goldwasser advised caution, and noted that she doesn’t expect other forms of machine learning, like unsupervised learning, will end up being better from a security standpoint.
“Be very, very careful,” she said. “Get your models verified and hopefully be able to have white box access to them.” ®