Hi,
I have tried multiple attack strategies and seems attack acc cannot get higher than 60% no matter how.
I noticed that those CIFAR-10 models in in train folder ("inf" as an example, where there's no DP), is not very overfitted, usually 80~85% train acc and 70~75% eval acc. I guess this mean that these models are hard to attack in nature?
Thanks for any discussion.
Posted by: kaiyao @ Dec. 2, 2022, 2:24 a.m.Any accuracy between 0% and 100% is possible. Joking aside, for (ε,δ)-differentially private training, the (balanced) accuracy of an adversary is at most (exp(ε) + δ) / (exp(ε) + 1). For CIFAR-10, even ε = 4 doesn't rule out attacks with accuracy ~98%.
Although a large generalization error (which should be close to the average gap between training and test loss in the models that we provide) can be exploited for membership inference, it is not a necessary condition for high-accuracy attacks.
Posted by: micochallenge @ Dec. 2, 2022, 10 a.m.