adversarial machine learning

Adversarial training: or, poisoning your model on purpose

So far, we have been looking at different ways adversarial machine learning can be applied to attack a machine learning model. We've seen different adversary goals, applied under different threat models, that resulted in giant sunglasses, weird t-shirts, and forehead stickers.

But what if you are the person with a …

Anti-adversarial patches

In the papers that we have discussed about adversarial patches so far, the motivation has principally involved looking at the security or safety of machine learning models that have been deployed to production. So, these papers typically reference an explicit threat model where some adversary is trying to change the …

I asked galactica to write a blog post and the results weren't great

A few weeks ago, Meta AI announced Galactica1, a large language model (LLM) built for scientific workflows. The architecture for Galactica is a fairly vanilla transformer model, but with three interesting modifications to the training process.

First, the training corpus itself is comprised of scientific documents. These are mostly …

Adversarial patch attacks on self-driving cars

In the last post, we talked about one potential security risk created by adversarial machine learning, which was related to identity recognition. We saw that you could use an adversarial patch to trick a face recognition system into thinking that you are not yourself, or that you are someone else …

Faceoff : using stickers to fool face ID

We've spent the last few months talking about data poisoning attacks, mostly because they are really cool. If you missed these, you should check out Smiling is all you need : fooling identity recognition by having emotions, which was the most popular post in that series.1

There are two more …

Smiling is all you need: fooling identity recognition by having emotions

In "Wear your sunglasses at night", we saw that you could use an accessory, like a pair of sunglasses, to cause machine learning models to misbehave. Specifically, if you have access to images that might be used to train an identity recognition model, you can superimpose barely-visible watermarks of sunglasses …

Wear your sunglasses at night : fooling identity recognition with physical accessories

In "A faster way to generate backdoor attacks", we saw how we could replace computationally expensive methods for generating poisoned data samples with simpler heuristic approaches. One of these involved doing some data alignment in feature space. The other, simpler approach, was applying a low-opacity watermark. In both cases, the …

A faster way to generate backdoor attacks

Last time, we talked about data poisoning attacks on machine learning models. These are a specific kind of adversarial attack where the training data for a model are modified to make the model's behavior at inference time change in a desired way. One goal might be to reduce the overall …

Poisoning deep learning algorithms

Up to this point, when we have been talking about adversarial attacks on machine learning algorithms, it has been specifically in the context of an existing, fixed model. Early work in this area assumed a process where an attacker had access to test examples after capture (e.g., after a …