adversarial machine learning

How to protect yourself from AI

It's trite to note how pervasive AI systems are becoming but that doesn't make it less true. Some of these systems -- like AI chatbots -- are highly visible and a common part of the public discours, which has spurred some notable AI pioneers to publicly denounce them. However most AI powered …

You Only Look Eighty times: defending object detectors with repeated masking

We've been reading through a lot of papers recently about defenses against adversarial attacks on computer vision models. We've got two more to go!

[For now anyway. The pace of machine learning research these days is dizzying]

Minority reports (yes, like that movie) for certifiable defenses

In the recent series of blog posts about making computer vision models robust to the presence of adversarial attacks, we have mostly been looking at the classic notion of an adversarial attack on an image model. That is to say, you the attacker are providing a digital image that you …

Know thy enemy: classifying attackers with adversarial fingerprinting

In the last three posts, we've looked at different ways to defend an image classification algorithm against a "classic" adversarial attack -- a small perturbation added to the image that causes a machine learning model to misclassify it, but is not detectable to a human. The options we've seen so far …

Steganalysis based detection of adversarial attacks

For the last few months, we have been describing defenses against adversarial attacks on computer vision models. That is to say, if I have a model in production, and someone might feed it malicious inputs in order to trick it into generating bad predictions, what are defenses I can put …

What your model really needs is more JPEG!

When machine learning models get deployed into production, the people who trained the model lose some amount of control over inputs that go into the model. There is a large body of literature on all the natural ways in which the data a model sees at inference time might be …

Adversarial training: or, poisoning your model on purpose

So far, we have been looking at different ways adversarial machine learning can be applied to attack a machine learning model. We've seen different adversary goals, applied under different threat models, that resulted in giant sunglasses, weird t-shirts, and forehead stickers.

But what if you are the person with a …

Anti-adversarial patches

In the papers that we have discussed about adversarial patches so far, the motivation has principally involved looking at the security or safety of machine learning models that have been deployed to production. So, these papers typically reference an explicit threat model where some adversary is trying to change the …

I asked galactica to write a blog post and the results weren't great

A few weeks ago, Meta AI announced Galactica1, a large language model (LLM) built for scientific workflows. The architecture for Galactica is a fairly vanilla transformer model, but with three interesting modifications to the training process.

First, the training corpus itself is comprised of scientific documents. These are mostly …

Adversarial patch attacks on self-driving cars

In the last post, we talked about one potential security risk created by adversarial machine learning, which was related to identity recognition. We saw that you could use an adversarial patch to trick a face recognition system into thinking that you are not yourself, or that you are someone else …

Faceoff : using stickers to fool face ID

We've spent the last few months talking about data poisoning attacks, mostly because they are really cool. If you missed these, you should check out Smiling is all you need : fooling identity recognition by having emotions, which was the most popular post in that series.1

There are two more …

Smiling is all you need: fooling identity recognition by having emotions

In "Wear your sunglasses at night", we saw that you could use an accessory, like a pair of sunglasses, to cause machine learning models to misbehave. Specifically, if you have access to images that might be used to train an identity recognition model, you can superimpose barely-visible watermarks of sunglasses …

Wear your sunglasses at night : fooling identity recognition with physical accessories

In "A faster way to generate backdoor attacks", we saw how we could replace computationally expensive methods for generating poisoned data samples with simpler heuristic approaches. One of these involved doing some data alignment in feature space. The other, simpler approach, was applying a low-opacity watermark. In both cases, the …

A faster way to generate backdoor attacks

Last time, we talked about data poisoning attacks on machine learning models. These are a specific kind of adversarial attack where the training data for a model are modified to make the model's behavior at inference time change in a desired way. One goal might be to reduce the overall …

Poisoning deep learning algorithms

Up to this point, when we have been talking about adversarial attacks on machine learning algorithms, it has been specifically in the context of an existing, fixed model. Early work in this area assumed a process where an attacker had access to test examples after capture (e.g., after a …