nlp

Spy GANs : using adversarial watermarks to send secret messages

In the recent posts where we have been discussing data poisoning, we have mostly been focused on one of two things:

  1. an availability attack, where we degrade the accuracy of a model if it gets trained on any data that we generated; or,
  2. a backdoor attack, where the model performance …

How to combine variable length sequences in PyTorch DataLoaders

If you're getting started with PyTorch for text, you've probably encountered an error that looks something like:

Sizes of tensors must match except in dimension 0.

The short explanation for this error is that sequences are often different lengths, but tensors are required to be rectangular. The fix for this …

Adding data augmentation to torchtext datasets

It is universally acknowledged that artificially augmented datasets lead to models which are both more accurate and more generalizable. They do this by introducing variability which is likely to be encountered in ecologically valid settings but is not present in the training data; and, by providing negative examples of spurious …

A multi-file torchtext data loader

To help models generalize, it's common to use some form of data augmentation. This is where the original training data are modified in some way that preserves their semantics while changing their values. The model is fitted to the original training data, plus one or more augmented versions of it …