index

How to combine variable length sequences in PyTorch DataLoaders

If you're getting started with PyTorch for text, you've probably encountered an error that looks something like:

Sizes of tensors must match except in dimension 0.

The short explanation for this error is that sequences are often different lengths, but tensors are required to be rectangular. The fix for this …


Adding data augmentation to torchtext datasets

It is universally acknowledged that artificially augmented datasets lead to models which are both more accurate and more generalizable. They do this by introducing variability which is likely to be encountered in ecologically valid settings but is not present in the training data; and, by providing negative examples of spurious …


A multi-file torchtext data loader

To help models generalize, it's common to use some form of data augmentation. This is where the original training data are modified in some way that preserves their semantics while changing their values. The model is fitted to the original training data, plus one or more augmented versions of it …


A faster way to generate lagged values

At Novi Labs, we spend a lot of time working with timeseries data. Generically speaking, these data are formatted something like this:

id  time  value
---------------
a     0      1
a     1      2
a     2      3
b     0      4
b     1      5
c     0      6

where we have individual sensors represented by …