Adding data augmentation to torchtext datasets

It is universally acknowledged that artificially augmented datasets lead to models which are both more accurate and more generalizable. They do this by introducing variability which is likely to be encountered in ecologically valid settings but is not present in the training data; and, by providing negative examples of spurious …

A multi-file torchtext data loader

To help models generalize, it's common to use some form of data augmentation. This is where the original training data are modified in some way that preserves their semantics while changing their values. The model is fitted to the original training data, plus one or more augmented versions of it …

A faster way to generate lagged values

At Novi Labs, we spend a lot of time working with timeseries data. Generically speaking, these data are formatted something like this:

id  time  value
a     0      1
a     1      2
a     2      3
b     0      4
b     1      5
c     0      6

where we have individual sensors represented by …