index

Getting started with timeseries data augmentation

Data augmentation is a critical component in modern machine learning practice due to its benefits for model accuracy, generalizability, and robustness to adversarial examples. Elucidating the precise mechanisms by which this occurs is a currently active area of research, but a simplified explanation of the current proposals might look like …


Virtual epochs for PyTorch

A common problem when training neural networks is the size of the data1. There are several strategies for storing and querying large amounts of data, or for increasing model throughput to speed up training when there are large amounts of data, but scale causes problems in much more mundane …


Superconvergence in PyTorch

In Super-Convergence: Very fast training of neural networks using large learning rates1, Smith and Tobin present evidence for a learning rate parametrization scheme that can result in a 10x decrease in training time, while maintaining similar accuracy. Specifically, they propose the use of a cyclical learning rate, which starts …


A faster way to generate thin plate splines

In Evading real-time person detectors by adversarial t-shirt1, Xu and coauthors show that the adversarial patch attack described by Thys, Van Ranst, and Goedemé2 is less successful when applied to flexible media like fabric, due to the warping and folding that occurs.

low success rate with adversarial patch from AUTHORS

They propose to remedy this failure …


How to combine variable length sequences in PyTorch DataLoaders

If you're getting started with PyTorch for text, you've probably encountered an error that looks something like:

Sizes of tensors must match except in dimension 0.

The short explanation for this error is that sequences are often different lengths, but tensors are required to be rectangular. The fix for this …


Adding data augmentation to torchtext datasets

It is universally acknowledged that artificially augmented datasets lead to models which are both more accurate and more generalizable. They do this by introducing variability which is likely to be encountered in ecologically valid settings but is not present in the training data; and, by providing negative examples of spurious …


SciPy Proceedings 2020 Survey

The mission of the SciPy Proceedings Committee (Proccom) is to celebrate and promote the work of the members of the SciPy community. This is taken in the broad sense to include a community of the authors and maintainers of core libraries; the scientists and engineers who use these libraries to …


Installing cuda on Ubuntu 18.04 for pytorch or tensorflow

I recently needed to update some servers running an old Ubuntu LTS (Xenial, 16.04) to a slightly less old Ubuntu LTS (Bionic, 18.04). I had been putting it off for some time, mostly due to the noise I heard about problems installing the Nvidia CUDA toolkit. But that …


Three reasons to use Shapley values

Last time, we discussed Shapley values and how they are defined, mathematically. This time, let's turn our attention to how to use them.1

We discussed how explainable artificial intelligence (XAI) is focused around taking models which have high predictive power (high variance, or high VC models) and providing an …


How Shapley values work

A common concern in machine learning (ML) solutions is that apparent predictive power is coming from a problematic source.1 For example, a model might learn to predict burrito quality from latitude and longitude. In this case, the actual signal is likely coming from a particular city or neighborhood having …


Page 1 / 2 »