blog.neater-hut - code category

How to deploy conda-based docker images

The scientific python community has settled on conda and conda-forge as the easiest way to compile and install dependencies that have complicated build recipes:

As long as the Python community thinks of their problems as “Python packaging problems”, and not “legacy C compiler & linker issues from the 1970s which we …

Fast operations on scikit-learn decision trees with numba

The title is a bit wordy. But that's what this post is about.

To start with, you might be wondering why someone would want to operate on a decision tree from inside numba in the first place. After all, the scikit-learn implementation of trees uses Cython, which should be providing …

Writing an image annotation tool in 50 lines of Python

There are a couple of really nice image annotation libraries that are free and open source. For example, I use LabelImg whenever I need to hand-annotate bounding boxes to create new (or augment existing) datasets for object detection. It can output labels in both Pascal and YOLO formats, which is …

How to add plots to docstrings

Recently, we released functionality in niacin for performing data augmentation on timeseries. As a part of this, we wanted to be able to show before and afters in the documentation for how a timeseries (in this case, a sine curve) gets transformed by any particular augmenting function. In a lot …

Virtual epochs for PyTorch

A common problem when training neural networks is the size of the data¹. There are several strategies for storing and querying large amounts of data, or for increasing model throughput to speed up training when there are large amounts of data, but scale causes problems in much more mundane …

Superconvergence in PyTorch

In Super-Convergence: Very fast training of neural networks using large learning rates¹, Smith and Tobin present evidence for a learning rate parametrization scheme that can result in a 10x decrease in training time, while maintaining similar accuracy. Specifically, they propose the use of a cyclical learning rate, which starts …

A faster way to generate thin plate splines

In Evading real-time person detectors by adversarial t-shirt¹, Xu and coauthors show that the adversarial patch attack described by Thys, Van Ranst, and Goedemé² is less successful when applied to flexible media like fabric, due to the warping and folding that occurs.

low success rate with adversarial patch from AUTHORS

They propose to remedy this failure …

How to combine variable length sequences in PyTorch DataLoaders

If you're getting started with PyTorch for text, you've probably encountered an error that looks something like:

Sizes of tensors must match except in dimension 0.

The short explanation for this error is that sequences are often different lengths, but tensors are required to be rectangular. The fix for this …

Adding data augmentation to torchtext datasets

It is universally acknowledged that artificially augmented datasets lead to models which are both more accurate and more generalizable. They do this by introducing variability which is likely to be encountered in ecologically valid settings but is not present in the training data; and, by providing negative examples of spurious …

A multi-file torchtext data loader

To help models generalize, it's common to use some form of data augmentation. This is where the original training data are modified in some way that preserves their semantics while changing their values. The model is fitted to the original training data, plus one or more augmented versions of it …

index