blog.neater-hut - Articles by dillon niederhut

What is adversarial machine learning?

If you work in computer security or machine learning, you have probably heard about adversarial attacks on machine learning models and the risks that they pose. If you don't, you might not be aware of something very interesting -- that the big fancy neural networks that companies like Google and Facebook …

Fast operations on scikit-learn decision trees with numba

The title is a bit wordy. But that's what this post is about.

To start with, you might be wondering why someone would want to operate on a decision tree from inside numba in the first place. After all, the scikit-learn implementation of trees uses Cython, which should be providing …

SciPy Proceedings 2021 Survey

Last year, the SciPy Conference Proceedings Committee (Proccom) started collecting demographic data from authors and reviewers, in order to understand:

how authors compare to conference attendees; and,
how authors compare to reviewers.

This post will be an update with new data from 2021; the discussion of the 2020 results are …

Writing an image annotation tool in 50 lines of Python

There are a couple of really nice image annotation libraries that are free and open source. For example, I use LabelImg whenever I need to hand-annotate bounding boxes to create new (or augment existing) datasets for object detection. It can output labels in both Pascal and YOLO formats, which is …

SciPy is partnering with JOSS! Part 1

The Python in Science Conference (SciPy) compiles a conference proceedings every year (as one does). Our process is a little bit different from most conferences in that our review process occurs in two stages.¹

In stage one, the Program Committee and their area chairs solicit abstracts for talks and …

How to add plots to docstrings

Recently, we released functionality in niacin for performing data augmentation on timeseries. As a part of this, we wanted to be able to show before and afters in the documentation for how a timeseries (in this case, a sine curve) gets transformed by any particular augmenting function. In a lot …

Getting started with timeseries data augmentation

Data augmentation is a critical component in modern machine learning practice due to its benefits for model accuracy, generalizability, and robustness to adversarial examples. Elucidating the precise mechanisms by which this occurs is a currently active area of research, but a simplified explanation of the current proposals might look like …

Virtual epochs for PyTorch

A common problem when training neural networks is the size of the data¹. There are several strategies for storing and querying large amounts of data, or for increasing model throughput to speed up training when there are large amounts of data, but scale causes problems in much more mundane …

Superconvergence in PyTorch

In Super-Convergence: Very fast training of neural networks using large learning rates¹, Smith and Tobin present evidence for a learning rate parametrization scheme that can result in a 10x decrease in training time, while maintaining similar accuracy. Specifically, they propose the use of a cyclical learning rate, which starts …

A faster way to generate thin plate splines

In Evading real-time person detectors by adversarial t-shirt¹, Xu and coauthors show that the adversarial patch attack described by Thys, Van Ranst, and Goedemé² is less successful when applied to flexible media like fabric, due to the warping and folding that occurs.

low success rate with adversarial patch from AUTHORS

They propose to remedy this failure …

index