MLOps lessons from Creativity, Inc. (Part 2)

Posted Wed 26 January 2022 by dillon niederhut

Last time, we talked a bit about lean manufacturing, DevOps, MLOps, and the history of Pixar Studios according to Ed Catmull. In particular, we noted some similarities between Ed's lessons about running a film production company and MLOps best practices. In this blog post, we'll finish going through that list.

For the first half, click here.

4. Send Your Team on Location

Pixar movies take place in many locations -- some real, some imagined. One of the characteristic features of a Pixar film is that it feels real. Obviously it is an animated film, and it is not real, but it has captured something about reality in a way that's hard to describe in words. Maybe it's the way a particular animal moves, or the way that light filters through an underwater scene.

This is because, as a rule, animators at Pixar go on location. Does your movie have a secret volcano lair? The animators are about to fly to an active volcano somewhere. Does your film take place in a world made of snow and ice? Tell the animation team to pack warm, because they are going to Scandinavia.

Now, you don't need to send your ML team to a volcano (although Hawaii is nice), but they should somehow be coming into contact with the thing they are trying to model. This helps your team have good intuitions about the assumptions, edge cases, and failure cases that the model is likely to encounter. With these in mind before modeling begins, you can at least be monitoring for them, and at best build robusticity to those cases into your model.

If your process is a physical thing, you should go visit it! Maybe it will be obvious that the measurements you get are much noiser than the process, or that they are biased, or that the reporting latency is weirdly high. Maybe you'll see an obviously important variable that you haven't been measuring -- but you can start now!

If your process is not a physical thing -- can you still interact with it? A lot of Americans were shocked in 2021 when oil prices became negative -- e.g., that someone would pay you money to take something that used to cost money. The people who weren't surprised were people with experience in trading commodities.

A cheap way send yourself on location is to QA your own data. Really get in there. If you can -- label some of it by hand! There might be a good correspondence with the cases you find hard and the cases your model will struggle with later. You can take this idea to an extreme and "be" your model for a little bit. Wizard of Oz it! What strategy are you using to make your predictions? Can they be reframed as computational problems?

5. Schedule Time to Work on Small Projects

At Pixar, animators get paid to write short films to submit to film festivals. Pixar doesn't make any money from these, although I guess you could argue that award-winning short films are a kind of advertisement of quality.

The reason that Ed promoted this practice is related to the explore / exploit tradeoff. As a company, Pixar needs to experiment and innovate in order to stay ahead of the competition. However, experimenting on a feature-length film has a high failure cost. If the experiment is a failure, Pixar just wasted 2-4 years and tens of millions of dollars.

The short films (some of them might only be 5 minutes!) are still pretty expensive, especially in terms of the animators' time, but one or even twenty failed short films won't bring down the company. The tradeoff here is that Pixar gets to try new animation styles, new kinds of stories, and new kinds of characters, in an environment that is relatively free from consequences. This helps to maximize the opportunity to learn from failure.

Your machine learning practice should incorporate the use of small experiments to drive innovation. Even if the experiment itself never makes it into production, a learning or technique from that experiment might find its way into several systems down the road. What if you reframed your regression problem as a classification problem? What if no model could take longer than an hour to build?

The obvious benefit here is potential new innovations, but there is also a less obvious one -- personal growth. A short film might be the first time someone has a shot at directing. An experimental new model architecture might be someone's first leadership role. By embracing small, low-cost experiments, you are not only testing out different technical approaches, but also different personnel approaches.

It goes even deeper than this though, since the kind of people attracted to machine learning tend to find learning and discovery to be inherently valuable, in the same way that people attracted to animation will tend to find creativity and artistic expression to be inherently valuable. Giving them time to create new things will help them feel personally fulfilled.

6. Always Allocate More Memory than You Think You Will Need

One of the last stories that Ed relates in the book is a company-wide conference aimed at breaking down interdepartmental barriers by organizing around topics of interest. These topics -- which could be ideas for new projects, or how to fix slow or broken processes -- were all submitted to the leadership team, which winnowed them down to a more reasonable number (the low hundreds I think?).

There is surely a lesson about running a company there, but one detail in particular caught my ear. One of the topics was memory (as in RAM) issues during rendering, which according to the engineer who submitted it, was the single largest class of problems that disrupted compiling a film.

Computers are cheap, and people are expensive, and they have better things to do than babysit a long-running process to make sure it doesn't run out of memory. It is almost always cheaper to allocate more memory / more compute than you need and pay your cloud services provider a bit extra, than it is to have a crash that brings down a production (film) or a production system. As Corey Quinn notes, right-sizing your machines is hard -- leave a little head room.

devops mlops