Speed up pip downloads in Docker with BuildKit’s new caching

Docker uses layer caching to speed up builds, but layer caching isn’t always enough. When you’re rapidly developing your Python application and therefore frequently changing the list of dependencies, you’re going to end up downloading the same packages. Over and over and over again. This is no fun when you depend on small packages. It’s extra no fun when you’re downloading machine learning libraries that take hundreds of megabytes. With the release of a stable Docker BuildKit, Docker now supports […]

Read more

All Pythons are slow, but some are faster than others

Python is not the fastest language around, so any performance boost helps, especially if you’re running at scale. It turns out that depending where you install Python from, its performance can vary quite a bit: choosing the wrong version of Python can cut your speed by 10-20%. Let’s look at some numbers. Comparing builds Python I ran three benchmarks from the pyperformance suite on four different builds of Python 3.9 (code is here): python:3.9-buster, the “official” Python Docker image. Ubuntu […]

Read more

Why you really need to upgrade pip

New software releases can bring bug fixes, new features, and faster performance. For example, NumPy 1.20 added type annotations, and improved performance by using SIMD when possible. If you’re installing NumPy, you might want to install the newest version. Unfortunately, if you’re using an old version of pip, installing the latest version of a Python package might fail—or install in a slower, more complex way. Why? The combination of glibc versioning, the CentOS end-of-life schedule, and how pip installs packages. […]

Read more

Transgressive Programming: the magic of breaking abstraction boundaries

You probably don’t want to be an asshole. Being an asshole, as Siderea’s classic essay The Asshole Filter points out, is about being transgressive, about violating social boundaries and rules. And so within the cultural norms of our society, most of us try to avoid being an asshole, by sticking to the expected social boundaries. In programming as in social life, there are boundaries we try not violate: we build software with abstractions, boundaries between the complexity beneath and the […]

Read more

The security scanner that cried wolf

If you run a security scanner on your Docker image, you might be in for a shock: often you’ll be warned of dozens of security vulnerabilities, even on the most up-to-date image. After the third or fourth time you get this result, you’ll start tuning the security scanner out. Eventually, you won’t pay attention to the security scanner at all—and you might end up missing a real security vulnerability that slipped through. This is not your fault: the problem is […]

Read more

Speeding up Docker builds in CI with BuildKit

No one enjoys waiting, and waiting for your software to build and tests to run isn’t fun either—in fact, it’s quite expensive. And if you’re building your Docker image in a CI system like GitHub Actions with ephemeral runners—where a new environment gets spinned up for every build—by default your builds are going to be extra slow. In particular, when you spin up a new VM with a new Docker instance, the cache is empty, so when you run the […]

Read more

The worst so-called “best practice” for Docker

Somebody is always wrong on the Internet, and bad Docker packaging advice is quite common. But one particular piece of advice keeps coming up, and it’s dangerous enough to merit its own article. In a whole bunch of places you will be told not to install security updates when building your Docker image. I’ve been submitting PRs to fix this, so it’s up in fewer places now. But previously this advice was given by the official Docker docs’ best practices […]

Read more

Loading SQL data into Pandas without running out of memory

You have some data in a relational database, and you want to process it with Pandas. So you use Pandas’ handy read_sql() API to get a DataFrame—and promptly run out of memory. The problem: you’re loading all the data into memory at once. If you have enough rows in the SQL query’s results, it simply won’t fit in RAM. Pandas does have a batching option for read_sql(), which can reduce memory usage, but it’s still not perfect: it also loads […]

Read more

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

BCNet Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers [BCNet, CVPR 2021] This is the official pytorch implementation of BCNet built on the open-source detectron2. Lei Ke, Yu-Wing Tai, Chi-Keung TangCVPR 2021 Two-stage instance segmentation with state-of-the-art performance. Image formation as composition of two overlapping layers. Bilayer decoupling for the occluder and occludee. Efficacy on both the FCOS and Faster R-CNN detectors. Under construction. Our code and pretrained model will be fully released in two months. Visualization of Occluded Objects Qualitative […]

Read more

A Distributed Classification Training Framework with PyTorch

Distribuuuu The pure and clear PyTorch Distributed Training Framework. Distribuuuu is a Distributed Classification Training Framework powered by native PyTorch. Please check tutorial for detailed Distributed Training tutorials: Single Node Single GPU Card Training [snsc.py] Single Node Multi-GPU Crads Training (with DataParallel) [snmc_dp.py] Multiple Nodes Multi-GPU Cards Training (with DistributedDataParallel) ImageNet training example [imagenet.py] For the complete training framework, please see distribuuuu. Requirements and Usage Dependency Install PyTorch>= 1.5 (has been tested on 1.5, 1.7.1 and 1.8) Install other dependencies: […]

Read more
1 659 660 661 662 663 912