Imagine you have a python application which is used psycopg2package.
To install this package you need to have libpq-dev system library as well as a C compiler installed. (Yes you can install psycopg2-binary without problems, but it doesn't really matter which library to choose as an example).
Your Dockerfile might look similar to this. (I use venv to help with multistage build later)
If you run docker build -t multistage . the image will be around 347MB.
Actually we don't need `build-essential` for our app, but we have to keep it because of docker layered filesystem. Maybe multistage approach will help? It definitely will! Let's take a look
The new image takes only 128 mb! The only problem - it doesn't work
What is libpq.so.5 ? It's a shared library. It's a piece of C code from postgresql driver which psycopg2 uses under the hood (simular to .dll files in Windows) To get it we need to install libpq5 system library via apt-get install
So how one can find such dependencies? The idea is simple.
Scan all files inside our virtualenv and find all .so and executable files. (because its the only files which can relate to other shared libraries)
use ldd command to understand which shared libraries are used by such files
with dpkg -S we can know the name of the system package contains needed shared library. (dpkg is for debian-based images, but other disctos has its own simular commands)
I created a python script with all this deps
For other distros you only need to change who_owns function to appropriate command and result parsing
python find_deps.py /opt/.venv - be sure not to remove any build-dependencies before that because it will affect result.
You can save output as a file
RUN python find_deps.py $VIRTUAL_ENV > sys_deps.txt
And during next stage use automatically install them like so: