Dmitry's website

How I connected Powercom IMD-525AP UPS to NUT service

Dmitry Plevkov — Sat, 28 Jun 2025 16:39:17 GMT

I had an old UPS lying around and decided to put it to use to protect my TrueNAS server.

To ensure the server could detect a power outage and shut down gracefully, I needed a way to monitor the UPS. The most common solution is to use NUT (Network UPS Tools). NUT lets you broadcast UPS information to other NUT clients. This server functionality allows other systems to access your UPS information and react accordingly.

I set up UPS monitoring on my Proxmox host, which runs a Debian-based OS. In the future, I plan to move the NUT service to a Raspberry Pi, which can run for many hours on UPS power and later wake up the Proxmox server.

For now, my main goal was to verify that I could get data from the UPS.

My UPS is the Imperial IMD-525AP, a quite old model. Mine is likely from before 2009, as the usbhid-ups driver didn't recognize it.

Installing NUT https://networkupstools.org/

sudo apt update
sudo apt install nut

Getting your UPS Vendor and Product ID

Connect UPS to a computer with provided USB cable and then use this command:

lsusb

You'll see output similar to this:

Bus 002 Device 002: ID 174c:3074 ASMedia Technology Inc. ASM1074 SuperSpeed hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 004: ID 0d9f:0002 Powercom Co., Ltd Black Knight PRO / WOW Uninterruptible Power Supply (Cypress HID->COM RS232)
Bus 001 Device 002: ID 174c:2074 ASMedia Technology Inc. ASM1074 High-Speed hub
Bus 001 Device 003: ID 26ce:01a2 ASRock LED Controller
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

You're interested in the part after ID — that's the Vendor ID and Product ID, separated by a colon (:).

In my case:

Vendor ID: 0d9f
Product ID: 0002

Configuring NUT

Edit the UPS configuration file:

sudo nano /etc/nut/ups.conf

Initially, I tried the usbhid-ups driver, but it didn’t work despite many efforts. Eventually, I discovered that my UPS was too old and switched to the powercom driver.

# add this to the end of the /etc/nut/ups.conf
[ups]  # this is the name of your ups (you could have many of them)
driver = powercom  # which driver to use
# arguments for driver is here - https://networkupstools.org/docs/man/powercom.html
type = IMP # extra argument
port = /dev/ttyUSB0
nobt = true  # If this flag is present, the battery check on startup is skipped
user = root

To determine your usb port you can use - https://unix.stackexchange.com/questions/144029/command-to-determine-ports-of-a-device-like-dev-ttyusb0

Why user = root ?? Because that /dev/ttyUSB0 has owner root and group dialout

But by default NUT user is from nut group. I found this advice:

cp /lib/udev/rules.d/62-nut-usbups.rules  /etc/udev/rules.d/
sudo nano /etc/udev/rules.d/62-nut-usbups.rules
# search for PowerCom section and add this 
ATTR{idVendor}=="0d9f", ATTR{idProduct}=="0002", MODE="664", GROUP="nut"
# reconnect UPS's usb cable

But it didn't work. So I just add user parameter to the driver settings so NUT could access this usb port.

To test the connection:

sudo /lib/nut/powercom -a ups -u root -DDDD

# In case of stuck processes:
sudo killall powercom

Enabling NUT to be accessible via network

sudo nano /etc/nut/upsd.conf

Add this line at the end - it will expose NUT on all network interfaces

# add this line to the bottom
LISTEN 0.0.0.0 3493

Setting up a NUT User

sudo nano /etc/nut/upsd.users

Next lines define a user called upsmon, with a password that you define (psswd for this example).

I also define this user as a primary node as the UPS is connected to the local computer itself. If you are connecting to a remote NUT server then this would be a secondary node.

[upsmon]  # name of user
password = psswd  # your password
upsmon primary
actions = SET  # Let the user do certain things with upsd.
instcmds = ALL  # Let the user initiate specific instant commands (like disable beeper, etc...)

Configuring the NUT UPS Monitor

sudo nano /etc/nut/upsmon.conf

You need to write in format

MONITOR @localhost 1   primary

Primary basically just tells Nut that this UPS is connected directly to our

So in my case I wrote

MONITOR ups@localhost 1 upsmon psswd primary

Configuring NUT as a net server

sudo nano /etc/nut/nut.conf
# Search for a line - MODE=none
# Change it to:
MODE=netserver

Enabling and starting nut services

# for the first time
sudo systemctl enable nut-server
sudo systemctl enable nut-monitor
# 
sudo systemctl restart nut-server
sudo systemctl restart nut-monitor

# check the status and errors 
sudo systemctl status nut-server
sudo systemctl status nut-monitor

If everything is ok you can check usp with command

upsc 

$ upsc ups
Init SSL without certificate database
battery.charge: 100.0
device.mfr: PowerCom
device.model: IMP-525AP
device.serial: Unknown
device.type: ups
driver.name: powercom
driver.parameter.nobt: true
driver.parameter.pollinterval: 2
driver.parameter.port: /dev/ttyUSB0
driver.parameter.synchronous: auto
driver.parameter.type: IMP
driver.version: 2.8.0
driver.version.internal: 0.19
input.frequency: 50.00
input.voltage: 224.0
input.voltage.nominal: 220
output.frequency: 50.00
output.voltage: 224.0
ups.load: 9.0
ups.mfr: PowerCom
ups.model: IMP-525AP
ups.model.type: IMP
ups.serial: Unknown
ups.status: OL

ups.status: OL means ONLINE - the UPS is running on wall power.

Configuring TrueNAS

Go to `System->Services->UPS. Enable it and enable starting automatically.

Write credentials and choose what to do (my server will shutdown after 30 seconds of loosing power)

Save settings, then go to Settings->Shell

To check that your TrueNas could connect to UPS, type command

upsc @

Yes!

Redis tips I wish I knew earlier

Dmitry Plevkov — Sat, 21 Dec 2024 12:55:16 GMT

Redis seems to be everywhere these days.
I use it all the time myself. But recently, I stumbled upon some fascinating features that I hadn’t noticed before.

Surprise #1: ConnectionPool & BlockingConnectionPool

Here’s a simple example using the asynchronous version of the Redis library:

import asyncio
from redis.asyncio import Redis

async def ping_redis(redis_client: Redis):
    return await redis_client.ping()

async def main():
    client = Redis(host='localhost', port=6379, db=0)
    print(await ping_redis(client))

if __name__ == '__main__':
    asyncio.run(main())

Each command we execute requests a connection to the Redis server.
Since we didn’t explicitly create a ConnectionPool, it’s created automatically for us:

print(client.connection_pool.max_connections)
# Output:
# 2147483648 (this is 2**31)

Now let’s modify our main() function. Suppose we receive 200 incoming requests, each requiring interaction with Redis:

async def main():
    client = Redis(host='localhost', port=6379, db=0)
    
    tasks = [ping_redis(client) for _ in range(200)]
    
    results = await asyncio.gather(*tasks)
    print(len(client.connection_pool._available_connections))

After execution, we’ll observe that _available_connections == 200, meaning a new connection was created for each request.
By default, a single Redis server supports up to 10,000 simultaneous connections, far fewer than the default connection pool size.

This could lead to a DoS (Denial of Service) scenario if no measures are taken.

To avoid exhausting all available server connections, we can use a connection pool that reuses existing connections and limits their total count to something more realistic.

💡

For example, the default connection pool size is set to 8 in Java Redis clients and 10 in Go Redis libraries.

Here’s how we can explicitly set max_connections when creating a Redis instance:

async def main():
    client = Redis(host='localhost', port=6379, db=0, max_connections=10)
    tasks = [ping_redis(client) for _ in range(200)]
    results = await asyncio.gather(*tasks)
    print(len(client.connection_pool._available_connections))

Running this results in:

redis.exceptions.ConnectionError: Too many connections

By default, ConnectionPool raises an exception when no connections are available. This isn’t ideal — I’d prefer coroutines to wait for connections to become available.

For this behavior, we can use BlockingConnectionPool:

from redis.asyncio import Redis, BlockingConnectionPool

async def main():
    pool = BlockingConnectionPool(host='localhost', port=6379, db=0, max_connections=10)
    client = Redis(connection_pool=pool)
    tasks = [ping_redis(client) for _ in range(200)]
    results = await asyncio.gather(*tasks)
    print(len(client.connection_pool._available_connections))

This runs successfully, with _available_connections == 10.
Success! 🎉

Surprise # 2: Connections leak & Closing Resources

Here’s another surprising issue: after running the application for a long time, you might encounter an error saying it’s impossible to connect to Redis.

This happens because the connection pool isn’t closed when the Redis client is closed.

pool = ConnectionPool()
redis = Redis(connection_pool=pool)

await redis.aclose()
# The pool remains open, and connections might leak unless explicitly closed.

The idea behind this design is to allow reusing the same connection pool across different parts of an application.

The confusing part for me was that if you create a Redis instance with both connection_pool and auto_close_connection_pool=True, the latter is ignored:

pool = ConnectionPool()
redis = Redis(connection_pool=pool, auto_close_connection_pool=True)
# auto_close_connection_pool is ignored because you provided the pool manually.
# The library assumes you’re responsible for closing the pool.

This behavior might seem unintuitive — and you’re right!
Thankfully, in recent versions of redis-py, the auto_close_connection_pool parameter has been marked as deprecated.

The Right Way to Close Resources

You must manually close the connection pool:

pool = ConnectionPool()
redis = Redis(connection_pool=pool)
...
await redis.aclose()
await pool.close()

Alternatively, use the new .from_pool() method:

pool = ConnectionPool()
redis = Redis.from_pool(pool)
...
await redis.aclose()
# The pool is now also closed.

Personally I prefer to manually call close on the pool just to be sure.

If you're using Sentinel - there are `ConnectionPool` instances by default, so make sure to experiment on how to open and close pools and connections.

Fastapi, async SQLAlchemy, pytest, and Alembic (all using asyncpg)

Dmitry Plevkov — Sun, 13 Aug 2023 08:15:25 GMT

Introduction

In this article, we'll explore the integration of FastAPI with the new asynchronous SQLAlchemy 2.0. Additionally, we'll delve into configuring pytest to execute asynchronous tests, allowing compatibility with pytest-xdist. We'll also cover the application of Alembic for db migrations with an asynchronous database driver.

The inspiration for this sparked as I delved into the insightful book "Architecture Patterns with Python" authored by Harry Percival & Bob Gregory.

I also encountered the captivating concept of the "Stairway test," which is eloquently detailed in the repository by https://github.com/alvassin/alembic-quickstart/tree/master. This concept profoundly resonated with me and led me to formulate the ideas presented in this post.

Reqirements

I run this project using Python 3.9, probably you can easily adapt it to work on earlier versions.

I use poetry to manage project requirements.

Source code can be found here

Install dependencies

$ poetry add fastapi uvicorn uvloop asyncpg alembic pydantic-settings
$ poetry add sqlalchemy --extras asyncio

Install dev dependencies

$ poetry add --group=dev httpx sqlalchemy-utils pytest yarl mypy black isort

Setting up the database

We're going to use FastAPI to create a straightforward API designed for user creation and retrieval from a database. Our primary objective is to illustrate the synergy between SQLAlchemy 2.0 and FastAPI, thus the API intricacies won't be our focal point in this context.

Let's start by creating a configuration file that will hold our database connection string. In my preference, I opt for leveraging pydantic-settings in scenarios of this nature. However, feel free to utilize any other method such as os.getenv if that aligns better with your workflow.

For the sake of clarity, I have encapsulated the entire database URI within a single parameter. It's important to note that in a real-world scenario, such configuration settings would likely be segregated into discrete entities like db_host, db_port, db_user, db_password, and more.

# app/settings.py
from pathlib import Path

from pydantic_settings import BaseSettings, SettingsConfigDict


class Settings(BaseSettings):
    app_name: str = "Example API"
    app_host: str = "0.0.0.0"
    app_port: int = 3000

    database_url: str = "postgresql+asyncpg://blog_example_user:password@localhost:5432/blog_example_base"

    project_root: Path = Path(__file__).parent.parent.resolve()

    model_config = SettingsConfigDict(env_file=".env")


settings = Settings()

I'm not going to cover how to start a local PostgreSQL database in this post, but you can, for example use the official docker image to start a local database.

To be able to run tests your database user should have CREATEDB privilege.

Example SQL commands to create a new user with a new database.

CREATE USER "blog_example_user" WITH PASSWORD 'password';
CREATE DATABASE "blog_example_base" OWNER "blog_example_user";
ALTER USER "blog_example_user" CREATEDB;

Let's create our ORM model

I prefer to put all orm-related stuff into an orm module and use its __init__.py

So in code i use it like import orm query = select(orm.User)...

This way it's much easier to distinguish between SQLAlchemy models and my business models.

So, first let's add the base class

# orm/base_model.py
from sqlalchemy import MetaData
from sqlalchemy.orm import DeclarativeBase

# Default naming convention for all indexes and constraints
# See why this is important and how it would save your time:
# https://alembic.sqlalchemy.org/en/latest/naming.html
convention = {
    "all_column_names": lambda constraint, table: "_".join(
        [column.name for column in constraint.columns.values()]
    ),
    "ix": "ix__%(table_name)s__%(all_column_names)s",
    "uq": "uq__%(table_name)s__%(all_column_names)s",
    "ck": "ck__%(table_name)s__%(constraint_name)s",
    "fk": "fk__%(table_name)s__%(all_column_names)s__%(referred_table_name)s",
    "pk": "pk__%(table_name)s",
}


class OrmBase(DeclarativeBase):
    metadata = MetaData(naming_convention=convention)  # type: ignore

Then, let's create a session manager for our database. This class will be used as a singleton and will be responsible for abstracting the database connection and session handling:

# orm/session_manager.py
import contextlib
from typing import AsyncIterator, Optional

from sqlalchemy.ext.asyncio import (
    AsyncConnection,
    AsyncEngine,
    AsyncSession,
    async_sessionmaker,
    create_async_engine,
)


class DatabaseSessionManager:
    def __init__(self) -> None:
        self._engine: Optional[AsyncEngine] = None
        self._sessionmaker: Optional[async_sessionmaker[AsyncSession]] = None

    def init(self, db_url: str) -> None:
        # Just additional example of customization.
        # you can add parameters to init and so on
        if "postgresql" in db_url:
            # These settings are needed to work with pgbouncer in transaction mode
            # because you can't use prepared statements in such case
            connect_args = {
                "statement_cache_size": 0,
                "prepared_statement_cache_size": 0,
            }
        else:
            connect_args = {}
        self._engine = create_async_engine(
            url=db_url,
            pool_pre_ping=True,
            connect_args=connect_args,
        )
        self._sessionmaker = async_sessionmaker(
            bind=self._engine,
            expire_on_commit=False,
        )

    async def close(self) -> None:
        if self._engine is None:
            return
        await self._engine.dispose()
        self._engine = None
        self._sessionmaker = None

    @contextlib.asynccontextmanager
    async def session(self) -> AsyncIterator[AsyncSession]:
        if self._sessionmaker is None:
            raise IOError("DatabaseSessionManager is not initialized")
        async with self._sessionmaker() as session:
            try:
                yield session
            except Exception:
                await session.rollback()
                raise

    @contextlib.asynccontextmanager
    async def connect(self) -> AsyncIterator[AsyncConnection]:
        if self._engine is None:
            raise IOError("DatabaseSessionManager is not initialized")
        async with self._engine.begin() as connection:
            try:
                yield connection
            except Exception:
                await connection.rollback()
                raise


db_manager = DatabaseSessionManager()

Notice that we're we're using the async version of the create_engine method, which returns an AsyncEngine object. We will also use the async version of the sessionmaker method, which returns an AsyncSession object for committing and rolling back transactions.

We are going to use init and close methods in FastAPI's lifespan event, to run it during startup and shutdown of our application.

The benefits of this approach are:

You can connect to as many databases as needed, which was a problem for me with the middleware approach. (just create different DatabaseSessionManager for each database)
Your DB connections are released at application shutdown instead of garbage collection, which means you won't run into issues if you use uvicorn --reload
Your DB sessions will automatically be closed when the route using session dependency finishes, so any uncommitted operations will be rolled back.

Then, we need to create a FastAPI dependency that will be used to get the database session. This dependency will be used in the API views:

# orm/session_manager.py
async def get_session() -> AsyncSession:
    async with db_manager.session() as session:
        yield session

And we're done with the database configuration. Now we can create the database models (I just used the model from SQLAlchemy tutorial):

# orm/user_model.py

from typing import Optional

from sqlalchemy import String
from sqlalchemy.orm import Mapped, mapped_column

from .base_model import OrmBase


class User(OrmBase):
    __tablename__ = "user_account"

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(30))
    fullname: Mapped[Optional[str]]

    def __repr__(self) -> str:
        return f"User(id={self.id!r}, name={self.name!r}, fullname={self.fullname!r})"

It's the most modern form of Declarative, which is driven from PEP 484 type annotations using a special type Mapped, which indicates attributes to be mapped as particular types.

Working with Database Metadata — SQLAlchemy 2.0 Documentation

The Database Toolkit for Python

Read more about Declaring Mapped Classes

# orm/__init__.py
"""
Data structures, used in project.

Add your new models here so Alembic could pick them up.

You may do changes in tables, then execute
`alembic revision --message="Your text" --autogenerate`
and alembic would generate new migration for you
in alembic/versions folder.
"""
from .base_model import OrmBase
from .session_manager import db_manager, get_session
from .user_model import User

__all__ = ["OrmBase", "get_session", "db_manager", "User"]

I use dunder init file in orm module to be able to use import orm and then call objects like orm.db_manager, orm.User etc... This approach substantially simplifies the distinction between your SQLAlchemy models and your business-oriented models.

Creating the API views

Now that we have the database configuration and models set up, we can create the API views.

For simplicity I'm going to put all API related models and functions in one file so you can check it easily. In real life you probably should consider segmentation for organizational clarity.

Let's start by creating a models for validating incoming API request and providing a response:

# api/user.py
from pydantic import BaseModel, ConfigDict, Field

class UserCreateRequest(BaseModel):
    name: str = Field(max_length=30)
    fullname: str


class UserResponse(BaseModel):
    id: int
    name: str
    fullname: str

    model_config = ConfigDict(from_attributes=True)

Then you need to return a list of users, many people opt for a simplistic approach such as responding with something like list[User] . After some time they need to add some additional information to such endpoint, but it can't be done easily.

So it's better to use more flexible response structure from the beginning, like:

# api/user.py
class APIUserResponse(BaseModel):
    status: Literal['ok'] = 'ok'
    data: UserResponse


class APIUserListResponse(BaseModel):
    status: Literal['ok'] = 'ok'
    data: list[UserResponse]

And, finally, our API views:

Few extra notes:

Our UserResponse model has every field which is already json-compatible (strings and ints). That's why we can use .model_dump() . If your model have some types like UUID , datetime, other classes, - you can use .model_dump(mode='json') and pydantic will automatically convert output values to be json-supported types.
I prefer to return Response directly than use FastAPI's response_model conversion. For me it's more convenient plus it's actually faster. (You could check https://github.com/falkben/fastapi_experiments/ -> orjson_response.py)
For the sake of simplicity I do ORM queries right in api views. In bigger project it's better to create additional service layer and put all your orm/sql queries in one module. If such queries spread through your code base you are going to regret it later.

# api/user.py
import uuid
from typing import Literal

from fastapi import APIRouter, Depends, status
from fastapi.responses import JSONResponse
from pydantic import BaseModel, ConfigDict, Field
from sqlalchemy import select
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.asyncio import AsyncSession

import orm


class UserCreateRequest(BaseModel):
    name: str = Field(max_length=30)
    fullname: str


class UserResponse(BaseModel):
    id: int
    name: str
    fullname: str

    model_config = ConfigDict(from_attributes=True)


class APIUserResponse(BaseModel):
    status: Literal["ok"] = "ok"
    data: UserResponse


class APIUserListResponse(BaseModel):
    status: Literal["ok"] = "ok"
    data: list[UserResponse]


router = APIRouter()


@router.get("/{user_id}/", response_model=APIUserResponse)
async def get_user(
    user_id: int, session: AsyncSession = Depends(orm.get_session)
) -> JSONResponse:
    user = await session.get(orm.User, user_id)
    if not user:
        return JSONResponse(
            content={"status": "error", "message": "User not found"},
            status_code=status.HTTP_404_NOT_FOUND,
        )
    response_model = UserResponse.model_validate(user)
    return JSONResponse(
        content={
            "status": "ok",
            "data": response_model.model_dump(),
        }
    )


@router.get("/", response_model=APIUserListResponse)
async def get_users(session: AsyncSession = Depends(orm.get_session)) -> JSONResponse:
    users_results = await session.scalars(select(orm.User))
    response_data = [
        UserResponse.model_validate(u).model_dump() for u in users_results.all()
    ]
    return JSONResponse(
        content={
            "status": "ok",
            "data": response_data,
        }
    )


@router.post("/", response_model=APIUserResponse, status_code=status.HTTP_201_CREATED)
async def create_user(
    user_data: UserCreateRequest, session: AsyncSession = Depends(orm.get_session)
) -> JSONResponse:
    user_candidate = orm.User(**user_data.model_dump())
    session.add(user_candidate)
    # I skip error handling
    await session.commit()
    await session.refresh(user_candidate)
    response_model = UserResponse.model_validate(user_candidate)
    return JSONResponse(
        content={
            "status": "ok",
            "data": response_model.model_dump(),
        },
        status_code=status.HTTP_201_CREATED,
    )

Here we have a simple FastAPI router with three API views: get_user, get_users and create_user. Notice that we're using the Depends keyword to inject the database async session into the API views. This is how we can use the database session in the API views.

Setting up FastAPI

Now that we have the API views set up, we can create the FastAPI application.

# main.py
import contextlib
from typing import AsyncIterator

import uvicorn
from fastapi import FastAPI

import orm
from api import user
from app.settings import settings


@contextlib.asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
    orm.db_manager.init(settings.database_url)
    yield
    await orm.db_manager.close()


app = FastAPI(title="Very simple example", lifespan=lifespan)
app.include_router(user.router, prefix="/api/users", tags=["users"])

if __name__ == "__main__":
    # There are a lot of parameters for uvicorn, you should check the docs
    uvicorn.run(
        app,
        host=settings.app_host,
        port=settings.app_port,
    )

In order for us to run our application, first we'll need to create our database tables. Let's see how we can do that using Alembic.

Migrations with Alembic

To start with alembic, we can use the alembic init command to create the alembic configuration. We'll use the async template for this:

alembic init -t async alembic

This will create the alembic directory with the alembic configuration. We'll need to make a few changes to the configuration.

alembic.ini

Uncomment the file_template line so names of the migrations will be more user- friendly and with dates so we can sort them.

Remove sqlalchemy.url string because we are going to set this parameter via alembic/env.py

alembic/env.py

First, we'll need to import our database models so that they're added to the Base.metadata object. This happens automatically when the model inherits from OrmBase, but we need to import the models to ensure that they're imported before the alembic configuration is loaded. Because we put all models into orm/__init__.py we can do import orm and models will be loaded.

Then, we need to set the sqlalchemy.url configuration to use our database connection string.

Important note - We are going to generate Alembic configuration for tests, so we need to be careful and not to rewrite sqlalchemy.url if it's already set.

And finally, we'll point the target metadata to our Base.metadata object.

Below I'll show the changes we need to make to the alembic/env.py file:

# alembic/env.py

import orm
from app.settings import settings
current_url = config.get_main_option('sqlalchemy.url', None)
if not current_url:
    config.set_main_option("sqlalchemy.url", settings.database_url)

target_metadata = orm.OrmBase.metadata

Then, we're able to run the alembic revision command to create a new revision:

alembic revision --autogenerate -m "Add user model"

This will create a new revision file in the alembic/versions directory. We can then run the alembic upgrade head command to apply the migration to the database:

alembic upgrade head

To revert the last migration you could use

alembic downgrade -1

Tutorial — Alembic 1.11.2 documentation

More about alembic commands

Starting the server

To start the server, runpython main.py. This will start the server on port 8000 by default. The docs will be available at http://localhost:8000/docs. You should be able to see and run any of the API views that we've created.

This should be enough to start using FastAPI with SQLAlchemy 2.0. However, one important component of software development is testing, so let's see how we can test our API views.

Testing the API views

For this section, my focus is primarily on demonstrating the mechanics of integration testing with FastAPI and SQLAlchemy 2.0. This means that our tests will call the API views and check the responses. While we won't be testing the database models, it's worth noting that a similar setup can be applied for such scenarios as well.

We'll start with the helper functions we are going to need:

Thesqlalchemy_utils package have the two very useful functions - create_database and drop_database.

Regrettably, these functions are synchronous and incompatible with the asyncpg driver. This typically leads tutorials to recommend the installation of psycopg2 and the adoption of a separate synchronous engine for database creation. However, in the spirit of experimentation, we can just slightly modify such functions so they can use create_async_engine

You can see them in Github repo

Next utils are:

# tests/db_utils.py
import contextlib
import uuid
from argparse import Namespace
from pathlib import Path
from typing import AsyncIterator, Optional, Union

import sqlalchemy as sa
from sqlalchemy.ext.asyncio import create_async_engine
from sqlalchemy_utils.functions.database import (
    _set_url_database,
    _sqlite_file_exists,
    make_url,
)
from sqlalchemy_utils.functions.orm import quote
from yarl import URL

from alembic.config import Config as AlembicConfig
from app.settings import settings


def make_alembic_config(
    cmd_opts: Namespace, base_path: Union[str, Path] = settings.project_root
) -> AlembicConfig:
    # Replace path to alembic.ini file to absolute
    base_path = Path(base_path)
    if not Path(cmd_opts.config).is_absolute():
        cmd_opts.config = str(base_path.joinpath(cmd_opts.config).absolute())
    config = AlembicConfig(
        file_=cmd_opts.config,
        ini_section=cmd_opts.name,
        cmd_opts=cmd_opts,
    )
    # Replace path to alembic folder to absolute
    alembic_location = config.get_main_option("script_location")
    if not Path(alembic_location).is_absolute():
        config.set_main_option(
            "script_location", str(base_path.joinpath(alembic_location).absolute())
        )
    if cmd_opts.pg_url:
        config.set_main_option("sqlalchemy.url", cmd_opts.pg_url)
    return config


def alembic_config_from_url(pg_url: Optional[str] = None) -> AlembicConfig:
    """Provides python object, representing alembic.ini file."""
    cmd_options = Namespace(
        config="alembic.ini",  # Config file name
        name="alembic",  # Name of section in .ini file to use for Alembic config
        pg_url=pg_url,  # DB URI
        raiseerr=True,  # Raise a full stack trace on error
        x=None,  # Additional arguments consumed by custom env.py scripts
    )
    return make_alembic_config(cmd_opts=cmd_options)


@contextlib.asynccontextmanager
async def tmp_database(db_url: URL, suffix: str = "", **kwargs) -> AsyncIterator[str]:
    """Context manager for creating new database and deleting it on exit."""
    tmp_db_name = ".".join([uuid.uuid4().hex, "tests-base", suffix])
    tmp_db_url = str(db_url.with_path(tmp_db_name))
    await create_database_async(tmp_db_url, **kwargs)
    try:
        yield tmp_db_url
    finally:
        await drop_database_async(tmp_db_url)

# Next functions are copied from `sqlalchemy_utils` and slightly 
# modified to support async. Maybe 
async def create_database_async(
    url: str, encoding: str = "utf8", template: Optional[str] = None
) -> None:
    url = make_url(url)
    database = url.database
    dialect_name = url.get_dialect().name
    dialect_driver = url.get_dialect().driver

    if dialect_name == "postgresql":
        url = _set_url_database(url, database="postgres")
    elif dialect_name == "mssql":
        url = _set_url_database(url, database="master")
    elif dialect_name == "cockroachdb":
        url = _set_url_database(url, database="defaultdb")
    elif not dialect_name == "sqlite":
        url = _set_url_database(url, database=None)

    if (dialect_name == "mssql" and dialect_driver in {"pymssql", "pyodbc"}) or (
        dialect_name == "postgresql"
        and dialect_driver in {"asyncpg", "pg8000", "psycopg2", "psycopg2cffi"}
    ):
        engine = create_async_engine(url, isolation_level="AUTOCOMMIT")
    else:
        engine = create_async_engine(url)

    if dialect_name == "postgresql":
        if not template:
            template = "template1"

        async with engine.begin() as conn:
            text = "CREATE DATABASE {} ENCODING '{}' TEMPLATE {}".format(
                quote(conn, database), encoding, quote(conn, template)
            )
            await conn.execute(sa.text(text))

    elif dialect_name == "mysql":
        async with engine.begin() as conn:
            text = "CREATE DATABASE {} CHARACTER SET = '{}'".format(
                quote(conn, database), encoding
            )
            await conn.execute(sa.text(text))

    elif dialect_name == "sqlite" and database != ":memory:":
        if database:
            async with engine.begin() as conn:
                await conn.execute(sa.text("CREATE TABLE DB(id int)"))
                await conn.execute(sa.text("DROP TABLE DB"))

    else:
        async with engine.begin() as conn:
            text = f"CREATE DATABASE {quote(conn, database)}"
            await conn.execute(sa.text(text))

    await engine.dispose()


async def drop_database_async(url: str) -> None:
    url = make_url(url)
    database = url.database
    dialect_name = url.get_dialect().name
    dialect_driver = url.get_dialect().driver

    if dialect_name == "postgresql":
        url = _set_url_database(url, database="postgres")
    elif dialect_name == "mssql":
        url = _set_url_database(url, database="master")
    elif dialect_name == "cockroachdb":
        url = _set_url_database(url, database="defaultdb")
    elif not dialect_name == "sqlite":
        url = _set_url_database(url, database=None)

    if dialect_name == "mssql" and dialect_driver in {"pymssql", "pyodbc"}:
        engine = create_async_engine(url, connect_args={"autocommit": True})
    elif dialect_name == "postgresql" and dialect_driver in {
        "asyncpg",
        "pg8000",
        "psycopg2",
        "psycopg2cffi",
    }:
        engine = create_async_engine(url, isolation_level="AUTOCOMMIT")
    else:
        engine = create_async_engine(url)

    if dialect_name == "sqlite" and database != ":memory:":
        if database:
            os.remove(database)
    elif dialect_name == "postgresql":
        async with engine.begin() as conn:
            # Disconnect all users from the database we are dropping.
            version = conn.dialect.server_version_info
            pid_column = "pid" if (version >= (9, 2)) else "procpid"
            text = """
            SELECT pg_terminate_backend(pg_stat_activity.{pid_column})
            FROM pg_stat_activity
            WHERE pg_stat_activity.datname = '{database}'
            AND {pid_column} <> pg_backend_pid();
            """.format(
                pid_column=pid_column, database=database
            )
            await conn.execute(sa.text(text))

            # Drop the database.
            text = f"DROP DATABASE {quote(conn, database)}"
            await conn.execute(sa.text(text))
    else:
        async with engine.begin() as conn:
            text = f"DROP DATABASE {quote(conn, database)}"
            await conn.execute(sa.text(text))

    await engine.dispose()

Let's start by creating a conftest.py file in the root of our tests/integration directory. This file will be responsible for setting up the test database. Since this is an intricate setup, let's break it down into smaller pieces. We'll start with the imports:

# tests/conftest.py
from typing import Optional

import pytest
from httpx import AsyncClient
from yarl import URL

import orm
from alembic.command import upgrade
from app.settings import settings
from orm.session_manager import db_manager
from tests.db_utils import alembic_config_from_url, tmp_database

There isn't much action going here. We're importing the necessary packages. Let's move on and create our app and client fixtures, used to create the FastAPI test application and test client:

# tests/conftest.py
@pytest.fixture()
def app():
    from main import app

    yield app


@pytest.fixture()
async def client(session, app):
    async with AsyncClient(app=app, base_url="http://test") as client:
        yield client

Because we use FastAPI, and it uses anyio we can use it in our tests. Many people are using pytest-asyncio. To skip testing on trio eventloop we need to create a new fixture. With it we also can just write async def test_... test functions without marking them additionally.

# tests/conftest.py
@pytest.fixture(scope="session", autouse=True)
def anyio_backend():
    return "asyncio", {"use_uvloop": True}

And now we're ready to create the database connection and session. Our test connection will be scoped to the session, so that we can use the same connection for all the tests, as it's best practice to avoid creating a new connection for each test, or even request.

# tests/conftest.py

@pytest.fixture(scope="session")
def pg_url():
    """Provides base PostgreSQL URL for creating temporary databases."""
    return URL(settings.database_url)


@pytest.fixture(scope="session")
async def migrated_postgres_template(pg_url):
    """
    Creates temporary database and applies migrations.

    Has "session" scope, so is called only once per tests run.
    """
    async with tmp_database(pg_url, "pytest") as tmp_url:
        alembic_config = alembic_config_from_url(tmp_url)
        # sometimes we have so called data-migrations.
        # they can call different db-related functions etc..
        # so we modify our settings
        settings.database_url = tmp_url
        
        # It is important to always close the connections at the end of such migrations,
        # or we will get errors like `source database is being accessed by other users`        

        upgrade(alembic_config, "head")

        yield tmp_url


@pytest.fixture(scope="session")
async def sessionmanager_for_tests(migrated_postgres_template):
    db_manager.init(db_url=migrated_postgres_template)
    # can add another init (redis, etc...)
    yield db_manager
    await db_manager.close()


@pytest.fixture()
async def session(sessionmanager_for_tests):
    async with db_manager.session() as session:
        yield session

    # Clean tables after each test. I tried:
    # 1. Create new database using an empty `migrated_postgres_template` as template
    # (postgres could copy whole db structure)
    # 2. Do TRUNCATE after each test.
    # 3. Do DELETE after each test.
    # DELETE FROM is the fastest
    # https://www.lob.com/blog/truncate-vs-delete-efficiently-clearing-data-from-a-postgres-table
    # BUT DELETE FROM query does not reset any AUTO_INCREMENT counter
    async with db_manager.connect() as conn:
        for table in reversed(orm.OrmBase.metadata.sorted_tables):
            # Clean tables in such order that tables which depend on another go first
            await conn.execute(table.delete())
        await conn.commit()

DELETE FROM does not reset any AUTO_INCREMENT counter so our user.id attribute is going to increase during single tests run. You should consider if it's bad for you or not. For me it's no problem, I don't want to switch to TRUNCATE.

Now we can write our first and simple test

# tests/test_orm_works.py

from sqlalchemy import text

import orm


async def test_orm_session(session):
    user = orm.User(
        name="Michael",
        fullname="Michael Test Jr.",
    )
    session.add(user)
    await session.commit()

    rows = await session.execute(text('SELECT id, name, fullname FROM "user_account"'))
    result = list(rows)[0]
    assert isinstance(result[0], int)
    assert result[1] == "Michael"
    assert result[2] == "Michael Test Jr."

You could run pytest and it works.

But we're not done yet. We need to add very useful Stairway test and doing so we will face a new challenges with Alembic.

Stairway test

Simple and efficient method to check that migration does not have typos and rolls back all schema changes. Does not require maintenance - you can add this test to your project once and forget about it.

In particular, test detects the data types, that were previously created by upgrade() method and were not removed by downgrade(): when creating a table/column, Alembic automatically creates custom data types specified in columns (e.g. enum), but does not delete them when deleting table or column - developer has to do it manually.

How it works

Test retrieves all migrations list, and for each migration executes upgrade, downgrade, upgrade Alembic commands.

Let's add new test package migrations and create fixtures

# tests/migrations/conftest.py

import pytest
from sqlalchemy.ext.asyncio import create_async_engine

from tests.db_utils import alembic_config_from_url, tmp_database


@pytest.fixture()
async def postgres(pg_url):
    """
    Creates empty temporary database.
    """
    async with tmp_database(pg_url, "pytest") as tmp_url:
        yield tmp_url


@pytest.fixture()
async def postgres_engine(postgres):
    """
    SQLAlchemy engine, bound to temporary database.
    """
    engine = create_async_engine(
        url=postgres,
        pool_pre_ping=True,
    )
    try:
        yield engine
    finally:
        await engine.dispose()


@pytest.fixture()
def alembic_config(postgres):
    """
    Alembic configuration object, bound to temporary database.
    """
    return alembic_config_from_url(postgres)

And the test itself:

# tests/migrations/test_stairway.py

"""
Test can find forgotten downgrade methods, undeleted data types in downgrade
methods, typos and many other errors.

Does not require any maintenance - you just add it once to check 80% of typos
and mistakes in migrations forever.
"""
import pytest

from alembic.command import downgrade, upgrade
from alembic.config import Config
from alembic.script import Script, ScriptDirectory
from tests.db_utils import alembic_config_from_url


def get_revisions():
    # Create Alembic configuration object
    # (we don't need database for getting revisions list)
    config = alembic_config_from_url()

    # Get directory object with Alembic migrations
    revisions_dir = ScriptDirectory.from_config(config)

    # Get & sort migrations, from first to last
    revisions = list(revisions_dir.walk_revisions("base", "heads"))
    revisions.reverse()
    return revisions


@pytest.mark.parametrize("revision", get_revisions())
def test_migrations_stairway(alembic_config: Config, revision: Script):
    upgrade(alembic_config, revision.revision)

    # We need -1 for downgrading first migration (its down_revision is None)
    downgrade(alembic_config, revision.down_revision or "-1")
    upgrade(alembic_config, revision.revision)

Running pytest again we will get an error:

E RuntimeError: asyncio.run() cannot be called from a running event loop

That's because inside upgrade command alembic use asyncio.run to run migrations via asyncpg driver. That works just fine then we run migration commands from command line, but during test run an active asyncio event loop is already in place and we can't use asyncio.run

We are definitely don't want to rewrite alembic internals. But we need some way to run an async function from sync run_migrations_online while eventloop is already running.

💡

There is better way to fix it. Solution below is kept for historical purposes. Better solution is added below (I mark it with the same callout)

I decided do the following:

check for running eventloop, it there is none, we can run standard alembic's way
if there is an eventloop we can use asyncio.create_task to wrap our migration command.
the problem is that we need somehow to await this task inside our pytest fixture, while creating it during alembic upgrade command.
to solve this problem I decided to add a new variable to conftest.py and set it from alembic. Yeap, it's kind of a global variable, but I fail to find more elegant solution.

# tests/conftest.py
#... add to the end
MIGRATION_TASK: Optional[Task] = None

@pytest.fixture(scope="session")
async def migrated_postgres_template(pg_url):
    """
    Creates temporary database and applies migrations.

    Has "session" scope, so is called only once per tests run.
    """
    async with tmp_database(pg_url, "pytest") as tmp_url:
        alembic_config = alembic_config_from_url(tmp_url)
        upgrade(alembic_config, "head")

        await MIGRATION_TASK # added line

        yield tmp_url

# alembic/env.py

def run_migrations_online() -> None:
    """Run migrations in 'online' mode."""
    try:
        current_loop = asyncio.get_running_loop()
    except RuntimeError:
        # there is no loop, can use asyncio.run
        asyncio.run(run_async_migrations())
        return

    from tests import conftest
    conftest.MIGRATION_TASK = asyncio.create_task(run_async_migrations())

Everythig should be fine, right?! Right?!

Well, not quite. We've got ourselves a new error:

E       AttributeError: 'NoneType' object has no attribute 'configure'

../alembic/env.py:61: AttributeError

What is going on? After reading sources of alembic it's get clear:

When we run upgrade command, alembic loads some data into context object using special context-manager:

with EnvironmentContext(...):
    script.run_env()

run_env method loads our alembic/env.py and invokes run_migrations_online()
We create asyncio Task and return from run_migrations_online. It ends the context manager and clears context object. So when we are trying to actually run some code inside this task, we already don't have some parameters. context is None and that's why we got an error shown before.

So we need somehow to pass data into our async task. To do that I decided to use contextvars. If we create contexvars before creating asyncio.Task then this task will get a copy of contextvars and will be able to use them.

Let's start with import and process of setting the context variable.

# alembic/env.py
from contextvars import ContextVar

ctx_var: ContextVar[dict[str, Any]] = ContextVar("ctx_var")


def run_migrations_online() -> None:
    """Run migrations in 'online' mode."""

    try:
        current_loop = asyncio.get_running_loop()
    except RuntimeError:
        # there is no loop, can use asyncio.run
        asyncio.run(run_async_migrations())
        return
    from tests import conftest
    ctx_var.set({
        "config": context.config,
        "script": context.script,
        "opts": context._proxy.context_opts,  # type: ignore
    })
    conftest.MIGRATION_TASK = asyncio.create_task(run_async_migrations())

Next step - using this contextvar

# alembic/env.py

def do_run_migrations(connection: Connection) -> None:
    try:
        context.configure(connection=connection, target_metadata=target_metadata)

        with context.begin_transaction():
            context.run_migrations()
    except AttributeError:
        context_data = ctx_var.get()
        with EnvironmentContext(
                config=context_data["config"],
                script=context_data["script"],
                **context_data["opts"],
        ):
            context.configure(connection=connection, target_metadata=target_metadata)
            with context.begin_transaction():
                context.run_migrations()

That's it. Now you can run pytest and see that everything is ok.

💡

Better solution for running Alembic's upgrade command from tests (added on December 2025)

Async alembic will call asyncio.run + set some context using context managers
So we just need to run it into different thread with its own eventloop and wait for it to finish.

So just change conftest.py

from concurrent.futures.thread import ThreadPoolExecutor

@pytest.fixture(scope="session")
async def migrated_postgres_template(pg_url):
    """
    Creates temporary database and applies migrations.

    Has "session" scope, so is called only once per tests run.
    """
    async with tmp_database(pg_url, "pytest") as tmp_url:
        alembic_config = alembic_config_from_url(tmp_url)

        # here!
        #
        with ThreadPoolExecutor() as thread_pool:
            thread_pool.submit(upgrade, alembic_config, 'head').result()

        yield tmp_url

And there is no need to change env.py at all!

💡

End of added content on December 2025

This setup allow you to use pytest-xdist. This package splits your test suite into chunks and runs each chunk in different process. It could speed up you tests if you have a lot of them. And because each process will create its own unique test database it works without any problems.

Finally, just some clumsy integration test to show than our API is working

# tests/test_api.py

from fastapi import status


async def test_my_api(client, app):
    # Test to show that api is working
    response = await client.get("/api/users/")
    assert response.status_code == status.HTTP_200_OK
    assert response.json() == {"status": "ok", "data": []}

    response = await client.post(
        "/api/users/",
        json={"email": "test@example.com", "full_name": "Full Name Test"},
    )
    assert response.status_code == status.HTTP_201_CREATED
    new_user_id = response.json()["data"]["id"]

    response = await client.get(f"/api/users/{new_user_id}/")
    assert response.status_code == status.HTTP_200_OK
    assert response.json() == {
        "status": "ok",
        "data": {
            "id": new_user_id,
            "email": "test@example.com",
            "full_name": "Full Name Test",
        },
    }

    response = await client.get("/api/users/")
    assert response.status_code == status.HTTP_200_OK
    assert len(response.json()["data"]) == 1

$ pytest . --vv -x
===================================================================================== test session starts =====================================================================================
platform darwin -- Python 3.9.15, pytest-7.4.0, pluggy-1.2.0 -- /venv-8ZwWMPCX-py3.9/bin/python
cachedir: .pytest_cache
rootdir: /Users/something/blog_article_v2
plugins: anyio-3.7.1
collected 3 items                                                                                                                                                                             

tests/test_api.py::test_my_api PASSED                                                                                                                                                   [ 33%]
tests/test_orm_works.py::test_orm_session PASSED                                                                                                                                        [ 66%]
tests/migrations/test_stairway.py::test_migrations_stairway[revision0] PASSED                                                                                                           [100%]

====================================================================================== 3 passed in 1.48s ======================================================================================

Hope it is going to be useful to someone and Google will index this article someday. ^{^.^} Have a nice day!

Multistage docker build

Dmitry Plevkov — Tue, 24 Aug 2021 21:27:44 GMT

Imagine you have a python application which is used psycopg2package.

To install this package you need to have libpq-dev system library as well as a C compiler installed. (Yes you can install psycopg2-binary without problems, but it doesn't really matter which library to choose as an example).

Your Dockerfile might look similar to this. (I use venv to help with multistage build later)

FROM python:3.9.6-slim-buster
# I create venv outside the workdir
# so even if we mount local folder to docker
# it won't be affected.
RUN python3 -m venv /opt/.venv
# ensure that virtualenv will be active
ENV PATH="/opt/.venv/bin:$PATH"

RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get -y install --no-install-recommends libpq-dev build-essential && \
    rm -rf /var/lib/apt/lists/*

# Install dependencies:
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY main.py .
CMD ["python", "main.py"]

Dockerfile

import psycopg2

# Connect to your postgres DB
conn = psycopg2.connect(dbname="test",
                        user="postgres",
                        password="secret",
                        host="db")

# Open a cursor to perform database operations
cur = conn.cursor()
cur.execute("SELECT now();")

main.py

psycopg2==2.9.1

requirements.txt

If you run docker build -t multistage . the image will be around 347MB.

Actually we don't need `build-essential` for our app, but we have to keep it because of docker layered filesystem. Maybe multistage approach will help? It definitely will! Let's take a look

FROM python:3.9.6-slim-buster AS build-base
RUN python3 -m venv /opt/.venv
# ensure that virtualenv will be active
ENV PATH="/opt/.venv/bin:$PATH"

RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get -y install --no-install-recommends libpq-dev build-essential && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Install dependencies:
COPY requirements.txt .
RUN pip install -r requirements.txt

FROM python:3.9.6-slim-buster AS release
WORKDIR /code/
ENV PATH="/opt/.venv/bin:$PATH"
# Copy only virtualenv with all packages
COPY --from=build-base /opt/.venv /opt/.venv
# Run the application:
COPY main.py .
CMD ["python", "main.py"]

Dockerfile simple multistage approach

The new image takes only 128 mb! The only problem - it doesn't work

our code won't even start

What is libpq.so.5 ? It's a shared library. It's a piece of C code from postgresql driver which psycopg2 uses under the hood (simular to .dll files in Windows) To get it we need to install libpq5 system library via apt-get install

So how one can find such dependencies? The idea is simple.

Scan all files inside our virtualenv and find all .so and executable files. (because its the only files which can relate to other shared libraries)
use ldd command to understand which shared libraries are used by such files
with dpkg -S we can know the name of the system package contains needed shared library. (dpkg is for debian-based images, but other disctos has its own simular commands)

I created a python script with all this deps

import stat
import subprocess
import sys
from pathlib import Path
from typing import List, Optional, Generator

EXECUTABLE_PERMISSIONS = stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH

def is_executable(filepath: Path) -> bool:
    return bool(filepath.stat().st_mode & EXECUTABLE_PERMISSIONS)

def find_all_executable_or_so_libs(venv_dir: Path) -> List[Path]:
    executable_files = set()
    for f in venv_dir.rglob('*'):
        if f.is_dir():
            continue
        if f.name.endswith('.so') or is_executable(f):
            executable_files.add(f)
    return sorted(list(executable_files))

def extract_lib_paths(dynamic_str: bytes) -> Optional[str]:
    """

    >>> extract_lib_paths(b"linux-vdso.so.1 (0x00007ffee695e000)")

    >>> extract_lib_paths(b"libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f1475154000)")
    '/usr/lib/libpthread.so.0'
    >>> extract_lib_paths(b"libpthread.so.0 => libpthread.so.0 (0x00007f1475154000)")

    """
    if b'=>' not in dynamic_str:
        return
    decoded_path = dynamic_str.decode(encoding='utf-8').strip()
    dyn_lib_path = decoded_path.split()[2].strip()
    if '/' not in dyn_lib_path:
        return
    return dyn_lib_path


def find_linked_libs(filepaths: List[Path]) -> List[str]:
    result = set()
    for interesting_file in filepaths:
        p = subprocess.Popen(['ldd', interesting_file.absolute()], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        for ln in p.stdout:
            lib_path = extract_lib_paths(ln)
            if lib_path:
                result.add(lib_path)
    return sorted(list(result))


def who_owns_debian(lib_path: str) -> Generator[str, None, None]:
    p = subprocess.Popen(['dpkg', '-S', lib_path], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    lines = p.stdout.readlines()
    for line in lines:
        line = line.decode('utf-8')
        if '/' in line and ':' in line:
            package_name = line.split(':')[0]
            yield package_name


def collect_package_names_debian(shared_libs_paths: List[str]):
    total_names = set()
    for one_lib_path in shared_libs_paths:
        for pkg_name in who_owns_debian(one_lib_path):
            total_names.add(pkg_name) if pkg_name else None
    return sorted(list(total_names))

def main(source_dir):
    source_dir = Path(source_dir)
    interesting_files = find_all_executable_or_so_libs(source_dir)
    shared_libs = find_linked_libs(interesting_files)
    all_names = collect_package_names_debian(shared_libs)
    for name in all_names:
        print(name)


if __name__ == '__main__':
    main(sys.argv[1])

find_deps.py

For other distros you only need to change who_owns function to appropriate command and result parsing

python find_deps.py /opt/.venv - be sure not to remove any build-dependencies before that because it will affect result.

COPY requirements.txt .
RUN pip install -r requirements.txt
COPY find_deps.py find_deps.py
RUN python find_deps.py $VIRTUAL_ENV

new steps in build-base

You can save output as a file

RUN python find_deps.py $VIRTUAL_ENV > sys_deps.txt

And during next stage use automatically install them like so:

COPY --from=build-base /sys_deps.txt /sys_deps.txt
RUN cat /sys_deps.txt | xargs apt-get install -y

But I prefer to place it by hand. Final multistage dockerfile is bellow

FROM python:3.9.6-slim-buster AS build-base
ENV VIRTUAL_ENV=/opt/.venv
RUN python3 -m venv $VIRTUAL_ENV
# ensure that virtualenv will be active
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get -y install --no-install-recommends libpq-dev build-essential && \
    rm -rf /var/lib/apt/lists/*

# Install dependencies:
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY find_deps.py find_deps.py
RUN python find_deps.py $VIRTUAL_ENV

FROM python:3.9.6-slim-buster AS release
WORKDIR /code/
ENV PATH="/opt/.venv/bin:$PATH"
# install system dependencies
RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get -y install --no-install-recommends libpq5 && \
    rm -rf /var/lib/apt/lists/*

COPY --from=build-base /opt/.venv /opt/.venv
# Run the application:
COPY main.py .
CMD ["python", "main.py"]

Dockerfile final

Step	Image size	Working app?
Full build	347 MB	YES
Multistage (only venv)	128 MB	NO
Multistage (venv + system deps)	138 MB	YES

I think it is worth the trouble.

openpyxl can't open xlsx file

Dmitry Plevkov — Tue, 20 Apr 2021 07:28:20 GMT

Yesterday I've got a very cryptic error.

>>> from openpyxl import load_workbook
>>> wb = load_workbook('super-important.xlsx')

  File ".../python3.9/site-packages/openpyxl/descriptors/base.py", line 128, in __set__
    raise ValueError(self.__doc__)
ValueError: Value must be one of {'lessThanOrEqual', 'lessThan', 'equal', 'greaterThan', 'notEqual', 'greaterThanOrEqual'}

After I placed breakpoint inside openpyxl/descriptors/base.py and run import with debugger I realized that it was complaining about some element of type openpyxl.worksheet.filters.CustomFilter which had value = '**none**'

OOXML specification says that customFilter criteria has Filter Comparison Operator and it should be one of

Valid value	Description
equal	Equal
lessThan	Less Than
lessThanOrEqual	Less Than Or Equal
notEqual	Not Equal
greaterThanOrEqual	Greater Than Or Equal
greaterThan	Greater Than

It seems that my file just does not comply with the OOXML specification.

So the value `**none**` might be supported internally by Excel but it isn't the specification. Supporting the specification is the only sane way to develop the library so I can't blame openpyxl in any way.

Here is what I found inside my xlsx file in sheet1.xml

I've managed to work around this by perform Format -> Clear direct formatting (Ctrl+M) in LibreOffice.

Creating abstract method

Dmitry Plevkov — Mon, 19 Apr 2021 18:56:00 GMT

Very popular method to declare an abstract method in Python class is to use NotImplentedError exception:

class SomeWorker:
    def do_work(self):
        raise NotImplementedError

This method even has IDE support (I use PyCharm, so at least it supports it).

PyCharm shows an error then I try to inherit without implementing all abstract methods

The only downside of this approach is that you get the error only after method call.

>>> w = MyWorker()
# it's ok
>>> w.do_work()

NotImplementedError

It would be much better to know about the problem right after class instantiation.

Python's abstract base classes are here to help

from abc import ABCMeta, abstractmethod

class SomeWorker(metaclass=ABCMeta):
    @abstractmethod
    def do_work(self):
        pass

For now if you subclass SomeWorker and doesn't override do_work - you will get an error right upon class instantiation

>>> w = MyWorker()
TypeError: Can't instantiate abstract class MyWorker with abstract methods do_work

Ignoring Exceptions

Dmitry Plevkov — Sat, 06 Mar 2021 10:19:08 GMT

Sometimes you don't care if some operation fails.

To ignore some exception, you usually do something like this:

some_list = [0, 1, 2, 3, 4]

try:
    print(some_list[42])
except IndexError:
   pass

That will work (without printing anything), but there is another way to do the same more expressively and explicitly:

from contextlib import suppress

some_list = [0, 1, 2, 3, 4]
with suppress(IndexError):
    print(some_list[42])

Cyberpunk 2077 & gpuapidx12error.cpp

Dmitry Plevkov — Thu, 07 Jan 2021 14:01:51 GMT

I was frustrated.

I have Intel i7-2600k & KFA2 GTX 1070 video card but I can't play Cyberpunk. After hours of googling I have already tried few different versions of video drivers and installed all available stuff from Windows Update. But results were the same.

I tried change language of the game - no results. It just didn't run at all!

gpuapidx12error.cpp(40)

After spending more time on the internet I got the answer. I had wrong Windows 10 version. I used Windows 10 LTSB Version 1607. I love LTSB version - it's more configurable, I was able to turn off some of telemetry, etc... And I didn't have problems with it for many years.

After upgrading to Windows 10 LTSC 1809 I'm finally able to visit the Night City. Yay!!

P.S. I just downloaded iso with LTSC version, unpacked it and run setup.exe from my previous system. Then I chose update version and after 20-30 minutes and few reboots my system was updated and all my programs were in place.

Mock's return_value & side effect

Dmitry Plevkov — Wed, 09 Dec 2020 20:14:46 GMT

Mock is an object that simulates the behavior of other object. In python we have a builtin mock module in unittest library.

from unittest import mock
>>> m = mock.Mock()
# You can try to read some non-existing attribute - it will just return mock.
>>> m.some_attribute


#  You can also try to call a method - it will return another mock too.
>>> m.process_data()

Mock have a return_value attribute to help you simulate specified behavior in tests. It allows to simply return a value.

>>> m = mock.Mock()
>>> m.process_data.return_value = 'The answer is 42'
>>> m.process_data()
'The answer is 42'
>>> m.process_data()
'The answer is 42'

You can assign anything - integers, string, tuples, dicts, classes, class instances, etc...

Side effect - a multifunctional tool

Mock's side_effect parameter allows to change the behavior of the mock. It accepts three types of values and changes its behavior accordingly.

side_effect = Exception

The mock will rise passed exception

>>> m.simulate_fail.side_effect = ValueError('Whoops...')
>>> m.simulate_fail()
Traceback (most recent call last):
[...]
File ".../lib/python3.8/unittest/mock.py", line 1140, in _execute_mock_call
    raise effect
ValueError: Whoops...

side_effect = Iterable

The mock will yield the values from this iterable on subsequent call

>>> m.my_attribute.side_effect = [5, 10, 42, ValueError('Something happened')]
>>> m.my_attribute()
5
>>> m.my_attribute()
10
>>> m.my_attribute()
42
>>> m.my_attribute()
Traceback (most recent call last):
[...]
File ".../lib/python3.8/unittest/mock.py", line 1140, in _execute_mock_call
    raise effect
ValueError: Something happened

side_effect = callable

The callable will be executed on each call with the parameters passed when calling the mocked method. Any callable will do, so it can be a function, or a class

# Class-based example 
class Person:
    def __init__(self, name):
        self.name = name

>>> m.my_attribute.side_effect = Person
>>> friend = m.my_attribute('Max')
>>> friend.name
'Max'
>>> repr(friend)
'<__main__.Person object at 0x7f71940ad2e0>'

# Function-based example
def log_calls(*args, **kwargs):
    print(f'Called with {args} and {kwargs}')

>>> m.another_attribute.side_effect = log_calls
>>> m.another_attribute()
Called with () and {}
>>> m.another_attribute(42, name='Peter', mood='Good')
Called with (42,) and {'name': 'Peter', 'mood': 'Good'}

Notes

pytest-mock - is a thin-wrapper around the mock package for easier use with pytest.

Django + Dramatiq + APScheduler

Dmitry Plevkov — Tue, 08 Dec 2020 15:08:14 GMT

Working on one of my django projects I had to do long-running computations in the background. For that I decided to use dramatiq - a very nice background task processing library.

After that I had a new task - we needed to run periodical processing tasks (import some data, calculate some statistics, etc...). So I needed some kind of scheduler to start those tasks on time.

APScheduler is the recommended scheduler to use with Dramatiq (dramatiq documentation)

Here are some approaches I've used and my discoveries.

Preparing steps

I installed the dramatiq, django-dramatiq, and APScheduler packages from pypi.
I created new django app via python manage.py startapp task_scheduler
I added my app into INSTALLED_APPS. (NOTE - I use this form instead of just writing task_scheduler to be able to use `AppConfig.ready()` function. You can read about in django documentation.

INSTALLED_APPS = [
	...
    'django_dramatiq',
    'task_scheduler.apps.TaskSchedulerConfig',
]

I configured django-dramatiq and started to work on periodical tasks.

Task example

For this article I will be using very simple task

import logging
import time

import dramatiq


@dramatiq.actor()
def process_user_stats():
    """Very simple task for demonstrating purpose."""
    logging.warning('Start my long-running task')
    time.sleep(5)
    logging.warning('Task is ended')

tasks.py (djang-dramatiq will auto-discover functions in this file)

import logging
import os

from .tasks import process_user_stats


def periodically_run_job():
	"""This task will be run by APScheduler. It can prepare some data and parameters and then enqueue background task."""
    logging.warning('It is time to start the dramatiq task')
    process_user_stats.send()

periodic_tasks.py

task_scheduler
├── __init__.py
├── admin.py
├── apps.py
├── migrations
│   ├── __init__.py
├── models.py
├── periodic_tasks.py
├── tasks.py
└── views.py

Directory structure of task_scheduler django app.

If you need my final solution - just click here

Simple and naive approach (not a good idea)

At first I've decided to use BackgroundScheduler class from apscheduler. This scheduler runs in the background using a separate thread. So it won't block the whole application. I updated periodic_tasks.py as follows:

import logging
import os

from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.triggers.cron import CronTrigger
from pytz import UTC

from .tasks import process_user_stats


def periodically_run_job():
    logging.warning('Starting dramatiq task')
    process_user_stats.send()


def start_scheduler():
    logging.warning(f'Starting background scheduler.')
    scheduler = BackgroundScheduler(timezone=UTC)
    every_minute = CronTrigger(minute=1, timezone=UTC)
    scheduler.add_job(periodically_run_job, every_minute)
    scheduler.start()

periodic_tasks.py

And then:

from django.apps import AppConfig


class TaskSchedulerConfig(AppConfig):
    name = 'task_scheduler'

    def ready(self):
        from .periodic_tasks import start_scheduler
        start_scheduler()

apps.py

But if you will start your django dev server - you will see two Starting background scheduler lines. And background tasks will be executed twice. It's because manage.py runserver runs django twice in two separate processes (one for serving requests and another to auto-reload), and each process executed our ready() function.

In production I use Gunicorn so it will fork the main process into additional worker processes and I will have 5 copies (depending on settings) of my BackgroundScheduler, and every task will be enqueued 5 times. Not good at all but this is not all.

How can I change Gunicorn settings to run only one BackgroundScheduler? (not a good idea too)

Well, you can start gunicorn with --preload option . This option means Load application code before the worker processes are forked. So our code will be executed in main process and only after that it will be forked. Why is this will help? Because look at start_scheduler()

def start_scheduler():
    # I create scheduler
    scheduler = BackgroundScheduler(timezone=UTC)
    ...
    # I run scheduler - It will create a new thread!
    scheduler.start()

After running this code the master gunicorn process will load the whole django project in memory, so it will execute `start_scheduler` once and a new thread is spun up in the background, which is responsible for scheduling jobs. After that gunicorn will call system's fork method. BUT forked processes do not inherit the threads of their parent so each worker doesn't run the BackgroundScheduler thread.

Are we good now? Well, kind-of. I have fixed running with gunicorn but completely forgot that I will need to run dramatiq-workers processes to actually run background tasks. And each of these processes will load whole project and run start_scheduler and I will have a bunch of schedulers again.

Custom dramatiq middleware

########################
# In periodic_tasks.py #
########################
# move scheduler to be a global object
_SCHEDULER = BackgroundScheduler(timezone=UTC)


def start_scheduler():
    every_minute = CronTrigger(minute=1, timezone=UTC)
    _SCHEDULER.add_job(periodically_run_job, every_minute)
    _SCHEDULER.start()

#################################
# new file custom_middleware.py #
#################################
class AntiScheduleMiddleware(dramatiq.Middleware):
    def before_worker_boot(self, broker, worker):
        from task_scheduler.periodic_tasks import _SCHEDULER
        _SCHEDULER.shutdown()

######################
# in django settings #
######################
DRAMATIQ_BROKER = {
    "BROKER": "dramatiq.brokers.redis.RedisBroker",
    "OPTIONS": {
        "url": "redis://localhost:6379",
    },
    "MIDDLEWARE": [
        "dramatiq.middleware.Prometheus",
        "dramatiq.middleware.AgeLimit",
        "dramatiq.middleware.TimeLimit",
        "dramatiq.middleware.Callbacks",
        "dramatiq.middleware.Retries",
        "django_dramatiq.middleware.DbConnectionsMiddleware",
        "django_dramatiq.middleware.AdminMiddleware",
        "task_scheduler.custom_middleware.AntiScheduleMiddleware"
    ]
}

It will work but it looks fragile and too complicated. We need not to forget about custom gunicorn settings and about custom dramatiq middleware. I don't like this kind of code at all.

My final solution

The easiest solution to understand and maintain, in my opinion, is to start a single scheduler in its own dedicated process. For this task we will use the blocking scheduler so only it will be running inside the process.

remove def ready(self): from TaskSchedulerConfig
remove start_scheduler()from periodic_tasks.py
create run_scheduler command

from django.core.management.base import BaseCommand, CommandError
from apscheduler.schedulers.background import BlockingScheduler
import pytz

from apscheduler.triggers.cron import CronTrigger

from task_scheduler.periodic_tasks import periodically_run_job


class Command(BaseCommand):
    help = 'Run blocking scheduler to create periodical tasks'

    def handle(self, *args, **options):
        self.stdout.write(self.style.NOTICE('Preparing scheduler'))
        scheduler = BlockingScheduler(timezone=pytz.UTC)
        every_day_at_05_05_utc = CronTrigger(hour=5, minute=5, timezone=pytz.UTC)
        scheduler.add_job(periodically_run_job, every_day_at_05_05_utc)
        # ... add another jobs
        self.stdout.write(self.style.NOTICE('Start scheduler'))
        scheduler.start()

File task_scheduler/management/commands/run_scheduler.py

To run scheduler we can use command python manage.py run_scheduler How and where to do it depends on deploy strategy.

Tilde `~` in python

Dmitry Plevkov — Sun, 01 Nov 2020 11:07:28 GMT

At first I was like wtf is that?

The usage of a tilde symbol I was able to remember were:

creating negated (NOT) query in django ORM like

from django.db.models import Q

# selects all poll objects with the publication year NOT EQUAL to 2005
Poll.objects.filter(~Q(pub_date__year=2005))

creating negated (NOT) query in django ORM

inverting boolean masks in pandas

#df = pd.DataFrame(...some data...)
# select all rows in which the column `content_type` contains the text `Specification`
df[df.content_type.str.contains('Specifications', case=False)]
# select all rows with no text `Specification` in the column named content_type
df[~df.content_type.str.contains('Specifications', case=False)]

inverting booleans masks in pandas library

In python ~ operator means bitwise NOT. It takes an integer and switch all bits 0 to 1 and 1 to 0. As wikipedia says `NOT x = -x − 1` (it's different for unsigned int but it's not our case). => ~0 = -0 - 1 = -1 and ~1 = -1 - 1 = -2

So in case of indexing a list the author of items[~index] wanted to take an element from the right side and use zero-based index from the right side too.

items = ['a', 'b', 'c', 'd', 'e', 'f']
#         0    1    2    3    4    5  # indexes
#        -6   -5   -4   -3   -2   -1  # negative indexes
#        ~5   ~4   ~3   ~2   ~1   ~0  # tilde indexes
items[0] == items[-6] == 'a'  # the first element
items[5] == items[-1] == 'f'  # the last element
items[~0] == items[-1] == 'f'
items[~1] == items[-2] == 'e'

It's a very strange way to do indexing. Please don't do that and just use minus indexes. It's much more common and easy to understand.

I have also learned that if you want to support ~ operator for your objects - you can implement magic method __invert__(self).

Rsync vs SCP

Dmitry Plevkov — Wed, 14 Oct 2020 12:03:03 GMT

I use rsync and scp commands to transfer files from one machine to another.

Recently I discovered a distinctive feature and write it here to not forget.

rsync is atomic. It does a copy into a temporary file and then renames this temp file.

scp on the other hand creates the filename initially. So in case of network problems your software can be very unhappy about file only half-done.

Future Plans

Dmitry Plevkov — Fri, 09 Oct 2020 10:23:19 GMT

So I decided to actually start this blog even though I didn't prepare all the text and posts. You can polish your text forever so I better start and will be adding new posts on the go.

Alongside writing tips and notes about python, one my goals is to add comments. I'd love to hear feedback on my posts.

This CMS has Disqus comments integration but I don't like this service. It's slow, adds a lot of junk scripts on your page, all your data is saved god knows where. It's just a piece of shit. I have found a much better option but I need some free time to run and integrate it here.

I also need to add a Privacy policy page because I will need it to activate social login etc. I respect your privacy and what is why I don't use Google Analytics, Yandex Metrika, etc... But I'm very curious about how many people will visit my site. So I installed Matomo - an open-source analytics. All the data is saved on my server and not transferred anywhere. I need to write about it and about steps I took to respect my visitors privacy.

OrderDict & dict

Dmitry Plevkov — Wed, 07 Oct 2020 21:44:35 GMT

If dict remembers the order of elements in Python3.6+, why do you need collections.OrderedDict anymore?

That's why:

OrderedDict(a=1, b=2) == OrderedDict(b=2, a=1)
False

dict(a=1, b=2) == dict(b=2, a=1)
True