When a database has to hold terabytes of time series data, table management requires careful consideration. Fast load times must be guaranteed even with massive data volumes. We faced exactly this challenge: How do we keep queries fast when tables are constantly growing? In this post, we share our lessons learned and how we handle such data volumes using PostgreSQL with partitioning.
- Author
- Johannes Hartmann
- Date
- December 15, 2025
- Reading time
- 9 Minutes
In an ongoing project for NeoVac, we receive measurement data from a large number of devices: electricity meters, water meters, heat pumps, etc. — some at fifteen-minute intervals.
In numbers, we are currently talking about:
- Approx. 300 000 devices
- Approx. 5 million messages per day
- Approx. 1.8 billion messages per year
- Approx. 4 TB (Terabytes) of data — and rising.
This data is evaluated across multiple projects for a broad user base: from commissioning technicians and building managers to tenants checking their consumption in an app.
The decisive factor here is the rapid availability of data. Nobody wants to sit in front of a loading screen for 10 seconds or more just to view a small chart.
With these data volumes, sub-second query times aren't exactly trivial anymore. While access to small tables with a few tens of thousands of rows is often fast even without an index, poor queries on tables several TB in size can take hours.
Our Technical Foundation: PostgreSQL with AWS Aurora
In this project, we rely on PostgreSQL in the AWS Cloud. We use AWS Aurora to form a database cluster, as this service offers simple scaling and replication, and thus high failure safety (high availability) for the database.
But hardware isn't everything. To keep the load on the database as low as possible, not only the queries but also the maintenance of large tables must be well thought out.
A new index can lock a table for hours. Operations like deleting or moving large amounts of data, e.g., for archiving purposes, can generate so much load that other operations on the database are severely negatively affected.
For database optimization, three techniques are fundamentally crucial for us:
- Indexing
- Aggregation
- Partitioning
Indexing
Actually a given for any table, good indexing is particularly important for such large tables. Here, the Explain Plan — the plan of how the database internally executes the query — is analyzed especially for queries on these tables.
It is important that the Explain Plan is determined on well-filled tables, as databases can choose different execution paths based on table statistics, such as the number of rows.
Ultimately, expensive operations must be avoided. Absolute no-gos are, for example, operations like Table Scans or Sequential Scans, where the entire table is searched.
With the help of query and index adjustments, queries are optimized as much as possible so that they can be executed quickly in production.
Aggregation
Especially with consumption data, it makes sense in our case to aggregate the data beforehand. A concrete example:
We display several charts where consumption is shown across multiple devices — e.g., all electricity meters in a building. The users see the monthly electricity consumption of a building for the last year.
Electricity meters often send fifteen-minute values. Even for a building with 10 apartments, at least 10 meters would have to be queried. Over a year, that’s around 350,000 consumption values that need to be summed up — just so the user sees a line with 12 points at the end.
That works, but it's slow.
The consumption data is therefore aggregated into daily and monthly values. The same query over the previously 350,000 consumption values now only covers 120 values and takes less than 10 milliseconds instead of potentially several seconds.
Partitioning
Many databases, like PostgreSQL in our case, support partitions. Here, the table is internally divided into several smaller tables. A partition key is defined, based on which the data is divided. The database automatically writes new data into the correct partition. Given our data and query schema, we partition by the timestamp of the measurement data.
Data in regular time intervals is also called "Time Series". There are special »Time Series Databases« for such data, like InfluxDB or the PostgreSQL extension TimescaleDB. We used the latter for a period of one year. Due to the additional administrative effort and because automatic partitioning was the only feature we used, we switched back to a pure PostgreSQL database and now use native partitioning there.
In this example, the table for fifteen-minute consumption values on our PostgreSQL server is divided into partitions of one week each.

The partitions are completely transparent to the user — i.e., the developers writing the SQL statements. The statements continue to use the main table. Internally, new data is automatically written to the correct partition or read from the correct partition.
Constraint Exclusion
A simple principle lies behind this term: If a query restricts a time range, the database recognizes this and ignores all partitions that do not fall into this range.
If values from Nov 27, 2025, are queried, only the matching partition needs to be queried. All data before and after is completely irrelevant. This strongly reduces the necessary index scans.
But watch out: ⚠️ If the partition key is not explicitly limited, constraint exclusion does not kick in.
A simple example, the latest 20 measurement values of a device:
SELECT *
FROM "UsagesQuarterHour"
WHERE "DeviceId" = @deviceId
ORDER BY "Timestamp" DESC
LIMIT 20
The query planner cannot exclude the unnecessary partitions and therefore queries all 467 partitions. Queries can thus become significantly slower than with unpartitioned tables. This becomes noticeably apparent with several hundred or thousand partitions.
Even if I, as a developer, know that only the current partition is needed — the database doesn't know.
A simple solution for this case: the query time range is arbitrarily limited to the last month. This way, constraint exclusion kicks in again.
Data Retention
It is often the case that data is deleted or archived/aggregated elsewhere after a certain period.
With non-partitioned tables, DELETE statements must always be performed for this. Depending on the data volume, these can be quite slow, entail large transaction logs, and also have a negative influence on index fragmentation.
With partitions, the whole process is simplified: Since each partition is its own table, it can simply be dropped or detached from the parent for further archiving. Regardless of the partition's size, removing a partition is a process of a few seconds.
Index Maintenance
It happens rarely, but it happens: A new index is needed for the large tables, or an existing one needs to be adjusted.
Indexing large amounts of data blocks many database resources and can significantly impair the overall system.
Extremely annoying in case of error: if a unique index is created, the creation can fail after several hours of calculation because there is one duplicate value from the year 2019. The whole procedure was then a waste of time.
With partitioned tables, index maintenance tasks can be carried out step-by-step and in a much more resource-friendly manner:
- Indices can be created separately for each partition.
- After each index, you can pause for a few seconds to let the system catch up on all queues and other tasks.
- If the creation of an index fails, only a small part needs to be investigated. All indices of the other partitions remain and do not need to be created again.
- Finally, the index is created on the main table so that it is also created for new partitions in the future. Since the index already exists on all partitions, this takes only a few seconds.
The ability to generate indices on such large tables in small steps is a huge advantage from an administration perspective.
Automatic Partitioning
By default, partitions must be created manually by the administrator. Of course, it makes little sense if they have to manually add new partitions every few days.
Databases specifically trimmed for Time Series do this by themselves, but this process can also be automated with PostgreSQL extensions:
- With pg_partman, the tables to be partitioned are defined. The extension takes care of creating the partitions and also offers functions to partition already existing tables.
- With pg_cron, cron jobs are created in PostgreSQL to call the maintenance functions of pg_partman, which, among other things, take care of creating new partitions.
When does partitioning make sense?
Partitioning brings effort for maintenance and know-how buildup, also regarding the query side (keyword: Constraint Exclusion).
Tables with several million rows are no problem for a database like PostgreSQL; performance problems can often be eliminated with query and index optimizations.
Indices are generally an enormously powerful tool in databases to solve performance problems. Our then-young consumption data table showed absolutely no weakness in query performance with 250 million rows.
So when does it make sense? »It depends™«.
Circulating values for when partitioning pays off start at 50GB table size or 100 million rows. Even if the number isn't constantly rising, partitions can be set up.
These numbers should, of course, be taken with a grain of salt, but if these values are far from being reached in the future, extensive partitioning usually makes no sense.
In our case, we thought about data storage and partitioning very early in the project, as it was clear from the start that several hundred thousand devices would be sending quarter-hour values.
Due to the early start, learnings were still easy to cope with given the relatively small data volumes (under 100GB), such as an administratively very complex Time Series variant or a partition interval chosen much too small.
Long story short: If it is clear from the start that constantly growing, high data volumes must be handled, it is worth designing the architecture for this early on. But existing data can also be partitioned relatively easily with the available tools.
Do you face similar challenges with growing data volumes in the IoT environment?
We would be happy to support you in making your architecture scalable and performant. Just get in touch with us for a non-binding exchange.
