MySQL date_add and date_sub functions running against millions of rows

One of my servers runs a query once a week to remove all rows from a Syslog table (>20,000,000 rows) in a MySQL database that are older than 60 days. This was running terribly slowly and interfering with other tasks on the server.

Although the original query used a DELETE statement I’ve used SELECT statements in the examples below.

SELECT COUNT(*)
FROM SystemEvents
WHERE ReceivedAt < DATE_SUB(NOW(), INTERVAL 60 DAY);

That selects about 900,000 rows and takes about 45 seconds.

SELECT COUNT(*)
FROM SystemEvents
WHERE ReceivedAt < DATE_ADD(CURRENT_DATE, INTERVAL -60 DAY);

Likewise takes about 48 seconds.

Is MySQL running a function every time it makes a comparison? I decided to try using a hard-coded date to find out:

SELECT COUNT(*)
FROM SystemEvents
WHERE ReceivedAt < '2015-11-12 12:00:00';

6 seconds! Much faster.

I created a user-defined variable:
SET @sixty_days_ago = DATE_SUB(NOW(), INTERVAL 60 DAY);

Then ran the query:
SELECT COUNT(*)
FROM SystemEvents
WHERE ReceivedAt < @sixty_days_ago;

12 seconds. No 6 seconds, but still a fraction of the original time!

One thought on “MySQL date_add and date_sub functions running against millions of rows”

  1. Part of the problem is that the SELECT and DELETE statements are not using a key field. If you precede the SELECT statement with EXPLAIN you’ll see that it has to process all rows:

    id: 1
    select_type: SIMPLE
    table: SystemEvents
    type: ALL
    possible_keys: NULL
    key: NULL
    key_len: NULL
    ref: NULL
    rows: 14869135
    Extra: Using where

    ID is the primary key (and the only key) in this table. Since the IDs are sequential, we can find the ID corresponding to the date we want and then delete all IDs less than that ID. EXPLAIN shows that the subquery still has to examine every row of the table, but actual DELETE statement would act on far fewer rows.

    DELETE FROM SystemEvents
    WHERE id < (
        SELECT MAX(id)
        FROM SystemEvents 
        WHERE ReceivedAt < @sixty_days_ago
    );

    That's the idea, although it turns out that will get you an error:
    ERROR 1093 (HY000): You can't specify target table 'SystemEvents' for update in FROM clause

    Instead you could set another variable and use that instead:

    SET @max_id_to_delete = (SELECT MAX(id) FROM SystemEvents WHERE ReceivedAt < @sixty_days_ago);
    DELETE FROM SystemEvents WHERE is <= @max_id_to_delete;

    An another way of doing this might be to use MySQL partitions (see Overview of Partitioning in MySQL).

Leave a Reply

Your email address will not be published. Required fields are marked *