First8 staat voor vakmanschap. Al onze collega’s zijn een groot aanhanger van Open Source en in het bijzonder het Java-platform. Wij zijn gespecialiseerd in het pragmatisch ontwikkelen van bedrijfskritische Java toepassingen waarbij integratie van systemen, hoge eisen aan beveiliging en veel transacties een belangrijke rol spelen. Op deze pagina vind je onze blogs.

MySQL query mystery

SQL queries can become quite complicated. Getting them to produce the right results is not always trivial. Especially if verifying if the result is correct is difficult. Last week I encountered a case where at least it was obvious the result was wrong, it simply didn’t give any results. But finding the cause wasn’t that trivial even though the query itself wasn’t that complicated. And it has a lot to with my all time favourite database, MySQL (note the sarcasm here).

The query was something like this:

Basically, it tries to do a full text search on ‘published posts’ and filter out any posts that are ‘providers’ or ‘methods’. While not terribly efficient, at first glance there are no obvious problems. Another weird thing was that there were no obvious changes made in this part of the code and the search used to work.

My strategy for finding the problem was simple. As the query didn’t give any results, the problem must be that one of the constraints is too strict. So I ran several variations of the query, each time dropping one of the constraints. Also, I run it directly against the database, skipping any JPA issues or other problems.

First reproducing the problem using the command line and a parameter that should give results was easy:

So, indeed one of the constraints should be the problem. Removing the ‘status’ constraint didn’t produce any results, removing the ‘match’ constrains also still didn’t produce any results. It has to be the third constraint. But that didn’t make any sense, looking at the following queries:

We are excluding a maximum of 385+186=571 id’s. And we have 1999 posts, so assuming there are no duplicate id’s, we should get at least 1999-571=1428 results.

But:

Huh? Ah!

Using IN or NOT IN  in a query should really only be used with a small number of constants, using a subquery as the value really is asking for trouble. There is a limit on the number of entries an IN clause can handle. In case of MySQL it is limited by some parameter called max_allowed_packet. If you go above the limit, MySQL simply doesn’t give any results anymore. A warning or error would have been helpful. But even an explicit show warnings doesn’t give any output.

It also explained why, without any changes in code, all of a sudden it stopped working: the number of ‘providers’ and ‘methods’ crossed a magic boundary and it stopped working.

Rewriting it to a proper form is easy. We could simply use a NOT EXISTS here to minimize impact:

Even better would be to use joins, but that involves a bit more rewriting:

So, some lessons learned: never use IN with a subquery, rewrite it to use a proper JOIN, or if you really must, an EXISTS clause. It might work at the time of testing, but it’ll break when your data grows.  And, be aware of the ‘friendly’ behaviour of MySQL.

Read more: