It Matters Where Your Join Clauses Go

Posted by

Explicit Join vs. Implicit Join

When writing SQL there are two places you can define how tables are joined. The first is explicit, and is specified after the the keyword ON in the join clause like this:


SELECT * 
FROM table_a as a
    JOIN table_b as b ON a.id=b.id 

The same join can also be written this way with implicit join syntax:


SELECT * 
FROM table_a as a, table_b as b 
WHERE a.id=b.id 

Both of these work perfectly fine. Generally the implicit syntax is considered older while the explicit is considered newer. Personally, I vastly prefer the explicit syntax. The reason is simply code maintainablity. SQL often survives many application updates and is reused in many ways in many places. The where clause will often change over time, and be changed by different people at different times. More than once in my career, I've seen one of the changes accidentally remove the where clause, leading to significant performance problems and incorrect results.

What Happened

Last week, someone was troubleshooting a very real performance problem and came to me with information from a monitoring tool, Dynatrace. The information showed that within a narrow time window being examined, a query expected to run in 3 seconds or less instead ran in 50 seconds. Considering the issue being worked on had to do with a web request timing out, this could be relevant. The monitoring tool presented the query like this:

select distinct USERS.USERS_ID,USERREG.LOGONID 
from USERS,USERREG 
where USERS.REGISTERTYPE in (?, ?, ..., ?) 
and USERS.PROFILETYPE = ? 
and (LCASE(USERREG.LOGONID) like ? escape '\'

I ran an explain on this query and found it was indeed exceptionally expensive, and in fact 50 seconds seemed rather speedy given that it was clearly creating a Cartesian Product:

Upon further inspection, we found the following representation of execution and response times for this query:

The green arrow points to the problem time identified - when this query ran for 50 seconds. The purple line indicates median execution duration. All other execution times for this query were much more reasonable. This graph alone ruled out our issue, as we were trying to help a vendor identify why a patch had started causing a particular function to time out on the website, and this query was occurring both before and after the patch, and generally was returning in a reasonable amount of time. However, I was still worried as not only would an abundance of these at once cause a potential performance problem, but it also wouldn't return meaningful data since it would pair each logonid with each user_id, which is not meaningful.

After I reached out to our developers to discuss, someone realized that the analysis tool was TRUNCATING the query for some reason. After some research, it looks like the real query was:

SELECT DISTINCT USERS.USERS_ID, USERREG.LOGONID
FROM USERS, USERREG
WHERE USERS.REGISTERTYPE
IN (?, ?, ?) AND USERS.PROFILETYPE = ? AND (LCASE(USERREG.LOGONID) LIKE ? escape '\' ) AND (USERREG.USERS_ID = USERS.USERS_ID)
ORDER BY USERREG.LOGONID

Because the join condition was placed at the END of the WHERE clause, the part that got truncated was the where clause, leading the query to look much worse than it was. The actual explain plan looked like this:

While probably not the ideal, this is at least acceptable for a query that runs sporadically and less than once per minute.

Lessons Learned

First, there's a reason why explicit join syntax is preferred. I would not have missed the tool truncating the query if the join condition were explicit.
Second, always check the data from a reporting tool to verify it means what you think it means. I would have spent less time on this if I had understood we were talking about a single execution and if I can correctly realized that we were not seeing a real-world cart

Lead Database Administrator
Ember is always curious and thrives on change. Working in IT provides a lot of that change, but after 18 years developing a top-level expertise on Db2 for mid-range servers and more than 7 years blogging about it, Ember is hungry for new challenges and looks to expand her skill set to the Data Engineering role for Data Science. With in-depth SQL and RDBMS knowledge, Ember shares both posts about her core skill set and her journey into Data Science. Ember lives in Denver and work from home

2 comments

  1. Good points, Ember.

    Personally, I would be worried about any tool which truncates queries anywhere.

    I have always maintained that there are 2 types of predicates – structure and business. When joining employee table to employee history, explicit/implicit referential integrity should dictate that a.empid = b.empid is needed for equijoin. This is a structure predicate and should be in the ON clause. If looking for a business condition of salary > 100,000 this should be in the WHERE clause. Structure predicates should rarely need to be changed. Since earlier versions of SQL did not have the ON clause we are left with a potential mess.

    A related topic to this is Oracle’s Natural join (it will join on all common columns by name!!), which is an even bigger ticking time-bomb!

  2. Suresh Sane, indeed, I’m getting scared of tools that can and will truncate queries.
    But I also am a huge fan of the explicit JOIN conditions.
    You can see immediately on which columns you join two tables.
    And in the WHERE clause, you see all the extra conditions.
    I also agree that some conditions are for the where clause, and not to be used in the join conditions.
    However, I also often use the LEFT JOIN condition, where I also check (in the join condition) on e.g. END_DATE >= current_date. There, I keep that in the JOIN condition and not in the WHERE clause, because then there is the risk that you don’t have the whole row, and you might still want the data of the left table.

    Example:
    select t1.client_id, t1.name, t1.address, t2.order_nr, t2.order_data, …
    from clients t1
    left join orders t2 on t2.client_id = t1.client_id and t2.order_delivery_date > current_date
    where client_country = ‘BEL’;

    In this query, I also get the clients that have no selected orders.

    And if you – as a developer – test an application modification, and you suddenly get poor performance, you then check first your executed query. If I were a developer, and I would send my query to a DBA, and they see that the join conditions have ‘disappeared’, I would be very ashamed!!

    About the oracle natural join. Isn’t it also that the join is not only on the same column names, but also on the same formats? Some time ago I have experimented with it, but I don’t remember anymore.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.