Can SQL triggers cause an application integration project to end in disaster?
How can two such contradictory statements be resolved? Let's start by examining the pattern itself.
We will pretend you are the CIO of a company that has a legacy ERP (Enterprise Resource Planning) system that is mission critical. It can be Dynamics, Oracle, SAP, you choose, this is your story. A requirement hits your desk that says "When order records are created in our ERP system we need to notify the new WMS (Warehouse Management System) in real time".
Your company sells a lot. Orders are placed day and night, non-stop. Any downtime costs the company thousands of pounds of revenue per minute.
A conference call has been arranged with the onboarding team of the WMS vendor and one of the topics for discussion is integration. It is good news to hear that your new system has a REST API that adheres to all modern standards.
The dev team get to work and they assess some options. Your ERP system has a SOAP API that is reliable but it is a little bit clunky and not blazing fast. One of your team suggests polling the ERP via the web service.
After some discussion about this the team decides that to achieve near real time notifications to the WMS the polling frequency would need to be every few seconds. Each poll invokes quite a heavy database query and there is some concern about the extra load on the ERP at peak times.
Another suggestion is to add a SQL trigger to the Orders table in the ERP database. The trigger fires as soon as an Order is inserted, which meets the near real time requirement, and the team remember the conference call where the REST API of the WMS was discussed.
The team write the SQL trigger and use it to send an HTTP request to the REST API of the WMS whenever a new order hits the ERP system. The team take it for a spin in the test environment and everybody is very impressed, the WMS is receiving notifications in under one second.
Everybody is very pleased with this highly performant integration.
The trigger is deployed into production.
5 minutes later the phones are ringing, sirens are sounding and some very senior, very angry colleagues are on their way to find out why all systems have just gone off line. Point of sale terminals in physical stores have frozen, the website is offline and the call centre cannot process any orders.
It turns out that nobody thought to apply a proper performance test to the trigger based integration. Upon seeing how quickly the notification was made the team thought it was highly performant, and it was, under very light load. However, the production environment has heavy load and the WMS cannot process the incoming orders as fast as they ERP can deliver them.
Consequently the WMS REST API starts making its consumers wait, one of which is the SQL trigger. The more orders that get placed in the ERP system the bigger the problem becomes and the longer the wait time.
While the SQL trigger is waiting for the HTTP request to complete, it is blocking other processes from writing to the Orders table, ultimately causing the ERP system itself to crash, taking point of sale, ecommerce and the call centre with it.
Does that mean SQL triggers are evil?
It means they can be badly implemented, just like any other piece of software, but they can also be implemented in a safe and performant fashion.
Your ERP must not become dependent on the performance of your WMS. In more general terms, one application should not become dependent on the performance of any other application. The key is to decouple your systems.
The team go back to the drawing board. They take a look at possible solutions, they read a few books, attend a couple of conferences and realise there is a better approach they can take.
They establish two guiding principles:
- SQL triggers must be as lightweight as possible
- Applications must be loosely coupled
Keeping triggers lightweight means they must never be dependent on external network connections and any connections must have very short timeout durations. Further, the trigger must have no dependency on downstream processing, its job is to notify then exit as quickly as possible.
Whether we are talking SQL Server, Oracle, PostgreSQL or something else, the implementation we are looking for is to have our trigger send a message to a queue. On SQL Server we have Service Broker, in Oracle we have Oracle Streams, in PostgreSQL we have LISTEN/NOTIFY or JMS queues, the end result is the same:
Table INSERT → Trigger → Enqueue Message → Exit
Having worked out how to enqueue a message from within a database the team turn their attention to selecting some message oriented middleware that will be responsible for delivering the message to their WMS.They look at several products including Apache Camel, Red Hat Fuse and Scribe before opting for Azure Service Bus.
They made this choice because:
- Their digital strategy is promoting a shift to the cloud
- They are light on resource for managing new infrastructure
- The development team are highly skilled in C# and .NET Core
- The auto scaling capability handles seasonal demand peaks well
Before long the team were pleased with their progress:
- A new order hits the Order table
- The SQL trigger sends a message to a local queue
- A process external to the database forwards the message to an Azure Service Bus queue
- A queue subscriber, written as an Azure function, orchestrates the WMS REST API call
This time around the team did some proper load testing in a production like environment and the ERP system did not miss a beat and there was no discernible impact on performance. However, the guys testing the WMS weren't so happy because their user interface was grinding to a halt and slowing them down.
The reason was that the messages from the ERP were being delivered immediately to the WMS and it was running out of server resource. Worse, when a message couldn't be delivered due to a timeout it would trigger a retry resulting in an ever ballooning number of requests hammering the WMS.
The team altered their setup in Azure Service Bus to ensure that the WMS REST API would receive a maximum of one request every three seconds - testing revealed this was the sweet spot that optimised the rate at which the WMS could process orders without chewing up excessive server resource.
This integration was deployed and it ran like clockwork, even through a busy Xmas period, which of course saw a spike in orders that the team factored in to prior performance tests.
From a technical standpoint we can see that there is no real reason why SQL triggers should not be used to implement such an integration. But technical issues are not the only objection to using SQL triggers for integration, maintainability is another.
SQL triggers can get lost. They are not very visible and they do not reside in your integration software, they reside in the database. Let's say you lose some team members, unfortunately their new Azure Service Bus skills are in demand and they took a job at a hot shot consultancy because they craved more Azure projects, whatever the reason you lost some knowledge and it is only a few months later when there is a requirement to update the ERP/WMS integration that the knowledge gap is realised.
You manage to hire a great new team member and you assign them this task: "Pass additional fields from the ERP to the WMS to support the new shipping feature in the WMS". On the face of it no problem, just pass some extra fields to the WMS REST API method already being used.
They open up the project in their IDE (Integrated Development Environment) and they can see that a message is being received from a queue local to a database and forwarded to an Azure Service Bus queue, but they cannot see how the queue is being sent in the first place, meaning they cannot test their solution.
Not only that, but another colleague, working on something completely different cannot fathom how to disable the sending of messages altogether so that they can perform some unrelated tests in isolation. The problem here being that the invocation of the trigger has a side effect beyond those intended by the user, but the causes of those effects are not immediately visible to them.
A savvy developer might think about the possibility of triggers and inspect the relevant tables, but often times that will not be the case. Therefore, the best approach to take is to ensure that decent, diagrammatic documentation exists so that the presence and purpose of the trigger is clear to all team members.
To wrap things up, the conclusion of this article is not that SQL triggers should be your first port of call, but rather to demonstrate that SQL triggers need not be feared if used appropriately. Each integration will have multiple possible approaches and the first job is to determine which approaches are viable, from there you distill the list further to identify which option is optimal given what you know and then subsequently to test your hypothesis before putting anything into production.
If testing reveals that your chosen approach is not viable then you go back a step and apply your new learnings to pick a better integration approach for the specific scenario you are facing.