Storing the SQL queries in the database

I want to outline something I developed something like 5 years ago, and that I was kind of happy with at the time: a way of saving all SQL queries inside the database itself. The reason for writing this is that it would really fit in with all the RubyOnRails and other programming frameworks that are created these days. If someone feels like creating a component/plug-in for it, that would be so nice…

If you have no interest in programming and/or database management, go hug somebody and skip this article.

PRINCIPLE
In the most simple version, you have 1 extra table in your database that looks like this:

query_id   int % unique ID of the query
template   varchar(1000) % or even longer
var_list   varchar(255) % comma separated list of vars

An example:
[ 33 , "select * from users where user_id = %user_id% AND type = '%status%' " , "%user_id%,%status%" ]
Instead of constructing your SQL code in your (PHP) code by making one big string and then executing it, you call a function my_exec_query(query_id,array_params)
This function will

  • get the template text
  • walk through the comma-separated list var_list and for each variable inside, look the corresponding value up in the array array_params (or this can be a dictionary or a hashtable).
  • when all substitutions are finished, execute the SQL code and return the result as (connected/disconnected) recordset or an array.


The advantages of this approach are (without taking into account all kinds of optimisations that are possible):

  1. no SQL code in your programs.
  2. every version of the database contains the correct queries that work on it
  3. the SQL template can be reused in different location of the code
  4. the SQL template can be edited through a web interface
  5. all SQL activity goes through one component, making it the ideal place to add statistics: which are the slow procedures, which ones bring back the most data

Main disadvantage: if you execute it in the naive approach as above, you have to execute 2 SQL queries for each function call (1: fetch the query template and build the real SQL, 2: execute the real SQL). But that is easy to fix.

OPTIMISATION #1: caching of the templates
You keep the whole table of queries in memory. When a specific query template or parameter list is updated, the new version is loaded into memory. Even if you would have 500 different queries, this would take something like 600KB in memory. That is normally not a problem (if the application servers have e.g. 2GB memory).
If you have more than 1 application server, and a query template is edited from server A, the other servers (B, C) should be instructed to reload too. This smells like we need some versioning.

OPTIMISATION #2: version control

query_id   int % unique ID of the query
template   varchar(1000) % or even longer
var_list   varchar(255) % comma separated list of vars
version   int % version number
created   datetime % creation time of the query
modified   datetime % last modification date
remarks   varchar(100) % can contain author, ...

Every time we execute a query, we check if it was constructed with the current template version. If not, the template and parameter list are reloaded, and the query is reconstructed with the new template. If you’re gonna do this every time, you might put it in a stored procedure.
Another advantage: if we make an error while editing, chances are the procedure will not compile properly. That’s alreadya good way of avoiding sloppy updates.

OPTIMISATION #3: stored procedure
For each query, a stored procedure is created with variable @parameters.

CREATE PROC procedure_name
    [ { @parameter data_type } 
    ] [ ,...n ] 
AS sql_statement [ ...n ] 

This stored procedure will check versions, but more importantly it can be precompiled (so your SQL server already knows which indexes and joins to use and in which sequence), and will run faster. But what significant name can we give it? query_33 doesn’t mean a lot to a database admin or a programmer.

OPTIMISATION #4: taxonomy/significant names
We give each query a name. What I did was the following:

query_id   int % unique ID of the query
category  varchar(20)
name  varchar(20)
template   varchar(1000) % or even longer
var_list   varchar(255) % comma separated list of vars
(...)

The ‘category’ is something like ‘user_management’ or ‘statistics’ and the ‘name’ is the name of the individual query: e.g. ‘insert_user’ or ‘top-tags’. You can also use the ‘category’ to decide on who is responsible for editing. Or you could set-up a stronger authorisation scheme, with authors having editing rights on certain queries and not on others.
The stored procedures that are automatically created will now have understandable name ‘user

OPTIMISATION #5: how to give a variable number of parameters to a function
I started with a Dictionary (VBScript style) for the parameters: I created it, filled it with the variables I needed (maybe one, maybe 10) and used it as a parameter in the function call my_exec_query(query_id,dict_params). This can also be done with (a pointer to) an array, a RecordSet, … Anything that maps ‘keys’ onto ‘values’ would work.
Wordpress uses another trick: they use the URLencoding or query string syntax: wp_list_cats('exclude=10,11,12');. This is basically a well-known way to do serialisation into a string; you could put them in an XML document too, but that’s a lot of overhead.

OPTIMISATION #6: error checking
When the substitution of variables in the template is done, you can still check for obvious errors: a %PARAMETER% that has not been substituted (forgotten?), a string where there should be a number (no enclosing quotes). You could think about ways to distinguish between required parameters (e.g. SELECT * FROM users WHERE id = %ID%) and optional parameters (e.g. INSERT INTO log VALUES ('%PLACE%', '%ERROR%' )). This can be done by adding a ‘!’ after the name in the parameter list: ID! is obligatory, PLACE is not.
One can also wonder wether we need the list of parameters as a separate field, we could also search the template text for any occurrence of %[A-Z][A-Z0-9_]*%. I just used the list because it made the substitution faster.

And I’m still forgetting truckloads of enhancements.

Now if someone is attracted by the advantages of this method, and wants to create a PHP/Python/Ruby component that neatly puts all this functionality into one package, let me know!

5 thoughts on “Storing the SQL queries in the database

  1. LVB

    What’s the essential difference between your idea and stored procedures? I mean, which of your 5 advantages cannot be obtained by applying the standard stored procedures approach?

  2. Peter Post author

    Good question.
    If one were to start using stored procedures, and then add logical naming of the procedures, add web-based editing with versioning, add a mechanism for passing a variable number of parameters and encapsulate everything in a piece of code, one would probably arrive at something quite similar.
    My system would work with a sissy database like Access, but once you have a more mature RDBMS (SQL-Server, mySQL, Oracle, Postgres), using stored procedures to implement it is obviously a best practice. But it’s more than just that.

  3. Abhishek

    Hi,

    How can i put substitution varable into the Access Database sql query, It will ask user to give the prompt answer

  4. Louie

    I have the same idea some 5 years ago to store SQL within the database itself for my programs (but I didn’t really implemented it yet :P). This is because it usually takes too long to go through the deployment cycle if I need to add / modify my sql statements. However, I think it is the best to use it only in reporting related things (as those data input parts actually requires a lot more of validation, business logics, etc).

    My idea is quite similar to yours: a table of sqlID/sql queries/parameter list/version#.

    In order to make it work in reporting environment, I am assuming my queries operates in a set of predefined parameters, e.g. %CategoryID%, %ContactID%, %OrganizationID%, %StaffID% etc. And in the UI, we always know what these parameters refer to and how to fill in the dynamic queries in a well controlled manner — e.g. StaffID is always the login user or select from a drop down, or user can select from a list of categories from radio buttons. etc.

    As I use dot net, the parameters will be maintained in a name value pair array (maybe a dictionary type as well)

    1 challenge is how you configure the parameters to an FieldName IN (a,b,c) type. I think you need to define a different #ParameterList# parameter which evalues to “Field in (‘a’,’b’,’c’)” or “Field in (a,b,c)”

    Louie 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *