Java and the Empowered Database

This article by Norbert Ehreke proposes an approach where the modeling is done in the relational tier and as much business logic as possible is handled within the database by employing a set of stored procedures as the middle tier.

This article is reposted from TheServerSide.com.

Introducing R/O Mapping -- A concept to have more business logic in the database and reduce the amount of Java code that deals with database issues

The connection between object-oriented systems and relational databases is commonly solved by employing a so-called O/R mapping (ORM) framework whose goal it is to map object-oriented models to entity relationship models which are used, often generated, in relational databases. The bridging between two entirely different modelling techniques results in a so-called impedance mismatch that needs to be overcome.

An unwritten consensus in the industry is that the best approach in solving these problems is a process where the business requirements are modelled in the object domain and where the resulting object model is mapped via an O/R mapping framework into the relational database system.

This article proposes a reversed approach where the modelling is done in the relational tier and as much business logic as possible is handled within the database by employing a set of stored procedures as the middle tier. A lightweight Java API, called Amber, is introduced that uses Java annotations instead of XML descriptors to help marshal result sets to Java objects and back into the database.

This text discusses the reversed nature of R/O mapping compared to O/R mapping, points out pros and cons, and shows how Amber, in concert with a set of stored procedures, can be used for simple yet powerful R/O mapping. While this text deals with these issues in a little more abstract way and more from the point of view of the database, an introductory article on Amber, its mapping strategy and its underlying concepts has been published under the title "Lightweight R/O Mapping."

Motivation

The importance of relational databases and their underlying philosophy has been underestimated in recent years, at least when systems analysis has been done from an object-oriented point of view. Relational databases are very often viewed as a rather dumb storage container for data, nothing more. The attempts on bridging the gap between object-oriented software systems and relational databases have been assumed to be best done in the object domain. This premise, however, is the first cause for the so-called impedance mismatch that then needs to be overcome when the object model has to be transferred to the database. In a way, this model transformation is like trying to design a model view controller (MVC) system with two models involved. Why shouldn't we transfer more power to the relational database and reduce the responsibilites of the object domain?

The fact that relational database systems have been designed from mathematical set theory introduces a number of subtle yet important consequences for systems design. It means that all kinds of sifting through data, filtering, sorting, joining, projection and also transaction-based handling of data is best done in the database -- nowhere else. Unfortunately, the very popular MySQL database has not supported stored procedures until Version 5 and higher, which explains why many database-oriented solutions really needed a middle tier in the form of an O/R mapping framework that helped alleviate the set-oriented problems when dealing with data.

Getting Started

In short, we need to bring more power to the relational database and what it does best. Direct data interaction is very often a process that starts with sorting, filtering, searching and combination of data. Only after that, something happens to the data sets that have been carved out from the usually large (literally) data-base. One of the implicit premises that Amber relies on is the separation of responsibilities between the client and the server. That means that we do not want to approach the data in a way where a more or less complex object model is generated from the existing relational model. What we want is an approach where the user interaction is supported for what a human being can scopewise actually deal with, both in terms of depth and width.

That means that our goal should not be to deliver 100,000 or even just 1,000 data sets to the user in the object domain. What is necessary is that the user can manipulate data of this scope, but not that it is actually loaded into a table or tree. It might be necessary to make it look as if this were the case, but we do not really aim for that.

Mapping

Most O/R frameworks rely on a mapping where classes are mapped to tables, objects to rows and class members to columns. This is accomplished with XML mapping descriptors.

Amber on the other hand does the mapping between Java objects and database rows via Annotations. Annotations are meta data that describe the context of Java methods, classes or fields. If you'd like to know more about annotations, please check the Resources section of this article. By the way, for the historically inclined, Amber is based on a framework that had originally been developed in C# that provided annotation-like support before J2SE5 was released.

Amber uses annotations to map the setter methods of Java beans to database columns, regardless of whether they come from one table or a joined result set, and it uses the setter methods for the parameters in update and insert calls that are assumed to be stored procedures in most cases. Since this article focuses more on broader issues, please refer to the article "Lightweight R/O Mapping" in the Resources section to check out how the mapping works in detail.

Committing Data

Let's say we have a workflow management system that allows the user to view pending tasks and manipulate the ones that are open to her. Our simple model in the database consists of the following tables: processes, nodes, tasks and steps. The tables, processes and nodes hold the meta data of the workflow itself while tasks and steps hold the concrete information for the tasks and what happens in the course of the process. Think of it as if these two table pairs relate to each other like class to object. The definitions of the tables and their relationships are shown in Fig.1.

Each process in the processes table possesses a number of nodes that may depend on each other. This dependency is simply expressed through the column predecessor_id that allows for a node to have a predecessor. Other useful columns have been omitted here to keep things simple. The table processes relates to the table task in a 1:N relationship similarly as the nodes table. And finally, the steps table holds the data for each concrete step that corresponds to a node in the workflow, including the login of the targeted user, a done-flag and a completed-at timestamp. Please note that this model is by far not a complete (or particularly useful) workflow model, it's just an example.

Let's say we'd like to create a new task from an existing process that has been described with a number of steps. We are going to create a new object of type Task, commit it to the database and populate it with its subtasks. On the database the creation of the task consists of generating a data set in the tasks table and a copy of all relevant node data sets that belong the process into the steps table.

The class Task looks like this. I will leave out the class Step since it looks very similar. The entire sample code can be downloaded from the resources section. The mapping of columns to the setter methods of this object happens with Annotations as mentioned before. Again, if you are interested in a more detailed explanation on how this works read "Lightweight R/O Mapping."

 

 public class Task { private Integer _taskId; private Integer _processId; private String _description; private ArrayList _stepList = null; public Task() { _stepList = new ArrayList(); } public Task( Integer processId ) { _stepList = new ArrayList(); _processId = processId; } @ParameterAssociation(name="@task_id",index=0,isAutoIdentity=true) public Integer getTaskId() { return _taskId; } @ColumnAssociation(name="task_id") public void setTaskId( Integer taskId ) { _taskId = taskId; } @ParameterAssociation(name="@process_id",index=1) public Integer getProcessId() { return _processId; } @ColumnAssociation(name="process_id") public void setProcessId( Integer processId ) { _processId = processId; } @ParameterAssociation(name="@description",index=2) public String getDescription() { return _description; } @ColumnAssociation(name="description") public void setDescription( String description ) { _description = description; } public void setStepList( Collection stepList ) { _stepList = new ArrayList( stepList ); } public Collection getStepList() { return _stepList; } }

The code to create a new task consists of the following steps:

  1. Open a connection to the database.
  2. Create a new Task object with a process id of 100.
  3. Commit the task object into the database by calling a stored procedure called sp_create_task.
  4. Inject the new task with the task id that the stored procedure returned.
  5. Read the steps for the task that the stored procedure generated.

We open a connection to the database with the following code. Please note that in all the code examples exception handling is left out in order to focus on the more interesting things.

 

 String url = "jdbc:microsoft:sqlserver://"; url += "myserver;Databasename=northwind;user=me;password=***"; ConnectorFactory.getInstance().add( "db", ConnectorFactory.MSSQL_DRIVER_CLASSNAME, url ); Connector connector = ConnectorFactory.getInstance().createConnector( "db" );

The classes Connector, ConnectorFactory, BeanReader and BeanWriter are part of Amber. A ConnectorFactory holds a number of Connector objects and a Connector is nothing more than a thin wrapper around a JDBC connection. The ConnectorFactory is implemented as a Singleton so that it is accessible from anywhere in the application. We are now creating the new task object. We assume that we know to which process the new task belongs (100) and we also provide the description.

 

 Task task = new Task( 100 ); // New task for process 100 task.setDescription( "My new task" ); BeanWriter writer = new BeanWriter( Task.class, connector ); ResultSet resultSet = writer.executeStringInsert( task, "sp_create_task" ); writer.injectIdentityIntValue( resultSet, task, "task_id" );

These last three lines write the object into the database, retrieve the task id that is generated by the stored procedure and inject it into the task object. The assumption is that the result set which is returned from the stored procedure contains a single row with a single column that holds the new task id. Finally, we use a BeanReader to load the steps associated with the task that have been created by the stored procedure and put this list into the newly created task.

 

 BeanReader reader = new BeanReader( Step.class, connector ); String stepQuery = "select * from steps where task_id=" + task.getTaskId(); Collection collection = reader.executeCreateBeanList( stepQuery ); task.setStepList( collection );

Last but not least, the stored procedure sp_create_task that actually does the heavy lifting looks like this. It is a T-SQL procedure for the MS SQL Server.

 

 create procedure dbo.sp_create_task ( @process_id int, @description varchar(255) ) as declare @task_id int declare @error int declare @rowcount int begin transaction insert into tasks ( process_id, description ) values( @process_id, @description ) select @task_id = @@identity, @error = @@error, @rowcount = @@rowcount if @error <> 0 or @rowcount = 0 goto error_end insert into steps ( task_id, node_id, done, completed_at ) select @task_id, node_id, 0, null from nodes where process_id = @process_id set @error = @@error if @error <> 0 goto error_end commit transaction select @task_id as task_id return 0 error_end: rollback transaction select null as task_id return -1

Let's summarize all this code a bit. First, what's important here is the amount of Java code we actually do not see. It's all in the stored procedure sp_create_task. We did this because the process of creating a new task is atomic to us. If it fails, we need to know why, but we shouldn't be worried about the transaction in the Java code.

Second, what might look awkward is the fact that we queried the database a second time to populate the new task with its subtasks. The real question is: Do we really need to do that? Actually, it depends on what the application will do with the new task object. Do we really want to automatically populate the new task with its steps? Maybe, but then again maybe not. It is good that after committing the new task we can populate it with its steps in a few lines of code, if we want to. That code is clear and easy to read. But, on the other hand, we might be done after committing the new task and do something else.

We are left with a simple task structure that allows us to work with it further. The amount of coding and configuration we needed to do was not that bad and from a Java point of view our workflow system has taken a big step forward. A more detailed analysis about the pros and cons can be found in the section Discussion below.

Reading Data

Let's say our tasks table is filled with tasks and we'd like to present the ones that are pending to the user. Of the assumably huge tasks table we're only going to load the relevant tasks for one user and for those tasks we're only going to load the steps when the user selects the task. To make things more interesting, we'd still like to present the user with the number of open steps. This means that we need to join the tasks table with the steps table, extract the relevant rows and determine the count of the open steps. We are going to define a new class TaskPlus for this job that encapsulates the Task class from above and holds the number of open steps as an int.

The heavy lifting occurs in the stored procedure sp_select_tasks that we are going to call and that is going to deliver the result set for our TaskPlus list. The motivation for such a query is to reduce the number of database queries and make use of the power of SQL.

The stored procedure is quite sophisticated. In order to solve the issue about the number of open steps, we are making use of a so-called subselect in the second left join clause.

 

 create procedure dbo.sp_select_tasks ( @login varchar(10) ) as select distinct t.*, sub.open_count from steps st left join tasks t on st.task_id = t.task_id left join ( select count(*) as open_count, task_id from steps where done = 0 and @login = steps.login group by task_id ) as sub on sub.task_id = t.task_id where st.login like @login

The following code uses this stored procedure and loads a list of TaskPlus. We assume that the connector has been retrieved from the ConnectorFactory as in the code above.

 

 BeanReader reader = new BeanReader( TaskPlus.class, connector ); String overviewQuery = "sp_select_tasks @login='me'"; Collection collection = reader.executeCreateBeanList( overviewQuery );

The definition of the TaskPlus class is found here.

 

 public class TaskPlus { private Task _task; private Integer _openCount; public TaskPlus() { _task = new Task(); } @ColumnAssociation(name="task_id") public void setTaskId( Integer taskId ) { _task.setTaskId( taskId ); } @ColumnAssociation(name="process_id") public void setProcessId( Integer processId ) { _task.setProcessId( processId ); } @ColumnAssociation(name="description") public void setDescription( String description ) { _task.setDescription( description ); } @ColumnAssociation(name="open_count") public void setOpenCount( Integer openCount ) { _openCount = openCount; } public Integer getTaskId() { return _task.getTaskId(); } public Integer getProcessId() { return _task.getProcessId(); } public String getDescription() { return _task.getDescription(); } public void setStepList( Collection stepList ) { _task.setStepList( stepList ); } public Collection getStepList() { return _task.getStepList(); } public Integer getOpenCount() { return _openCount; } public Task getTask() { return _task; } }

Let's summarize what we did here. Rather than trying to map the relational model of processes, nodes, tasks and steps to the object domain, we extracted the necessary information by creating a stored procedure that would deliver a simple result set with additional information that enabled us to make better choices for what to query next.

The filtering of relevant data was done in the database and additional information was provided without the need to create an entire object model. Server roundtrips for the determination of open steps within tasks were avoided. As a tradeoff, the Task objects were not automatically populated with steps. If required this would have to be done explicitly with a separate database query.

Discussion

The nature of R/O mapping lies in the empowering of the database in terms of business logic and to elevate the database above the function of a mere data container. The Java domain, or more generally the object domain, is freed from large amounts of responsibilities that complicate and enlarge the codebase. It has been pointed out that this approach was not possible for the longest time, because the most accessible database systems, e.g., MySQL, did not support business logic on their turf.

It is significant to separate dataset-oriented data manipulation and datacell-oriented data manipulation. As a rough guideline, all data access that can be reduced to filtering, searching and extraction should be done in the database, if possible. Direct data manipulation, complex and semantical validation of data, however, should be done in the object domain. By adhering to these guidelines the mapping of models is very often reduced in complexity, meaning the number of relationships often shrink to a mere set of objects of one single type. Amber supports and requires that sensible judgement is made when designing a system that features these traits.

Pros

  • Naturally optimized database structures through empowered RMDB admins
    A seasoned entity-relationship developer is able to design entity relationship models that are fast, robust and extensible. It makes sense to use this kind of expertise and then try to use this model, not mimic it, in the upper tiers.
  • No model generation
    Duplicating models is a mistake that should be avoided. To reduce the role of the client in terms of model handling responsibilities, development in the client domain becomes easier.
  • No XML descriptors
    XML descriptors are in fact glue-code(!) that stems from the lack of technology to better handle the mapping task, when in-code meta data had not been available. They need to be incorporated into the developing cycle and they make maintainance and debugging harder. Instead of mapping via external files it is easier to handle the mapping right in the target class where it is visible to the developer.
  • No transaction handling
    The natural habitat for transactions is the database, not client code, be it Java, C#, etc. To employ the powerful transaction capabilities of databases is the right choice.
  • Better usage of the database
    The power of SQL lies in its set-operation abilities. Let's use it.

Cons

  • Platform independence is weakened
    As business logic is transferred into the database the platform independance of the entire system is broken. A switch to another database would require a port of the stored procedures.
  • Performance-hungry reflection
    Using reflection to manage the mapping between relational data and Java objects can be costly in terms of performance. Large classes require more CPU power.
  • Software-based load balancing becomes difficult
    By moving business logic closer to the database tier, one has to pay attention to load balancing issues, which then need to be accomplished by physical load balancing strategies.
  • Manual labor
    Some tasks, like automatic generation of object hierarchies, that ORM layers provide need to be done manually with Amber.
  • Navigation is not readily available in the object domain
    Since Amber does not provide full-fledged object hierarchies, data browsing is not possible without conscious coding. In other words, it is not possible to load all customers, filter them according to types and display the ones that have a certain name without an appropriate stored procedure or SQL statement.

Critics have also pointed out that Amber can be used for quite unwieldly and performance costly code when manual model mapping is done and the resulting database queries are not automatically optimized. Yet, with every design it is necessary to balance ease of use with flexibility. Amber was not designed with the idea in mind to construct object models from relational models, but more to allow painless access to relational data structures and to provide the possibility to generate only simple object model structures, if desired. It has been observed in our applications that the R/O Mapping approach usually results in simple lists of objects being presented to the user for interaction. The urge to construct entire models falls away in most cases. When this is not the case, the relationships are very simple and the performance penalty for constructing simple relationships is not significant when compared to the time the database needs to retrieve the data.

Conclusion

This article has demonstrated an approach to efficient data exchange between object-oriented systems and relational databases that relies on heavy usage of SQL within stored procedures on the database and a lightweight Java API, called Amber, that simplifies relational data to object mapping. Considering research on the topic of O/R mapping and existing O/R mappers, the most important question is to ask what the best interaction between relational data stores really is and how to optimally distribute the workload between the data store and the presentation layer. The attempt was made to include as much set-oriented business logic into the database as possible and avoid model-to-model mapping. For this entire approach that has been depicted here, it is necessary to blur the distinction between database administrator and Java developer.

Resources

  • Lightweight R/O MappingLightweight R/O Mapping: The first publication of Amber and its underlying concept. An article that focuses more on the mapping problem.
  • Amber project: The project on Sourceforge.
  • On O/R Mapping: Tom Wirges chimes in on O/R mapping and suggests that we should integrate the relational model into the host language, which is an approach that is embodied in the LINQ Project at Microsoft.
  • The Agile DBA: Scott Ambler takes a look at the agile DBA.
  • On JDO: David Jordan, coauthor of Java Data Objects, clarifies some issues people have with JDO. It also allows for a quick glance at the usage of JDO.
  • O/R Mapping Problems: This articles discusses the benefits of O/R mapping.
  • Middlegen: Other people have proposed similar approaches. Middlegen is a database-driven code generation engine, that allows to use the database as the main model.

About the Author

Norbert Ehreke graduated from the Technical University of Berlin (TUB) with an M.S. in Systems Engineering. He began database-centric development with Perl and Java in 1999 and is a senior development lead at Impetus Unternehmensberatung GmbH in Frankfurt, Germany.

Dig Deeper on Oracle database design and architecture

Data Management
Business Analytics
SearchSAP
TheServerSide.com
  • The 3 daily Scrum questions

    The 2020 Scrum Guide removed all references to the three daily Scrum questions, but does that mean you shouldn't ask them anymore?

  • Why WebAssembly? Top 11 Wasm benefits

    Latency and lag time plague web applications that run JavaScript in the browser. Here are 11 reasons why WebAssembly has the ...

  • Why Java in 2023?

    Has there ever been a better time to be a Java programmer? From new Spring releases to active JUGs, the Java platform is ...

Data Center
Content Management
HRSoftware
Close