Sunday, June 15, 2008

Why I Do Not Use ORM

An impedance mismatch occurs when two devices are connected so that neither is operating at peak efficiency. This lack of efficiency is not due to any intrinsic incompatibilities between the devices, it only exists once they are connected together the wrong way. Object Relational Mapping (ORM) does not cure a pre-existing impedance mismatch, it creates one, because it connects databases to applications in a way that hobbles both.

UPDATE: There is a newer Historical Review of ORM now available.

UPDATE: In response to comments below and on reddit.com, I have a new post that gives a detailed analysis of an algorithm implemented as a sproc, in app code with embedded SQL, and in ORM.

Welcome to the Database Programmer blog. This blog is for anybody who wants to see practical examples of how databases work and how to create lean and efficient database applications.

There are links to other essays at the bottom of this post.

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

Good Marketing, Terrible Analogy

Normally I like to reserve this space for a positive presentation of things that I have found that work, I don't like to waste time ranting against things I don't like. However, ORM is so pervasive in some circles that it is important to establish why you will not see it used here, so as to avoid a lot of unnecessary chatter about its absence.

The use of the term "impedance mismatch" is great marketing, worthy of a certain software company in the Northwest of the US, but the analogy is utterly wrong. The analogy is used incorrectly to imply an intrinsic incompatibility, but the real irony is that there is no such incompability, and if we want to use the analogy we are forced to say that ORM is the impedance mismatch, because it creates the inefficient connection.

It always comes back to the fact that modern databases were designed to provide highly reliable permanent storage, and they possess a slew of features that promote that end. Programming languages on the other hand are meant to process data in a stepwise fashion. When the two of them meet it is very important to establish a strategy that uses the strengths of both, instead of trying to morph one into the other, which never yields efficient results.

The Very Basics

The language SQL is the most widely supported, implemented, and used way to connect to databases. But since most of us have long lists of complaints about the language, we end up writing abstraction layers that make it easier for us to avoid coding SQL directly. For many of us, the following diagram is a fair (if not simplified) representation of our systems:

This diagram is accurate for ORM systems, and also for non-ORM systems. What they all have in common is that they seek to avoid manually coding SQL in favor of generating SQL. All systems seek to give the programmer a set of classes or functions that makes it easy and natural to work with data without coding SQL manually.

This brings us to a very simple conclusion: the largest part of working out an efficient database strategy is working out a good SQL generation strategy. ORM is one such strategy, but it is far from the simplest or the best. To find the simplest and the best, we have to start looking at examples.

First Example: Single Row Operations

Consider the case of a generic CRUD interface to a database having a few dozen tables. These screens will support a few single-row operations, such as fetching a row, saving a new row, saving updates, or deleting a row.

We will assume a web form that has inputs with names that begin with "inp_", like "inp_name", "inp_add1", "inp_city" and so forth. The user has hit [NEW] on their AJAX form, filled in the values, and hit [SAVE]. What is the simplest possible way to handle this on the server? If we strip away all pre-conceived ideas about the "proper" thing to do, what is left? There are only these steps:

  1. Assemble the relevant POST variables into some kind of data structure
  2. Perform sanity checks and type-validations
  3. Generate and execute an insert statement
  4. Report success or failure to the user

The simplest possible code to do this looks something like this (the example is in PHP):

# This is a great routine to have.  If you don't have
# one that does this, write it today!  It should return
# an associative array equivalent to:
#    $row = array( 
#       'name'=>'....'
#      ,'add1'=>'....'
#    )
# This routine does NOT sanitize or escape special chars
$row = getPostStartingWith("inp_");

# get the table name.  
$table_id = myGetPostVarFunction('table_id');

# Call the insert generation program.  It should have
# a simple loop that sanitizes, does basic type-checking,
# and generates the INSERT.  After it executes the insert
# it must caches database errors for reporting to the user.
#
if (!SQLX_Insert($table_id,$row)) {
    myFrameworkErrorReporting();
}

Without all of my comments the code is 5 lines! The Insert generation program is trivial to write if you are Using a Data Dictionary, and it is even more trivial if you are using Server-side security and Triggers.

This is the simplest possible way to achieve the insert, and updates and deletes are just as easy. Given how simple this is (and how well it performs), any more complicated method must justify itself considerably in order to be considered.

ORM cannot be justified in this case because it is slower (objects are slower than procedural code), more complicated (anything more than 5 lines loses), and therefore more error-prone, and worst of all, it cannot accomplish any more for our efforts than we have already.

Objection! What About Business Logic?

The example above does not appear to allow for implementing business logic, but in fact it does. The SQLX_Insert() routine can call out to functions (fast) or objects (much slower) that massage data before and after the insert operation. I will be demonstrating some of these techniques in future essays, but of course the best permforming and safest method is to use triggers.

Example 2: Processes, Or, There Will Be SQL

Many programmers use the term "process" to describe a series of data operations that are performed together, usually on many rows in multiple tables. While processes are not common on a typical website, they are plenty common in line-of-business applications such as accounting, ERP, medical programs, and many many others.

Consider a time entry system, where the employees in a programming shop record their time, and once per week the bookkeeper generates invoices out of the time slips. When this is performed in SQL, we might first insert an entry into a table of BATCHES, obtain the batch number, and then enter a few SQL statements like this:

-- Step 1, mark the timeslips we will be working with
UPDATE timeslips SET batch = $batch
 WHERE batch IS NULL;
 
-- Step 2, generate invoices from unprocessed timeslips
INSERT INTO Invoices (customer,batch,billing,date)
SELECT CUSTOMER,$batch,SUM(billing) as billing,NOW()
  FROM timeslips
 WHERE batch = $batch
 GROUP BY customer;
 
-- Step 2, mark the timeslips with their invoices
UPDATE timeslips 
   SET invoice = invoices.invoice
  FROM invoices 
 WHERE timeslips.customer = invoices.customer
   AND timeslips.batch    = $batch;

While this example vastly simplifies the process, it ought to get across the basic idea of how to code things in SQL that end up being simple and straightforward.

Counter Example: The Disaster Scenario

The biggest enemy of any software project is success. Code that works wonderfully on the developer's laptop is suddenly thrown into a situation with datasets that are hundreds of times larger than the test data. That is when performance really matters. Processes that took 3 minutes on the laptop suddenly take 10 hours, and the customer is screaming. How do these things happen?

Mostly they happen because programmers ignore the realities of how databases work and try to deal with them in terms they understand, such as objects or even simple loops. Most often what happens is that the programmer writes code that ends up doing something like this:

foreach( $list_outer as $item_outer) {
    foreach( $list_inner as $item_inner) {
        ...some database operation
    }
}

The above example will perform terribly because it is executing round trips to the database server instead of working with sets. While nobody (hopefully) would knowingly write such code, ORM encourages you do to this all over the place, by hiding logic in objects that themselves are instantiating other objects. Any code that encourages you to go row-by-row, fetching each row as you need it, and saving them one-by-one, is going to perform terribly in a process. If the act of saving a row causes the object to load more objects to obtain subsidiary logic, the situation rapidly detiorates into exactly the code snippet above - or worse!

On a personal note, I have to confess that I am continually amazed and flabbergasted when I see blog posts or hear conversations in user groups about popular CMS systems and web frameworks that will make dozens of database calls to refresh a single page. A seasoned database programmer simply cannot write such a system, because they have habits and practices that instinctively guard against such disasters. The only possible explanation for these systems is the overall newnewss of the web and the extreme ignorance of database basics on the part of the CMS and framework authors. One can only hope the situation improves.

Sidebar: Object IDs are Still Good

There are some people who, like myself, examine how ORM systems work and say, "no way, not in my code." Sometimes they also go to the point of refusing to use a unique numeric key on a table, which is called by some people an "Object ID" or OID for short.

But these columns are very useful for single-row operations, which tend to dominate in CRUD screens (but not in processes). It is a bad idea to use them as primary keys (see A Sane Approach To Choosing Primary Keys), but they work wonderfully in any and all single-row operations. They make it much easier to code updates and deletes.

Conclusions

The recurring theme of these essays is that you can write clean and efficient code if you know how databases work on their own terms. Huge amounts of application code can be swept away when you understand primary keys and foreign keys and begin to normalize your tables. The next step from there is knowing how to code queries, but sooner or later you have to grapple with the overall architecture. (Well supposedly you would do that first, but many of us seem to learn about architectural concerns only after we have coded long enough to recognize them).

A thorough knowledge of database behavior tends to lead a person away from ORM. First off, the two basic premises of ORM are factually incorrect: One, that there is some native incompatibility between databases and code, and two, that all the world must be handled in objects. These two misconceptions themselves might be excusable if they turned out to be harmless, but they are far from harmless. They promote a willful ignorance of actual wise database use, and in so doing are bound to generate methods that are inefficient at best and horrible at worst.

Overall, there are always simpler and better performing ways to do anything that ORM sets out to achieve.

Next Essay: Performance on Huge Inserts

Addendum June 19, 2008

After reading the comments on the blog over the last few days I have decided to put in this addendum rather than attempt to answer each comment independently. I have attempted to answer the objections in descending order of relevance.

The Example is Trivial or "Cheats"

This is a very compelling challenge to the article offered by bewhite and Michael Schuerig and it deserves a meaningful response. What I want to do is flesh out my approach and why I find it better than using ORM. While I do not expect this to lead to agreement, I hope that it answers their challenges.

  • My sphere of activity is business applications, where two dozen tables is trivial and the norm is for dozens or hundreds of tables.
  • When table count beyond the trivial, many concerns come into play that do not appear at lower table counts.
  • I have found that a single unified description of the database works best for these situations, provided it can specify at very least schema, automations, constraints, and security. This is what I refer to as the data dictionary.
  • The first use of the data dictionary is to run a "builder" program that builds the database. This builder updates schemas, creates keys and indexes, and generates trigger code. The same builder is used for clean installs and upgrades.
  • The generated trigger code answers directly the challenges as to how non-trivial inserts are handled. Downstream effects are handled by the triggers, which were themselves generated out of the dictionary, and which implement security, automations, and constraints. No manual coding of SQL routines thank you very much.
  • All framework programs such as SQLX_Insert() read the dictionary and craft the trivial insert. The code does what you would expect, which is check for type validity, truncate overlong values (or throw errors). But it does need to know anything more than is required to generate an INSERT, all downstream activity occurs on the server.
  • The dictionary is further used to generate CRUD screens, using the definitions to do such things as gray out read-only fields, generate lookup widgets for foreign keys, and so forth. This generated code does not enforce these rules, the db server does that, it simply provides a convenient interface to the data.
  • A big consequence here is that there is no need for one-class-per-table, as most tables can be accessed by these 'free' CRUD screens.
  • That leaves special-purpose programs where 'free' CRUD screens don't reflect the work flow of the users. In a business app these usually come down to order entry, special inquiry screens and the lot. These can be programmed as purely UI elements that call the same simple SQLX_Insert() routines that the framework does, because the logic is handled on the server.
  • This approach is not so much about code reuse as code elimination. In particular, the philosophical goal is to put developer assets into data instead of code.
  • When this approach is taken to its full realization, you simply end up not needing ORM, it is an unnecessary layer of abstraction that contributes nothing to quality at any stage.

These ideas are implemented in my Andromeda framework. It is not the purpose of this blog to promote that framework, but it has been successfully used to produce the types of applications I describe on this blog. I make mention of it here for completeness.

So to conclude, both of these gentlemen are correct that the example says nothing about how the crucial SQLX_Insert() routine is coded, and I hope at least that this addendum fleshes this out and makes clear where it is different from ORM.

The Model Should Be Based On Classes

bewhite asks "Do you propose us to organize our applications in terms of tables and records instead of objects and classes?"

Yes. Well, maybe not you, but that's how I do it. I do not expect to reach agreement on this point, but here at least is why I do it this way:

  • My sphere of activity is business applications, things like accounting, ERP, medical management, job control, inventory, magazine distribution and so forth.
  • I have been doing business application programming for 15 years, but every program I have ever written (with a single recent exception) has replaced an existing application.
  • On every job I have been paid to migrate data, but the old program goes in the trash. Every program I have written will someday die, and every program written by every reader of this blog will someday die, but the data will be migrated again and again. (...and you may even be paid to re-deploy your own app on a new platform).
  • The data is so much more important than the code that it only makes sense to me to cast requirements in terms of data.
  • Once the data model is established, it is the job of the application and interface to give users convenient, accurate and safe access to their data.
  • While none of this precludes ORM per se, the dictionary-based approach described above allows me to write both procedural and OOP code and stay focused on what the customer is paying for: convenient, accurate and safe access.
  • The danger in casting needs in any other terms is that it places an architectural element above the highest customer need, which is suspect at best and just plain bad customer service at worst. We all love to write abstractions, but I much prefer the one that gets the job done correctly in the least time, rather than the one that, to me, appears to most in fashion.

Old Fashioned Technnologies

More than one comment said simply that triggers and other server-side technologies "went out". Since I was there and watched it happen I would contend that when the web exploded a new generation came along with different needs. In particular the need for content and document management caused people to question all of the conventional uses of the SQL databases, and like all programmers they are convinced their world is the only world and all of the world, ergo, triggers are history because I don't use them. Nevertheless, those of us who continue to write business applications continue to use the technologies that worked well then and only work better now.

Ken Does Not Like OOP

I love OOP, especially for user interfaces. I just don't think it should own the domain model, and I don't think that "trapping" business logic inside of classes gives nearly the same independence as a data dictionary does. I've tried it both ways and I'll stick with the dictionary.

Any Use of OO Code Implies ORM

A few comments said outright that if you are using OOP code then you are by definition mapping. Technically this is untrue if you understand the use of the term "map" as opposed to "interface". Mapping is the process of creating a one-to-one correspondence between items in one group (the code) to items in the other (the database). A non-ORM interface is one in which any code, procedural or OOP, passes SQL and handles data without requiring a one-to-one mapping of tables or rows to classes or functions. My apps are not ORM because I have no such requirement that there be a class for every table, and no such requirement that there be any specific code to handle a table.

Don't Talk about Procedural Being Faster

At least three comments blasted this contention. To put things in context, performance in a database application goes in two stages. First and absolutely most critical is to be extremely smart about reducing database reads, since they are 1000's of times slower than in-memory operations. However, once that is done, there is no reason to ignore speed improvements that can be gained by optimizing the code itself. The commenters are correct that this gain is of a very low order, but I would stand by the statement after making this contextual addendum.

Thank You All

This was the most widely read piece in this series, definitely the most controversial. There will not likely be any other articles this controversial, as the main purpose of this essay was to provide regular readers with some background as to why they will not see ORM-based examples in future essays. Thanks for the comments!

Related Essays

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

Other philosophy essays are:

33 comments:

Max said...

Interesting to hear from someone challenging the conventional wisdom on ORM. However, I think you're being a little unfair on it.

Firstly, the performance cost of object dispatch compared to direct procedure invocation is absolutely tiny, especially compared to the delay you incur by making a database call! And if you really cared about tiny improvements in generated code like this, perhaps PHP is the wrong choice of language.

Secondly, you are absolutely right to call out excessive round trips as the major factor harming ORM performance. However, the major ORM frameworks that I know of have techniques for mitigating this e.g. multi criteria/detached criteria for Hibernate.

Essentially, just as you can write bad raw DB access code if you e.g. make a new query in every iteration of a loop it is possible to write bad ORM code. In both cases it is just a question of learning your tools.

Jayson said...

ORM is pushed with the assumption that good business logic programmers are bad SQL programmers. Unfortunately this line of thought also has ensured that many of the programmers neglect SQL programming completely.

Another interesting thing is that most of the issues in Web applications are due to incorrect use of ORM parameters (for example completely loading the table graph).

bob84123 said...

You make some good points about ORM; they should be used with some understanding, and I agree that in some circumstances (mainly very funky queries) they're not the best way to go. However, often the convenience benefits they provide outweigh their performance penalty (any abstraction layer on *anything* comes with a performance penalty).

With a good ORM you can execute raw SQL if you want to (so you can use ORM in general, and write SQL for that single query that it fumbles through).

Also, using an ORM doesn't mean your database knowledge is completely wasted. For example, good ORMs can run your nested loop example with a single query (e.g. ActiveRecord's ':include' argument and the Django ORM's 'select_related' function).

Lastly, if you're writing an object-oriented program that used a relational database then you're doing Object-Relational Mapping, by definition. The only question whether you're writing the "ORM" yourself or using a premade one, and the bigger your application gets the more closely it will resemble one of the premade ones. I'm not good enough to write ActiveRecord or Django's ORM as a subproject, so I use what's been written.

Paul Keeble said...

I agree that ORM is not the answer, that it is the problem sitting between the database and the application code. Alas you are also wrong in suggesting that the database should instead be accessed directly. As you have shown with your code examples there is significant overhead beyond the object model and that SQL requires the calling code to meet some nonsense design and strange 80's like conventions to use a database. Indeed their lack of scalability is one of the biggest problems with large websites today. Every application has to worry about the performance tweaking of the database, for almost all of its business logic, knowing full well that scalability is rubbish.

The right answer is a proper OODB where the semantics of data storage are updated for the object oriented world and designed to scale appropriately. It doesn't exist yet but there are a lot of projects trying to do better.

rholmes said...

There are so many misapprehensions in this article that it's difficult to know where to begin.

First, the term "impedance mismatch" was coined by programmers rather than the marketing department of any company. It refers to the *conceptual* mismatch between how entities are related to one another (primarily via associations and specialization/inheritance) in object-oriented systems vs relational systems. And it is a completely valid description of a very real phenomenon -- one which becomes especially obvious in a domain model with deep inheritance hierarchies.

Second, an ORM tool can perform an insert in only one line of code so, by your (false, arbitrary) metric involving lines of code required to perform a task, your "simple" approach should be discarded out of hand as it provides no additional value compared to an ORM.

Third, your assertion that business logic should be placed in triggers has the obvious and well-known downsides that 1) your business logic becomes tied to a specific database, often using a proprietary stored procedure language and 2) this approach virtually guarantees an anemic domain model as well as business logic that is split across your application code and your database, and therefore far more difficult to debug and maintain. Hopefully it goes without saying that an object-oriented language such as Java or C# is far better suited to expressing complex business logic and validation than PL/SQL or a similar stored procedure language.

Fourth, your example using iteration misses a number of key points. 1) In an ORM, the iterative approach may very well perform better than equivalent "raw" database operations since the ORM will take advantage of an in-memory object cache which, if properly tuned, will eliminate most individual object fetches. Most modern ORM's (all of them that I know of) also have tunable fetching strategies specifically designed to avoid the "N+1" problem to which you allude. These can be used with or without a cache and will cause the individual object retrievals to be batched for efficiency. 2) Most ORM's have specific support for batch operations (i.e. inserts, updates and deletes) so that your code can be written in an iterative form (often a good choice in an OO application) but the database operations will be submitted as a batch, whose size can be tuned for optimal efficiency. JDBC itself also has batching support which any ORM will take advantage of. 3) At least in the case of Hibernate, a good deal of the documentation describes how to use options 1&2 precisely to avoid inefficient, iterative database operations. So it is ignorant at best and misleading or disingenuous at worst to say that ORMs "encourage" this type of inefficient database usage.

A primary theme of modern, large-scale system architecture is to "background" the database so that it is used only for simple persistent storage and eliminated as a performance bottleneck. The advice you are giving here is simply a throwback to the widely discredited and mostly abandoned practices of rowset-oriented client/server architecture. There are new, exciting and far more powerful approaches to system design available these days. I suggest you look into them.

Anonymous said...

I come from an organisation where the DBAs have very similiar thoughts about ORM as you do so I felt I should comment on your post.

If you have worked on an application which is of any decent size you would know that the sort of example in your post serves no purpose and frankly shows your lack of understanding on the problem that ORM attempts to solve.

You fail to address issues such as transaction handling, how changes in the db schema propagate across your application code, managing associations, writing of sql to achieve CRUD tasks, sql injection.. I could go on but I will let you educate yourself on ORM.

Your type of attitude is quite evident in other disciplines as well where it seems that everything should be handcoded. In the Java world (I would say programming but I don't think it is quite true yet), one of the most prevalent ideas is the reuse of code - if somebody else has written it previously, make reuse of it (commonly known as not reinventing the wheel). This is usually achieved through frameworks.

How this relates to ORM is that a-lot of the tasks that do usually require some care is handled by code that has been tested by thousands of people around the world and is known to work. The benefits are huge - less bugs and extremely fast development. While performance might not be as fast as if you had hand written every query - this is usually heavily out-weighed by the boost you get in productivity.

Most (all I know anyway) ORM frameworks will allow you to fall-back down to writing sql if you need - so the places in your application where speed is key - you can always hand-code this. Even at this level you are getting benefits from the ORM framework.

You mentioned native incompatibility between databases and code being incorrect. I fail to see how - that is why we there are object dbs. Why then do with traditional relational databases we have to marshel back and forth between sql/objects. Why are there such patterns as Data Access Object?

bewhite said...

World is not black or white. It is grey. You are telling us about demerits of ORM by decreasing benefits of object-oriented programming. Sure OOP is not a silver bullet but can you solve problems that are solved by OOP in some other way? Do you propose us to organise our applications in the terms of tables and records instead of objects and classes?

Propose something to us but not just troll current technics.

By the way, your 5 lines of code is cheating example because you have to add lines of code inside self-made functions used in your code.

Reinier Zwitserloot said...

This post would be so much more powerful if you fixed the beginning bits.

I was already convinced before I read this article, so I kept reading. However, if I was an ORM fan, I would disregard your ramblings as idiotic the moment you start suggesting that 'objects' are slower than 'functions'. This is true, in the same sense that emptying a glass of water in the ocean raises the sea level. When a TCP/IP connection to the database server is involved, and, more often than not, a disk access, the 2 nanoseconds difference in using objects vs. functions is just not going to make a difference. In fact, 1 month of moore's law and it makes no difference.

Remove that entire argument, it's bullshit. Focus on the notion that anything but schoolbook apps don't just do CRUD operations but also have processes, and that basic ORM just isn't a good match to model processes.

I'd also like to add a nuance: Let's say there's some sort of Domain Specific Language where you can write down a process in mostly SQL, and during this process, you emit certain 'objects', which are really just database rows which certainly don't have to map to a table (can contain the result of a join, aggregate functions like SUM(), and just functions in the row data, such as '(publicProfile AND nickName is not null) AS isPublic' - then, in this same DSL, you write an on-the-fly class spec so that the actual code that needs to deal with this data can access them that way. Some sort of framework would dress this up, add cursor support, and all that other stuff.

Very nice for static languages, and still nice for dynamic languages because you can add business logic here as well, if that does it for you.

So far I don't really know of any libraries that are trying to go this way. SQLAlchemy for Python seems to be on the right track.

Onur Gümüş said...

"ORM cannot be justified in this case because it is slower (objects are slower than procedural code), more complicated (anything more than 5 lines loses), and therefore more error-prone, and worst of all, it cannot accomplish any more for our efforts than we have already."

This sentence doesn't make sense. What does "Objects are slower than procedural code" mean ? You mean object oriented programming ? If you are bashing OOP , then it means you are bashing entire world. OOP is not more error prone than procedural code. This is why we moved to OOP.

Three basic princibles of OOP:
Encapsulation
Intheritance
Polymorphism.
These allow use better designed code

For performence well I have one word for that: "Developers are expensive, servers are not!"


"I will be demonstrating some of these techniques in future essays, but of course the best permforming and safest method is to use triggers. "

Using triggers is the worst practice ever. Triggers are written in SQL and bundled to Database. Triggers are unnecessarily powerful. Disadvantages are they are written SQL which is hard to debug or read. They are spread among your db. Which is hard to follow. I worked in somecompeny which uses heavily triggers and I really figured out what is a maintenance nightmare there. Your code is running and some joe comes up and adds a trigger to db and everything messes up.

foreach( $list_outer as $item_outer) { foreach( $list_inner as $item_inner) { ...some database operation } } The above example will perform terribly because it is executing round trips to the database server instead of working with sets.

This is completely incorrect most orms like hibenate and nhibernate provides reliable solution to this "n+1 select" problem

steph said...

I don't think it's necessarily the ORM solution that's causing the issue, it's rather people's fundamental lack of understanding databases.

For example (this isn't the best example, but I'm trying to show a difference in level of sophistication) there are many people who work with WordPress themes (or modify their own) that don't realize that most function call (assuming there's no caching) to get data result in a database call. So most WordPress pages are generated from many many separate database calls. And it's not just WordPress, the PHP world seems much more prone to this behavior than say the Java world for example.

The issue is that these people have no real understanding of database issues. It's not their fault, they don't have the training or expertise. The languages and systems they use remove the need for this knowledge. And in most cases, it doesn't matter because the traffic isn't significant. However in enterprise applications, this can make or break your system!!!

Anonymous said...

I'll agree that ORM isn't the final solution to this problem, but I think direct SQL went out with the '90s. These days, performance is often measured in developer time. Sure, if I'm working on something that's doing thousands of transactions per second, I may find value by doing SQL/sproc calls in my code, but the vast bulk of all programs will never see more than a few transactions a second.

When I make a database change, ORM helps me by breaking my compile. If I remove some column, sprocs will gladly wait until runtime to fail, but my ORM solution lets me know immediately that there's a problem. If things really get inefficient, it's a lot cheaper to buy an extra server than it is to fund a developer to comb through data access classes and sprocs every time a schema change hits.

ilan b said...

@Anonymous

I very much enjoyed reading your response and I agree with everything except the part about the possibility of switching out to another db vendor if you are using an orm solution. For so long now, I keep hearing about the importance of not being db vendor specific but I have yet to see a company switch out a db for a particular product line, and if it indeed does occur, it is a very rare event. (Whoops.. I stand corrected, there was one I remember about 5 years back :) )

I also have yet to work on project where extensive tweaking didn't have to be done at the SQL level for a mostly ORM solution including but not limited to stored procedures, triggers, events, constraints etc. So I believe that most ORM projects are tied to a specific db vendor anyways and switching one out for another would not be a trivial operation at all.

I use ORM extensively but I never report to the suits that an advantage of such a solution is that the db can be switched out for another on a moments notice. This holy grail will only occur when all db vendors are Ansi99 compliant and that will probably happen when h-ll freezes over.. :)

Really enjoyable read from the OP and the responses, thank you kindly for posting on this subject

ilan berci

Rob Lambert said...

I am crossing my fingers for a mean-spirited-albeit-probably-correct response to this post from the Hibernate folks ... those are always good for a laugh and always lightens my day :)

Anonymous said...

ORM is just another tool in the box. I used Hibernate on a JEE project - the learning curve is high. However, once you get the hang of it, Hibernate can save quite a bit of time when the codebase gets large, and a model object/database table needs refactored. With ORM, nothing needs to be modified. With standard SQL apis, good luck searching through thousands of lines of code in order to modify each statement where table structures are hard coded into methods...

Michael Schuerig said...

Your example is trivial. In order to demonstrate something that even approaches being relevant, you'd need to show a page that displays and updates data from multiple tables where permissions depend on still further tables.

Doing all that in a single database call would improve your bragging rights considerably.

Assuming you can come up with the code, I'd like to read your explanation why that code is easily maintainable and embodies good practices such as separation of concerns.

Eduardo Miranda said...

I always appreciate reading non conventional arguments against common sense. It usually brings new interesting point of views.
But I believe there is a major flaw in your arguments: You are criticizing ORM, which is a technique/pattern focus on solving an OOP issue, which the impedance between objects and relational databases. But your samples seem to be procedural programming. You actually state that “objects are slower than procedural code”.
But if you don’t use OOP you don’t have the impedance issue, then you shouldn’t care about this subject at all.
On the other hand, if you want to argument against OOP, then it’s a whole new ball game, and you need to talk about code reuse, extensibility, maintenance, etc. I’m not saying it’s impossible, but you should dig deeper.

Chris Nash said...

Many thanks (as ever) Ken for an interesting read, I'm following the series and enjoying every minute of it.

I think one thing that's overlooked is the title of the post - "Why (Ken) Does Not use ORM" - the counterpoint isn't whether ORM is good, or bad. Rather, the opposite opinion is more why a considerable number of software engineers do. Sadly, a large number of developers will make comments like; "I'm a programmer, let a DBA worry about that kind of stuff", or; "ORM means I can just write Java and forget about the database". Not all, but distressingly many.

Using ORM as an effort to avoid SQL, or "conventional database wisdom", unfortunately seems to be what attracts many developers to ORM frameworks in first place. Competent OO developers can produce immaculate object models that translate into appalling data structures once the ORM gets a hold of them, and developers go out of their way to be naive about this.

Ken's articles have shown (and proved to me very quickly in practice) that starting with the database schema can lead to not only better code, but an overall better design - and doing so does in no way preclude using ORM. The only losing proposition is to treat ORM as some sort of magic bullet.

KenDowns said...


>Ilan B
: I also have seen only one example of an existing system being ported to a different back-end, and yours truly had to work out how to do it. Not to beat a dead horse, but a data dictionary and a powerful builder is IMHO the best approach to true server independence.

Lets not even talk about keeping it independent of the relational/hierarchical models....

KenDowns said...


>Chris
: Thanks for the kind words, especially the not-so-subtle point that it is why Ken does not use ORM.

Your point about not treating ORM as a magic bullet is very well taken.

Dave B said...

Ken,

I was until very recently a member of the "You can pry the stored procedures from my cold, dead hands club."

I agree that a tool cannot make up for bad architecture or programmers. I am 100% with you that a good application design starts with a solid database design. I think you are missing the fact that one assumption that the ORM tool developers and proponents have is that a database professional such as yourself and not some code monkey has created an efficient data model for the ORM tool to point at. Without a proper data model the ORM tool is useless. The ORM tool is then meant to be used for CRUD and related table load operations instead of coding it by hand or using a code generation tool. This saves a ton of lines of code to debug and test which speeds up development time. This will allow us to concentrate on what the business wants and is paying us to do, deliver quality applications that serve there business needs as quickly as we can.

ORM tools may not work in every situation, but IMHO they should not be dismissed completely either.

sunru said...

Very mush appreciate your post.
I also come directly from the SQL and the dba world, having spent many years with powerful databases such as postgresql, understanding schema/table design, triggers and procedural languages.
However recently begun work on a web framework, and become utterly stumped with its use of an ORM (sqlalchemy in this case).
Seems like a nice piece of software. But I couldn't for the life of me figure out why I need it?
I designed the database, the triggers the functions, everything.
some population of tables is done directly through functions (SECURITY DEFINER direct access to tables its not GRANTED).
As much practical functionality is included in the database, with the possibility of a GUI frontend as well.
Building specific libraries, (and as you have shown, not that complicated at all) interfacing directly with the DB API (that many ORMs utilize).
I understand ORMs. But perhaps they are only intended for people that don't understand good database design and SQL?

Jeremy D. Young said...

I'm a little behind on reaching this post, but I think one of the key areas not discussed in this post is Open Source development. Open Source projects are the ones that are driving ORM. Most Open Source projects need to be written in such a way that the person that downloads it and uses it in their enterprise can plug it into ANY database and have it function. It is an absolute necessity to unbind the DB from your application in this scenario. Someone may want to have a go at it with MySQL, other companies may already have an Oracle standard and nothing else can be installed. Yet other companies may require DB2, or worse, DB2/400. When writing software to be shared and distributed, you can't start at the tables.

Open Source requires database abstraction.

The truly successful enterprise is more and more learning to leverage Open Source and become more agile by being able to change directions quickly. How many companies are there out there that can choose the right Database layer now for use through a growing company for the next 10 years? If you accomplish that, it usually comes with an extremely high price tag involving vendor lock in.

KenDowns said...

@Jeremy: database agnosticism does not require ORM.

Kevin Clark said...

Guys, if i may suggest, you head over to this article if you ever want to know why you should use ORM in the first place.

Cheers

Krishna said...

Ken,

While the post is titled appropriately, the content is full of very wrong generalisations about ORMs. I seriously doubt if you have actually used any modern ORM. (eg. Hibernate, NHibernate).

That said, some of the points about requiring proper upfront database design are correct.

Business logic though should never be in triggers. Let's use triggers the day we have automatic tools that let you manage business logic the way a development environment lets you manage OO code. Meaning build-time error checking and so on.

If one tries to ignore that the article was not so heavily biased and avoided making some factually incorrect assumptions, then it is a good read.

KenDowns said...

Krisha: I use triggers because I wrote the tool.

Paul said...

Interesting article. I've been looking at Doctrine and Propel, so far I'm trying really hard to justify its usage.

When thinking about website/web applications I always tend to think in terms of the database design before the code (PHP/Perl). I am completely comfortable with the fact I have to write sql,procedures,view etc to support any processes.

At this point I am unconvinced that ORM can support my processes. Its certainly not clear and at best a situational decision.

Marcello said...

I´m not a Transact/PL/any other database sql dialect purist,I also don´t like the way ORM´s do things now.
Unfortunatelly everybody that talks about the subject tends to allways give the worst examples of the unliked tecnology,both sides the ORM fans and the database fans also.
I´m now a Delphi programmer mastering C#,I like the concep t of applications server,basically the MVC model,all my business rules are in the application server,although the business rules are all in the “middle tier” I also do client validation(the rules are sent to the cliente with the dataset).
I don´t have so much plumbing and all that bla,bla,bla that I see Orm purists talking about all of them giving the same example,of storing everything in the database and not in the Application server.
Of course this does not solve the”impedancy” thing they allways talk about,the hell is not so hot like tey allways try to show.
On the other hand everybody that hates ORMS allways give the worst examples,of course the tools are not sob ad,they “mitigate” problems.
But for me ORM tools are just not good enough,For me Software developmente is:
1-Trust –I don´t trust completely in machine dev eloped code,that are not deterministic.
I don´t believe a Bank will ever use ORM,they still have code made with COBOL,wrote in 1972,and every change ,made by a human,is tested,tested,tested,tested before going to production,no code will never be generated at run time(I may be wrong about this subject).

Second-Users productivity,But i Think there are probably no effects in the users interfaces that affects this comparison.

Third-Run time performance ,If ORM does not generate better or at least equal SQL statementes in terms of performance,its nothing but garbage.

Fourth-Easy of development,I mean easy of development comes kilometers after Trus, and Run time performance andu ser productivity.

I never saw a really state of the art database application made with ORM,I don´t like to use tools or frameworks of the future,If I saw such a one in the presente,I´ll make the change.
Excuse me for my very poor English(I expect your Portuguese to be better :>}

Anonymous said...

When I make a database change, ORM helps me by breaking my compile. If I remove some column, sprocs will gladly wait until runtime to fail, but my ORM solution lets me know immediately that there's a problem.
------
What a nonsense actually. DBMS self or helper tool can easy check if exists any sp that depend on it and show warning on attempt ...

Use right tools. :)

Anonymous said...

Time has proven you and your crappy radicore wrong.

Anonymous said...

@Anonymous above: This article still valid and still stand on its point. I bet you're an array maniac, rarely join more than 2 tables and never use WITH ROLL UP.

Read this if you have time, it was written in 2011:
http://n0tw0rthy.wordpress.com/2011/05/28/orms-hidden-cost/

Time prove that every developer must have understanding to use ORM properly, ORM can be replaced with traditional DAL to the MODEL and your comment crap.

NWest said...

Love this post.

Developers love their applications, but at the end of the day, the data will be around long after the "next big thing". Tom Kyte has a great view on this. http://tkyte.blogspot.com/2005/09/most-incredible-statement-i-heard-this.html

Nearly all business data is *relational*. We ask our database all sorts of questions, not just the questions that this specific application needs.

Shalin Siriwaradhana said...

Not a much of a relevant question here, but could you please name a database diagram software to be used for data base diagramming?