Monday, June 9, 2008

Using a Data Dictionary

Database applications can be made much simpler if you maintain a body of data that describes your tables. Every single programming task in a database application needs to know something about the tables it is working with, so every program in a framework and many programs in an application can benefit from a central store of information about the database.

Introducing the Data Dictionary

The term "data dictionary" is used by many, including myself, to denote a separate set of tables that describes the application tables. The Data Dictionary contains such information as column names, types, and sizes, but also descriptive information such as titles, captions, primary keys, foreign keys, and hints to the user interface about how to display the field.

A super-simple beginning Data Dictionary might look like this:

students | student | int  |  -    |  -    | Y  | IDENT  | Student ID
students | firstnm | char | 20    |  -    |    |        | First Name
students | lastnm  | char | 20    |  -    |    |        | Last Name
students | city    | char | 20    |  -    |    |        | City
students | gpa     | num  |  2    |  1    |    |        | Grade Point Avg

* Precision: Needed for chars, varchars, and numerics
** automation: to be described more below

The First Question: Where to Put It

It might seem that the first question about a data dictionary would be "What do we put in it?" We will get to this question in a moment, but the most important question is usually, "Where do we put it?" By this I mean do you put into an XML file, or do you encode it into classes in program code? Or is it somehow directly entered into database tables? There are of course many opinions on this.

The first opinion is that you do not really need a data dictionary per se because you can get this information from the database server (if this is news to you, google "information_schema" and your favorite database). There are two major drawbacks when you depend on the server. First, you cannot build a database out of your data dictionary, because of course you don't have one until the database is built. The second drawback is much worse, which is that you cannot put any extended properties into the dictionary, and without those the dictionary is of little use.

Another method is to put the dictionary into classes, using the typical one-class-per-table approach and storing the data as properties of the class. There are multiple drawbacks to this approach:

  • Data that is "trapped" in code is very difficult to deal with efficiently, so many operations with the dictionary will be much harder to code and will be slower.
  • The dictionary is spread out into many files.
  • Ironically, a good data dictionary allows you to generate most CRUD forms, and so you don't need those one-class-per-table files filling up your directory. It seems silly to make a class that contains data that makes the class unnecessary.

The option I prefer is a plaintext file, which can be generated by a GUI or typed in by hand. The only requirement for the file is that it be easy to type and read, and easy to parse by a program. These requirements are well met by two formats: YAML and JSON (XML is a bletcherous horror to work with manually, so it is disqualified before the race starts). Both YAML and JSON enjoy parsers and writers in nearly all popular languages, so if you create your data dictionary in one of those you have a file that is human readable and writable, machine readable and writable, useful in nearly every language on every platform, easily placed in source control, and very flexible in what it can represent.

First Basic Use: Building and Upgrading Databases

A data dictionary is a wonderful way to build and upgrade databases. It is a real thrill to write your first SQL generator that scans a table of column definitions, writes out a CREATE TABLE statement, and executes it. The next step in the evolution of this process is to have the program query the INFORMATION_SCHEMA of the database, then work out which columns have been added to your data dictionary that are not in the table, and then upgrade the table with an ALTER TABLE statement.

This basic approach can easily be extended to include indexes and keys.

Sidebar: Data Dictionary Versus Upgrade Scripts

Many programmers use upgrade scripts to alter their schemas. The idea is that programmer Sax Russell adds a feature. Along with his code he writes a script that makes the necessary alterations to the database. Then comes Ann Clayborne who does the same thing, followed by Hiroko Ai, and then Sax again. When a customer upgrades their system, they run all four scripts in order. This can lead to horrible upgrade experiences in cases where multiple scripts are upgrading a large table several times. A data dictionary is far superior because it can simply examine the tables as they are, examine the data dictionary, work out a diff, and execute the smallest set of commands to bring the database current. This approach does require of course that the data dictionary be in source control like any other application file.

Second Basic Use: HTML Code Generation

A rich data dictionary can provide you with everything you need to provide "free" CRUD screens for most of your tables. Now to be sure, there are always those tasks that require special screens for the users, but there is just no reason to sit down and code up a screen to managae a table of customer types, zip codes, or even a general ledger chart of accounts. Here is an example of an expanded data dictionary in YAML format that contains enough information to generate screens with zero application code:

table customer_types:
    # Use this to generate menus!
    module: sales
    # Use this for the menu and the page title
    description: Customer Types
    column customer_type:
        type_id: char
        column_precision: 10
        # This is crucial for code generation
        description: Customer Type
        primary_key: "Y"
    column description:
        type_id: char
        column_precision: 40
        description: Description

The use of most of those properties should be pretty obvious, but I would like to point out one particular clever trick you can do. The "primary_key" flag can be used to enable a column during insert mode (if it is a user-entered key), and then to gray out the column in edit mode. When you consider this basic example it starts to become clear that nearly all of the code in most CRUD screens can be reduced to a few routines that read the data dictionary and generate HTML.

I would like to repeat that I do not mean to say that every single CRUD form in an application will work this way. In my experience 9 out of 10 tables can use "free" generated forms, but about 1 in 10 are used so often by users that you end up making special forms with shortcuts and wizards to speed up their work. The dictionary can help you there if you use it to generate the inputs, but it is no use trying to expand the dictionary to cover every conceivable case, there is always one more that the next customer needs that would just plain go faster if you coded it up by hand.

Third Basic Use: Trigger Generation

Generating triggers is a very powerful thing you can do with a data dictionary. We saw last week that the most complete encapsulation of code and data occurs when a trigger is placed on a table. Imagine you had a data dictionary that looked like this:

table orderlines:
    column extended_price:
        type_id: numeric
        column_precision: 10
        column_scale: 2
        calculate: @price * @qty

That magic little "calculate" value can be used to generate a trigger and put it onto the table. The trigger code might look something like this (The SQL version is PostgreSQL):

    -- direct assignments are an error
    if new.extended_price is not null then
        raise error 'Direct assignment forbidden: extended_price';
        new.extended_price = new.price * new.qty;
    end if;

It is not my purpose here to explain how to generate that trigger, but simply to suggest that it is a very doable thing. Because I am not tracing out the steps, I do need to explain that I slipped those "@" signs into the "calculate" value so that when the trigger is built the builder program could detect column names and put the appropriate "old." and "new." prefixes in front of them.

Fourth Basic Use: SQL Access

A database application is full of code that reads and writes to tables. All frameworks and even very modest websites end up with a handful of routines that handle the mundane tasks of sending inserts and updates to the tables.

The Data Dictionary is the perfect resource for these routines. It can be used to properly format data, put quotes where they belong, clip overlong values (or throw an error), prevent the user from changing a primary key, and many other things.

Fifth Basic Use: Documentation

If you have a "rich" data dictionary, one that contains lots of extended properties that describe everything about columns and tables, and if you build your database and generate your forms out of that dictionary, then the next and final natural step is to generate technical documentation out of it as well.

In a first pass, technical documentation is limited to simply displaying the columns that go into a table, which admittedly does not mean much even to technical users. But once you get past that basic task you can begin to layer on lists of parent and child tables (as links of course), descriptions of formulas, and so on.

In a similar vein, Tooltip descriptions go well in a data dictionary.

Conclusion: True Synchronization

The holy grail of database programming is synchronization of code and data. Sychronization means a lot more than simply making sure you delivered the right code and upgraded the tables. Complete synchronization means that the framework is intrinsically incapable of mistakenly accessing missing columns or failing to consider important columns. At the framework level, if you make use of a data dictionary as described above then your framework will be organically synchronized, which is to say that the synchronization is built into the code itself.

Next Essay: Why I Do Not Use ORM


jezemine said...

Good post. There are also several tools on the market that will generate data dictionaries for you. I wrote one called SqlSpec that works against every major DBMS on the market today. You can check it out here:

Anonymous said...

I was the developer of the Recital database and 4GL product way back in 1988. This was the first product (that i'm aware of) that sported a data dictionary. In fact i believe i coined this term. It is good to see that this functionality is still deemed as being important in modern day databases. Thank you for this article.

KenDowns said...

Barry: small world, the first database I used was Recital on a MicroVax. I found the company very helpful and they had a nice product. I have also heard the term data dictionary used by A. Neil Pappalardo of Meditech as early as 93 or so, and by a Foxpro framework called Visual Promatrix in the 90s.

amihay gonen said...

Hi , very nice article.

Here is some of my comments :

1. I think sql translator "" can be use as base code for thi
s builder.

I saw "androdma" os based on PHP (i think perl is more server sided)

2. The builder cannot cover all the senrio , for example :
delete from some table
or add FK while there need to so some primary key values.

KenDowns said...


Your link brought me to a page not found, are there any missing letters?

Perl is more server-side than PHP? Sorry no, they are both server-side languages. I like many find Perl incomprehensible after it is written and prefer the readability of PHP.

The builder cannot handle a delete? What do you mean?

If I understand your point about foreign keys, Andromeda handles that by setting the "auto_insert" flag on a foreign key. This causes the database to insert a value in a parent table if it is not present. Extremely useful when used properly, I make great use of it in my own projects.

KenDowns said...

Amihay: I got the link to work. The project you refer to appears to be about schemas only: tables and columns. While the features look very robust for that sphere, my OP (and my Andromeda framework) are a for more advanced approach to the entire picture of security and automations as well.

Anonymous said...

Interesting article. Incidently, I used data dictionaries some while ago in one of my PHP projects. The difference was that I used PHP arrays to describe them, and that these arrays where defined inside the classes that operated on these tables - not in an ORM way, but because of modularization, i.e. the main class of a module defined one or more tables and on instantiation made sure that they exist and are up to date. This is essentially object orientation: the data is defined where it is used.

A problem with this approach (data dictionaries in general) vs. update scripts is that it is impossible to distinguish a renamed column from deleting one column and creating a number. This means that during the update, data is lost. Do you have any good solution to that? I think the most comfortable way would be to auto-generate some none-semantic IDs for each table and column (so that you have no intent to rename them, just like you would only rename a username, but not a numeric ID) and storing this ID somewhere in the database as meta information.

table foo:
id: 1
column one:
id: 2
column two:
id: 3

And later:
table bar:
id: 1
column three:
id: 2
column four:
id: 4
In this example it is clear that table foo was renamed to bar and column one was renamed to three, but column two was removed and column four was added.

What do you think of this? Have you run into other problems with the dictionary approach? How do you handle the potential need to update the actual data along with a schema update (for example when changing a default value)?

KenDowns said...

Lemming: you ask several significant questions, let me try to answer in no particular order.

When it comes to moving data around during an upgrade, such as populating a new table from an old one, I still use scripts. The reason is that this operation only happens when I discover a design flaw in the tables. Each case is always an exception and there is nothing to be gained by trying to work out general-purpose tricks to handle them. But that being said, the Andromeda framework itself is 4 years old and has exactly 3 scripts. A medical office program I wrote is 2 years old and has 4, though I make changes to it every week or two.

My dictionary does support specifying content. For instance, there is a set of tables that is common to all my apps, which I ignore when I don't need them. One of them is a table of timezones. The spec also includes the values for the table. The first time a database is built the table is populated.

As for renaming columns, all I can say is that I do not do that. Once I create a column 'email' as varchar it stays varchar, though somebody always seems to need it to be 5 more characters, so widening columns is a necessary feature. My dictionary defines columns outside of tables, and then *places* them in tables. This means the same column in two tables will always have the same type, precision, and scale. Andromeda will let me put prefixes and suffixes onto them, so I can have "phone_home" and "phone_cell", which are both "instances" of the "phone" column.

With respect to keeping an ID for the columns, I do not do that. I am a heretic. The unique identifier for the "address" column is "address." Adding an ID in my experience is more work with no benefit. The best example I can offer is a query I had to help a customer write that compared values across years. I innocently wrote "WHERE table1.year > table2.year" and he said, "oh, watch out, year is not a year, its an ID to a table that lists.... you got it, years! WTF!!! For Pete's sake, an ID column to identify an integer value? When will this madness end? Why not friggin put in the G**D**M year! Well, I seem to have gotten a little worked up...

I am not sure what you mean by "data is lost." My own builder is non-destructive, if you remove a column from the spec it does not remove it from the table, it just ignores it from then on.

Here is something you may find interesting. My framework defines appx 40 columns that can be used in any database (,
and a typical project will use about a dozen of them over and over with prefixes and suffixes. So an order has "amt_tax", "amt_lines", "amt_final" and so forth. After doing this for four years now it feels so natural I really don't know how to talk about dropping and renaming columns any more.

To another of your points, I do not define the data "where it is used" because its definition is independent of whatever use I have in mind when I first see it. This leads to another statement you make... say "on instantiation made sure that they exist and are up to date". This I most definitely do not do. I write complete applications of dozens of tables and one of the most basic operation is to validate the entire spec as itself being valid before making any changes to a customer's database. This is why the spec itself is a self-contained complete description of the entire database. The builder runs through it and valiates it, then validates that the transition from current to new is itself valid. Only when both of requirements are met will it begin to make structure changes.

Another reason not to build or upgrade a table on instantiation is that the entire build process is fragmented and depends on user interaction. You could improve it to ripple through to things like updating or creating parent tables also, but then why not just upgrade the database in one shot and be done with it?

Anonymous said...

I have begun putting all SQL server metadata on my website. It lists all the information_schema views and the underlying code behind it.

Anonymous said...

This is going to be a free document generator check this sample

Few more days for the release

DEddy said...

Barry -

I'd guess "data dictionary" was coined at least in the UK in the mid-late 1960s.

By early 1970s in US the debate was over "active" vs "passive."

Keeping track of data structures (an SQL table is just another type of data structure, no different than a flat file) is just one of the many functions of a full blown data dictionary.

There are hundreds of data structure types, of which SQL is simply one.

The term "data dictionary" went out of fashion in the late 1980s & "metadata repository" was swapped into the same space.

The genre has essentially ceased to exist in the marketplace other than in handwaving PowerPoint presentations.

Anonymous said...

Regarding your comment that "you cannot put any extended properties" into a data dictionary when building it from the server, I think it's important to point out that on Microsoft SQL, this isn't entirely correct. In MS SQL 2000 and later, you can use sp_addExtendedProperty to add properties to a database. Once this data is in the database, accessing it to create a data dictionary is pretty simple.

Your first point is entirely correct, however - this data doesn't help you unless you already have the database. The other, unstated, advantage of using the database to create a data dictionary is in the situation of taking over a poorly documented database. While not all properties may be filled out, the use of a script-generated data dictionary may allow the DBA to quickly gather information about a database.

On, there is a script available that creates one additional table to store data dictionary information, as well as multiple stored procedures used to populate it. The script is specifically for Microsoft SQL servers, and can be seen and downloaded at .

Jay said...

Many dittos. Two quibbles:

You say that using the database engine as a data dictionary won't work because then the data dictionary can't exist until you create the database. True, but so what? I can create tables with SQL as easily as I can type up a data dictionary. I don't have to actually populate the database: I can just define the tables. I've done this on a number of projects.

Now when you point out that there are limits to what metadata you can specify -- that's a more telling point. Many DB engines give me no way to specify a description or domain information.

Also, I'm surprised by your statement that 9/10 of screens can be automatically generated from a dictionary. Perhaps you mean 9/10 of the plain vanilla "update a record" screens. Any app I've worked on, perhaps half the screens can be generated. The rest combine data from multiple tables or are otherwise complex.

(The app I'm working on these days has zero generated screens, probably because the original developers never heard of that idea. Nor had they heard of other new-fangled ideas, like normalized databases. But that's another story.)

Probal DasGupta said...

I still use a data dictionary as a knowledge base that is a super-set of all databases. It is a list of unique data items in the company, but a sub-set of what might be called a Business Terminology document for the company.

Steve Ginal said...

Check out dbmaster (dbmaster. It is a great tool what guys built to pull information schema from a database and you can any data dictionary fields for tables and columns and then quickly search and generate triggers, documentation in any format.

Steve Ginal said...

Correct url for the dbmaster is You should check it out if you're working with data dictionary.

Tekk said...

We were using the term data dictionary back in about 1982 - when developing databases using SSADM - which started up in about 1980... So I think the term was coined well before 1988

Unknown said...

im just amateur about this.. where could i find data dictionary? is it on a xampp that i created a databaser?

Anonymous said...

SQL Data Dictionary

Dataedo said...

Have a look at this Data Dictionary generator - Dataedo. It enables you to describe any element of your data model and generate nicely formatted documenation to HTML, PDF and Excel.

Digvijay Chaudhary said...

It is really nice and informative post. For more about data dictionary you can visit here..Explanation of Data Dictionaries with example

sdfghj said...

it was really nice and clean explanation and easy to undersatand meropaper

Define said...

Recommend you to try dbForge Documenter for MySQL at for automatically generated MySQL documentation in HTML, PDF, and Markdown file formats

Unknown said...

How to understand if PHP Search isn't working with PostgreSQL Database? Contact to Postgres SQL Support for Windows
On the off chance that any of the clients attempting to make look bar for site which questions a PostgreSQL database however in the meantime you found that the pursuit work isn't working. One thing more you will see here, you won't get any sort of blunder on the off chance that you physically enter the question. For this question het our propel client benefit i.e. Postgres SQL Support for Linux or PostgreSQL Relational Database Service or PostgreSQL Remote Database Service which viably empowering you to boost your execution and whole Postgres database.
For More Info:
Contact Number: 1-800-450-8670
Email Address-
Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801

nisha said...

I really appreciate your article. the post has excellent tips which are useful. Nice post. Thanks for sharing! I want people to know just how good this information is in your article.
Email Database USA
Email Marketing Database
Buy Email Database
USA Business Database
World wide Email Database

Repairtech Solutions said...

It’s nearly impossible to find well-informed people for this subject, however, you seem like you know what you’re talking about! Thanks onsite mobile repair bangalore You ought to take part in a contest for one of the greatest blogs on the net. I'm going to recommend this website! asus display repair bangalore A fascinating discussion is worth comment. I think that you should write more about this subject matter, it might not be a taboo subject but generally people don't talk about such topics. To the next! Kind regards!! huawei display repair bangalore

Repairtech Solutions said...

This is the perfect webpage for anybody who hopes to find out about this topic. You know so much its almost tough to argue with you (not that I really would want to…HaHa). You definitely put a new spin on a topic that's been discussed for many years. Wonderful stuff, just excellent! online laptop repair center bangalore Aw, this was an incredibly good post. Taking the time and actual effort to create a top notch article… but what can I say… I hesitate a whole lot and never manage to get anything done. dell repair center bangalore

Repairtech Solutions said...

I blog often and I genuinely thank you for your information. This article has really peaked my interest. I am going to take a note of your blog and keep checking for new information about once per week. I subscribed to your RSS feed as well. macbook repair center bangalore Pretty! This was an incredibly wonderful article. Many thanks for providing these details. acer repair center bangalore

lavanya said...

I like the helpful info you provide in your articles. I’ll bookmark your weblog and check again here regularly. I am quite sure I will learn much new stuff right here! Good luck for the next!I really appreciate your article. the post has excellent tips which are useful. Nice post. Thanks for sharing! I want people to know just how good this information is in your article.
Java training in Chennai

Java Online training in Chennai

Java Course in Chennai

Best JAVA Training Institutes in Chennai

Java training in Bangalore

Java training in Hyderabad

Java Training in Coimbatore

Java Training

Java Online Training

surya said...

I really appreciate your article. the post has excellent tips which are useful. Nice post. Thanks for sharing! I want people to know just how good this information is in your article.

angular js training in chennai

angular training in chennai

angular js online training in chennai

angular js training in bangalore

angular js training in hyderabad

angular js training in coimbatore

angular js training

angular js online training

radhika said...

This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...

AWS training in Chennai

AWS Online Training in Chennai

AWS training in Bangalore

AWS training in Hyderabad

AWS training in Coimbatore

AWS training

Revathi said...

Thanks for the sharing this informative and interesting post with us, I really admire your post as its it's really very helpful for my research.keep up!!

android training in chennai

android online training in chennai

android training in bangalore

android training in hyderabad

android Training in coimbatore

android training

android online training

vivekvedha said...

Very amazing and interesting post, thank you for sharing
acte reviews

acte velachery reviews

acte tambaram reviews

acte anna nagar reviews

acte porur reviews

acte omr reviews

acte chennai reviews

acte student reviews

sanjay said...

Excellent post...
Cyber Security Training Course in Chennai | Certification | Cyber Security Online Training Course | Ethical Hacking Training Course in Chennai | Certification | Ethical Hacking Online Training Course |
CCNA Training Course in Chennai | Certification | CCNA Online Training Course | RPA Robotic Process Automation Training Course in Chennai | Certification | RPA Training Course Chennai | SEO Training in Chennai | Certification | SEO Online Training Course

zayn said...

Benefits of cold pressed juice

zayn said...

OOH Ads with keywits

zayn said...

Diet and exercise for healthy body and mind

zayn said...

How to grow yopur business on Quora

Unknown said...

school management app
school management software
school erp software
classroom management software
online school management system

Musheer said...

Thanks for sharing this

Zayn said...

Thanks for sharing

Realtime Experts said...

I have read your blog its very attractive and impressive
Database Developer Training in Bangalore

janvi said...

Thankyou so much

Unknown said...

This is the exact information I am been searching for, Thanks for sharing the required infos with the clear update and required points. To appreciate this I like to share some useful information

Database Developer training in bangalore

Subhasree said...

Excellent post! Your post is very useful and I felt quite interesting reading it. Expecting more post like this. Thanks for posting such a good post. laptop service in home. To service your laptop with offer prices, Please visit : Laptop service center in Navalur

Jon Hendo said...

Less than a quarter of events organizers have ever hosted a hybrid event so if we’re looking at hybrid through a broadcast lens you’d benefit from hiring an Executive Producer. follow up email after networking event sample and subject line for thank you email

Anand Shankar said...

Very good blog post. I love all the articles posted here.
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge
one funnel away challenge

vivikhapnoi said...

This is one of the most incredible blogs Ive read in a very long time.
đại sứ quán việt nam tại nga

thông tin chuyến bay từ canada về việt nam

chuyến bay từ frankfurt đến hà nội

Vé máy bay giá rẻ tu itali ve Viet Nam

Bảng giá vé máy bay Vietjet Air tu Ha Lan ve Viet Nam

Cách săn vé máy bay giá rẻ tu New Zealand ve Viet Nam

Soureessay said...

Great work having complete researched material. If you are seeking assignment help services, you can contact team of assignment writers of anytime. phd thesis croydon

fullmetal said...

Hey friend, it is very well written article, thank you for the valuable and useful information you provide in this post. Keep up the good work! FYI, Pet Care adda
Credit card processing, wimpy kid books free
,science a blessing or curse essay

Sruthi Karan said...

Very interesting post and I want more updates from your blog. Thanks for your great efforts...
Separation Before Divorce
Cost of Legal Separation VS Divorce

Sruthi Karan said...

Your blog was really good and I gain good knowledge from your post. Good work!
Fairfax Divorce Lawyers
Fairfax Divorce Attorney

Vidhyamenon said...

Useful blog, keep sharing with us.

Xamarin Training in Chennai
Xamarin Online Training

Nila dharshan said...

Great blog, it is very impressive.

Blockchain Course in Chennai
Blockchain Training in Chennai

Deepika said...

Thank you for this blog. Share more like this.

AWS Training in Chennai
AWS Online Training
AWS Training in Bangalore
AWS Course in Coimbatore

Hashini said...

This blog is really nice! I learn more knowledge from this post and Keep sharing with us.
pvc foam board manufacturers in kerala
pvc foam board manufacturers kerala

Divya said...

great blog. keep sharing more.
Artificial Intelligence Training in Chennai
AI Training In Bangalore
Artificial Intelligence Training in Coimbatore

Rosy S said...

Great blog with good content and thanks for sharing
Also Visit:
DevOps Training in Chennai |
DevOps Online Course |
DevOps Training in Coimbatore

OnlineReviews said...

Nice informative content. Thanks for sharing the valuable information.RR Technosoft is the Digital marekting comapny in hyderabad and it provides Class room & Online Training by real time faculty with course material and Lab Facility.

kosmiktechnologies said...

Nice Article Very glad to read your Article

aws training in hyderabad

Best Fashion photographer in india said...

Thanks for sharing .. Portfolio photographers in india

Irich photography said...

Awesome content