Sunday, September 28, 2008

The Quest for the Absolute

This is the Database Programmer blog, for anybody who wants practical advice on database use.

There are links to other essays at the bottom of this post.

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

Today I am taking a huge detour from technical matters to lay out the philosophical groundwork behind this blog. The ideas presented today lie beneath every essay on this site. It is easy to observe that people seem driven to formulate absolute truths to guide their pursuits. Programming is no different, programmers are driven to find the absolutes that will universally guide their efforts. Those absolutes are not that hard to find, if you know the method for seeking them out. Fortunately, we have hundreds and thousands of years of human efforts, both successes and failures, to draw upon when embarking upon the task.

Absolutes in the Post-Modern Age

Academics refer to our current stage of history as the "Post-Modern" age. Thinking in the post-modern age is dominated by a deep mistrust of the very concept of absolute truth. Many thinkers have noted that in the post-modern age the only absolute is that there are no absolutes. Now, anybody who has not bothered to read much past what they are handed likely believes much of this without even thinking about it, they may not know that in the history of the human race such thinking is less than 60 years old.

But that "no absolutes" stuff is all nonsense at best and downright cowardice at worst. If you want an example of an absolute truth, try stepping off the edge of a cliff: even if you do not believe in gravity, gravity believes in you. It is an absolute truth for me that if I do not take care of my customers my life becomes unpleasant. It is a further absolute truth for me that I constantly obvserve programmers proclaiming absolutes (always use relational, always use OO, etc). When I stop observing it, then I suppose it won't be an absolute anymore (and I suppose then it never was?)

So let us now cheerfully ignore the wailing of those who cry that there are no absolutes, and ask if we might discover some elements of software development strategy that hold true always (ok, maybe mostly always) for the context of database application development.

Aristotle and Virtue

Nowadays nobody has to read philosophy much anymore, at least not where I live (in the United States), so most programmers have never heard of a man named Aristotle, who lived about 2500 years ago. This is a shame, because Aristotle had a logical way of thinking about things that would warm the heart of any programmer.

One of Aristotle's major contributions to civilization was his formulation of what philosophers call "virtue". Philosophers use the term in a technical sense, and they do not use "virtuous" to mean "nice" or "pleasant" or "good-natured." To a philosopher (or at least those that taught me) something is virtuous in Aristotelean terms if if performs its function well. The standard classroom example is that a virtuous table serves the function of a table, and a virtuous table maker is somebody who makes good tables.

This is a very useful concept for programmers. If we want to speak of a "virtuous" program, we mean simply one that meets its goals. This takes the whole high-minded theory and philosophy stuff back to real down-to-earth terms. (This is why I always preferred Aristotle to Plato).

In the quest for the absolute, if we let the ancient philosophers guide us, we discover the surprisingly basic idea that our programs should perform their functions well if they are to be called virtuous. This is easy to swallow, easy to understand, and easy to flesh out.

What is a Virtuous Computer Program?

A virtuous computer program is one that serves its purpose well, and so we need to flesh out the three purposes that are common to most programs:

  • To meet some institutional or strategic goal of those who sign the checks (or accept the work as charity in some case).
  • To meet the goals of end-users, which almost always comes down to performance and ease-of-use.
  • To provide income for the developers (or meet their own goal of providing charity work for non-profits).

Notice what is not on the list, things like ensure all data resides in a relational database, or implement all code in strictly object-oriented languages. We are not nearly ready to consider such specific strategies as those, they are completely out of place here in a discussion of the unifying goals of all projects.

So let's review. So far we know that the absolutes of programming are the pursuit of virtue, which turns out to be a fancy way of saying that the program should perform its functions well, which turns out to mean simply that it should do what the check-signer asked for, in a way that is workable for the end-users, and at a price that keeps the programmer fed.

This leads us towards strategies for reaching those goals.

The Virtuous Programming Strategy

Continuing with the idea that a virtuous program meets is basic goals, we can say that a virtuous strategy smooths the way for a programmer to meet the basic goals. An unvirtuous (or just plain bad) strategy litters the path with obstructions or ends up not meeting the goals of the check-signer, end-users, programmer, or all of the above.

Before we can begin to formulate a strategy, we must look next at the reality of the programming world. Some of the fundamental realities include (but are not limited to):

  • The end-user or check-signer may not fully understand or be able to articulate their requirements.
  • The programmer may not correctly understand requirements, even when correctly articulated.
  • In a healthy prosperous situation there will be new requirements that interact with established requirements in ways that range from no interaction at all to fiendish incompatibilities.
  • The world will change around you, creating demands that did not exist when the system was created (some of us can still remember when there was no internet).
  • Staff will come and go.
  • ...and so on.

So even before we begin formulating particular strategies for particular situations, we recognize that our strategy had underlying goals it must facilitate, such as:

  • Being easy to change, both for correcting mistakes and adding features.
  • Being able to maintain and sort out possibly contradictory requirements that arise as the years go by.
  • Requiring little or no "deep magic" that depends on arcance knowledge of employees who may depart.
  • Being able to expect the unexpected (like the explosion of the web etc.)

Only after we have worked through to this point can we begin to evaluate specific strategies and technologies. We can now begin to ask about the proper context of the database server, where to use object orientation, and if javascript is a good programming language. Anything that responds to our core goals and realities can be considered for use, anything which does not play into the core goals is useless at best and obstructive at worst.

Future essays (and some past essays) in this series will refer back to these ideas. For example, many developers have observed over the years that if you Minimize Code and Maximize Data then you gain many advantages in terms of development time, robustness, and feature count. Other ideas similar to this will come out over and over in future essays in this series.

Conclusion

The strategies and techniques that you will see on this blog are all aimed at one way or another towards the goals expressed in this essay. At the very beginning comes the goals of the check-signer, the end-users, and the programmer. From there we seek strategies that will satisfy our need to grow, change, correct, and adapt. Only then can we ask about the technologies such as databases and object-oriented languages and see how well they let us meet all of these goals.

Related Essays

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

Other philosophy essays are:

Sunday, September 21, 2008

Topical Table of Contents

This posting is updated whenever a new post goes up.

There is a also a Skills-oriented Table Of Contents. It is not as complete as this list, which lists all posts, but it is more centered on links as they relate to skills.

If you want some free analysis, why not submit your schema to the Database Programmer? If you are willing to discuss your issues with a bit of public exposure, I will provide free analysis, and everybody can benefit!

User-Submitted Analysis Topic: Email

The Application Stack

Table Design Basics: Keys, Normalization, Denormalization

The first group of posts introduces the must-know terms and techniques for table design.

It might be a good idea to start with The Relational Model.

Following up on the normal forms are some basic discussions of normalization and denormalization.


Table Design Patterns

The second subseries details commonly occurring patterns in table design, how to recognize them and when to use them.

There is a complete List of Table Design Patterns. The rest of the entries are:


SQL SELECT and Queries

Algorithms and Processes

Server-Side Code

Analysis

Development Cycle

Philosophy

Data Dictionary

Many of these posts are listed elsewhere in this table of contents, but I wanted to have them altogether in one place as well.

Security

Performance

The Browser

Sunday, September 7, 2008

Advanced Table Design: Secure Password Resets

Most web-based database applications make use of email to allow users to change their passwords. Completing securing this operation can be tricky business, and one of the best ways to do it is to user database server abilities.

Disclaimer 1: Only As Secure as Email

We tend to take it for granted today that password reset systems work through email. We reason that if a user can access an email sent by us then they are who they say they are. Obviously this will not be true if a user's email account has been compromised.

Dealing with the possibility of compromised email accounts is outside the scope of this week's essay. There are other strategies available to reduce that risk, but they will be treated in some future essay.

Disclaimer 2: Only SSL (HTTPS) of Course!

It is not much use giving yourself a super-secure email system if you transmit sensitive information over unencrypted connections. Secure Socket Layers (SSL) should always be used when high security is required. For the end-user this means they are going to a site through HTTPS instead of HTTP.

Password Resets vs. Sending Passwords

On some low-security systems it is acceptable to send a user his password in an email. This approach is very ill-advised in higher security contexts because we have no control over the user's storage of that email. It could end up anywhere, and anybody might read it.

When security requirements are higher, it is better to force the user to reset their password. There are several reasons for this, but the important one here is that we do not want to send the actual password in an email. Therefore we must send a link that sends them to a page where they can provide a new password.

The Requirements

If we spell out the requirements for a secure password reset system, they are at the very least these:

  1. We must generate some hash and send it to the user, this is how she will identify herself so we can let her change her password.
  2. The hash must expire at some point, since we cannot be sure the user will completely purge out the email (or that he even can, depending on the policy of the email host).
  3. It must be completely impossible for anybody to read the hash, otherwise they could intercept the reset process and set a password for themselves.
  4. Despite requirement 3 just listed, we must somehow verify the hash when the user presents it.
  5. We must be able to change the user's password, which is a priveleged operation, even though the user is not even logged in.

It is not actually possible to implement these requirements in application code alone (or perhaps I should say is not possible to do it and meet minimum acceptable risk). There are two problems if you try it:

  1. Requirements 3 and 4 cannot be reconciled. If the application is able to read the hash to verify it, then a vulnerability in the application code could lead to compromise. If we implement in application code we have the burden of ensuring practically zero vulnerabilities, while if we go server-side we have no such burden (at least for this feature).
  2. Requirement five requires the application code to connect at a very high privelege level, which could lead to completely unrelated vulnerabilities.

Implementing In The Database

The system I will now describe meets all 5 of the requirements listed above while never requiring a priveleged connection to the database. The feature is implemented in an isolated system that cannot touch other systems, and it has no burden to be particularly careful in writing the application code.

Since a picture is worth a thousand words, here it is:

The process begins at the top left. The user (Yellow circle) clicks on some "Forgot Password" link and provides an email or account id. This goes to web server which generates an INSERT to the insert-only table of hashes. This insert contains only the user's id, nothing else is needed. There is a trigger on the table that fires on the INSERT. This trigger generates the hash and sends the email to the user.

The salient features here are that the table is insert-only, which is explained below, and that the trigger operates at super-user level, which is also explained below.

Once the user receives the link and clicks on it, our process goes over to the right. The user lands on a page and provides a new password (and probably of course must type it in twice). The web server does basic things like making sure the two values match, that the password is long enough, and like that, and then generates an INSERT into a second table. The insert contains the email or account ID, the hash, and the desired new password.

The magic begins on the INSERT into the second table. An INSERT trigger running at superuser level is allowed to look at the first table and verify the hash and its expiration. If these match, it sets the user's password.

Simple, really, IMHO.

Feature 1: Insert Only Tables

This system depends on creating tables that any unpriveleged user can insert into, but which nobody can SELECT from or UPDATE to or DELETE from.

This may sound like a joke: "Insert Only Table", something like "Write only memory". But the idea is very simple, if nobody can SELECT from the table then nobody can discover active hashes. If nobody can UPDATE the table then nobody can forge hashes. Finally, if nobody can DELETE from the table then nobody can cause mischief.

The code for the tables looks like this:

-- FIRST TABLE
CREATE TABLE users_pwrequests
(
  recnum_pwr integer,
  user_id character varying(40),
  md5 character(32),
  ts_ins timestamp without time zone,
)
-- NOTE! This syntax is PostgreSQL, there may be
-- slight variations on other platforms.
REVOKE ALL ON TABLE users_pwrequests FROM PUBLIC;
GRANT INSERT ON TABLE users_pwrequests FROM PUBLIC;

-- SECOND TABLE
CREATE TABLE users_pwverifies
(
  recnum_pwv integer,
  user_id character varying(40),
  md5 character(32),
  member_password character varying(20),
)
REVOKE ALL ON TABLE users_pwverifies FROM PUBLIC;
GRANT INSERT ON TABLE users_pwverifies FROM PUBLIC;

Feature 2: Trigger Security Priveleges

It is possible on most servers to severely limit a user's allowed actions on a table, but then to provide trigger code that fires on those actions and executes a super-user level. Today's technique depends upon this ability. Trigger code operating at superuser level can look at the insert-only table to verify a hash, and it can also set the user's password.

This basic ability is what makes triggers so amazing and cool for implementing business logic (see also Triggers and Encapsulation), because there is no way for a user to directly invoke a trigger for his own nefarious purposes, and there is no way for a cracker to avoid the firing of the trigger if he performs an action on a table. Triggers are truly the most powerful example of encapsulation of data and code that is available to today's programmer.

The first trigger looks something like this ( this is PostgreSQL code, your server will likely require variations) (I have also stripped it down for brevity, it may not work exactly without modification):

CREATE OR REPLACE FUNCTION users_pwrequests_ins_bef_r_f()
  RETURNS trigger AS
$BODY$
DECLARE
    NotifyList text = '';
    ErrorList text = '';
    ErrorCount int = 0;
    AnyInt int;
    AnyRow RECORD;
    AnyChar varchar;
    AnyChar2 varchar;
    AnyChar3 varchar;
    AnyChar4 varchar;
BEGIN
    -- necessary for an old glitch in pg security
    SET search_path TO public;

    -- Only execute if the user's id is valid
    SELECT INTO AnyInt Count(*)
           FROM users WHERE user_id = new.user_id;
    IF AnyInt > 0 THEN 
       SELECT INTO AnyChar email
              FROM users WHERE user_id = new.user_id;
       -- This lets you put the email itself into 
       -- a table for admin control
       SELECT INTO AnyChar2 variable_value
              FROM variables
             WHERE variable = 'PW_EMAILCONTENT';
       -- Also the server is stored in a table
       SELECT INTO AnyChar3 variable_value
              FROM variables
             WHERE variable = 'SMTP_SERVER';
             
       -- This becomes the email FROM Address
       SELECT INTO AnyChar4 variable_value
              FROM variables
             WHERE variable = 'EMAIL_FROM';
       IF AnyChar4 IS NULL THEN AnyChar4 = ''; END IF;
       
       -- Very important! Set the md5 hash!
       new.md5 := md5(now()::varchar);
       
       -- Call out to a stored procedure that sends emails
       PERFORM pwmail(AnyChar
          ,'Password Reset Request'
          ,AnyChar2 || new.md5
          ,AnyChar3
          ,AnyChar4);
       EXECUTE ' ALTER ROLE ' || new.user_id || ' NOLOGIN ';
    END IF;    -- 3000 PK/UNIQUE Insert Validation

END; $BODY$
  -- The "SECURITY DEFINER" is crucial, it allows 
  -- the trigger to run as the super-user who 
  -- created it
  LANGUAGE 'plpgsql' VOLATILE SECURITY DEFINER

The second trigger looks like this:

CREATE OR REPLACE FUNCTION users_pwverifies_ins_bef_r_f()
  RETURNS trigger AS
$BODY$
DECLARE
    NotifyList text = '';
    ErrorList text = '';
    ErrorCount int = 0;
    AnyInt int;
    AnyRow RECORD;
    AnyChar varchar;
    AnyChar2 varchar;
    AnyChar3 varchar;
    AnyChar4 varchar;
BEGIN
    SET search_path TO public;

    -- Read the first table to see if the 
    -- link is valid and has not expired
    SELECT INTO AnyInt Count(*)
           FROM users_pwrequests
          WHERE user_id = new.user_id
            AND md5     = new.md5
            AND age(now(),ts_ins) < '20  min';         
    IF AnyInt = 0 THEN                                
        ErrorCount = ErrorCount + 1; 
        ErrorList  = ErrorList || 'user_id,9005,Invalid Link;';
    ELSE 
       -- Magic!  The user's password is set
        EXECUTE 'ALTER ROLE ' ||  new.user_id 
            || ' LOGIN PASSWORD ' 
            || quote_literal(new.member_password);
            
        -- Very important!  Now that we have set it,
        -- erase it so it is not saved to the table
        new.member_password := '';
    END IF;    -- 3000 PK/UNIQUE Insert Validation

    IF ErrorCount > 0 THEN
        RAISE EXCEPTION '%',ErrorList;
        RETURN null;
    ELSE
        RETURN new;
    END IF;
END; $BODY$
  LANGUAGE 'plpgsql' VOLATILE SECURITY DEFINER;

Feature 3: Sending Email From Database Server

The technique present above requires that your database server be able to send emails. This is not always possible. Postgresql (www.postgresql.org) can do it, and I have to believe the other big guys can as well, but I have not tried it yet personally.

To send emails through a PostgreSQL server, you must install Perl as an untrusted language, and then install the Perl MAIL package. If anybody wants to know more about that then please leave a comment and I will expand the essay to include that.

Feature 4: The Empty Column

There is one more note that should be made. To use this system, you must tell the server the user's desired new password. To do that, you must actually make it part of the INSERT command and therefore you must have a column for it in the 2nd read-only table. However, you certainly do not want to actually save it, so you have the trigger set the password first and then blank out the value, so the final row saved to the table does not actually contain anything. This is noted in the code comments on the second trigger, which is included above.

Conclusion

The technique presented today makes full use of database server abilities to create a password reset system that is highly resistant to forgery, interception, and evil-admin meddling. It makes use of a combination of restrictive table security, priveleged trigger code, and sending emails from the database server.