How to validate zip code, city, and state code

How would you ensure that the zip code, city, and state code information entered in your database is correct?

Good question. The answer: "It depends."

Syntactic correctness is achieved when the values fit the datatype. For example, the City column would typically be VARCHAR(50) or a similar character datatype, and therefore could contain just about anything. The StateCode column might be CHAR(2) for values like 'AK', 'AL', and so on, but it could just as easily hold values like 'ok', '42', and '' (an empty string). You might define ZipCode as a VARCHAR(10), so that it can hold both 5-digit zip codes and 9-digit ones including the hyphen.

In any case, all databases are capable of doing adequate syntactic validation when values are inserted or updated. For example, to prevent empty strings (which are not the same thing as NULL), you can specify a constraint such as:

CHECK ( DATALENGTH(City) > 0 )

Constraints of this type are declared in the CREATE TABLE statement.

Semantic correctness is more complicated. There are 256 * 256 = 65536 different ASCII two-character combinations, but only sixty-five of them are official USPS abbreviations.

One obvious solution is to use a State lookup table, with the primary key as the state code, and data columns for the state name and other information relevant to your needs, such as a shipping charge that varies by state. Then any table that has a state code can define it as a foreign key to the State table so that relational integrity will be enforced, automatically ensuring that any state code entered is valid.

If you do not need to maintain additional State information separately, then instead of a lookup table you could use a DOMAIN. Not all databases support DOMAINs, though, so you might have to use a constraint instead:

CHECK ( Statecode IN ('AK','AL', ... 'WY') )

This would have to be defined on every table that has a state code, and thus might require a bit of maintenance when the US annexes Canada.

Uh oh, did I just say that out loud?

Zip codes are more problematic. You might create some elaborate CHECK constraints that allow only 5-digit numbers, or 9-digit numbers with either a hyphen or a space between the 5-digit and 4-digit portions. Alternatively, you could have a ZipCode lookup table, which would be big and require regular updates, but which is a very practical solution. You can obtain more information by searching for "zip code database" on the Web.

A higher level of semantic correctness is ensuring that the city, state code, and zip code are consistent. Walla Walla is not located in NY and its zip code is not 90210, despite the fact that each of those three values is perfectly correct on its own. There are commercial solutions that can do this type of checking, so, again, if you're interested, visit one of the Web's major search engines.


Dig Deeper on Oracle development languages

Data Management
Business Analytics
SearchSAP
TheServerSide.com
Data Center
Content Management
HRSoftware
Close