Storing Custom Fields in the Database
Throughout the years, I’ve worked on a number of apps which required user-customizable fields. For instance, consider a document-management system which has a some standard fields (document title, creation date), but also gives users the possibility of adding their own (department name, security code, etc). The question whether having such fields is a good idea is touched upon by Joel; in this post, I’d like to think about the available options for storing these fields in a relational database.
So far, I’ve come across four different patterns to store custom fields in a RDBMS:
- the meta-database
- the mutating table
- the fixed table
- the LOB
All of these have their advantages and disadvantages. I will discuss them seperately, using the document-management system as an example.
the Meta Database
This approach basically defines a database inside of the database. In its simplest form, it consists of two tables: a Field Definition table, and a Field Value table. The value table has a foreign key relationship with the definition table, and with the table containing the entities (i.e. the documents table). The following figure illustrates this:

When adding a custom field, you just add a row to the definitions table. When adding a document, you look up the defined custom fields, and add value rows for all definitions. When retrieving documents, you also retrieve all custom fields for that document.
Advantages
- Fully normalized
Disadvantages
- Requires complex queries to get to the fields values
- Slow, because of the JOINs
- The fields table become huge, because we add a row for each custom field
the Mutating Table
When using this approach, we store all custom fields in one Fields table. This table has a one-on-one relationship with the documents table, and it alters as we add or remove custom fields. Here’s a picture:

The custom fields table initially starts out with just an id column, but as custom fields are defined, columns are added to this table by using ALTER TABLE statements. When custom fields are removed, we drop the column again.
Advantages
- Simple to query: the data is just there to read
- Works nicely with reporting tools
Disadvantages
- Requires runtime permissions to alter the database
- Does not work with ORMs which require a mapping to be defined upfront
the Fixed Table
This approach is very similar to the previous one: we still have one table for all fields. The major difference is that we don’t alter this table, but initialize it with a fixed number of nullable columns: 20 or 30 will do. The columns all have unimaginative names such as CustomField1, CustomField2, etc.; in another table, we map these column names to the field names as fields are defined. It looks something like this:

Advantages
- Simple to query
- Works with ORM & reporting tools
Disadvantages
- Fixed amount of fields
the LOB
Finally, there is the option of storing custom fields in some sort of Large Object. You can serialize some Hashtable object containing the fields and store that in a BLOB, or create an XML tree and store that in a CLOB. The LOB containing this data will just end up right next to the other document fields, in the document table:

Advantages
- Fully extendable
Disadvantages
- Hard or even impossible to query
Conclusions
I’ve used all of these four approaches, and they all have their pro’s and con’s. Overall, my favorite is the Fixed Table pattern, where we simply reserve some table space for 20 or so fields. Sure, this might be less flexible, since the amount of fields is fixed, but I don’t think that’s a big problem. Quoting Joel:
Here’s a common programmer thought pattern: there are only three numbers: 0, 1, and n. If n is allowed, all n’s are equally likely. […] Thus, for example, programmers tend to think that if your program allows you to open multiple documents, it must allow you to open infinitely many documents […]. A programmer would tend to look with disdain on a program which limited you to 20 open documents. What’s 20? Why 20? It’s not even a power of 2!
In the past, when I offered the choice to clients, explaining that they could choose between a solution which was easy to implement, but had an upper limit of 20 fields; or a solution which was more complex, and doesn’t work with their existing reporting tools, they all chose the simple solution.
And besides, 20 fields ought to be enough for anybody.
Marcus said,
January 24, 2008 @ 6:59
There’s actually a hybrid that I would consider to be a fifth (I’ve also done a lot of thinking over a lot of years about this weakness of RDBMS’). It sounds like it is obvious, but the reality on the ground is I don’t find a lot of people think like this:
For some common scenarios it isn’t a big stretch to just do a little more work to think out the truisms in the domain space. It’s a bit of a chore the first time, but then it’s pretty much the same over and over, and you end up with (1) a pattern that can be easily split across multiple one-to-one tables, if that’s a performance consideration, and (2) a very small number of arbitrarily mapped fields, as in your “Fixed Table” pattern.
The truisms aren’t too hard to figure out, and they are mostly an exercise in careful logic. If you’re talking about a User table, and you know that flexibility in the field set is highly valued, then you just work through the reasonable possibilities. The User table has an unusually high number of truisms to account for. Most other tables have very few. (People are complicated, and we want to dissect data about them in weird ways all the time.)
In the end, with extremely customizable apps, for ASPs, with salespeople that just can’t say no, I’ve never needed more than 8 generic Fixed Table style fields to cover the truly unknowns, and truthfully, using more than 3 or 4 is a “data-smell”.
In an analysis of existing custom clients for an ASP that bends over backwards to touch the floor, only one client exceeded having 5 truly custom fields, and they were all horribly designed life cycle fields. The client was fired by the ASP anyway within the same month of my analysis for apparently different and long-standing “difficult client” issues, but I believe there is a high amount correlation between their many other problems and the fact that their data was hell.
It’s also helpful to consider life cycle event fields as a completely separate issue. Too often life cycle fields are bundled in with properties. It’s serious problem masquerading as a non-issue. Life cycle data is 99 times out of a 100 strictly in the realm of truisms, and if that’s not the case then there’s a very good chance you’re (1) not doing a business application (perhaps scientific?), or (2) need to hire a new head BSA/SA.
Clearly this isn’t for every application and I know people will be disbelieving if they haven’t Just Tried It ™ and will criticize the approach. That’s fine. The Fixed Table approach works pretty well when this one is too big for the application. In my experience with over 15 years in the enterprise space, either Fixed Table or this hybrid are the only two I will select for an RDBMs though.
Clearly the RDBMs vendors don’t do a good job of allowing a multi-paradigm approach to one of the most significant problem in business schema design in existence. Sure they have their proprietary stuff, although there isn’t even much of that. It’s a big hole. LDAP is a useful consideration for some scenarios. And I think the big dream is that Semantic/RDF databases progress fast enough to make up for all this.
I would also encourage critics of this approach, who say that having too many specific columns is a performance problem, that (1) this is a relatively easy performance issue to deal with, and (2) consider the overhead of the alternatives, and (3) why the !$!$! are you putting your CMS data in an RDBMs anyway (taking liberties in my assumptions). Hell! It’s bad enough that you’re putting customer properties in an RDBMs in the first place…but at least it can be reasonably workable. But get documents out of there, in a hurry!
:-)
Marcus said,
January 24, 2008 @ 7:06
I probably don’t need to state the obvious, but when I allude to scientific apps having less “truisms”, it’s actually that they have more, but they are just more abstract. But that’s another acid trip altogether ;-D
Paul Bary said,
January 24, 2008 @ 13:36
One disadvantage you forgot to mention for each of the options except mutating table is that you lose the ability to have different database types. For example, in the meta database FieldValues.value will be a varchar(255), which may not be appropriate for all custom fields, such as date fields.
Arjen Poutsma said,
January 24, 2008 @ 13:51
@Paul: yes, in the simplest version, you tend to use one data type for all custom fields, typically a varchar. However, in the meta database case, you can actually have multiple Value columns in the FieldValues table (i.e. StringValue, IntegerValue, DateTimeValue columns). This makes queries even harder, but it does work.
Jamaal said,
January 30, 2008 @ 19:30
I just ran into this problem while trying to create a database that organized a large number of different applications (in the classical sense) for different services that needed to be easily mutable in order to add or take away fields based on new legal requirements, but in addition be cross application queryable, so I modeled the “meta-database” instinctively (I’m not as afraid of writing brutally huge queries as I probably should be).
All I want to know is what reference material you were using for this blog entry, if it wasn’t just personal experience. I’ve been hammering google for days trying to get more information on this type of thing, and though I’ve already reinvented the wheel, I don’t want to have to reperfect it. I’m a little freaked out by the joins required for the average query and want an idea of the performance hit I’m going to take before I build it.
thank ye…
Arjen Poutsma said,
January 30, 2008 @ 21:02
@Jamaal,
It’s all personal experience, I’m afraid
. I don’t know of any books that cover this topic, though perhaps “Refactoring Databases”, from the Martin Fowler signature series, does. I haven’t read it (yet), though.
Jamaal said,
January 31, 2008 @ 22:54
Thanks Arjen…
I figured out what I made though, and it’s an “Entity-Value-Relationship (EAV)” model, which is aggressively pooh-pooed by most people who mention it on the web. After reading all that’s been said, I still think that it’s justified in my model, as it is only a portion of it (a ‘mixed’ EAV), and for the queries that I’m running, I value extensibility over speed. Coincidentally, what I’m applying EAVs to is exactly what you chose to use as an example. I think everybody who uses EAV modeling successfully is dealing with basically the same thing: many and varying potential fields on many different forms that need to be cross-queried.
For anyone else trying to do something as foolhardy as I, here’s two really useful links:
“Dave’s guide to the EAV”
http://weblogs.sqlteam.com/davidm/articles/12117.aspx
And to measure the degree to which you’ll take a performance hit:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=79043
here’s the seminal paper:
http://ycmi.med.yale.edu/nadkarni/Introduction%20to%20EAV%20systems.htm
and here’s some unbelievably aggressive pooh-pooing that nevertheless is very educational:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11QUESTIONID:10678084117056
Weekly linkdump #111 - cb - блог разработчиков said,
February 1, 2008 @ 9:24
[…] Один из разработчиков Spring Framework делится опытом, Storing Custom Fields in the Database […]
Andrey V. Shtukaturov said,
February 4, 2008 @ 9:51
there are also Flexfields “technology” under Oracle
http://www.oracle.com/technology/tech/blaf/specs/flexfields.html FlexFields