Tuesday, May 7, 2013

Tracking Families in a Database

Developers out there: What is the best way to track families in a database? Trivial question, right?

You'd better get this right! Your application may end up being used by your company's (or client's) employees with a vip at the end of a phone call, and it's important for your company's employee to quickly add/update/change records to reflect changes to their family (you wouldn't want to send out a happy anniversary card if the donor is divorced, or send one to his neighbor, but miss hers)!

Oh, it is important to get right; but it's a trivial problem... so... think about it for a moment before you read on.

Here's one common approach:

Tables (columns:
people (name, date of birth, etc)
relationships (RelationshipType, personid, personid)
relationship_types (Father, Mother, Cousin, Uncle, Twice Married Half sister's mother cousin, etc)

That sounds fun. Some individuals would end up with dozens of connections; often connections are duplicated because it's hard to check for duplicates in such a scenario, and your business rules better be perfect otherwise your data is going to get corrupt. Not an instant nightmare, but a nightmare that'll get you two years after your application is in production and you suddenly need detailed reports requiring that family relationship info.

So say we have a family Mother: Sue; Dad: Bob; and child: Tofu. Tofu has 2 relationship entries linking him to Bob; Bob has 2 relationship entries linking him as husband to Sue and father to Tofu, Sue has 2 relationship entries linking her to Tofu as Mother and Bob as Wife. 6 relationship entries for 3 people. Add in more children, grandparents, 2nd marriages, and suddenly you have a pretty precarious situation. Almost certainly to get a full picture of a person's relationships you will find out your database knows about relationships that your app doesn't, because those records weren't added.

Is there a better way? Sure. Track events in people's lives instead.

people (First Name, Last Name, ...)
life_events* (EventType, Date)
people_life_events (Assocation Table: Event, Person, Title)
person_life_event_title (Lookup table: name; example: Father, Mother, Child, Husband, Wife, etc)
life_event_type (Lookup table: name; examples: Marriage, Birth, Divorce, etc)

*side note: life_events seems the best term to me; although I prefer one word table names. There are some options out there, but they aren't great. milestone, event, occasion, happening, or epochal.

Example: Dad: Bob; Mom: Sue; Son: Tofu.
3 Person records (Bob, Sue, Tofu)
2 Life Events (connecting event_type(marriage) and event_type(birth) with their persons)
2 associations in life_event_people for the Marriage, 3 for the birth
3 person_life_event_title records (husband, wife, child)
2 life_event_type records (marriage, birth)

Here's a Google Doc to help you visualize the data:

[or you can view the actual Google Doc here]

Now, there is no duplicate entry; and your application will know even complicated relationships correctly every time.

You might get to the point one day where you decide person_life_event_title can be converted to an enum. You certainly don't have to use it for generating reports; just use the string 'Father' etc. This is true for the event_types also.

Hint: Spend time on your models. It's the foundation on which your entire application will stand. Don't worry about how you're going to work with the data, worry about making a solid foundation. Your database shouldn't have to change much, if at all, besides adding new tables and new associations if you build it correctly.

Now Discuss: