Principles of GNUmed information storage - Reasoning behind DB design

General Concepts

1. Focus on clinical information for GPs

The first and foremost thing to keep in mind is that the whole DB is designed to properly store: GP level care medical data

2. Client (presentation mode) independence

Information is stored independently of how it is going to be displayed.

3. Client-to-database version matching

Each version of the client will connect only to a particular version of the GNUmed database, whose structure (its schema, as reflected by its hash) must be valid.

The above remains true except when the GNUmed client is run in -- debug mode. People should not be using --debug mode in a production environment except when:

4. Data protection

Storage of Particular Types of Data

1. Overview of clinical tables

[not complete! Want to add diagram that also shows relationships like 1:n or n:m for the important clinical tables]

Ancestor table of all clinical information: clin_root_item (is abstract, ie it is never used to store information in, rather as a template to derive child tables from it)

'clin_root_item' table passes inheritance of its fields to the descendant/children tables (listed here without the audit "log_" tables):

2. Freetext & codes/types/categories

The approach is to store freetext (because this is the main type of information for a GP) and enhance that via targeted tables such as the code or type tables (see 3.).

3. Typing and coding of the clinical information

Typing is used to label clinical data to be of a certain, well, type of information regardless of what the content of it actually is. One is then able to query on the type of data. Typing is inherently prone to type-content mismatch.

It is done in two ways:

Coding, on the other hand, is concerned with the content of clinical information. A code is a replacement or corresponding value for a term/group of terms, within the constraints of a coding system. Any narrative field in any table can be "coded" in any number of coding systems. Codes are not directly linked to narrative terms in any given table. They only represent the same content by means of the term associated with them in the coding table being identical to the given narrative. So, getting codes for a term is active, on-demand.

IOW, a type is an attribute of the content while a code is (represents) the content - expressed in the language of the "code".

Remark: SOAP categories
They are viewed as data origin or data certainty types, rather than arbitrary types of the clinical content. IOW, any clinical content can be categorized into the SOAP schema. Don't be fooled by the stock English meaning, rather view the categories as something like levels of certainty or types of information:

4. Inheritance from clin_root_item

There are many clinical tables like vaccination, allergy, lab_request (see 1.). All those are more specialized, enhanced clinical tables going beyond (but often including) free text narrative. Some enhancing tables are still missing, of course, such as medication, etc.

The way to look at the schema for clinical stuff is this (or "Where put what clinical information?"):

Example: Vaccination table (one of the children of clin_root_item)

encounter                   episode (-- optionally --> health issue)
         \                /
           clin_root_item --> type(s)
           --------------
           - soap_cat
           - narrative
               ^
               |
           (inherits)
               |
           vaccination
           -----------
           - additional fields
           - ...
         /                     \
  vaccine                       schedule

All clinical item tables have this structure. You can overlay them at the clin_root_item junction. The top part will always stay the same, the bottom part will always be different (don't get confused, there is a narrative field in the clin_root_item table which is consequently inherited to its children. But there is also a clin_narrative child table which actually then uses that field to store freetext).

Clinical narrative is thereby purposely aggregated, and not scattered across several tables. A key factor for easing full text searches!

PostgreSQL vs other databases

The project built itself around Postgresql on account of Postgresql strengths including its referential integrity, support of inheritance, triggers, and a number of other considerations. While some would argue that software could and should be able to work "with any back end", GNUmed would need to be rewritten to do so and, in the process, would lose both functionality and some of its existing clinical data safeguards.