Principles of GNUmed information storage - Reasoning behind DB design
- the database schema is here
General Concepts
1. Focus on clinical information for GPs
The first and foremost thing to keep in mind is that the whole DB is
designed to properly store:
GP level care medical data
2. Client (presentation mode) independence
Information is stored independently of how it is going to be displayed.
3. Client-to-database version matching
Each version of the client will connect only to a particular version of the GNUmed database, whose structure (its schema, as reflected by its hash) must be valid.
The above remains true
except when the GNUmed client is run in
-- debug
mode. People should not be using
--debug
mode in a production environment except when:
- they are doing so selectively, perhaps as directed, as part of further investigating a particular bug and/or
- no mismatching versions of the GNUmed database are configured to be available to this client and/or
- the user has a clear knowledge of which among potentially-multiple database versions to connect to
4. Data protection
- always use foreign keys, proper datatypes, check constraints, triggers, and rules to protect the integrity of clinical data
- the database defaults to read only transactions
- if you simply connect to the database all you have is readonly access regardless of what your table permissions are
- if you want to write data you need to set the transaction to read-write:
-
set default_transaction_read_only to off
- this applies as long as the connection is open
- if you access the database via the GNUmed middleware module
gmPG2.py
you must explicitely request a read-write connection if you need one
Storage of Particular Types of Data
- forms to be filled in and printed/faxed/emailed etc
1. Overview of clinical tables
[not complete! Want to add diagram that also shows relationships like 1:n or n:m for the important clinical tables]
Ancestor table of all clinical information:
clin_root_item (is
abstract, ie it is never used to store information in, rather as a template to derive child tables from it)
'clin_root_item' table passes inheritance of its fields to the
descendant/children tables (listed here without the audit "log_" tables):
- allergy
- clin_aux_note
- clin_medication
- clin_narrative
- form_instances
- lab_request
- referral
- test_result
- vaccination
2. Freetext & codes/types/categories
The approach is to store freetext (because this is the main type of information for a GP) and enhance that via targeted tables such as the code or type tables (see 3.).
3. Typing and coding of the clinical information
Typing is used to label clinical data to be of a certain, well,
type of information regardless of what the content of it actually is. One is then able to query on the type of data. Typing is inherently prone to type-content mismatch.
It is done in two ways:
- All clin_root_item descendents inherit the field soap_cat. In this field one of the fixed SOAP categories (S/O/A/P) must be assigned to the row. Some child tables have their soap_cat value fixed by a constraint.
- Any row in a clin_root_item descendent can be "tagged" by arbitrary numbers of types (many-to-many relation between clin_root_item and clin_item_type via lnk_type2item).
Coding, on the other hand, is concerned with the
content of clinical
information. A code is a replacement or corresponding value for a term/group of terms, within the constraints of a coding system. Any narrative field in any table can be "coded" in any number of coding systems. Codes are not directly linked to narrative terms in any given table. They only represent the same content by means of the term associated with them in the coding table being identical to the given narrative. So, getting codes for a term is active, on-demand.
IOW, a type is an attribute of the content while a code is (represents) the content - expressed in the language of the "code".
Remark: SOAP categories
They are viewed as data origin or data certainty types, rather than arbitrary types of the clinical content. IOW, any clinical content can be categorized into the SOAP schema. Don't be fooled by the stock English meaning, rather view the categories as something like levels of certainty or types of information:
- S: this was reported (Subjective)
- O: this was found (Objective)
- A: this was thought of it (Assessment)
- P: this was (intended to be) done (Plan)
4. Inheritance from clin_root_item
There are many clinical tables like vaccination, allergy, lab_request (see 1.). All those are more specialized, enhanced
clinical tables going beyond (but often including) free text narrative. Some enhancing tables are still missing, of course, such as medication, etc.
The way to look at the schema for clinical stuff is this (or
"Where put what clinical information?"):
- anything that is a clinical entity which cannot be split further without losing medical meaning (ie atomic) is put into a child table of clin_root_item
- the clin_root_item table provides basic fields all clinical items shall have, eg auditing, encounter, episode, soap category and a narrative field
- clin_root_item child tables enhance the simple narrative with more clinical fields
Example: Vaccination table (one of the children of clin_root_item)
encounter episode (-- optionally --> health issue)
\ /
clin_root_item --> type(s)
--------------
- soap_cat
- narrative
^
|
(inherits)
|
vaccination
-----------
- additional fields
- ...
/ \
vaccine schedule
All clinical item tables have this structure. You can overlay them at the clin_root_item junction. The top part will always stay the same, the bottom part will always be different (don't get confused, there is a narrative field in the clin_root_item table which is consequently inherited to its children. But there is also a clin_narrative child table which actually then uses that field to store freetext).
Clinical narrative is thereby purposely aggregated, and not scattered across several tables. A key factor for easing full text searches!
PostgreSQL vs other databases
The project built itself around Postgresql on account of Postgresql strengths including its referential integrity, support of inheritance, triggers, and a number of other considerations. While some would argue that software could and should be able to work "with any back end", GNUmed would need to be rewritten to do so and, in the process, would lose both functionality and some of its existing clinical data safeguards.