openEHR data discoverability

Posted by Pablo Pazos Gutierrez on May 16, 2023, 3:02 am

When you have an openEHR Clinical Data Repository, like EHRServer or Atomik, you can create stored data queries. So it's a little different from creating SQL queries on a relational database, where those queries are not stored but hardcoded in code written in some programming language. Software developers know that each time they need to access some data from a database, they need to write some kind of query, which is time consuming, difficult to debug, those queries might not be reusable by other components, and it requires testing the whole thing since those queries are mixed with the application code.

Having stored queries in an openEHR repository is different than the classic approach. First, those queries can be created and stored outside your code, second, queries can be reused by many different components in your architecture. The tradeoff of this approach would be performance, since executing queries from code would be faster than executing queries from an API. Though, there are always ways to handle this, for instance by usiing preemptive caches (chacing data that you know will be requested/used later).

In openEHR implementation, you will find that each data query you create and expose through the API, is like a new data service you provide to your users, and you need to consider you will have thousands of those queries. At that point, query discoverability is crucial, that means: finding existing queries to use/reuse them. Since queries are stored, those are actually managed in a repository, so queries themselves are represented by a data structure with associated metadata, including versioning and authoring information.

openEHR query discoverability is like querying for queries, and there are many ways of doing that: like adding meaningful name, description and tags for each query, add full-text search over name/descriptions/tags, filtering by tags, adding synonyms for certain words used in the name and description, or a more modern approach to searching like using generative AI, for instance, search based on ChatGPT.

Another interesting thing about using generative AI is that it can be used to explain what a query does, so it can be used for generating rich descriptions for queries that can be used for searching and selecting (discovering!) queries that will be used by a certain component to implement certain features. For instance, a small test we did was providing part of an openEHR data query created in Atomik's Query Builder to ChatGPT, it described the query and provided context information like a champ.


Conslusion: openEHR query discoverability allows data discoverability, which enables secondary uses of clinical information and opens the door for innovation. Basically you can ask your EHR what can it provide, you pick, and it will give you the data already filtered and standardized.