Big mess or big data?

Posted on: 08.11.2016 Tags:

Big Data


Is there anyone out there who has not heard about big data? Because it seems that it is increasingly becoming a key part of every aspect of our lives, including healthcare business, of course.

But is healthcare ready for the use of big data? There are some aspects to think about it. Firstly, we should know that we are treating patients with data that is stemming from very different sources. For example, big data sources are 20% structured data and 80% unstructured data, meaning by this an added difficulty to manage it.

So, let´s talk about the four “V’s” which could define big data:

·         Volume, because we are treating a huge amount of data

·         Velocity, because we are treating fast-generated data

·         Variety, because we are treating a wide variety of data

·         Veracity, because we need an appropriate level of trust about data

But what about the quality? When we are making decisions based on big data technology, we must assure the quality of data containing them. How could we assure the quality of data? Can we imagine putting control at data acquisition level?

Let´s think about data generated from wearables. Should we consider it as a reliable data source? Or is this unstructured information that is often taken out of context really just a big mess rather than big data.

And we have a huge amount of information. Do we have any idea of the data that is most important? Perhaps we should use a data mining algorithms to reveal patterns that we can use and re-use later?

In Spain we say “Los árboles no nos dejan ver el bosque” (The trees does not let us see the forest), so we could adapt to “The data does not let us see the knowledge”. Because we need knowledge, not raw data, to make decisions. We need an upper level beyond the big data concept, in order to improve integrity and quality of it.

These are some reasons why I am saying “Big data, and I want to remark, as the big data that we know today, is underused and misused”, period.

Data VS Knowledge. Welcome to the ontologies world.

In the twentieth one century, we shouldn’t talk about data, we should talk about knowledge. We should model the data in a way to represent the knowledge. How should we do it?

Let me introduce the ontology concept.

Ontologies are a semantic way to model domains of knowledge, establishing relationships between the different entities (components) as well as establishing taxonomies. Ontologies are the base of cognitive computing, far beyond from HADOOP or big data concepts. The data is structured in an n-dimensional network where each piece of data belongs to n-different attributes and their classes. Moreover, ontologies are also a good way to prove integrity between different statements of data, since you can apply integrity rules.

When we talk about rules, we can use simple examples such as, “this person is a girl, so this person is not a man”, in this way we avoid health records like “80 years old man, with nine pregnancies and six born alive” (extracted from a real medical record). 

Let me show another different example of statement: “a bottle of white Australian Chardonnay wine goes well with fish”.

This statement reveals some attributes of data:

·         Made from grape (Chardonnay)

·         Has a country origin (Australia)

·         Has a color (white)

·         Has a container (bottle)

·         Has a flavor (moderate)

From the wine, probably we can even get more information about (for example, brand, sugar and so on) and we can relate it with other classes such as food. As we can see, the ontologies can define all the attributes within a particular domain.

Ontologies and healthcare

Why are not we talking about this in healthcare and diseases? Consider pneumonia disease, for example.

The ontology could represent it (epidemiology, treatment, symptoms and so on), like this:

Source: Florida Institute for Human & Machine cognition (IHMC)

But, what if we represent this disease in one patient?

One of the best examples on how to map a disease in specific patient using ontologies you’ll find in “Infectious News” ( September 13th, 2016, Dr. Meghan May wrote about the health issues affected Mrs. Hillary Clinton in September 11th. She supposed that Mrs. Clinton had pneumonia. She developed a written (suspected) Mrs. Clinton’s medical record (you could read it following the above link) and modelled it as ontology.

Source: Infectious News. Dr. Meghan May “After Careful Review, I Suspect Hillary Clinton Actually Has... Pneumonia.” 09/13/2016


Final words.

As you can see in the examples I provide before, ontologies are a useful way to deal with knowledge. Remember this: if you want to be compliant with norms like EN/ISO13606, or you want to connect with cognitive computing systems like IBM’s Watson, you don’t have any other choice than to use structured and modeled data.

You must work with ontologies.  A simple way to organise and assure the quality of data collected.

That’s the future: Shall we be ready unlocking the incredible power of healthcare ontologies?

This is one of topics to be discussed at the HIMSS Europe World of Health IT (WoHIT) Conference & Exhibition which will be taking place on 21–22 November 2016 in Barcelona.

I want to acknowledge the invaluable contribution to this post of my colleague Inma Roig (

Rafael Pardo Espino

Senior eHealth Consultant and R&D Manager, Spain