Query Formulation Tool to Identify Eligible Clinical Research Participants

SarahN.LimChoiKeung,PhD1;LeiZhao,MSc1;JamesRossiter,PhD1;IreOgunsina,MSc1;VasaCurcin,PhD2;RoxanaDanger,PhD2;MarkMcGilchrist,PhD3; JeanFranoisEthier,MD4; ChristianOhmann,PhD5;WolfgangKuchinke,PhD5;AdelTaweel,PhD6;BrendanC.Delaney,BMBCh,MD6;TheodorosN.Arvanitis,PhD1

Identifying and recruiting eligible research participants for studies is an expensive and timeconsuming process. Especially when involving heterogeneous, multinationally distributed electronic data sources, data provenance becomes an issue. To use provenance information, we developed a provenanceaware Query Formulation Tool that researchers can use to collaboratively define protocols; and trace back protocol development and participants identification processes via the provenance service.

Figure 2. Screenshot of the Query Formulation Tool, example Diabetes study protocol definition.

TRANSFoRmProject 5year panEuropean project funded by the European Union to advance information and computing technologies in order to address current market challenges for connecting healthcare and research. Will deliver a digital infrastructure, and develop rigorous and generic methods facilitating reuse of primary care electronic health record (EHR) data to improve both patient safety; and the conduct and volume of clinical research in Europe. Query Formulation and Provenance The Query Formulation Tool enables researchers to collaboratively build study eligibility criteria and identify potential subject counts from selected heterogeneous data sources. Data provenance tracking during query formulation, execution and results display phases is important for reproducibility and validation of the research participant identification process.

Query Formulation Tool requirements:
Heterogeneity of data sources and terminologies Eligibility criteria and study protocol definitions Query submission and display of results Researcher tools for collaboration, data sharing and participant recruitment Privacy and data provenance aspects

Figure 3. Provenance Query Tool showing provenance graph and example provenance data tracked for the Diabetes study eligibility criteria.

Conclusions and Future Work

This work supports the identification and recruitment of research participants for cross European studies using electronic health data. Figure 1. Conceptual diagram showing the interaction of components in TRANSFoRm, especially relevant to the Query Formulation Tool and provenance. Study protocols can be collaboratively defined, versioned and submitted using the Query Formulation Tool. Provenance templates are used by the Query Formulation Tool to track processes that data items have been through. The Provenance Query Tool can be used to reproduce and validate the research participant identification process. Future work includes adding a data extraction feature (from heterogeneous data sources) to the Query Formulation Tool and the need for the Provenance Service to track this process.

Figure 1 shows how the Query Formulation Tool is used to obtain counts of participants with matching criteria and where provenance is recorded.
The clinical research domain is modelled by CRIM and the Query Formulation Tool design is in turn based on CRIM. Researchers collaboratively use the Query Formulation Tool to define study protocols, and use the terminology service for interoperable coding systems. Queries form study protocols are expressed according to CDIM and then submitted for execution against heterogeneous data sources.

The Query Formulation Tool provides a flexible way for researchers to define complex study protocols. Figure 2 shows an example of a Diabetes study. The Provenance Service, using a novel concept of provenance templates, tracks information during the query formulation process, e.g. user, time, evolution of the eligibility criteria definitions. Provenance data can be queried through the Query Provenance Server. Figure 3 shows part of the provenance graph for the Diabetes study focussing on the EligibilityCriteria artifact as well as the recorded values for this artifact.

Provenance is tracked in three areas in the Query Formulation Tool:

1. Eligibility criteria definition protocol versions, researchers contributing 1 2. 2 Concepts selected, including using the Terminology Service 3. From query submission to results aggregation from the selected data sources 3

This project is partially funded by the European Commission under the 7th Framework Programme (Grant Agreement 247787).

