Clarity.CQLExecutionTask¶
Description¶
This is a custom task that allows ClarityNLP to execute CQL (Clinical Quality Language) queries embedded in NLPQL files. ClarityNLP directs CQL code to a running instance of the CQL Engine, which processes the CQL and translates it into requests for a FHIR (Fast Healthcare Interoperability Resources) server. The FHIR server runs the query and retrieves structured data for a single patient. The data returned from the CQL query appears in the results for the job associated with the NLPQL file.
The CQL query requires several FHIR-related parameters, such as the patient ID, the URL of the FHIR server, and several others to be described below. These parameters can either be specified in the NLPQL file itself or supplied by ClarityNLP as a Service.
Documentsets for Unstructured and Structured Data¶
ClarityNLP was originally designed to process unstructured text documents. In a typical workflow the user specifies a documentset in an NLPQL file, along with the tasks and NLPQL expressions needed to process the documents. ClarityNLP issues a Solr query to retrieve the matching documents, which it divides into batches. ClarityNLP launches a separate task per batch to process the documents in parallel. The number of tasks spawned by the Luigi scheduler depends on the number of unstructured documents returned by the Solr query. In general, the results obtained include data from multiple patients.
ClarityNLP can also support single-patient structured CQL queries with a few
simple modifications to the documentset. For CQL queries the documentset must
be specified in the NLPQL file so that it limits the unstructured documents to
those for a single patient only. FHIR is essentially a single-patient
readonly data retrieval standard. Each patient with data stored on a FHIR
server has a unique patient ID. This ID must be used in the documentset
statement and in the Clarity.CQLExecutionTask
body itself, as illustrated
below. The documentset specifies the unstructured data for the patient, and
the CQL query specifies the structured data for the patient.
Relevant FHIR Parameters¶
These parameters are needed to connect to the CQL Engine and the FHIR server,
evaluate the CQL statements, and retrieve the results. They can be provided
directly as parameters in the CQLExecutionTask
statement (see below), or
indirectly via ClarityNLPaaS:
Parameter | Meaning |
---|---|
fhir_version | Either DSTU2 or DSTU3 |
cql_eval_url | URL of the FHIR server’s CQL Execution Service |
patient_id | Unique ID of patient whose data will be accessed |
fhir_data_service_uri | FHIR server base URL |
cql | CQL code surrounded by “”” (triple quotes) |
Time Filtering¶
This task supports a time filtering capability for the CQL query results. Two
optional parameters, time_start
and time_end
, can be used to
specify a time window. Any results whose timestamps lie outside of this
window will be discarded. If the time window parameters are omitted, all
results from the CQL query will be kept.
The time_start
and time_end
parameters must be quoted strings with
syntax as follows:
DATETIME(YYYY, MM, DD, HH, mm, ss)
DATE(YYYY, MM, DD)
EARLIEST()
LATEST()
An optional offset in days can be added or subtracted to these:
LATEST() - 7d
DATE(2010, 7, 15) + 20d
The offset consists of digits followed by a d
character, indicating days.
Both ``time_start`` and ``time_end`` are assumed to be expressed in Universal Coordinated Time (UTC).
Here are some time window examples:
1. Discard any results not occurring in March, 2016:
"time_start":"DATE(2016, 03, 01)",
"time_end":"DATE(2016, 03, 31)"
2. Keep all results within one week of the most recent result:
"time_start":"LATEST() - 7d",
"time_end":"LATEST()"
3. Keep all results within a window of 20 days beginning July 4, 2018, at 3 PM:
"time_start":"DATETIME(2018, 7, 4, 15, 0, 0)",
"time_end":"DATETIME(2018, 7, 4, 15, 0, 0) + 20d"
Note that the strings to the left and right of the colon must be surrounded by quotes.
Example¶
Here is an example of how to use the CQLExecutionTask
directly, without
using ClarityNLPaaS. In the text box below there is a documentset creation
statement followed by an invocation of the CQLExecutionTask
. The
documentset consists of all indexed documents for patient 99999
with a
source
field equal to MYDOCS
. These documents are specified explicitly
in the CQLExecutionTask
invocation that follows, to limit the source
documents to those for patient 99999 only.
The task_index
parameter is used in an interprocess communication scheme
for controlling task execution. ClarityNLP’s Luigi scheduler creates worker
task clones in proportion to the number of unstructured documents in the
documentset. Only a single task from among the clones should actually connect
to the FHIR server, run the CQL query, and retrieve the structured data.
ClarityNLP uses the task_index
parameter to identify the single task
that should execute the CQL query. Any NLPQL file can contain multiple
invocations of Clarity.CQLExecutionTask
. Each of these should have
a task_index
parameter, and they should be numbered sequentially starting
with 0. In other words, each define
statement containing an invocation
of Clarity.CQLExecutionTask
should have a unique value for the zero-based
task_index
. If you limit your CQL use to a single query per NLPQL file,
the value of task_index
should always be set to 0.
The patient_id
parameter identifies the patient whose data will be accessed
by the CQL query. This ID should match that specified in the documentset
creation statement.
The remaining parameters from the table above are set to values appropriate for GA Tech’s FHIR infrastructure. You should change them to match your FHIR installation.
The cql
parameter is a triple-quoted string containing the CQL query. the
triple quotes can be comprised of either single or double quotes.
This CQL code is assumed to be syntactically correct and is passed to the FHIR
server’s CQL evaluation service unaltered. All CQL code should be checked for
syntax errors and other problems prior to its use in an NLPQL file.
This example omits the optional time window parameters.
documentset PatientDocs:
Clarity.createDocumentSet({
"filter_query":"source:MYDOCS AND subject:99999"
});
define WBC:
Clarity.CQLExecutionTask({
documentset: [PatientDocs],
"task_index": 0,
"fhir_version":"DSTU2",
"patient_id":"99999",
"cql_eval_url":"https://gt-apps.hdap.gatech.edu/cql/evaluate",
"fhir_data_service_uri":"https://apps.hdap.gatech.edu/gt-fhir/fhir/",
cql: """
library Retrieve2 version '1.0'
using FHIR version '3.0.0'
include FHIRHelpers version '3.0.0' called FHIRHelpers
codesystem "LOINC": 'http://loinc.org'
define "WBC": Concept {
Code '26464-8' from "LOINC",
Code '804-5' from "LOINC",
Code '6690-2' from "LOINC",
Code '49498-9' from "LOINC"
}
context Patient
define "result":
[Observation: Code in "WBC"]
"""
});
context Patient;
Arguments¶
Name | Type | Required | Notes |
---|---|---|---|
documentset | documentset | Yes | Documents for a SINGLE patient only. |
task_index | int | Yes | Each CQLExecutionTask statement must have a unique value of this index. |
fhir_version | str | No | Either “DSTU2” (default) or “STU3” |
patient_id | str | Yes | CQL query executed on FHIR server for this patient. |
cql_eval_url | str | Yes | See table above. |
fhir_data_service_uri | str | Yes | See table above. |
cql | triple-quoted str | Yes | Properly-formatted CQL query, sent verbatim to FHIR server. |
time_start | str | No | Optional, discard results with timestamp < time_start |
time_end | str | No | Optional, discard results with timestamp > time_end |
Results¶
The specific fields returned by the CQL query are dependent on the type of FHIR resource that contains the data. ClarityNLP can process the FHIR resources in the next table:
FHIR Resource Type |
---|
Patient |
Procedure |
Condition |
Observation |
MedicationOrder |
MedicationRequest |
MedicationStatement |
MedicationAdministration |
ClarityNLP returns a flattened version of the JSON representation of each
resource, the meaning of which is explained
here. Essentially, the key for a
flattened JSON object contains underscores for each nested object boundary
(delimited by the {
character), and a numeric index for each array
boundary (delimited by the [
character).
To illustrate, consider this JSON object:
{
"field1":"value1",
"field2":{"field3":"value3"},
"field4":[{"field5":"value5", "field6":"value6"}],
"field7":[{"field8":[{"field9":"value9", "field10":"value10"}]}]
}
The flattened version is:
{
"field1":"value1",
"field2_field3":"value3",
"field4_0_field5":"value5",
"field4_1_field6":"value6",
"field7_0_field8_0_field9":"value9",
"field7_0_field8_1_field10":"value10"
}
The FHIR resource data structures can be represented as nested JSON objects. The DSTU2 resources can be found here and the DSTU3 resources can be found here.
For a specific FHIR example, consider the DSTU2 general condition example:
{
"resourceType": "Condition",
"id": "example",
"text":
{
"status": "generated",
"div": "<div>Severe burn of left ear (Date: 24-May 2012)</div>"
},
"patient":
{
"reference": "Patient/example"
},
"code":
{
"coding":
[
{
"system": "http://snomed.info/sct",
"code": "39065001",
"display": "Burn of ear"
}
],
"text": "Burnt Ear"
},
"category":
{
"coding":
[
{
"system": "http://hl7.org/fhir/condition-category",
"code": "diagnosis",
"display": "Diagnosis"
},
{
"fhir_comments":
[
" and also a SNOMED CT coding "
],
"system": "http://snomed.info/sct",
"code": "439401001",
"display": "Diagnosis"
}
]
},
"verificationStatus": "confirmed",
"severity":
{
"coding":
[
{
"system": "http://snomed.info/sct",
"code": "24484000",
"display": "Severe"
}
]
},
"onsetDateTime": "2012-05-24",
"bodySite":
[
{
"coding":
[
{
"system": "http://snomed.info/sct",
"code": "49521004",
"display": "Left external ear structure"
}
],
"text": "Left Ear"
}
]
}
The flattened version of this example, with quotes removed for clarity, is:
resourceType: Condition
id: example
text_status: generated
text_div: <div xmlns="http://www.w3.org/1999/xhtml">Severe burn of left ear (Date: 24-May 2012)</div>
clinicalStatus: active
verificationStatus: confirmed
category_0_coding_0_system: http://hl7.org/fhir/condition-category
category_0_coding_0_code: encounter-diagnosis
category_0_coding_0_display: Encounter Diagnosis
category_0_coding_1_system: http://snomed.info/sct
category_0_coding_1_code: 439401001
category_0_coding_1_display: Diagnosis
severity_coding_0_system: http://snomed.info/sct
severity_coding_0_code: 24484000
severity_coding_0_display: Severe
code_coding_0_system: http://snomed.info/sct
code_coding_0_code: 39065001
code_coding_0_display: Burn of ear
code_text: Burnt Ear
bodySite_0_coding_0_system: http://snomed.info/sct
bodySite_0_coding_0_code: 49521004
bodySite_0_coding_0_display: Left external ear structure
bodySite_0_text: Left Ear
subject_reference: Patient/example
onsetDateTime: 2012-05-24 00:00:00
date_time: 2012-05-24 00:00:00
len_code_coding: 1
len_severity_coding: 1
len_bodySite: 1
len_bodySite_0_coding: 1
len_category: 1
len_category_0_coding: 2
value_name: Burn of ear
Note the additional fields at the end, such as date_time
and the fields
prefixed with len_
. ClarityNLP adds the date_time
field to enable
time sorting on the results (see above). The len_
prefixed fields
provide the lengths of all lists in the flattened data. These are convenience
fields, inserted so that consumers of the data will not have to separately
determine the presence and size of the embedded lists.
The exact set of fields returned for the different FHIR resources depends on the nature and complexity of the FHIR server’s data. The documentation for the DSTU2 and DSTU3 resources can be used to interpret the results.
Collector¶
No