Thursday, January 13, 2011

A Universal Exchange Language Example

In preparation for the PCAST Workgroup discussion, the Workgroup chairs asked Wes Rishel and I to find examples of the Universal Exchange Language proposed by the report. We asked Sean Nolan from Microsoft for his comments. His guest post is below:

"As requested, I’ve put together the below thoughts and samples from HealthVault and Amalga and offered my perspective on how they reinforce the core ideas in the PCAST recommendation. I hope they are useful and look forward to any follow up that would be useful.

To me, the most compelling aspect of the PCAST recommendation is the idea of data-atomicity … the idea that where appropriate we start encouraging exchange of the most granular data elements possible, rather than aggregates, snapshots and pre-normalized information. And further, that we defer concerns about harmonization and structure as late as possible in the information lifecycle, rather than requiring all data sources to conform to a fixed set of standards. This is something we embody both in our HealthVault and Amalga product lines.

Note that much of this is most immediately applicable in the “secondary use”/research/CER context --- which is where I know PCAST started out … concepts will certainly apply more broadly, but that’s at least how I’ve been thinking about it.

The challenge with the “traditional” approach to secondary use is that by definition we lose information as we pass the data through filters and “clean things up.” We make pre-judgments about which metadata is important and which is not. We lose granularity by aggregating along dimensions we think are the important ones --- and lose the ability to re-aggregate against others in the future. And so on.

Of course, there is a conservation of energy --- in order to compute on data, it has to be transformed at some point. But in the traditional approach, every source system bears the burden of the work and encodes a fixed set of transformations. This is incredibly brittle and lossy. Instead, with a data-atomic approach the work is delayed until the last moment, when the needs are best understood, the technology can be deployed efficiently, and technology for normalization may have advanced since the original data elements were captured.

















The Extreme Case: EAV

This philosophy leads us to develop exchange mechanisms that are supportive of maximal completeness rather than maximal normalization. In its truly simplest form, in some cases in Amalga we reduce storage to an open-ended “Entity-Attribute-Value” structure, where each data element is known as a bucket of arbitrary name/value pairs. This structure supports the “extrusion” of multiple transformed views of the same information. For example, in the extreme:

Entity ID
Attribute
Value
12345
Type
Blood Pressure
12345
Timestamp
10/22/2007
12345
Given Name
John
12345
Family Name
Halamka
12345
Systolic
116
12345
Diastolic
72
12345
Source
Personal Physicians HealthCare
12345
Device Name
Omron 7 Series
12345
Device Model
BP760
12345
Device Serial Number
0123456789
12345
Pulse
66

The idea is to capture as much metadata as possible from the source system and make sure it survives alongside all of the other data with the item. And what is most important is that while there is a natural “grouping” for the item as the otherwise meaningless Entity ID --- any attribute can serve as the means to construct just-in-time entities for different purposes. For example, I may want a view of all of John’s readings, so I use the demographic or other patient ID attributes to create an extrusion pivoted on the patient. Or I may want to understand the penetration of different device models around the country, so I can use the “device model” attribute to create another extrusion for that. In the Amalga implementation, these “extrusions” are often created automatically as physical transformations under the covers in response to dynamic query patterns.

The “envelope” format in this case is almost trivial --- we typically use XML as a convenient format but virtually anything will work.

Softening the Approach: codable values and common data

The EAV approach works really well for source systems (they just sent what they can and forget about it) and it can be super-effective in many cases, primarily intra-institution. However, everything is a balance, and the burden EAV puts on the receiving system can be inordinately high. In order to encourage an easier onramp to interoperability, we have adopted a number of techniques that are evident in the HealthVault data model. For example:

We capture a common set of core metadata for every item, including:
* a codified item “type” such as “blood pressure reading”
* various meaningful timestamps (e.g., “created”, “updated”, “effective”)
* audit information about the entity that submitted or updated the item

In many cases, there is common data that “just is” as part of that data type. For example, a blood pressure is not a blood pressure without “systolic” and “diastolic”. So we create very simple schemas for the 80% case of data elements that just have to be there to make sense.

We provide “slots” for other common structured data that may or may not be available --- for example, “pulse” is often present with a bp reading but not always --- so we have a place to put it if available, but it is not required.

Wherever data can be coded, we create constructs that facilitate the capture of those codes without allowing data to be lost. We talk about this as the “codable value” --- an XML construct that allows an item to be identified both with “display text” and zero or more codes that self-describe their codeset. Note this model follows very closely to constructs originally created as part of the ASTM CCR.

We always provide the capability for other metadata to be associated and “travel with” the item --- to ensure the completeness principle.

Examples of these elements can be seen in the following HealthVault item XML fragments. The first is a complete blood pressure reading imported from John’s CCD and shows common metadata and core type information. The second and third are from my record and demonstrate optional items and codable values respectively.

<thing>
<thing-id version-stamp="f4dd8faa-2ba6-410e-b367-b0b5f96f0aaa">15657ae9-7955-4a1d-9f23-bf56e76b640d</thing-id>
<type-id name="Blood Pressure Measurement">ca3c57f4-f4c1-4e15-be67-0a3caf5414ed</type-id>
<thing-state>Active</thing-state>
<eff-date>2007-10-22T00:00:00Z</eff-date>
<created>
<timestamp>2011-01-13T05:31:08.79Z</timestamp>
<app-id name="Microsoft HealthVault">9ca84d74-1473-471d-940f-2699cb7198df</app-id>
<person-id name="Sean Nolan">11141dc8-eb3c-4923-99aa-0094bd4d0648</person-id>
<access-avenue>Online</access-avenue>
<audit-action>Created</audit-action>
</created>
<updated>
<timestamp>2011-01-13T05:31:08.79Z</timestamp>
<app-id name="Microsoft HealthVault">9ca84d74-1473-471d-940f-2699cb7198df</app-id>
<person-id name="Sean Nolan">11141dc8-eb3c-4923-99aa-0094bd4d0648</person-id>
<access-avenue>Online</access-avenue>
<audit-action>Created</audit-action>
</updated>
<data-xml>
<blood-pressure>
<when>
<date>
<y>2007</y>
<m>10</m>
<d>22</d>
</date>
</when>
<systolic>116</systolic>
<diastolic>72</diastolic>
</blood-pressure>
<common>
Personal Physicians HealthCare
<related-thing>
<thing-id>b61900cf-15dc-4ad8-8a4c-88ccb20d9952</thing-id>
<version-stamp>1c0cbf63-875d-4acb-832f-ac296e7334a7</version-stamp>
<relationship-type>Extracted from CCD</relationship-type>
</related-thing>
</common>
</data-xml>
</thing>

<blood-pressure>
<when>
<date>
<y>2010</y>
<m>8</m>
<d>31</d>
</date>
<time>
<h>21</h>
<m>48</m>
<s>9</s>
</time>
</when>
<systolic>114</systolic>
<diastolic>90</diastolic>
<pulse>97</pulse>
<irregular-heartbeat>false</irregular-heartbeat>
</blood-pressure>

<name>Glucose</name>
<clinical-code>
<text>Glucose</text>
<code>
<value>001032</value>
<family>labcorp</family>
<type>result</type>
<version>20090101</version>
</code>
<code>
<value>2345-7</value>
<family>regenstrief</family>
<type>LOINC</type>
<version>2.26</version>
</code>
</clinical-code>

Each of these techniques is designed to allow us to hold COMPLETE information first, enable flexible representation of STRUCTURE and CODING, and ease the burden on CONSUMERS of the data where possible. That last requirement is the one that will “give” when needed, because if we have all the data --- we can always improve and reinterpret it over time.

Item Provenance

Especially in a consumer-controlled environment, ability to track provenance of data is very important. Internally HealthVault maintains a full audit log of changes to information (see some of the common metadata highlighted in yellow above) --- but as a more permanent provenance mechanism the platform allows digital signatures to be applied to any data atom. The fragment below shows a sample signature block from a real HealthVault item.

There is no reason that this mechanism could not be applied within the PCAST context. As certificates are becoming more prevalent for systems such as Direct and Federal identity initiatives, the ability to trace the integrity of information back to its source will become more and more important.

<signature-info>
<sig-data>
<hv-signature-method>HVSignatureMethod1</hv-signature-method>
<algorithm-tag>rsa-sha1</algorithm-tag>
</sig-data>
<signature xmlns="http://www.w3.org/2000/09/xmldsig#">
<signedinfo>
<canonicalizationmethod algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315">
<signaturemethod algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1">
<reference uri="">
<transforms>
<transform algorithm="http://www.w3.org/TR/1999/REC-xslt-19991116">
<xs:transform version="1.0" xmlns:xs="http://www.w3.org/1999/XSL/Transform">
<xs:template match="thing">
<xs:copy-of select="data-xml">
<xs:value-of select="data-other">
</xs:value-of></xs:copy-of></xs:template>
</xs:transform>
</transform>
</transforms>
<digestmethod algorithm="http://www.w3.org/2000/09/xmldsig#sha1">
<digestvalue>/6KMZjPtWyF4wE2nq0yvrLm4p5c=</digestvalue>
</digestmethod></reference>
</signaturemethod></canonicalizationmethod></signedinfo>
<signaturevalue>z/aYWJKGywqpjb+WNfqORgVJNr72rNb6jT9BRzNNtBMXEZKQtL9NvqQCdkpQbFyO9w61FXgCPqWEIG41dmrZkNFgMBTzEQFniNCUQYZXrap8Rpu7gCEhWucbIbjL0aXuaVXnLxd6Iz0I3JwyTBoRF037Q1DXduamJQQdBc204VI=</signaturevalue>
<keyinfo>
<x509data>
<x509certificate>MIIE5zCCA8+gAwIBAgIQeukUhvXaUhvkc+GQ+lPNxTANBgkqhkiG9w0BAQUFADCB3TELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDlZlcmlTaWduLCBJbmMuMR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3JrMTswOQYDVQQLEzJUZXJtcyBvZiB1c2UgYXQgaHR0cHM6Ly93d3cudmVyaXNpZ24uY29tL3JwYSAoYykwNTEeMBwGA1UECxMVUGVyc29uYSBOb3QgVmFsaWRhdGVkMTcwNQYDVQQDEy5WZXJpU2lnbiBDbGFzcyAxIEluZGl2aWR1YWwgU3Vic2NyaWJlciBDQSAtIEcyMB4XDTA4MDEyNDAwMDAwMFoXDTA5MDEyMDIzNTk1OVowggEeMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNvcnAuIGJ5IFJlZi4sTElBQi5MVEQoYyk5ODEeMBwGA1UECxMVUGVyc29uYSBOb3QgVmFsaWRhdGVkMTQwMgYDVQQLEytEaWdpdGFsIElEIENsYXNzIDEgLSBNaWNyb3NvZnQgRnVsbCBTZXJ2aWNlMR4wHAYDVQQDFBV2YWxpZHRlc3QgY2VydGlmaWNhdGUxJDAiBgkqhkiG9w0BCQEWFWpvbGVhcnlAbWljcm9zb2Z0LmNvbTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEA1DRlI13wp313MIeaqb/VN+QfZgpXujJ99Qfm2pnPYjQnEw/PF5zQSf48A/SQnVaJrVuzyCv+y26N6HI6jGopfPkVseKMzmD+zoTMqzvLB1zs8B+nE9ZvDQoID5ZEZd+5NR8WGlIaj0KGch8SrV4FsnxKtbnd0UONscb8yu6hTNMCAwEAAaOB4jCB3zAJBgNVHRMEAjAAMEQGA1UdIAQ9MDswOQYLYIZIAYb4RQEHFwMwKjAoBggrBgEFBQcCARYcaHR0cHM6Ly93d3cudmVyaXNpZ24uY29tL3JwYTALBgNVHQ8EBAMCBaAwHQYDVR0lBBYwFAYIKwYBBQUHAwQGCCsGAQUFBwMCMBQGCmCGSAGG+EUBBgcEBhYETm9uZTBKBgNVHR8EQzBBMD+gPaA7hjlodHRwOi8vSW5kQzFEaWdpdGFsSUQtY3JsLnZlcmlzaWduLmNvbS9JbmRDMURpZ2l0YWxJRC5jcmwwDQYJKoZIhvcNAQEFBQADggEBAFJPND6M7LXHNtNKUibt9rdOGS4QjVyttyEHVEasb0NCH+48mZCbdZa4KHEoo5wmhwusJwZ5M6B1rqgG7WVlb6v0qBIUYrnxAf7B4YSXQVHHhDV6Firqbv2PYtxHm0rJ+I7blVOhlaftGrcJ8a9gmJyxK/jwjMekrp7jsVI8iYR/XyYNphToe+HIGFzXjOkDWWrFTEXGhOVKi/+qRk5jOHWW9RP6MsoizJxGRt/t4CZn0W0S+CvMH9JE55pJLx3B22ItY5X0WotYVLr+h7wB4BVn7NJAORWSAEzHq1vzXBcNklFuX7G4PNrIkHU7ixlv1zSgQYOZugjmMJhKR+PEovo=</x509certificate>
</x509data>
</keyinfo>
</signature>
</signature-info>

Privacy Intent vs. Anonymization

The PCAST document includes a recommendation to apply “patient privacy wishes” as part of the common metadata that would travel with each data atom. A number of organizations have worked to define “languages” by which this can be represented, and there is no reason to think that information could not easily be transmitted along with any other metadata. Enforcing those wishes in the diverse healthcare IT environment is a very daunting challenge indeed. Digital rights management technology has advanced significantly over the past few years, in particular for situations where attacks are distributed. That is --- a community of thousands may attack DRM on a newly-released movie; many fewer are likely to be targeting any single data atom. Still, this seems like a recommendation that might be considered more directional than immediate.

An alternative --- especially in the case of secondary use --- may be to start with anonymization. If the identifying elements in each data atom are masked and/or skewed before they are submitted to the DEAS environment, there may be an option here that would kickstart the ecosystem without having to solve all of the technical problems at once.

Both Amalga and HealthVault represent privacy and security within their own environments --- that is, there is full control over who sees what information, but once the information is disclosed it is not tracked further. So these comments are speculative only and not the result of our direct experience.

I hope these samples and thoughts are helpful to the committee as it does its work. I would be more than happy to continue to the discussion, answer questions and offer clarification of anything in the above. Note you can also read more about the HealthVault data model and see samples at the following links:

*http://msdn.com/healthvault
*http://developer.healthvault.com/types/types.aspx

I’ve also posted a few different blog entries about Amalga and its internal data structures. This post is a good start."

No comments:

Post a Comment