By Yvonne Tevis. Patient data – it’s a gold mine for health research. Without patient data, researchers can’t analyze conditions, explore treatment efficacy, consider new models of personalized care…
And in today’s fast-paced and competitive world of health research, researchers and institutions have to jump quickly and explore emerging hypotheses. They would love to have the wealth of a full clinical record at their fingertips. But how can they get that data – and still protect the privacy of patients?
Rick Larsen, UCSF director of research informatics, said privacy is both paramount and a barrier. At UC, researchers must seek approval from the campus’s Institutional Review Board (IRB) to use patient data. While important, that review can delay the research and greatly limit access to the data. He said “the general rule is that they get the minimum data necessary to do their research.”
A cross-departmental team at UCSF wanted to give researchers access to more clinical data and faster, while also protecting patient privacy. And so the UCSF De-Identified Clinical Data Warehouse (CDW) was born. With support and participation from UCSF’s Institute for Computational Health Sciences and Clinical and Translational Science Institute, IT department, and privacy office, the team has developed a unique process to de-identify the entire patient record and make that full record – not just limited elements – available to research.
The premise is if the complete clinical record can be provided without being linked to patient names and other demographic information, researchers can use it without IRB review. Team member David Dobbs, UCSF executive director of data analytics, said the goal was “protecting patient privacy while still enabling research.”
The process to de-identify the patient record is more complicated than just removing the patient’s name and address. For example, identifiable data might pop up in unexpected places, such as laboratory test codes. To solve this, the team developed probes to search for and find all patient demographic information so it can be scrubbed from the data going to the De-identified CDW.
They take other steps to protect privacy as well, for example changing the dates of patient care while still providing the actual care time span, which is important in research. Dobbs said, “We uniformly shift each patient’s dates so the sequencing of care remains the same.”
The team is actually creating two data sets: The first is the de-identified version for self-service access. The second is a limited dataset that contains the real, unshifted dates of care, which are essential for certain research topics, such as seasonal allergies. Because this latter dataset has very limited patient identification fields, its use can more easily get IRB approval.
So is the CDW really private? The team is working with three external companies approved by UCSF’s privacy office to perform audits of both the de-identified data itself and the process the team uses. “The goal,” Dobbs said, “is not to have a one-time data set but to refresh every month, and so we want to not to have to be re-audited.” As long as the parameters of what is included in the data warehouse doesn’t change, he said, the auditors’ privacy certification will stay in effect.
The team’s vision is novel in the world of academic health research. Other sites that provide data via a self-service tool provide only counts, such as the number of people with diabetes during a certain time period. Because the De-Identified CDW provides the source data, researchers can delve into much deeper questions.
Now as the team wraps up the certification process, they’re looking to expand the information they include, such as images and notes. “There’s a big vision here,” Larsen said. “If you look at that whole vision, it is well beyond what anybody else is doing.”