First, do no Harm: Can Privacy and Advanced Information Technology Coexist?

February 12, 2004

The following is an article I wrote last year for EContent magazine that delves into some of the privacy issues raised by the growth in scope, depth, and power of database technology. While it specifically discusses medical privacy and the HIPAA act, its broader questions apply to many kinds of content and information.

Advances in networking and database technology have brought vast amounts of data together, and as search and querying technology improves, these vast stores of data become increasingly meaningful to even the casual user. In the right hands, such networked data and content can be invaluable--the doctor who needs vital patient records, the security analyst who wants to glean some intelligence from financial records. But for all its potential good use, the same data has great potential for misuse--either inadvertent or intentional. Mishaps have already happened and, while policies are in places (and new ones are soon to be implemented), the risk remains. Perhaps the real question then is to ask whether technology to preserve privacy can advance as quickly as the technology that seems to be putting privacy at risk.

In late December 2002, the U.S. Department of Defense reported that its efforts to computerize the medical records of military personnel were set back when hard drives containing the records of a half-million personnel were stolen. The records included names, social security numbers, and medical claims histories. According to the Associated Press, the Defense Department had seen the new computerized system "as a potential 'data gold mine' for military physicians and other healthcare professionals that will provide quick and easy access to military patient records worldwide."

While this is perhaps the most spectacular recent privacy breach, it is not the only one. According to news accounts, patient record information has been compromised at a major pharmaceutical chain, a health insurance company, and an online retailer of healthcare products, to name a few places. In each of these cases, the compromise has been inadvertent: in one case, information was emailed to the wrong parties and in another case-sensitive information was accidentally posted to a public Web site. But when these accidental disclosures are considered in light of the Defense Department theft and some well-publicized security breaches at ecommerce companies, the concern begins to grow.

Indeed, many would argue that, when it comes to medical records, any compromise is unacceptable and that every reasonable effort should be made to safeguard such data. To that end, the federal government is mandating the enforcement of new patient privacy rules under the Health Insurance Portability and Accountability Act of 1996 (HIPAA). HIPAA is a broad law that called upon Congress to delineate what rights patients have to control their own medical information, and what procedures and mechanisms would be followed for appropriate sharing of that information. The result is a broad set of regulations to be followed by healthcare providers, insurers, and related organizations such as medical researchers--anyone who handles patient information.

PRIVACY IS FUNDAMENTAL

The assumption behind the protection of medical record information is that privacy is a fundamental right. In announcing the HIPAA regulations, the U.S. Department of Health and Human Services recognized that the new regulations would come at significant cost to the healthcare industry, but pointed out, "it is important not to lose sight of the inherent meaning of privacy: it speaks to our individual and collective freedom." While this may seem like lofty language, they cite the same basis many privacy organizations and advocates do--the Fourth Amendment guarantee that "the right of the people to be secure in their persons, houses, papers and effects, against unreasonable searches and seizures, shall not be violated."

To this end, HIPAA and regulations seek to control how patient information is collected, safeguarded, and used over time. The overarching requirement is in some ways obvious; only clinicians with a need to know-and to whom you have granted access-should have access to your medical information. But the actual implementation is complex, as more information is digitized, as more systems are interconnected, and as increasingly powerful tools for querying become available.

But the real tension between privacy and usefulness stems from the basic requirement for automating patient information in the first place--to give clinicians ready access to the information they need to make on-the-spot, critical decisions. "It's a balance between confidentiality and ease of use," notes Dr. John Halamka, who as both a practicing physician and CIO for a Boston-area hospital group has a comprehensive view of the problem. In describing the tools they have developed at CareGroup Health System, Halamka talked about "including knowledge in the workflow" for an application such as order entry. Halamka offered the example of a doctor who is prescribing a hypertension drug for a diabetic, where the doctor would ideally have the patient's latest lab results as well as recent and relevant research about the medication "in the context of taking the action."

Again, while the requirement is in some ways obvious, the implementation is likely complex. To begin with, doctors operate in an information-saturated world. Primary medical research alone is a deluge of information. Halamka points out that if doctors took time out "to read eight research articles a night, they would be 800 years behind after one year." To solve that problem, Halamka's technology team at CareGroup gives clinicians access to databases such as Uptodate.com, where experts in the field read, abstract, and summarize the world's literature.

Moreover, even an individual patient's record may be lengthy and complex, and, depending on the action being taken at a given time, the clinician likely needs selected information rather than every detail about that patient. Halamka notes that the same doctor prescribing a hypertension drug would indeed want recent lab results, but would likely not need to read a summary from a recent psychological visit.

The key, then, is to provide authorized clinicians with precisely the information they need, when they need it--but only the precise information they need, so that privacy is not compromised. In an environment such as CareGroup, which deals with 40 terabytes of patient information, such careful handling requires a team of 16 data analysts who provide the necessary views, reports, and query tools for the clinicians to use. Depending on the nature of the query, some reports would need to be stripped of identifying patient information, for example, and others might need to generalize the results so no specific patient information could be inferred. In addition, Halamka emphasized the tools "need to recognize roles and rights based on clinical needs." A query that is appropriate for one clinician to perform may not be appropriate for another. Halamka noted an emergency room doctor might need ready access to a broad set of patient information. The tools, Halamka continued, "should allow you to do your job while a lso protecting the patient."


IS TODAY'S TECHNOLOGY UP TO THE TASK?

Given the complexity of maintaining patient privacy in an increasingly digital world, it's reasonable to ask if the technology can support the requirement for privacy while also giving clinicians access to the information they need. Practitioners like Halamka would answer in the affirmative--"We do our very best with the tools we have"--but HIPAA compliance comes at a cost. (The Department of Health and Human Services estimates it will cost the industry $17 billion over ten years to implement the HIPAA privacy regulations.)

Some of the cost of HIPAA compliance is the human cost for the data curation work done at places like CareGroup. Other costs come as organizations integrate privacy software with patient record systems. At least one interested party, though, thinks the eventual solution to the patient privacy issue may involve a new approach to database technology itself. Researchers at IBM's Almaden Research Center in San Jose have been developing the technology behind Hippocratic Databases--databases that, according to IBM Fellow Dr. Rakesh Agrawal, support the primary mission of patient care while taking "responsibility for the data that they manage to prevent disclosure of private information."

Agrawal is widely recognized as a leading thinker in the field of datamining--the discovery of useful knowledge previously hidden in massive amounts of raw data--and has been writing about privacy issues for several years. Agrawal's idea of Hippocratic Databases presumes a system where "contracts" are created between databases and users to ensure the privacy and integrity of data. "This contract system is based on 10 principles," notes Agrawal, "including stipulations that the information will be kept accurate and up-to-date, the data is used solely for what it was specifically collected for, and the data is only retained for as long as it is needed."

FIRST, DO NO HARM

"Whatever, in connection with my professional practice or not, in connection with it, I see or hear, in the life of men, which ought not to be spoken of abroad, I will not divulge, as reckoning that all such should be kept secret."

--Hippocratic Oath

Agrawal's interest in privacy and databases stems from his long and serious work in datamining. At various times, datamining has been viewed as problematic because of potential privacy concerns, and the topic has been frequently discussed at conferences where Agrawal was a speaker. Attending a conference in 1995, Agrawal was struck by a question from the audience, "Can technologists change the attitude that we are not responsible for the consequences of technology?" Agrawal admits, "the question stuck with me," and it motivated him to keep thinking about this issue of privacy. In Spring 2002, Agrawal and several colleagues from IBM presented a paper, "Hippocratic Databases," at the 28th Annual VLDB Conference in Hong Kong.

"We saw it as a call to the industry," said Agrawal, and the paper's introduction said, "We suggest that the database community has an opportunity to play a central role in this crucial debate involving the most cherished of human freedoms by re-architecting our database systems to include responsibility for the privacy of data as a fundamental tenet." And while patient record information is the most obvious and important problem, Agrawal is well aware that privacy extends to many other areas--finance immediately comes to mind. "Five years from now," according to Agrawal, "information about animate things in databases will completely dwarf information about inanimate things." Moreover, Agrawal suggests the logic of managing this animate information is very different, and privacy is just one issue that presents technical challenges to today's databases.

The challenges begin with how privacy clashes with some of the fundamental benefits of a traditional database, such as concurrency and recall. Databases are very good at capturing and committing records, and then immediately making these records available in views, query results, and reports. But, as Agrawal suggests, Hippocratic databases likely require more emphasis on "consented sharing" than on concurrency.

There are database technologies in use today that support privacy, but Agrawal would argue that they either don't go far enough or they don't support the kind of use cases that Hippocratic databases require. Medical researchers, for example, rely on statistical databases to provide meaningful answers to statistical questions (average, maximum, minimum, etc.) without compromising sensitive information about individuals. Statistical databases use techniques such as restricting types of queries and "data perturbation"--where noise is added or selected values are swapped. While Hippocratic databases would benefit from some of these statistical techniques, Agrawal and his colleagues point out that Hippocratic databases will need to support a much broader set of queries and usage.

Security and encryption technologies are also increasingly in use with databases. Agrawal notes that databases can apply multiple levels of security to database items--e.g., top secret, secret, confidential, and so forth. To date, though, these techniques have been implemented in ways that can make query results uneven or inaccurate--a "top secret" query could leave "confidential" records unreported, for example. "Many of our architectural ideas about Hippocratic databases have been inspired by this [security] work," wrote Agrawal and his colleagues.

THE HIPPOCRATIC DATABASE

IBM's model for privacy-savvy databases may well have been inspired by the Hippocratic oath, but the principles of how to handle private information are broadly understood and articulated. Regulations in the United States and elsewhere in the world are largely based on the idea of "Fair Information Practices" These practices stem from the set of principles established in 1980 by the Organization for Economic Co-operation and Development (OECD). While the OECD delineated eight principles (which many countries have used to develop legal guidelines for the collection and use of personal information), IBM's researchers cite ten, which cover how the data shall be used, disclosed, retained, and safeguarded.

Along with these principles, Agrawal and his colleagues offer a strawman design and a set of use cases for how Hippocratic databases could be tested. The response has been enthusiastic according to Agrawal, and has bolstered his conviction that, "We can build the datamining models while still preserving the privacy of individuals." For Agrawal, it's a case of "the promise of the technology versus the risk, and the technical community can help reduce the risk."

Posted by Bill Trippe at February 12, 2004 7:57 PM

support this blog