HIDE™: Health Information DE-identification

Project Overview
Health informatics is receiving a tremendous amount of attention nationally and locally, as a strategic area of technological development in the 21st century. Recent provisions of standardization of health care transactions will make it faster and easier to share health information. However, such data sharing has been stymied by restrictions of the privacy, security and quality of the data.

The HIDE project aims to develop a configurable and integrated Health Information DE-identification (HIDE™) framework for publishing and sharing health data while preserving data privacy. Current research thrusts include:

  • 1. De-identification or anonymization of unstructured (text) and structured medical records using HIPAA safe harbor methods as well as statistical de-identification methods.
  • 2. Release of statistical health information as data cubes with differential privacy.
  • 3. De-identification and statistical data release for distributed data while preserving privacy for data subjects as well as confidentiality for data providers.
  • The outcome of the project includes a suite of algorithms and techniques as well as a set of open source software tools that will allow medical information service providers as well as computer science researchers to manage and share privacy constrained data more effectively and efficiently. While the project is focused on the health domain, the algorithms and techniques are widely applicable in various application domains.


    Acknowledgement
    This research is supported partially by a Career Enhancement Fellowship by Woodrow Wilson Foundation, IBM faculty innovation award, and University Research Committee, ITSC, and CCI at Emory. Any opinions, findings, and conclusions or recommendations expressed in the project material are those of the authors and do not necessarily reflect the views of the sponsors.