Accessing HIDE Website

HIDE functions using either the built-in Django web server or can be configued with Apache. It is recommened that HIDE be configured using Apache with WSGI for production purposes. Either way this document assumes that HIDE can be accessed at http://localhost:8000/hide/. After pointing the browser to http://localhost:8000/hide/ the following page should appear:

There are options for Label Documents, Anonymize Documents (still very prototype), Import Documents, Export Documents, List All Documents, Analysis, Evaluate, and Settings (not yet implemented. All settings are still configured in the config xml file).

Import Documents

HIDE needs a set of reports in order to operate. HIDE provides several methods for importing data into the system, including xml, hl7, copy-and-paste, and more. Note: If you need help determining how to loading data into HIDE, please don't hesitate to contact James Gardner. After clicking Import Documents the following page should appear:

More details on the XML format are coming, but in the meantime you can look through the code to see examples for each import format. To quickly upload a single record you may use the Copy and Paste a Single Record option. This allows you to assign a title, tags, and text of the document to be stored in the HIDE database. After clicking Copy and Paste a Single Record the page should look like the following:

After filling out all the fields the record will be added to the database after clicking "Upload Record" (not pictured because it's at the bottom of the page).

Label Documents

The Label Documents page provides a way to view, train, and label documents associated with a particular tag (think delicious). After clicking Label Documents the following page should appear:

After clicking a label in the left hand box the list of documents that have that tag will be displayed like the following:

From this screen you may Label all of the documents using a chosen CRF, Train a CRF to predict the labels as already marked in the set, De-Identify the records according to the labels in the text, and Save all of the documents in feature vector format into the configured CRF directory (this is only necessary for those users who want a sequence labeling representation of the data to test with other machine learning softwares/techniques). Clicking on one of the titles in the table will display the contents of the document. Note that in this image the report has already been labeled and saved.

Scrolling to the bottom of the page and clicking "Clear" will remove all of the associated labels in the document.

Labeling the document is as simple as selecting the text and then clicking on the label you would like to assign to the text.

Selecting "id" labels the selected text as an "id":

Removing a single associated label is as simple as clicking the "remove label" button under the label.

De-Identify Documents

Documents may be de-identified after they are labeled. The de-identification options are specified in the configuration file. Clicking on "De-Identify" on the top bar will display a de-identified version of the document.

Using the Health Information DE-identification (HIDE) Interface

Purpose

Accessing HIDE Website

Import Documents

Label Documents

De-Identify Documents