Accessing HIDE Website
HIDE functions using either the built-in Django web server or can
be configued with Apache. It is recommened that HIDE be configured
using Apache with WSGI for production purposes. Either way this
document assumes that HIDE can be accessed at
http://localhost:8000/hide/
. After pointing the
browser to
http://localhost:8000/hide/
the following
page should appear:
There are options for Label Documents, Anonymize Documents (still
very prototype), Import Documents, Export Documents, List All
Documents, Analysis, Evaluate, and Settings (not yet implemented.
All settings are still configured in the config xml file).
Import Documents
HIDE needs a set of reports in order to operate. HIDE provides
several methods for importing data into the system, including xml,
hl7, copy-and-paste, and more. Note: If you need help determining
how to loading data into HIDE, please don't hesitate to contact
James Gardner. After
clicking Import Documents the following page should appear:
More details on the XML format are coming, but in the meantime you
can look through the code to see examples for each import format.
To quickly upload a single record you may use the Copy and Paste a
Single Record option. This allows you to assign a title, tags, and
text of the document to be stored in the HIDE database. After
clicking Copy and Paste a Single Record the page should look like
the following:
After filling out all the fields the record will be added to the
database after clicking "Upload Record" (not pictured because it's
at the bottom of the page).
Label Documents
The Label Documents page provides a way to view, train, and label
documents associated with a particular tag (think delicious). After
clicking Label Documents the following page should appear:
After clicking a label in the left hand box the list of documents
that have that tag will be displayed like the following:
From this screen you may Label all of the documents using a chosen
CRF, Train a CRF to predict the labels as already marked in the
set, De-Identify the records according to the labels in the text,
and Save all of the documents in feature vector format into the
configured CRF directory (this is only necessary for those users
who want a sequence labeling representation of the data to test
with other machine learning softwares/techniques). Clicking on one
of the titles in the table will display the contents of the
document. Note that in this image the report has already been
labeled and saved.
Scrolling to the bottom of the page and clicking "Clear" will
remove all of the associated labels in the document.
Labeling the document is as simple as selecting the text and then
clicking on the label you would like to assign to the text.
Selecting "id" labels the selected text as an "id":
Removing a single associated label is as simple as clicking the
"remove label" button under the label.
De-Identify Documents
Documents may be de-identified after they are labeled. The
de-identification options are specified in the configuration file.
Clicking on "De-Identify" on the top bar will display a
de-identified version of the document.