safe harbor de-identification of health data
the journal of Michael Werneburg
twenty-seven years and one million words
The health industry works with a standard called the "Safe Harbor" for de-identifying personal information. It's supposed to reduce the number of unique records to 0.04% of the population, meaning only about 1 in 2,500 people can be uniquely identified with the data once it's been restricted/altered. It's part of HIPAA:
The Safe Harbor method for de-identification is definedas follows:
(2)(i) The following identifiers of the individual or of
relatives, employers, or household members of the individual,
are removed:
(A) Names
(B) All geographic subdivisions smaller than a state,
including street address, city, county, precinct, ZIP code,
and their equivalent geocodes, except for the initial three
digits of the ZIP code if, according to the current publicly
available data from the Bureau of the Census:
(1) The geographic unit formed by combining all ZIP codes
with the same three initial digits contains more than 20,000
people; and
(2) The initial three digits of a ZIP code for all such
geographic units containing 20,000 or fewer people is changed
to 000.
(C) All elements of dates (except year) for dates that are
directly related to an individual, including birth date,
admission date, discharge date, death date, and all ages over
89 and all elements of dates (including year) indicative of
such age, except that such ages and elements may be
aggregated into a single category of age 90 or older.
(D) Telephone numbers
(L) Vehicle identifiers and serial numbers, including license
plate numbers
(E) Fax numbers
(M) Device identifiers and serial numbers
(F) Email addresses
(N) Web Universal Resource Locators (URLs)
(G) Social security numbers
(O) Internet Protocol (IP) addresses
(H) Medical record numbers
(P) Biometric identifiers, including finger and voice prints
(I) Health plan beneficiary numbers
(Q) Full-face photographs and any comparable images
(J) Account numbers
(R) Any other unique identifying number, characteristic, or
code, except as permitted by paragraph (c) of this section;
and
(K) Certificate/license numbers
(ii) The covered entity does not have actual knowledge that
the information could be used alone or in combination with
other information to identify an individual who is a subject
of the information.
I find it odd that the financial industry doesn't push something similar to this, which has been used in the health sphere for years. Or if the finance field has done so, how I could have operated in that area so long without finding similar guidance. Nothing like this is in common practice, no matter the existence of such a standard: I've seen banks throw any and all of these fields at third parties with the slightest provocation. I think they need to learn from the health industry.