journal features
movie reviews
photo of the day

safe harbor de-identification of health data

the journal of Michael Werneburg

twenty-seven years and one million words

Toronto, 2017.05.24

The health industry works with a standard called the "Safe Harbor" for de-identifying personal information. It's supposed to reduce the number of unique records to 0.04% of the population, meaning only about 1 in 2,500 people can be uniquely identified with the data once it's been restricted/altered. It's part of HIPAA:

The Safe Harbor method for de-identification is defined

as follows:

(2)(i) The following identifiers of the individual or of

relatives, employers, or household members of the individual,

are removed:

(A) Names

(B) All geographic subdivisions smaller than a state,

including street address, city, county, precinct, ZIP code,

and their equivalent geocodes, except for the initial three

digits of the ZIP code if, according to the current publicly

available data from the Bureau of the Census:

(1) The geographic unit formed by combining all ZIP codes

with the same three initial digits contains more than 20,000

people; and

(2) The initial three digits of a ZIP code for all such

geographic units containing 20,000 or fewer people is changed

to 000.

(C) All elements of dates (except year) for dates that are

directly related to an individual, including birth date,

admission date, discharge date, death date, and all ages over

89 and all elements of dates (including year) indicative of

such age, except that such ages and elements may be

aggregated into a single category of age 90 or older.

(D) Telephone numbers

(L) Vehicle identifiers and serial numbers, including license

plate numbers

(E) Fax numbers

(M) Device identifiers and serial numbers

(F) Email addresses

(N) Web Universal Resource Locators (URLs)

(G) Social security numbers

(O) Internet Protocol (IP) addresses

(H) Medical record numbers

(P) Biometric identifiers, including finger and voice prints

(I) Health plan beneficiary numbers

(Q) Full-face photographs and any comparable images

(J) Account numbers

(R) Any other unique identifying number, characteristic, or

code, except as permitted by paragraph (c) of this section;

and

(K) Certificate/license numbers

(ii) The covered entity does not have actual knowledge that

the information could be used alone or in combination with

other information to identify an individual who is a subject

of the information.

I find it odd that the financial industry doesn't push something similar to this, which has been used in the health sphere for years. Or if the finance field has done so, how I could have operated in that area so long without finding similar guidance. Nothing like this is in common practice, no matter the existence of such a standard: I've seen banks throw any and all of these fields at third parties with the slightest provocation. I think they need to learn from the health industry.

rand()m quote

There are no secrets to success. It is the result of preparation, hard work, and learning from failure.

—Colin Powell