By guest blogger Breanna Beers
There’s a saying in the technology industry: “If you’re not paying for the product, you are the product.” With the rise of big data, companies like Google and Amazon know more about you than some of your friends do. Of course, it’s all “anonymized” — stored in giant chunks, detached from individual identifiers. But it doesn’t take much for anyone with a computer and a couple of dollars to put the pieces of the puzzle back together, as Lara Sweeney demonstrated back in the 1990’s:
By comparing Massachusetts state employees’ “anonymized” health records with the voter rolls of the city of Cambridge — which cost Sweeney $20 to access, and contained the name, birth date, sex, address, and ZIP code of every voter — Sweeney was able to send the health records of the Massachusetts governor directly to his office. She went on to prove that nearly 90% of all Americans could be uniquely identified using only three pieces of information: sex, birth date, and ZIP code.
Anonymization has only improved slightly since then, and our ability to get around it has certainly kept pace. As the amount — and sensitivity — of data being collected increases, privacy becomes an ever more nebulous term. Last spring, the Golden State Killer, Joseph DeAngelo, was caught decades after the trail was thought to have gone cold, thanks to genetic data from genealogy website GEDmatch. Though DeAngelo himself never uploaded his own information, the data from relatives who used the site was enough to identify and ultimately convict him.
While no one is complaining that DeAngelo was captured using this data, his case raises serious questions about genetic privacy. The NIH hosts a number of databases of anonymized genetic information for research purposes, but a 2013 study demonstrated that participants could be re-identified by cross-referencing these databases with genealogical data and/or public records. And as the DeAngelo case reveals, even if you choose to keep your DNA private, your second and third cousins may have already made that decision irrelevant.
While the increasing availability of genetic information is probably not cause to panic — it’s not Gattaca yet — the technical realities need to face the ethical ramifications. Some lawmakers have already tried to do this: the 2008 Genetic Information Nondiscrimination Act provides a level of genetic privacy, prohibiting insurers and employers from requesting or discriminating based upon genetic information. Other regulations, such as HIPAA, guarantee clinical privacy for patients, including genetic privacy.
But questions remain, from the ethical to the economic: how much control do individuals have — or deserve to have — over their personal information? When does genetic information cross over into more generic medical information? Does a fallen strand of hair still belong to its owner, or is the information it contains free for anyone’s use? When does law enforcement need a search warrant to utilize genetic data?
These are difficult questions, but the answers are well worth our careful consideration. For now, two things are certain: current laws are inadequate, and we will be having some interesting ethical conversations in the near future.