Data protection

Well of course you’d look after your data wouldn’t you? You don’t want anyone hurting it after all the trouble you went to to collect it.

Well, that’s sort of what the term means but of course the meaning that has evolved probably over the last 30+ years, is about your responsibility to protect anyone else’s “personal data” that may be embedded in your datasets.

Details #

Actually, it does include that direct interpretation of the meaning: if you aren’t protecting your data you can’t be sure you’re looking after any personal data of anyone within your datasets. That’s increasingly non-trivial as pretty much all our computers are, temporarily or permanently connected to the internet and as malicious people make money and spend huge amounts of time working out how to get into our machines and to our data.

So this is partly about all the things we have to do to protect our data: encrypted drives, antivirals and firewalls, on and offline, local and distal backups (very carefully encrypted themselves). Coming back to protecting personal data of all participants this is about more than just that. It’s about doing everything sensible you can to ensure that it’s not possible to identify those participants just from the data you have stored so it’s about removing any potentially identifying variables and pseudonymising ID codes and making sure that lookup tables that you might have to keep to get back to individual’s identity is stored securely, on paper in locked storage very separate from all your computers if possible. It’s also about trying to make sure that identities cannot be “re-identified” by “jigsaw attacks” and the like. (Roughly the situation in which combinations of values on some variables, usually demographic, service or “clinical” variables might, put together, uniquely identify someone.

One problem in this area now is that organisations have become preoccupied with the legalities such as the GDPR (General Data Protection Regulations, still, for now, legally binding in the UK as well as the EU). These are well intentioned and generally far better than doing nothing but it’s almost impossible to code law, or organisational procedures that really safeguard personal data and my own take is that there’s a growing problem that the focus on having procedures and things like registration with the Office of the Information Commissioner are what protection is about when it’s far more about everyone in organisations recognising the challenges and their personal contributions to security. I see a problem but I have no solution sadly.

Equally I see some very naïve assumptions that open data can be both open and safeguarded against re-identification. Another issue that is real as the principles of open data are excellent, but the realities of true protection against re-identification aren’t easy.