TL;DR GDPR is almost here, Internet is already full of various analysis, digests, summaries, interpretations - but 99% of them is made by (wannabe) lawyers for people who understand "lawyerish". However what counts in the end is the lower-level perspective: how does GDPR impact the design of systems (you do think about it already, don't you?) - is hashing human-readable identifiers enough? what about aggregated state? does eventsourcing make any sense? will block-chain-based solutions currently become illegal?
GDPR go-live is just days ahead - I'm not going to succumb into the overall panic cry, but I can't help noting some observations I find amusing:
- the number of weasel-consultants that "specialize" in "GDPR compliance" is growing exponentially :) what means that there are (still) piles of gold almost literally laying in the streets :D
- overall awareness among the developers is still very low - some think that this is only Facebook's/Google's issue, others believe that it's lawyers/managers problem, many claim that GDPR will remain a dead, ineffective law that cannot be enforced in practice (due to massive scale, ephemeral nature of software and inability of government) ...
I'm not going to comment on the 1st point, but the 2nd one is a bit worrisome (as GDPR CAN'T be properly introducing without engineers input), so shedding some light on it may be useful. Don't worry, I'm not going to get through whole GDPR - for several reasons:
- it'd be boring
- there's already plenty of general analysis on the web
- more detailed analysis has to be very context-specific -> software applications are very different & there's no single solution to fit all of them
I strongly believe every software crafts(wo)man should get PERSONALLY familiar with GDPR, so (s)he can build her/himself full understanding what does it mean in context of her/his work.
OK, so what I'm going to cover within this particular post then?
- I'm going to start with the list of key (IMHO) remarks you should keep in mind - some of them may appear inconspicuous & harmless, but they make a huge difference after all
- And I'll conclude with list of top questions I can imagine relevant from highly aware software crafts(wo)men perspective - list of answered questions, of course :)
OK, let's get it rollin'
Disclaimer 1: I'm not a lawyer, don't treat any of these remarks as a legal advice or any kind of legal commitment of mine. These are the interpretations I've built base on my knowledge, cases I do confront on daily basis & my understanding of GDPR - I do not take any responsibility for how you'll use these remarks.
Disclaimer 2: focus of this blog post is on how GDPR impacts architecture & functional design, let's put security aspects aside (not that security ain't an architectural concern, lol) for now.
Disclaimer 3: I really hope that the titular reference remains understandable for all the readers :)
GDPR is not about where does your business (software service operator) operate in, it's not about where your client's (corporate) headquarters is in, it's all about where the particular individuals (protected by GDPR) reside - if personal information of individuals from EU is in game, GDPR DOES apply.
Law is not retroactive, but it doesn't mean that GDPR applies only for data accumulated after GDRP goes live - it DOES apply for all the (previously collected) data that remains at companies' disposal, that's why they had several months to adapt themselves to new regulations
Consent as a gating (entry) step is not a solution ("we collect all the necessary consent at the entry, then we proceed as before w/o any change"), as consent can be withdrawn (at any point) & you need to take that into account.
GDPR is NOT only about data (that can be) used to identify particular individual, but also about all the data that can be traced out (via "tokens" - according to GDPR terminology) of such data (so yes - "transactional" data applies as well).
Unequivocal identification according to GDPR does not have to happen via unique personal information (like SSN, passport number, tax payer id, etc.) - ANY information (e.g. behavioral history!) that could potentially be used to unambiguously determine someone's identity does qualify - that's why basic pseudonymisation (according to GDPR terminology) may not be sufficient.
Data protection range of GDPR is not only about "provided" (e.g. manually input) data, but also "observed" data (e.g. all kind of behavioral events, not explicitly typed-in by the user).
(this one is frequently omitted) as a software creator, you're obliged to maintain records of (GDPR-protected data) processing activities - in other words: an audit log of processing personal information (in case of audit or any other clarifying procedures); the detail level is not specified clearly, one has to apply some common sense here
So-called "right of access" ain't only about providing access to personal data itself, but also to: how it was acquired, what's the purpose of processing, what's the retention period, etc.
There's no single trick, contract clause, super-consent form that will let you (as software provider) to collect any data you want & process them in any data you find appropriate - no, it just doesn't work that way (anymore). Live with that.
Q1: What about so-called "accumulated state"? E.g. system persists state which is "built" by accumulating transactions performed for different users (balance as sum of individual operations) - what if 1 user wishes to erase ("right of erasure") her/his information?
Answer: By default you DO have to allow that. But there are two important points that specify important details. 1st is about when right to erasure is justifiable:
"data subject has the right to request erasure of personal data (...) if the legitimate interests of the controller is overridden by the interests or fundamental rights and freedoms of the data subject, which require protection of personal data"
The 2nd one is about lawful basis for data processing:
"processing is necessary for the performance of a contract to which the data subject is party"
In practice it means that entering the contractual (from law's perspective) binding make software service provider's position safer as long as data processing sticks to the service described in the contract. Weasels that offer X, but in fact earn by selling out collected personal data (90% of so-called "FinTechs" ...) can choke on that.
Q2: What about audit information (e.g. append only logs required from legal / contractual obligation perspective)?
Answer: Contractual obligations I've already refered to. Regarding other legal obligations, these are explicitly mentioned as well:
"processing is necessary for compliance with a legal obligation to which the controller is subject"
What about audit information not required by law, but necessary for trouble-shooting, quality audit-trail, etc.? As long as you can related it directly to this statement (in a way acceptable by potential judicial proceeding), you should be safe:
"processing is necessary to protect the vital interests of the data subject or of another natural person"
Q3: What about legacy systems / 3rd party systems / COTS components - which of these do not have to be taken under consideration (in context of GDPR)?
Answer: Neither data subject nor legislators are interested in your technical architecture. There's not term of 'legacy', 'technical debt' or anything that resembles them. You're responsible for the service end-to-end, but ofc you can rely on another services (under the hood) that are also subject to GDPR regulations. Unless you're using a version with an expired support (not supported officially) ...
Q4: Isn't GDPR effectively killing the idea of building event-sourced systems (with assumed immutability by-design)?
Answer: Hmm, no it doesn't. But event-sourcing has to (as it always was ...) be applied carefully. Now this carefulness ain't only about event schema versioning & snapshot management, but also about clear event attribution (to entity/data subject) & explicit marking of personal data. Each stage of data processing (creating a snapshot, any sort of aggregation, building redundant CQRS read views, etc.) should be annotated with the information whether it's output qualifies as personal data or not.
Q5: Isn't GDPR effectively killing the idea of blockchain? Information can't be removed out of blockchain + there's no single entity accountable in case of decentralised systems ...
Answer: Next question please :) On more serious note, GDPR has not been designed with decentralised systems in mind. At all. So in theory GDPR does apply here, while in practice it does not - it just ain't feasible to execute. If EU really wanted to, legislators would be able to trigger large-scale operation to ban bitcoin based on GDPR, but this will not happen for several, very obvious reasons :) Maybe, if there were any asymmetry in terms of access to the data (e.g. one big supporting company like Facebook - with an exclusive access), but no - it's not the case here.
Q6: What bout OSS software & personal contributions to it? What if someone would like to erase her/his personal information (+ information traceable from her/his personal information)?
Answer: I've seen some discussion regarding that on the Net already. OSS contribution is not just the delta of changes (by contributing, you agree to the licensing terms), but it's also associated to your personal information (e.g. identity on GH) - what if someone would like to execute her/his "right to erasure" upon that? Technically it's possible & in case of majority of licenses it's enough to anonymize credentials, w/o removing actual contribution, but I haven't seen any official statement from the biggest "players" (SaaS providers) like GH.
As I find this topic particuarly interesting, feel free to let me know if you have any more questions nurturing you - maybe I'll be able to help.