Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm having a hard time visualizing this - are we talking outside devs (consultants) or folks in a different jurisdiction who don't have rights under law, or just a company administrative policy? Agree that fake data can work well, but if you can clone the production DB, that seems a preferable and easier approach.


There's a variety of reasons this may happen that stem from either legal, certification, or company reasons.

Your production database may have medically-sensitive PII (or for something like SOC-2 compliance any PII at all) that cannot be shared any human (other than the original user) unless with prior approval.

Even for non-externally mandated reasons, companies may (and often do) wish to restrict access to production data by developers to minimize concerns around data exfiltration and snooping on user data by company employees.


that makes sense - I suppose using my model you could mask some of the data in a versioned graph or a collection contained in the database that can be surfaced up to other users who can then clone the collection that excludes PII. You could run the main collection and the PII free collection in the same data product. This might be an easier approach than creating fake data & fake schema.


> This might be an easier approach than creating fake data & fake schema.

I doubt this. This is for two reasons: the first is that the development database usually shares the same schema as the production database so that's not an issue.

The second is that fake data convincingly takes care of various issues surrounding de-anonymization of data using correlations among bits of data that ostensibly have had their PII-sensitive bits removed.

If protection of user data is a priority, there are far fewer headaches associated with creating entirely fake data to populate the same schema than trying to figure out post-hoc censoring of production data.

That's not to say there aren't valid use cases of the latter. You often will want to do post-hoc censoring/aggregation if you wish to track e.g. usage metrics. This is in fact often a component of ETLs. However, those are removed from everyday development tasks.


I am struggling to think of any time it would be appropriate to clone a production database and use it for staging. That almost inevitably translates into any developer who ever deploys database-related changes having access to the contents of the production database, or at least every part of it that might ever be affected by code changes they are making. There are numerous legal, regulatory and ethical issues with granting such broad access, unless you're small enough that the few people you obviously do need to have the ability to grant full DB access happen to include anyone who is likely to change code that talks to the database. And even then it still seems like bad practice to actually grant anyone that access when it's not strictly necessary!


Legally. I've worked in healthcare and any direct access to production is strictly forbidden. If you can't figure it out through your logs and monitoring, and you can't reproduce in earlier environments, then the problem isn't that you don't have prod access, the problem is that your instrumentation has a gap.

Personally, I think we rely on prod access as a crutch because it's easier to expect that than it is to build a sufficient infrastructure. Cloning a prod database or allowing ad-hoc r/w access is on my list of strictly forbidden operations.


> Personally, I think we rely on prod access as a crutch because it's easier to expect that than it is to build a sufficient infrastructure.

Which I think is reasonable if you are not in a strictly controlled space.

Not everyone is able to spend weeks or months on instrumentation efforts.


For example, we, as a company have decided on some rules around PII (personally identifyable information) that means, amongst other things, that we don’t have any of it outside of production. We therefore cannot just copy data from production to staging/dev.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: