The point is that data provenance isn't encoded in the data. Like the difference...

The point is that data provenance isn't encoded in the data. Like the difference between a schema where column X has type `int` versus X having type `do(int)` or something cleverer. If the way you get your causal model is to ask the person who ran the experiment, then it's very much an uphill battle for an algorithm to get a causal model. We want to enable automated causal inference, so we should better record our causal models (data lineage).