what is data lineage

11 months ago 41
Nature

Data lineage is the process of tracking the flow of data over time, providing a clear understanding of where the data originated, how it has changed, and its destination. It includes the data origin, what happens to it, and where it moves over time. Data lineage can be represented visually to discover the data flow/movement from its source to destination via various changes and hops on its way in the enterprise environment, how the data gets transformed along the way, how the representation and parameters change, and how the data splits or converges after each hop. Data lineage provides the audit trail of the data points at the highest granular level, but presentation of the lineage may be done at various zoom levels to simplify the vast information, similar to analytic web maps. Data lineage can be visualized at various levels based on the granularity of the view, providing information about data sets that can aid in classifying them. Data lineage and data governance are closely related, as data lineage documents data sources and flows, enabling governance teams to monitor how data moves through systems and is modified and used.