What Is lakeFS?
lakeFS is a tool developed by Treeverse that transforms object storage systems (such as AWS S3, Azure Blob, GCS or on-prem S3-compatible stores) into data repositories with Git-like semantics. :contentReference[oaicite:1]{index=1} In other words, it allows you to create branches, commits and merges of data, just like you would manage code.
In the domain of Data Analytics / AI Infrastructure, lakeFS brings reproducibility, isolation and collaboration to data engineers and scientists so they can experiment, test and deploy data pipelines without fear of breaking production. :contentReference[oaicite:2]{index=2}
Core Architecture & Principles
lakeFS operates on top of an object store: it does **not** replace your storage, but overlays a metadata layer that tracks versions. :contentReference[oaicite:3]{index=3} This metadata engine manages branches, commits and tags, while the underlying data remains in your object storage.
The major components include:
- Repositories & branches: You can create a repo (e.g., customer-data”) and create branches (e.g.
lakeFS (Git-Like Data Version Control for Data Lakes) – What It Is and How It Works