High-Level Python SDK¶

The High-Level Python SDK provides a simplified, Pythonic interface for working with lakeFS. Built on top of the Generated SDK, it offers advanced features like transactions, streaming I/O, and intuitive object management while maintaining the full power of the underlying API.

Key Concepts¶

Repository-Centric Design¶

The High-Level SDK is organized around repositories as the primary entry point. All operations flow from repository objects to branches, commits, and objects, providing a natural hierarchy that mirrors lakeFS's data model.

Lazy Evaluation¶

Objects are created lazily - creating a Repository, Branch, or StoredObject instance doesn't immediately interact with the server. Operations only execute when you call action methods like create(), upload(), or commit().

Fluent Interface¶

The SDK supports method chaining for common workflows:

repo = lakefs.repository("my-repo").create(storage_namespace="s3://bucket/path")
commit = repo.branch("main").object("file.txt").upload(data="content").commit("Add file")

Built-in Error Handling¶

All operations include comprehensive error handling with specific exception types for different failure scenarios, making it easier to build robust applications.

Key Features¶

Simplified API - Pythonic interface that abstracts complex operations
Transaction Support - Atomic operations with automatic rollback capabilities
Streaming I/O - File-like objects for efficient handling of large datasets
Import Management - Sophisticated data import operations with progress tracking
Batch Operations - Efficient bulk operations for better performance
Generated SDK Access - Direct access to underlying Generated SDK when needed
Automatic Authentication - Seamless credential discovery from environment or config files

Architecture Overview¶

The High-Level SDK is structured in layers:

High-Level SDK (lakefs package)
├── Repository Management
├── Branch & Reference Operations  
├── Object I/O & Streaming
├── Transaction Management
├── Import/Export Operations
└── Generated SDK (lakefs_sdk)
    └── Direct API Access

Core Classes¶

Repository¶

The main entry point for all operations. Represents a lakeFS repository and provides access to branches, tags, and metadata.

Branch¶

Extends Reference with write capabilities. Supports object uploads, commits, merges, and transaction management.

Reference¶

Read-only access to any lakeFS reference (branch, commit, or tag). Provides object listing and reading capabilities.

StoredObject & WriteableObject¶

Represent objects in lakeFS with full I/O capabilities including streaming, metadata management, and batch operations.

ImportManager¶

Handles complex data import operations with support for various source types and progress monitoring.

Documentation Sections¶

Quickstart - Get started with basic operations
Repositories - Repository management operations
Branches & Commits - Version control operations
Objects & I/O - Object operations and streaming
Imports & Exports - Data import/export operations
Transactions - Atomic operation patterns
Advanced Features - Advanced patterns and optimization

Quick Example¶

import lakefs

# Create a repository
repo = lakefs.repository("my-repo").create(
    storage_namespace="s3://my-bucket/repos/my-repo"
)

# Create a branch and upload data
branch = repo.branch("feature-branch").create(source_reference="main")
obj = branch.object("data/file.txt").upload(data="Hello, lakeFS!")

# Commit changes
commit = branch.commit(message="Add new data file")
print(f"Committed: {commit.id}")

Installation¶

pip install lakefs

Authentication¶

The SDK automatically discovers credentials from: 1. Environment variables (LAKEFS_ACCESS_KEY_ID, LAKEFS_SECRET_ACCESS_KEY, LAKEFS_ENDPOINT) 2. Configuration file (~/.lakectl.yaml) 3. Explicit client configuration

When to Use High-Level SDK¶

Choose the High-Level SDK when you need: - Simplified workflows - Common operations with minimal code - Transaction support - Atomic operations across multiple changes - Streaming I/O - Efficient handling of large files - Import management - Complex data ingestion workflows - Python-first experience - Pythonic interfaces and error handling

For direct API control or operations not covered by the high-level interface, you can access the underlying Generated SDK through the client property.

Next Steps¶

Start with the quickstart guide to learn the basics, then explore specific features in the detailed sections.