Skip to content

Getting Started with Python and lakeFS

This comprehensive guide walks you through installing and configuring Python SDKs for lakeFS. Follow the steps for your chosen SDK to get up and running quickly.

Prerequisites

Before you begin, ensure you have:

  • Python 3.8 or higher (check with python --version)
  • pip package manager
  • Access to a lakeFS instance (local or remote)
  • lakeFS credentials (access key ID and secret access key)

Quick SDK Selection

Not sure which SDK to choose? See our SDK comparison or use the decision matrix.

SDK Installation Best For
High-Level SDK pip install lakefs Most users, data pipelines
Generated SDK pip install lakefs-sdk Direct API access
lakefs-spec pip install lakefs-spec Data science workflows
Boto3 pip install boto3 S3 migration

Installation Guide

High-Level SDK Installation

The High-Level SDK provides the most user-friendly interface for lakeFS operations.

Basic Installation

pip install lakefs

Development Installation

For the latest features and bug fixes:

pip install --upgrade lakefs

# Create virtual environment
python -m venv lakefs-env
source lakefs-env/bin/activate  # On Windows: lakefs-env\Scripts\activate

# Install SDK
pip install lakefs

Verify Installation

import lakefs
print(lakefs.__version__)

Generated SDK Installation

The Generated SDK provides direct access to all lakeFS API endpoints.

Basic Installation

pip install lakefs-sdk

With Optional Dependencies

# For async support (if available)
pip install lakefs-sdk[async]

Verify Installation

import lakefs_sdk
print(lakefs_sdk.__version__)

lakefs-spec Installation

lakefs-spec provides filesystem-like operations and integrates with the fsspec ecosystem.

Basic Installation

pip install lakefs-spec

With Data Science Dependencies

# For pandas integration
pip install lakefs-spec[pandas]

# For complete data science stack
pip install lakefs-spec[all]

Verify Installation

import lakefs_spec
print(lakefs_spec.__version__)

Boto3 Installation

Use Boto3 for S3-compatible operations with lakeFS.

Basic Installation

pip install boto3

With Additional AWS Tools

# For AWS CLI compatibility
pip install boto3 awscli

# For async operations
pip install aioboto3

Verify Installation

import boto3
print(boto3.__version__)

Installation Troubleshooting

Common Issues

Permission Errors:

# Use --user flag to install for current user only
pip install --user lakefs

# Or use virtual environment (recommended)
python -m venv venv && source venv/bin/activate

Version Conflicts:

# Check for conflicts
pip check

# Upgrade pip first
pip install --upgrade pip

# Force reinstall
pip install --force-reinstall lakefs

Network Issues:

# Use different index
pip install -i https://pypi.org/simple/ lakefs

# Install from wheel
pip install --only-binary=all lakefs

Authentication and Configuration

All Python SDKs support multiple authentication methods. Choose the method that best fits your deployment and security requirements.

Set environment variables in your shell or deployment environment:

Linux/macOS

export LAKEFS_ENDPOINT=http://localhost:8000
export LAKEFS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export LAKEFS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Windows Command Prompt

set LAKEFS_ENDPOINT=http://localhost:8000
set LAKEFS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
set LAKEFS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Windows PowerShell

$env:LAKEFS_ENDPOINT="http://localhost:8000"
$env:LAKEFS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE"
$env:LAKEFS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

Using .env Files

Create a .env file in your project directory:

LAKEFS_ENDPOINT=http://localhost:8000
LAKEFS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
LAKEFS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Load with python-dotenv:

from dotenv import load_dotenv
load_dotenv()

import lakefs
# SDK will automatically use environment variables

lakectl Configuration File

Create ~/.lakectl.yaml (compatible with lakectl CLI):

credentials:
  access_key_id: AKIAIOSFODNN7EXAMPLE
  secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
server:
  endpoint_url: http://localhost:8000

Custom Configuration File

Create a custom YAML configuration file:

# config/lakefs.yaml
lakefs:
  endpoint: http://localhost:8000
  access_key_id: AKIAIOSFODNN7EXAMPLE
  secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
  verify_ssl: true

Load in Python:

import yaml
import lakefs

with open('config/lakefs.yaml', 'r') as f:
    config = yaml.safe_load(f)['lakefs']

client = lakefs.Client(
    host=config['endpoint'],
    username=config['access_key_id'],
    password=config['secret_access_key'],
    verify_ssl=config.get('verify_ssl', True)
)

Method 3: Programmatic Configuration

High-Level SDK

import lakefs

# Basic configuration
client = lakefs.Client(
    host="http://localhost:8000",
    username="AKIAIOSFODNN7EXAMPLE",
    password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
)

# Advanced configuration
client = lakefs.Client(
    host="https://lakefs.example.com",
    username="AKIAIOSFODNN7EXAMPLE",
    password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
    verify_ssl=True,
    ssl_ca_cert="/path/to/ca-bundle.pem",
    proxy="http://proxy.example.com:8080"
)

# Use client with repository operations
repo = lakefs.Repository("my-repo", client=client)

Generated SDK

import lakefs_sdk
from lakefs_sdk.configuration import Configuration
from lakefs_sdk.api_client import ApiClient

# Configure client
configuration = Configuration(
    host="http://localhost:8000",
    username="AKIAIOSFODNN7EXAMPLE",
    password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
)

# Create API client
api_client = ApiClient(configuration)

lakefs-spec

from lakefs_spec import LakeFSFileSystem

# Using credentials directly
fs = LakeFSFileSystem(
    host="http://localhost:8000",
    username="AKIAIOSFODNN7EXAMPLE",
    password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
)

# Auto-discover from ~/.lakectl.yaml
fs = LakeFSFileSystem()

Boto3

import boto3
from botocore.config import Config

# Basic S3 client configuration
s3_client = boto3.client(
    's3',
    endpoint_url='http://localhost:8000',
    aws_access_key_id='AKIAIOSFODNN7EXAMPLE',
    aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
)

# Advanced configuration with SSL and checksums
config = Config(
    request_checksum_calculation='when_required',
    response_checksum_validation='when_required',
    retries={'max_attempts': 3}
)

s3_client = boto3.client(
    's3',
    endpoint_url='https://lakefs.example.com',
    aws_access_key_id='AKIAIOSFODNN7EXAMPLE',
    aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
    config=config
)

SSL/TLS Configuration

Self-Signed Certificates (Development Only)

import lakefs

# Disable SSL verification (NOT for production)
client = lakefs.Client(
    host="https://localhost:8000",
    username="AKIAIOSFODNN7EXAMPLE",
    password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
    verify_ssl=False
)

Security Warning

Disabling SSL verification allows man-in-the-middle attacks. Never use verify_ssl=False in production environments.

Custom CA Certificates

import lakefs

# Use custom CA bundle
client = lakefs.Client(
    host="https://lakefs.example.com",
    username="AKIAIOSFODNN7EXAMPLE",
    password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
    ssl_ca_cert="/path/to/ca-bundle.pem"
)

Proxy Configuration

HTTP/HTTPS Proxy

import lakefs

client = lakefs.Client(
    host="http://localhost:8000",
    username="AKIAIOSFODNN7EXAMPLE",
    password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
    proxy="http://proxy.example.com:8080"
)

Proxy with Authentication

import lakefs

client = lakefs.Client(
    host="http://localhost:8000",
    username="AKIAIOSFODNN7EXAMPLE",
    password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
    proxy="http://user:pass@proxy.example.com:8080"
)

Testing Your Configuration

Quick Connection Test

import lakefs

try:
    # Test with High-Level SDK
    repos = list(lakefs.repositories())
    print(f"✅ Connected successfully! Found {len(repos)} repositories.")
except Exception as e:
    print(f"❌ Connection failed: {e}")

Comprehensive Health Check

import lakefs

def test_lakefs_connection():
    try:
        # Test repository listing
        repos = list(lakefs.repositories())
        print(f"✅ Repository access: {len(repos)} repositories found")

        if repos:
            # Test branch listing on first repository
            repo = repos[0]
            branches = list(repo.branches())
            print(f"✅ Branch access: {len(branches)} branches in '{repo.id}'")

        return True
    except Exception as e:
        print(f"❌ Connection test failed: {e}")
        return False

# Run the test
if test_lakefs_connection():
    print("🎉 lakeFS connection is working correctly!")

Environment-Specific Configuration

Development Environment

# development.py
import lakefs

# Use local lakeFS instance with relaxed SSL
client = lakefs.Client(
    host="http://localhost:8000",
    username="lakefs",
    password="lakefs_password",
    verify_ssl=False  # OK for local development
)

Production Environment

# production.py
import os
import lakefs

# Use environment variables with strict SSL
client = lakefs.Client(
    host=os.getenv("LAKEFS_ENDPOINT"),
    username=os.getenv("LAKEFS_ACCESS_KEY_ID"),
    password=os.getenv("LAKEFS_SECRET_ACCESS_KEY"),
    verify_ssl=True,
    ssl_ca_cert=os.getenv("LAKEFS_CA_CERT_PATH")
)

Next Steps

See Also

SDK Selection: - Python SDK Overview - Compare all available Python SDK options - SDK Decision Matrix - Choose the right SDK for your use case - Feature Comparison - Detailed feature comparison across SDKs

SDK-Specific Getting Started: - High-Level SDK Quickstart - Basic operations with simplified interface - Generated SDK Examples - Direct API access patterns - lakefs-spec Filesystem API - File-like operations - Boto3 Configuration - S3-compatible setup

Authentication and Security: - Best Practices Guide - Production security recommendations - Troubleshooting Authentication - Common auth problems - SSL/TLS Configuration - Secure connections

Learning Resources: - Data Science Workflow Tutorial - End-to-end data analysis - ETL Pipeline Tutorial - Building data pipelines - ML Experiment Tracking - Model versioning workflow

Reference Materials: - Environment Configuration Examples - Production setup patterns - Connection Testing - Verify your setup - Performance Optimization - Optimize SDK performance

External Resources: - lakeFS Documentation - Complete lakeFS documentation - Python Package Index - All lakeFS Python packages