# Core Architecture Standards for DuckDB

This document outlines the core architectural standards for contributing to and maintaining the DuckDB project. It focuses on the high-level structure, organization, and key design principles that guide development. Adherence to these standards ensures consistency, maintainability, performance, and security within the DuckDB codebase.

## 1. Fundamental Architectural Principles

DuckDB's architecture is designed around several key principles that guide its development:

* **Columnar Data Storage:** Data is stored in columns, enabling efficient analytical processing by minimizing I/O and maximizing vectorization opportunities.

* **In-Process Execution:** DuckDB operates within the same process as the application, eliminating serialization/deserialization overhead and enabling tight integration. This design choice favors simplicity and speed for many common use cases.

* **Vectorized Execution:** Queries are processed using vectorized execution, where operations are applied to entire columns (or chunks of columns) at once. This dramatically improves performance compared to row-by-row processing.

* **Extensibility:** DuckDB is designed to be extensible. Custom functions ("UDFs"), table functions ("TDFs"), and other extensions can be added to provide specialized functionality.

* **Data locality:** DuckDB tries to maintain high data locality by grouping related data together (e.g. by using radix partitioning). This improves cache hit ratios and reduces memory access latency.

* **Minimal Dependencies:** Aiming for ease of deployment and portability, DuckDB strives to minimize external dependencies.

* **Cost-Based Optimizer:** Utilizes a cost-based optimizer for efficient query planning. It estimates the cost of different execution strategies and selects the most performant one.

## 2. Project Structure and Organization

The DuckDB project follows a well-defined directory structure:

* **"src/":** Contains the core source code.

* **"catalog/":** Manages database metadata (tables, schemas, functions, etc.).

* **"common/":** Common utility functions and data structures used throughout the codebase.

* **"execution/":** Implements query execution logic including the vectorized processing engine.

* **"function/":** Contains built-in SQL functions.

* **"main/":** Main entry point for the DuckDB library.

* **"optimizer/":** Implements the query optimizer, including rule-based and cost-based optimizations.

* **"parser/":** Responsible for parsing SQL queries.

* **"planner/":** Creates the logical plan from the parsed SQL query.

* **"storage/":** Implements storage management and data access.

* **"transaction/":** Manages database transactions and concurrency control.

* **"include/duckdb/":** Public header files for the DuckDB API.

* **"test/":** Contains unit tests and integration tests.

* **"extension/":** Location for extensions to DuckDB.

* **"third_party/":** External libraries used by DuckDB.

### 2.1 Standards for Project Structure Contributions

* **Do This:** Place new source code in the appropriate subdirectory within "src/". If a new component is introduced, create a dedicated subdirectory.

* **Don't Do This:** Add source files to the root "src/" directory unless it's absolutely unavoidable. This keeps the codebase organized and navigable.

**Example:** If you're adding a new string function, create the files "src/function/string/my_new_string_function.cpp" and "src/function/string/my_new_string_function.hpp".

### 2.2 Namespaces

* **Do This:** All DuckDB code should reside within the "duckdb" namespace. Nested namespaces (e.g., "duckdb::storage") can be used to further organize code within modules. Use anonymous namespaces for file-local symbols.

* **Don't Do This:** Use the global namespace or other top-level namespaces for DuckDB code.

**Example:**

"""cpp

namespace duckdb {

namespace storage {

class MyStorageClass {

// ...

};

} // namespace storage

} // namespace duckdb

"""

### 2.3 Directory Naming Conventions

* **Do This:** Keep directory names lowercase and descriptive. Use underscores to separate words (e.g., "storage_manager").

* **Don't Do This:** Use camelCase or mixed-case directory names. Avoid abbreviations unless they are well-established within the project (e.g., "UDF" is acceptable rather than "user_defined_function").

## 3. Coding Style and Formatting

DuckDB follows a consistent coding style based on LLVM's style guide, with minor customizations.

* **Do This:**

* Use clang-format to automatically format code. A ".clang-format" file is provided in the root of the repository.

* Follow the naming conventions for variables (snake_case), classes (PascalCase), and functions (camelCase, starting with a lowercase letter).

* Use expressive and descriptive names.

* Keep lines within a reasonable length (ideally under 120 characters).

* **Don't Do This:**

* Manually format code. Let clang-format handle the formatting.

* Use cryptic or single-letter variable names (except in very localized contexts like loop counters).

**Example:**

"""cpp

// Correct

class MyStorageManager {

public:

void initializeStorage(const string& path);

private:

string database_path_;

};

// Incorrect

class mystoragemanager { // Class name should be PascalCase

public:

void initstorage(const string& p); // Function starts with lowercase, variable name unclear

private:

string dbpath; // Variable name unclear, should be snake_case

};

"""

## 4. Memory Management

DuckDB employs a combination of manual memory management (using "new" and "delete"), smart pointers ("unique_ptr", "shared_ptr") for resource ownership, and a custom memory pool allocator for managing the lifetime of short-lived objects within the vectorized execution engine.

### 4.1 Standards for Memory Management

* **Do This:**

* Use "unique_ptr" for exclusive ownership of resources. This is the preferred way to manage memory in most cases.

* Use "shared_ptr" only when shared ownership is explicitly required. Carefully consider the lifetime implications when using "shared_ptr" to avoid circular dependencies and memory leaks.

* Use the memory pool allocator ("Allocator") for allocating short-lived objects within the vectorized execution engine, especially within inner loops or frequently called functions. This avoids the overhead of "new" and "delete" for each object.

* When using raw pointers, ensure clear ownership transfer and deallocation, document the ownership semantics, and consider using RAII (Resource Acquisition Is Initialization) to tie the lifetime of the resource to the lifetime of an object.

* **Don't Do This:**

* Use raw pointers for resource ownership without clear ownership transfer.

* Leak memory by failing to "delete" allocated objects.

* Double-free memory.

* Access memory after it has been freed (use-after-free).

* Mix different memory allocation strategies haphazardly.

**Example using "unique_ptr":**

"""cpp

#include

namespace duckdb {

class MyObject {

public:

MyObject(int value) : value_(value) {}

int GetValue() const { return value_; }

private:

int value_;

};

void processObject(std::unique_ptr obj) {

// 'obj' is exclusively owned here.

std::cout << "Processing object with value: " << obj->GetValue() << std::endl;

} // 'obj' is automatically deleted when it goes out of scope.

std::unique_ptr createObject(int initialValue) {

return std::make_unique(initialValue);

}

} // namespace duckdb

"""

**Example using the memory pool allocator:**

"""cpp

namespace duckdb {

class Vector {

public:

Vector(Allocator &allocator) : data_(allocator.Allocate(1024)) {}

private:

data_ptr_t data_;

};

void myFunction(Allocator &allocator) {

// Allocate a Vector using the provided allocator.

Vector my_vector(allocator);

} // Vector's memory is automatically deallocated when the Allocator's scope ends, usually at the end of query execution.

} // namespace duckdb

"""

## 5. Concurrency and Parallelism

DuckDB leverages multi-threading for parallel query execution, particularly within the vectorized execution engine.

### 5.1 Standards for Concurrency

* **Do This:**

* Use appropriate locking mechanisms (e.g., "std::mutex", "std::shared_mutex") to protect shared data structures from race conditions.

* Use fine-grained locking to minimize lock contention and maximize parallelism.

* Consider using lock-free data structures for high-contention scenarios, but only when appropriate and with careful consideration of the complexity involved. The "atomic" types can be helpful here.

* Utilize the task scheduler for managing parallel tasks.

* **Don't Do This:**

* Introduce data races by accessing shared data without proper synchronization.

* Hold locks for extended periods, blocking other threads.

* Create deadlocks by acquiring locks in inconsistent orders.

**Example using "std::mutex":**

"""cpp

#include

namespace duckdb {

class SharedData {

public:

void incrementCounter() {

std::lock_guard lock(mutex_); // RAII-style locking

counter_++;

}

int getCounter() const {

std::lock_guard lock(mutex_);

return counter_;

}

private:

int counter_ = 0;

std::mutex mutex_;

};

} // namespace duckdb

"""

## 6. Error Handling

Robust error handling is crucial for maintaining the stability and reliability of DuckDB.

### 6.1 Standards for Error Handling

* **Do This:**

* Use exceptions ("std::exception" or custom exception classes derived from it) to signal errors. Specifically "duckdb::Exception" and its subclasses are preferred.

* Catch exceptions at appropriate levels and handle them gracefully.

* Provide informative error messages that include the context of the error (e.g., the SQL query being executed, the file being processed).

* Use "D_ASSERT" macros for internal assertions that should always be true. These assertions are enabled in debug builds and can help catch bugs early.

* Return "Value" objects which contain error states when appropriate, especially for functions.

* **Don't Do This:**

* Ignore errors.

* Use return codes for error handling unless there is a very specific reason to do so. Exceptions provide a much cleaner separation of concerns.

* Throw generic exceptions without providing specific error information.

* Use assertions for error conditions that can occur in production. Assertions are only enabled in debug builds; use exceptions for handling runtime errors.

**Example using exceptions:**

"""cpp

#include

#include "duckdb.hpp"

namespace duckdb {

void myFunction(int value) {

if (value < 0) {

throw InvalidInputException("Value must be non-negative");

}

// ...

}

void anotherFunction() {

try {

myFunction(-1);

} catch (const InvalidInputException& e) {

std::cerr << "Error: " << e.what() << std::endl;

// Handle the error appropriately (e.g., log it, return an error code).

} catch (const Exception& e) {

std::cerr << "DuckDB Error: " << e.what() << std::endl; // Catch DuckDB specific exceptions

} catch (const std::exception& e) {

std::cerr << "Standard exception: " << e.what() << std::endl; // Catch standard exceptions

} catch (...) {

std::cerr << "Unknown error occurred." << std::endl;

// Handle unexpected errors.

}

} // namespace duckdb

"""

## 7. Logging

DuckDB uses a logging system to record events and diagnostic information at different levels of severity.

### 7.1 Standards for Logging

* **Do This:**

* Use the logging macros (e.g., "D_LOG", "D_INFO", "D_DEBUG", "D_WARN", "D_ERROR") to log events at the appropriate severity level.

* Include relevant context in log messages (e.g., the function name, the current state of the system).

* Use structured logging to make log messages easier to parse and analyze.

* **Don't Do This:**

* Over-log, creating excessive noise in the logs.

* Log sensitive information (e.g., passwords, API keys).

* Use "std::cout" or "std::cerr" for logging. Use the DuckDB logging macros instead for consistency and configurability.

**Example using logging macros:**

"""cpp

#include "duckdb.hpp"

namespace duckdb {

void myFunction(int value) {

D_DEBUG("myFunction called with value: {}", value); //Debug level message

if (value < 0) {

D_ERROR("Invalid value: {}", value); // Error Level message

throw InvalidInputException("Value must be non-negative");

}

D_INFO("Processing value: {}", value);

}

} // namespace duckdb

"""

## 8. Extensibility

DuckDB is designed to be extensible, allowing developers to add custom functions, table functions, and other extensions.

### 8.1 Standards for Extensibility

* **Do This:**

* Follow the documented API for creating custom functions and table functions. Refer to the DuckDB documentation for the latest API details.

* Provide clear documentation and examples for your extensions.

* Consider contributing your extensions back to the DuckDB community or publishing them as separate packages that others can use.

* Ensure that extensions are thread-safe and do not introduce data races. Use proper synchronization mechanisms when accessing shared data structures.

* **Don't Do This:**

* Modify the core DuckDB code to add custom functionality. Use the extension API instead.

* Introduce breaking changes to the extension API without careful consideration and communication with the community.

* Create extensions that are insecure or unreliable.

**Example Registering UDFs:**

"""cpp

#include "duckdb.hpp"

#include "duckdb/function/scalar_function.hpp"

namespace duckdb {

static void my_scalar_function(DataChunk &args, ExpressionState &state, Vector &result) {

auto &input = args.data[0];

UnaryFunction::Execute(input, result, args.size(),

[&](int32_t input) {

return input + 1;

});

}

class MyExtension : public Extension {

public:

std::string Name() override {

return "my_extension";

}

void Load(DatabaseInstance &instance) override {

Connection con(instance);

con.BeginTransaction();

auto &catalog = Catalog::GetCatalog(*con.GetContext());

ScalarFunction my_function("my_scalar_function", {LogicalType::INTEGER}, LogicalType::INTEGER, my_scalar_function);

catalog.CreateFunction(*con.GetContext(), my_function);

con.Commit();

}

};

extern "C" {

DUCKDB_EXTENSION_API void MyExtension_init(duckdb::DatabaseInstance &db) {

db.RegisterExtension(std::make_unique());

}

DUCKDB_EXTENSION_API const char *MyExtension_version() {

return duckdb::DuckDB::LibraryVersion();

}

} // namespace duckdb

"""

## 9. Testing

Tests are written using GTest and are located in the "test/" directory. Each component of the DuckDB system should have corresponding unit tests.

### 9.1 Testing Standards

* **Do This:**

* Write both unit tests and integration tests to cover different aspects of the code. Unit tests should focus on individual components, while integration tests should verify the interaction between multiple components.

* Use descriptive test names that clearly indicate what is being tested.

* Write tests that are reliable and repeatable. Avoid tests that depend on external factors (e.g., network connectivity, specific file system layout) unless those factors are explicitly part of the test.

* Aim for achieving good test coverage by writing tests that exercise all code paths and edge cases.

* **Don't Do This:**

* Skip writing tests, even for small changes.

* Write tests that are flaky or unreliable. Fix or remove such tests.

* Commit code without running the tests first.

**Example Test:**

"""cpp

#include "catch.hpp"

#include "duckdb.hpp"

using namespace duckdb;

TEST_CASE("Basic test", "[core]") {

DBConfig config;

DuckDB db(nullptr, &config);

Connection con(db);

REQUIRE(con.Query("SELECT 42")->GetValue(0, 0) == Value::INTEGER(42));

}

TEST_CASE("Test that asserts","[common]") {

// this test will only fail in debug mode

REQUIRE_ASSERT(D_ASSERT(1 == 2));

}

"""

## 10. Documentation

Clear and up-to-date documentation is essential for making DuckDB easy to use and contribute to.

### 10.1 Standards for Documentation

* **Do This:**

* Document all public APIs (functions, classes, etc.) using Doxygen-style comments.

* Provide clear and concise explanations of the purpose, usage, and limitations of the API.

* Include examples to illustrate how to use the API.

* Keep the documentation up-to-date as the code evolves.

* Document internal design decisions and architectural choices to help other developers understand the codebase.

* Use meaningful comments within the code to explain complex logic or non-obvious decisions.

* **Don't Do This:**

* Skip documenting public APIs.

* Write documentation that is vague, incomplete, or inaccurate.

* Let the documentation become outdated.

**Example using Doxygen comments:**

"""cpp

namespace duckdb {

/**

* @brief Initializes the storage manager.

* @param path The path to the database file.

void initializeStorage(const std::string& path);

} // namespace duckdb

"""

## 11. Specific Considerations for Core Architecture

* **Catalog Management:** The "catalog/" directory is critical. Changes here affect the entire database. Code here requires extensive testing. Correct locking is critical to prevent database corruption. Ensure all catalog changes are properly logged for recovery purposes.

* **Query Optimizer:** The "optimizer/" is performance-sensitive. New optimization rules should be carefully evaluated for their impact on query performance. Use benchmarks before and after changes. Pay special attention to corner cases for robustness.

* **Storage Layer:** The "storage/" directory is responsible for data persistence. Correct implementations of the Write-Ahead Log (WAL) is critical for durability. Thoroughly test recovery scenarios after system crashes or power failures. Performance changes in the storage systems have a global impact.

By adhering to these coding standards, developers can contribute to the DuckDB project in a consistent, maintainable, and high-quality manner. This collaborative effort ensures that DuckDB remains a powerful and reliable analytical database system.

Cline

This guide explains how to effectively use .clinerules with Cline, the AI-powered coding assistant.

Overview

The .clinerules file is a powerful configuration file that helps Cline understand your project's requirements, coding standards, and constraints. When placed in your project's root directory, it automatically guides Cline's behavior and ensures consistency across your codebase.

Key Concepts

Purpose of .clinerules

Defines project-specific guidelines and requirements
Enforces consistent coding standards
Establishes documentation practices
Sets testing and quality requirements
Configures error handling preferences

File Location

Place the .clinerules file in your project's root directory. Cline automatically detects and follows these rules for all files within the project.

Rule Structure

1. Project Overview

# Project Overview
project:
  name: 'Your Project Name'
  description: 'Brief project description'
  stack:
    - technology: 'Framework/Language'
      version: 'X.Y.Z'
    - technology: 'Database'
      version: 'X.Y.Z'

2. Code Standards

# Code Standards
standards:
  style:
    - 'Use consistent indentation (2 spaces)'
    - 'Follow language-specific naming conventions'
  documentation:
    - 'Include JSDoc comments for all functions'
    - 'Maintain up-to-date README files'
  testing:
    - 'Write unit tests for all new features'
    - 'Maintain minimum 80% code coverage'

3. Security Rules

# Security Guidelines
security:
  authentication:
    - 'Implement proper token validation'
    - 'Use environment variables for secrets'
  dataProtection:
    - 'Sanitize all user inputs'
    - 'Implement proper error handling'

Best Practices

Writing Effective Rules

Be Specific
- Use clear, actionable language
- Provide examples where helpful
- Define measurable criteria
Maintain Organization
- Group related rules together
- Use consistent formatting
- Keep critical rules at the top
Regular Updates
- Review rules periodically
- Update based on team feedback
- Document changes in version control

Common Patterns

# Common Patterns Example
patterns:
  components:
    - pattern: 'Use functional components by default'
    - pattern: 'Implement error boundaries for component trees'
  stateManagement:
    - pattern: 'Use React Query for server state'
    - pattern: 'Implement proper loading states'

Integration with Development Workflow

Using with Version Control

Commit the Rules
- Include .clinerules in version control
- Document rule changes in commit messages
- Review rule changes as part of PR process
Team Collaboration
- Discuss rule changes with team
- Maintain changelog for rule updates
- Ensure all team members understand rules

Troubleshooting

Common Issues

Rules Not Being Applied
- Verify file location (must be in root directory)
- Check file formatting
- Ensure Cline has access to the file
Conflicting Rules
- Review rule hierarchy
- Resolve conflicts explicitly
- Document rule precedence
Performance Considerations
- Keep rules concise and focused
- Avoid overly complex rule structures
- Regular cleanup of obsolete rules

Examples

Basic Project Setup

# Basic .clinerules Example
project:
  name: 'Web Application'
  type: 'Next.js Frontend'
  standards:
    - 'Use TypeScript for all new code'
    - 'Follow React best practices'
    - 'Implement proper error handling'

testing:
  unit:
    - 'Jest for unit tests'
    - 'React Testing Library for components'
  e2e:
    - 'Cypress for end-to-end testing'

documentation:
  required:
    - 'README.md in each major directory'
    - 'JSDoc comments for public APIs'
    - 'Changelog updates for all changes'

Advanced Configuration

# Advanced .clinerules Example
project:
  name: 'Enterprise Application'
  compliance:
    - 'GDPR requirements'
    - 'WCAG 2.1 AA accessibility'

architecture:
  patterns:
    - 'Clean Architecture principles'
    - 'Domain-Driven Design concepts'

security:
  requirements:
    - 'OAuth 2.0 authentication'
    - 'Rate limiting on all APIs'
    - 'Input validation with Zod'

Core Architecture Standards for DuckDB

Cline

Overview

Key Concepts

Purpose of .clinerules

File Location

Rule Structure

1. Project Overview

2. Code Standards

3. Security Rules

Best Practices

Writing Effective Rules

Common Patterns

Integration with Development Workflow

Using with Version Control

Troubleshooting

Common Issues

Examples

Basic Project Setup

Advanced Configuration

Related Rules

Component Design Standards for DuckDB

Performance Optimization Standards for DuckDB

API Integration Standards for DuckDB

State Management Standards for DuckDB

Testing Methodologies Standards for DuckDB