# Security Best Practices Standards for DuckDB

This document outlines the security best practices for developing applications using DuckDB. These standards aim to guide developers in writing secure, maintainable, and performant code that mitigates common vulnerabilities. This will also enable AI coding assistants to produce code that aligns with these standards.

## 1. Input Validation and Sanitization

### 1.1. Rationale

DuckDB, while primarily operating locally, can be exposed to external data sources depending on your architecture. Malicious or improperly formatted input can lead to data corruption, unexpected behavior, or potentially, although less common, code execution if using user-defined functions (UDFs) that are not properly secured. Input validation and sanitization are critical to prevent these issues.

### 1.2. Standards

* **Do This:** Validate all external inputs to DuckDB before processing them.

* **Don't Do This:** Directly use untrusted data from external sources (files, network) without validation.

### 1.3. Implementation

#### 1.3.1. Data Type Validation

Always verify that incoming data matches the expected data type. DuckDB automatically does data type coercion, which can sometimes mask issues. Explicit checks provide better control.

"""python

import duckdb

def insert_data(conn, data: dict):

"""

Inserts data into a DuckDB table after validating data types.

"""

try:

# Validate data types before insertion

if not isinstance(data["id"], int):

raise ValueError("id must be an integer")

if not isinstance(data["name"], str):

raise ValueError("name must be a string")

if not isinstance(data["value"], float):

raise ValueError("value must be a float")

conn.execute("INSERT INTO my_table VALUES (?, ?, ?)", (data["id"], data["name"], data["value"]))

except ValueError as e:

print(f"Data validation error: {e}")

except duckdb.Error as e:

print(f"DuckDB error: {e}")

# Example usage

conn = duckdb.connect(":memory:")

conn.execute("CREATE TABLE my_table (id INTEGER, name VARCHAR, value DOUBLE)")

good_data = {"id": 1, "name": "example", "value": 1.23}

insert_data(conn, good_data)

bad_data = {"id": "string", "name": "example", "value": 1.23} # Incorrect id type

insert_data(conn, bad_data) # This will now raise and be handled gracefully.

conn.close()

"""

#### 1.3.2. Range and Format Validation

Beyond data types, constrain the range of possible values and enforce specific formats where necessary.

"""python

import duckdb

import re

def validate_email(email: str) -> bool:

"""Validates an email address using a regular expression."""

email_regex = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"

return bool(re.match(email_regex, email))

def insert_user(conn, user_data: dict):

"""Inserts user data into a DuckDB table after validation."""

try:

if not isinstance(user_data["age"], int) or not (0 <= user_data["age"] <= 120): # Reasonable age range

raise ValueError("Age must be an integer between 0 and 120")

if not validate_email(user_data["email"]):

raise ValueError("Invalid email format")

conn.execute("INSERT INTO users (age, email) VALUES (?, ?)", (user_data["age"], user_data["email"]))

except ValueError as e:

print(f"Data validation error: {e}")

except duckdb.Error as e:

print(f"DuckDB error: {e}")

# Example Usage

conn = duckdb.connect(":memory:")

conn.execute("CREATE TABLE users (age INTEGER, email VARCHAR)")

good_user = {"age": 30, "email": "test@example.com"}

insert_user(conn, good_user)

bad_user = {"age": -5, "email": "invalid-email"} # Invalid age and email

insert_user(conn, bad_user)

conn.close()

"""

#### 1.3.3. Sanitization

Sanitize data to remove or escape potentially harmful characters. This applies especially to strings being used within SQL queries dynamically. Consider data masking techniques, like hashing or tokenization, for sensitive data before storing it in DuckDB.

"""python

import duckdb

import bleach # pip install bleach

def insert_comment(conn, comment: str):

"""Inserts a sanitized comment into the database."""

sanitized_comment = bleach.clean(comment, tags=[], attributes={}, styles=[], strip=True) # Options can be configured.

try:

conn.execute("INSERT INTO comments (comment) VALUES (?)", (sanitized_comment,))

except duckdb.Error as e:

print(f"DuckDB error: {e}")

# Example Usage

conn = duckdb.connect(":memory:")

conn.execute("CREATE TABLE comments (comment VARCHAR)")

unsafe_comment = " This is a comment."

insert_comment(conn, unsafe_comment)

conn.execute("SELECT * FROM comments").show() # The script tag will be removed.

conn.close()

"""

Bleach is a Python library designed for sanitizing HTML. For other types of potentially malicious input you may need a different sanitization approach. SQL parameters should always be used (see section 2) to prevent code injection, but sanitization pre-emptively reduces the risk of errors and accidental code.

## 2. SQL Injection Prevention

### 2.1. Rationale

SQL injection is a critical vulnerability where malicious SQL code can be injected into a database query through user input. This can lead to unauthorized data access, modification, or deletion.

### 2.2. Standards

* **Do This:** Always use parameterized queries or prepared statements.

* **Don't Do This:** Dynamically construct SQL queries by directly concatenating user input.

### 2.3. Implementation

#### 2.3.1. Parameterized Queries

Parameterized queries are the most effective way to prevent SQL injection. Parameters are treated as data, not as part of the SQL command, thus preventing malicious injected commands from executing.

"""python

import duckdb

def search_products(conn, search_term: str):

"""Searches for products using a parameterized query to prevent SQL injection."""

try:

# Use ? as a placeholder for the parameter. DuckDB will escape the parameter.

results = conn.execute("SELECT * FROM products WHERE name LIKE ?", ('%' + search_term + '%',)).fetchall() # Always wrap parameters in a tuple

return results

except duckdb.Error as e:

print(f"DuckDB error: {e}")

return []

# Example Usage

conn = duckdb.connect(":memory:")

conn.execute("CREATE TABLE products (id INTEGER, name VARCHAR)")

conn.execute("INSERT INTO products VALUES (1, 'Laptop'), (2, 'Mouse'), (3, 'Keyboard')")

search_term = "Laptop"

products = search_products(conn, search_term)

print(products)

# Malicious input - this will NOT result in SQL injection because of the parameter

search_term = "'; DROP TABLE products; --"

products = search_products(conn, search_term) # it is treated as literal string, even with ' and --

print(products)

conn.close()

"""

**Important:** The "?" placeholder is the correct syntax for parameterized queries in DuckDB. Ensure you always wrap the parameters in a tuple, even if there's only one parameter.

#### 2.3.2. Escaping (Discouraged, but sometimes necessary as a last resort)

While parameterized queries are strongly preferred, there might be cases where you need to dynamically build parts of the SQL query (e.g., table names or column names). In such rare scenarios, proper escaping or whitelisting is necessary.

**WARNING:** Escaping should be treated as a last resort only when parameterized queries are strictly impossible.

"""python

import duckdb

import shlex # For string escaping

def dynamic_sort(conn, column_name: str):

"""Sorts a table dynamically based on a column name. AVOID IF POSSIBLE, use parameterized queries when possible."""

# Whitelist valid column names before escaping

valid_columns = ["id", "name", "price"]

if column_name not in valid_columns:

raise ValueError("Invalid column name")

# Even with whitelisting, still escape the column name

escaped_column = shlex.quote(column_name) # Escaping function

try:

# Concatenating the escaped column name into the query

query = f"SELECT * FROM items ORDER BY {escaped_column}"

results = conn.execute(query).fetchall()

return results

except duckdb.Error as e:

print(f"DuckDB error: {e}")

return []

# Example usage

conn = duckdb.connect(":memory:")

conn.execute("CREATE TABLE items (id INTEGER, name VARCHAR, price DOUBLE)")

conn.execute("INSERT INTO items VALUES (1, 'Apple', 1.0), (2, 'Banana', 0.5), (3, 'Orange', 0.75)")

sorted_items = dynamic_sort(conn, "price")

print(sorted_items) # Output: [(2, 'Banana', 0.5), (3, 'Orange', 0.75), (1, 'Apple', 1.0)]

#Demonstrates what a failure would look like:

try:

sorted_items = dynamic_sort(conn, "injected_code; DROP TABLE items;") # Invalid column given as input

except ValueError as e:

print(f"caught: {e}")

conn.close()

"""

**Explanation:**

* **Whitelisting:** Ensuring the "column_name" is within a predefined set of valid columns significantly reduces the attack surface.

* **Escaping:** Use "shlex.quote" in Python, which provides platform-appropriate quoting mechanisms. This prevents the injection of potentially dangerous characters.

* **Error Handling:** Include comprehensive error handling to catch any unexpected exceptions during query execution.

## 3. Principle of Least Privilege

### 3.1. Rationale

The principle of least privilege (PoLP) dictates that a user, process, or system should have only the minimum necessary privileges required to perform its intended function. Limiting privileges reduces the potential damage that can be caused by accidental misuse or malicious exploitation. DuckDB itself doesn't have users/roles like a traditional client-server database as it is in-process. However, consider scenarios where your Python application connects to, and possibly creates, databases in the filesystem.

### 3.2. Standards

* **Do This:** Grant only the necessary file system permissions to the user/process running the DuckDB application. Restrict database file creation to specific directories.

* **Don't Do This:** Run the application with superuser or excessive permissions. Make database files universally accessible/writable.

### 3.3. Implementation

#### 3.3.1. Filesystem Permissions

Linux example, but the same principles apply to all operating systems.

* **Create a dedicated user:** Create a dedicated user account (e.g., "duckdb_app") to run the application.

* **Restrict database directory:** Create a directory for DuckDB databases (e.g., "/opt/duckdb_data") and set the owner and group to the dedicated user.

"""bash

sudo adduser duckdb_app

sudo mkdir /opt/duckdb_data

sudo chown duckdb_app:duckdb_app /opt/duckdb_data

sudo chmod 700 /opt/duckdb_data # Ensures no one but the user can read and write

"""

#### 3.3.2. Application Configuration

* **Configure the application:** In your application's configuration, explicitly specify the database path within the restricted directory. Use relative paths within that directory when opening databases.

* **Avoid hardcoding credentials:** Don't commit credentials to the repository.

"""python

import duckdb

import os

# This assumes the duckdb_app has full permissions to the /opt/duckdb_data directory

DATABASE_PATH = "/opt/duckdb_data/my_database.duckdb" # Explicitly define the path

# Best practice to keep data in a separate directory, with limited permissions outside directory

# Can use relative path from that location when in a secure directory. However, using

# absolute paths when initially connecting may be better since it makes it clear where the app

# is connecting, avoiding any relative path issues

# Create a database at the explicitly defined path.

conn = duckdb.connect(DATABASE_PATH)

conn.execute("CREATE TABLE IF NOT EXISTS my_table (id INTEGER, name VARCHAR)") # Example

conn.close()

"""

## 4. Secure User-Defined Functions (UDFs)

### 4.1. Rationale

User-Defined Functions (UDFs) extend DuckDB's functionality with custom code, but they also introduce potential security risks. If UDFs are not properly vetted and secured, they can become a gateway for malicious code execution within the DuckDB process.

### 4.2. Standards

* **Do This:** Carefully review and test all UDFs before deploying them. Implement input validation and sanitization within the UDF itself. If accessing external resources, use secure methods. Limit side effects.

* **Don't Do This:** Allow untrusted users to define or execute UDFs without thorough security checks. Execute external system commands directly within a UDF.

### 4.3. Implementation

#### 4.3.1. Input Validation within UDFs

Always validate and sanitize inputs within the UDF to prevent unexpected behavior or vulnerabilities.

"""python

import duckdb

import subprocess

def safe_udf(input_string: str) -> str:

"""

A secure UDF that validates the input string before processing.

Uses a safe subprocess call to avoid shell injection.

"""

# Input Validation: Only allow alphanumeric characters and spaces in the input string.

if not input_string.isalnum() and not ' ' in input_string:

return "ERROR: Invalid input characters." # Handle the error safely

# If you NEED external tools, use subprocess.run with shell=False.

try:

result = subprocess.run(["echo", input_string], capture_output=True, text=True, shell=False, timeout=5) # example

return result.stdout.strip()

except subprocess.TimeoutExpired:

return "ERROR: Timeout during command execution."

except Exception as e:

return f"ERROR: An unexpected error occurred: {str(e)}"

# Example usage:

conn = duckdb.connect(":memory:")

conn.create_function("safe_udf", safe_udf)

# Execute the UDF on some data.

result = conn.execute("SELECT safe_udf('Hello World')").fetchone()[0]

print(result)

result = conn.execute("SELECT safe_udf('Hello World; rm -rf /')").fetchone()[0] # This will return an error now!

print(result)

conn.close()

"""

#### 4.3.2. Limiting Side Effects

UDFs should ideally be pure functions, meaning they don't have side effects (e.g., modifying global state, writing to files). If side effects are unavoidable, carefully control and audit them.

"""python

import duckdb

import os

def file_writing_udf(input_string: str) -> str:

"""Writes input to a file in a controlled directory. This UDF serves as an example of how to implement a secure UDF."""

#Input Validation is CRITICAL here.

if not input_string.isalnum():

return "ERROR: Invalid filename characters."

# Determine the file path

file_path = os.path.join("/tmp/secure_udf_dir/", input_string + ".txt") # Ensure the directory exist

if not file_path.startswith("/tmp/secure_udf_dir/"): #Sanity check

return "ERROR: Path escaping attempt"

try:

with open(file_path, "w") as f:

f.write("This is written by the file_writing_udf")

return f"File written to {file_path}"

except OSError as e:

return f"Error: {e}" #Handle error

def setup_udf_filesystem():

"""Prepare the filesystem outside of DuckDB connection session."""

# Create the directory, make it writable by our user only

try:

os.makedirs("/tmp/secure_udf_dir", exist_ok=True)

os.chmod("/tmp/secure_udf_dir", 0o700) #read, write and execute for the user

except OSError as e:

print(f"Error when setting up a secure directory: {e}")

# Example Usage

conn = duckdb.connect(":memory:")

setup_udf_filesystem()

conn.create_function("file_writing_udf", file_writing_udf)

# Execute the UDF:

file_name = "output"

result = conn.execute(f"SELECT file_writing_udf('{file_name}')").fetchone()[0]

print(result)

#Check to see if there is any error from invalid call

file_name = "/../../../../very_invalid_output"

result = conn.execute(f"SELECT file_writing_udf('{file_name}')").fetchone()[0]

print(result)

conn.close()

"""

4.3.3. External Dependencies

If your UDF uses external dependencies, carefully manage and vet these dependencies. Always use virtual environments and pin dependency versions to prevent supply chain attacks.

## 5. Data Encryption

### 5.1. Rationale

Data encryption protects sensitive data both at rest (stored on disk) and in transit (while being transmitted over a network). Even though DuckDB is often used as an embedded database, encrypting sensitive data adds an extra layer of protection against unauthorized access.

### 5.2. Standards

* **Do This:** Encrypt sensitive data at rest and in transit, consider full disk encryption and authenticated connections where possible.

* **Don't Do This:** Store sensitive data in plain text without encryption.

### 5.3. Implementation

#### 5.3.1. DuckDB Encryption Support

DuckDB natively supports encryption at rest using the "PRAGMA key" command. You must specify a key when creating or opening an encrypted database.

"""python

import duckdb

import os

# Generate a random encryption key (for demonstration purposes only - store securely in production)

encryption_key = os.urandom(32).hex() # Generate 32 random bytes and convert to hex

# Create an encrypted DuckDB database

conn = duckdb.connect('encrypted_database.duckdb')

conn.execute(f"PRAGMA key = '{encryption_key}'")

conn.execute("CREATE TABLE my_table (id INTEGER, value VARCHAR)")

conn.execute("INSERT INTO my_table VALUES (1, 'Secret data')")

conn.close()

# Re-open the encrypted database (must provide the encryption key) to access data

conn = duckdb.connect('encrypted_database.duckdb')

conn.execute(f"PRAGMA key = '{encryption_key}'")

result = conn.execute("SELECT * FROM my_table").fetchall()

print(result) # [(1, 'Secret data')]

conn.close()

# Demonstrate failure

try:

conn = duckdb.connect('encrypted_database.duckdb')

conn.execute("SELECT * FROM my_table").fetchall()

except duckdb.CatalogException as e: #Correct way to handle encryption error

print(f"error: {e}") # Should Throw "database is encrypted but no key was provided"

"""

**Important Considerations:**

* **Key Management:** The most critical aspect of encryption is secure key management. *Never* hardcode encryption keys in your application code. Use a secure key management system (e.g., HashiCorp Vault, AWS KMS, Azure Key Vault) to store and access encryption keys.

* **Error Handling:** Always handle potential encryption errors gracefully, such as incorrect key provided.

* **Performance:** Encryption can impact performance. Test the performance impact of encryption on your application and optimize accordingly.

* **Transit Encryption:** DuckDB itself does not handle network connections. If you are transmitting DuckDB data over a network (e.g., using a remote file system), ensure you encrypt the data in transit using protocols like TLS/SSL.

## 6. Dependency Management

### 6.1 Rationale

Using outdated or vulnerable dependencies can expose your DuckDB application to security risks. Managing dependencies properly is crucial.

### 6.2 Standards

* **Do This:** Use a dependency manager (e.g., pip in Python projects) to track and update dependencies. Regularly scan dependencies for known vulnerabilities.

* **Don't Do This:** Use outdated versions of libraries with known security vulnerabilities. Add non-essential dependencies.

### 6.3. Implementation

#### 6.3.1. Using "pip"

In Python environments, "pip" is the standard package installer:

* **"requirements.txt":** Create a "requirements.txt" file to list all project dependencies with specific versions.

"""

duckdb==0.9.2 # Pin specific versions.

bleach==6.1.0

requests==2.31.0 #Example

"""

* **Install dependencies:** Use "pip install -r requirements.txt" to install the dependencies.

#### 6.3.2. Vulnerability Scanning

* **OWASP Dependency-Check:** Integrate OWASP Dependency-Check (or similar tools) into your build process to automatically scan dependencies for known vulnerabilities.

#### 6.3.3 Virtual Environments

It is also vital to develop using virtual environments. For more information on virtual enviroments and "pip" see documentation available online.

## 7. Security Audits and Testing

### 7.1. Rationale

Regular security audits and testing are essential to identify and remediate potential vulnerabilities in your DuckDB application.

### 7.2. Standards

* **Do This:** Conduct regular code reviews, perform penetration testing, and implement security monitoring.

* **Don't Do This:** Assume your application is secure without regular verification.

### 7.3. Implementation

### 7.3.1. Code Reviews

Enforce mandatory code reviews by experienced developers to identify potential security flaws.

### 7.3.2. Penetration Testing

Engage security professionals to perform penetration testing to simulate real-world attacks against your application. Use tools like OWASP ZAP or Burp Suite.

### 7.3.3. Security Monitoring

Implement security monitoring to detect and respond to suspicious activity. Monitor system logs, application logs, and network traffic for anomalies.

## 8. Specific DuckDB Considerations

### 8.1. Limited User Management

DuckDB, being in-process, has limited user management compared to client-server databases. However, be aware of the file permissions the process is executing under. Ensure the process only has required read/write permissions.

### 8.2 Extension Security

Be wary of installing third-party extensions. Ensure they come from trusted sources, since extensions can execute arbitrary code within the DuckDB process.

### 8.3. Shared Memory (Multi-threading)

If using DuckDB in a multi-threaded application, pay careful attention to locking and data consistency. While DuckDB supports concurrent reads and writes should be serialized properly to prevent data corruption. Using external libraries for concurrency adds a new external dimension to the security risk.

Cline

This guide explains how to effectively use .clinerules with Cline, the AI-powered coding assistant.

Overview

The .clinerules file is a powerful configuration file that helps Cline understand your project's requirements, coding standards, and constraints. When placed in your project's root directory, it automatically guides Cline's behavior and ensures consistency across your codebase.

Key Concepts

Purpose of .clinerules

Defines project-specific guidelines and requirements
Enforces consistent coding standards
Establishes documentation practices
Sets testing and quality requirements
Configures error handling preferences

File Location

Place the .clinerules file in your project's root directory. Cline automatically detects and follows these rules for all files within the project.

Rule Structure

1. Project Overview

# Project Overview
project:
  name: 'Your Project Name'
  description: 'Brief project description'
  stack:
    - technology: 'Framework/Language'
      version: 'X.Y.Z'
    - technology: 'Database'
      version: 'X.Y.Z'

2. Code Standards

# Code Standards
standards:
  style:
    - 'Use consistent indentation (2 spaces)'
    - 'Follow language-specific naming conventions'
  documentation:
    - 'Include JSDoc comments for all functions'
    - 'Maintain up-to-date README files'
  testing:
    - 'Write unit tests for all new features'
    - 'Maintain minimum 80% code coverage'

3. Security Rules

# Security Guidelines
security:
  authentication:
    - 'Implement proper token validation'
    - 'Use environment variables for secrets'
  dataProtection:
    - 'Sanitize all user inputs'
    - 'Implement proper error handling'

Best Practices

Writing Effective Rules

Be Specific
- Use clear, actionable language
- Provide examples where helpful
- Define measurable criteria
Maintain Organization
- Group related rules together
- Use consistent formatting
- Keep critical rules at the top
Regular Updates
- Review rules periodically
- Update based on team feedback
- Document changes in version control

Common Patterns

# Common Patterns Example
patterns:
  components:
    - pattern: 'Use functional components by default'
    - pattern: 'Implement error boundaries for component trees'
  stateManagement:
    - pattern: 'Use React Query for server state'
    - pattern: 'Implement proper loading states'

Integration with Development Workflow

Using with Version Control

Commit the Rules
- Include .clinerules in version control
- Document rule changes in commit messages
- Review rule changes as part of PR process
Team Collaboration
- Discuss rule changes with team
- Maintain changelog for rule updates
- Ensure all team members understand rules

Troubleshooting

Common Issues

Rules Not Being Applied
- Verify file location (must be in root directory)
- Check file formatting
- Ensure Cline has access to the file
Conflicting Rules
- Review rule hierarchy
- Resolve conflicts explicitly
- Document rule precedence
Performance Considerations
- Keep rules concise and focused
- Avoid overly complex rule structures
- Regular cleanup of obsolete rules

Examples

Basic Project Setup

# Basic .clinerules Example
project:
  name: 'Web Application'
  type: 'Next.js Frontend'
  standards:
    - 'Use TypeScript for all new code'
    - 'Follow React best practices'
    - 'Implement proper error handling'

testing:
  unit:
    - 'Jest for unit tests'
    - 'React Testing Library for components'
  e2e:
    - 'Cypress for end-to-end testing'

documentation:
  required:
    - 'README.md in each major directory'
    - 'JSDoc comments for public APIs'
    - 'Changelog updates for all changes'

Advanced Configuration

# Advanced .clinerules Example
project:
  name: 'Enterprise Application'
  compliance:
    - 'GDPR requirements'
    - 'WCAG 2.1 AA accessibility'

architecture:
  patterns:
    - 'Clean Architecture principles'
    - 'Domain-Driven Design concepts'

security:
  requirements:
    - 'OAuth 2.0 authentication'
    - 'Rate limiting on all APIs'
    - 'Input validation with Zod'

Security Best Practices Standards for DuckDB

Cline

Overview

Key Concepts

Purpose of .clinerules

File Location

Rule Structure

1. Project Overview

2. Code Standards

3. Security Rules

Best Practices

Writing Effective Rules

Common Patterns

Integration with Development Workflow

Using with Version Control

Troubleshooting

Common Issues

Examples

Basic Project Setup

Advanced Configuration

Related Rules

API Integration Standards for DuckDB

Testing Methodologies Standards for DuckDB

Tooling and Ecosystem Standards for DuckDB

State Management Standards for DuckDB

Performance Optimization Standards for DuckDB