# Testing Methodologies Standards for PostgreSQL
This document outlines the testing methodologies standards for PostgreSQL development, focusing on ensuring code quality, reliability, and performance. Adhering to these standards will help create robust and maintainable PostgreSQL applications.
## 1. General Testing Principles
### 1.1. Adopt a Test-Driven Development (TDD) or Behavior-Driven Development (BDD) Approach
**Do This:**
* Write tests before or alongside the code.
* Follow the Red-Green-Refactor cycle.
**Don't Do This:**
* Leave testing as an afterthought.
**Why:** TDD and BDD approaches ensure that requirements are well-understood before implementation, leading to more focused and effective code. Writing tests first also ensures higher test coverage.
### 1.2. Aim for High Test Coverage
**Do This:**
* Target a minimum test coverage of 80% for all modules.
* Use coverage tools to identify untested code paths.
**Don't Do This:**
* Ignore low coverage areas without a valid reason.
**Why:** Higher test coverage reduces the risk of introducing bugs and ensures that changes do not break existing functionality.
### 1.3. Use Assertions Effectively
**Do This:**
* Write clear and specific assertions.
* Use appropriate assertion types (e.g., equality, inequality, null checks).
**Don't Do This:**
* Use generic assertions that mask underlying issues.
**Why:** Clear and specific assertions provide valuable feedback and pinpoint the exact cause of test failures.
## 2. Unit Testing
Unit testing focuses on testing individual components or functions in isolation.
### 2.1. Frameworks and Tools
**Do This:**
* Consider using "pgTAP" (PostgreSQL Test Anything Protocol) for unit testing. It provides a standard interface for running and interpreting test results.
**Don't Do This:**
* Avoid external dependencies for basic unit tests where unnecessary.
**Why:** "pgTAP" is designed specifically for PostgreSQL and integrates well with the database environment. It supports structured testing and reporting.
### 2.2. Writing Effective Unit Tests
**Do This:**
* Test individual functions and procedures with different input values, including boundary conditions and edge cases.
* Use mock objects to isolate the component under test and avoid dependencies on external services.
* Keep unit tests fast and independent to allow for frequent execution.
**Don't Do This:**
* Write tests that depend on the state of the database or other external factors (for unit tests only).
* Test multiple functionalities in a single unit test.
**Why:** Isolated and focused unit tests are easier to maintain and debug. They provide clear feedback when a specific component fails.
### 2.3. Example: Unit Testing a Function
Consider a simple function to calculate the total price of an order:
"""sql
-- Function to calculate the total price of an order
CREATE OR REPLACE FUNCTION calculate_total_price(order_id INT)
RETURNS DECIMAL AS $$
DECLARE
total DECIMAL;
BEGIN
SELECT SUM(price * quantity) INTO total
FROM order_items
WHERE order_id = calculate_total_price.order_id;
RETURN COALESCE(total, 0); -- Return 0 if no items found
END;
$$ LANGUAGE plpgsql;
"""
Here's how to write unit tests for this function using "pgTAP":
"""sql
-- Install pgTAP extension
CREATE EXTENSION IF NOT EXISTS pgtap;
-- Begin test transaction
BEGIN;
SELECT plan(3); -- Planning for 3 tests
-- Create a temporary table for testing
CREATE TEMP TABLE order_items (
order_id INT,
price DECIMAL,
quantity INT
);
-- Test case 1: Order with items
INSERT INTO order_items (order_id, price, quantity) VALUES
(1, 10.00, 2),
(1, 5.00, 3);
SELECT is(
calculate_total_price(1),
35.00::DECIMAL,
'Test case 1: Calculate total price for order with items'
);
-- Test case 2: Order with no items
SELECT is(
calculate_total_price(2),
0::DECIMAL,
'Test case 2: Calculate total price for order with no items'
);
-- Test case 3: Order with NULL values
INSERT INTO order_items (order_id, price, quantity) VALUES (3, NULL, 2);
SELECT is(
calculate_total_price(3),
NULL,
'Test case 3: Handle null values'
);
-- Clean up and end transaction
SELECT * FROM finish();
ROLLBACK;
"""
### 2.4 Common mistakes in unit testing
* **Over-reliance on real database connections**: Mocking the data access layer is necessary, but frequently developers forget to do this, making tests slow and brittle.
* **Testing implementation details, not behavior**: Unit tests should test the *what*, not the *how*. Implementation details are likely to change.
* **Ignoring edge cases**: Thoroughly test null values, empty sets, zero values, and out-of-bounds parameters.
## 3. Integration Testing
Integration testing verifies the interaction between different components or modules of the system.
### 3.1. Testing Database Interactions
**Do This:**
* Test the integration between the application and the database by executing SQL queries and verifying the results.
* Use a dedicated test database to avoid affecting production data.
* Rollback transactions after each test to maintain a consistent state.
**Don't Do This:**
* Run integration tests against the production database.
**Why:** Integration tests ensure that data is correctly stored, retrieved, and processed by the application.
### 3.2. Using Stored Procedures in Tests
**Do This:**
* Encapsulate setup and teardown logic into stored procedures for managing test data and state. This is a good practice when using "pgTAP" or similar testing frameworks.
* Write stored procedures to perform common assertion tasks.
**Don't Do This:**
* Repeat the same code logic for setting a test case for integration tests.
**Why:** Writing stored procedures is a great encapsulation practice when you design your integration tests, making them easier to write and read.
### 3.3. Example: Integration Testing a Stored Procedure
Consider a stored procedure for transferring funds between accounts:
"""sql
CREATE OR REPLACE PROCEDURE transfer_funds(
sender_id INT,
receiver_id INT,
amount DECIMAL
) AS $$
BEGIN
UPDATE accounts SET balance = balance - amount WHERE id = sender_id;
UPDATE accounts SET balance = balance + amount WHERE id = receiver_id;
COMMIT; -- Explicit commit for demonstration purposes; consider transaction management requirements
END;
$$ LANGUAGE plpgsql;
"""
Here's how to write an integration test for this stored procedure:
"""sql
-- Install pgTAP
CREATE EXTENSION IF NOT EXISTS pgtap;
-- Begin test transaction
BEGIN;
SELECT plan(2);
-- Create a temporary table for accounts
CREATE TEMP TABLE accounts (
id INT PRIMARY KEY,
balance DECIMAL
);
-- Insert initial account balances
INSERT INTO accounts (id, balance) VALUES
(1, 100.00),
(2, 50.00);
-- Test case 1: Transfer funds successfully
CALL transfer_funds(1, 2, 20.00);
SELECT is(balance, 80.00::DECIMAL, 'Test case 1: Sender balance after transfer')
FROM accounts WHERE id = 1;
SELECT is(balance, 70.00::DECIMAL, 'Test case 1: Receiver balance after transfer')
FROM accounts WHERE id = 2;
-- Reset balances for next test
UPDATE accounts SET balance = CASE id WHEN 1 THEN 100.00 WHEN 2 THEN 50.00 END;
-- Test case 2: Insufficient funds (assuming no exception handling for simplicity -- add proper exception handling in production)
CALL transfer_funds(1, 2, 150.00);
SELECT is(balance, -50.00::DECIMAL, 'Test case 2: Sender balance after transfer (insufficient funds)')
FROM accounts WHERE id = 1;
SELECT is(balance, 200.00::DECIMAL, 'Test case 2: Receiver balance after transfer(insufficient funds)')
FROM accounts WHERE id = 2;
-- Clean up and end transaction
SELECT * FROM finish();
ROLLBACK;
"""
**Note:** In a real-world scenario, proper error handling (e.g., using exceptions and "RAISE" statements) should be implemented to handle cases like insufficient funds and atomicity ensured. The "COMMIT" statement is included solely for demonstration. Robust transaction management is often needed.
### 3.4 Common mistakes in integration testing
* **Skipping integration tests entirely**: Developers sometimes only focus on unit or end-to-end tests.
* **Not cleaning up data after tests run**: This causes cascading failures as each test depends on the broken state of the database. Always use transactions and rollback.
* **Using shared resources between tests**: All tests should execute in isolation. Each test should set up its own data, and tear it down afterwards.
* **Testing too much in one test**: Keep the scope of each integration test small and focused.
## 4. End-to-End (E2E) Testing
End-to-end testing validates the entire system flow, from the user interface to the database.
### 4.1. Simulating User Interactions
**Do This:**
* Use tools like Selenium, Cypress, or Playwright to simulate user interactions.
* Verify that data is correctly displayed and processed throughout the application.
**Don't Do This:**
* Rely solely on manual testing for end-to-end validation.
**Why:** E2E tests ensure that all components work seamlessly together and that the application meets user requirements.
### 4.2. Testing Data Integrity
**Do This:**
* Insert test data through the UI and verify that it is correctly stored in the database.
* Modify data through the UI and verify that changes are reflected in the database and other parts of the application.
* Delete data through the UI and verify that it is removed from the database.
**Don't Do This:**
* Ignore data-related issues during E2E testing.
**Why:** Data integrity is crucial for the reliability and accuracy of the application.
### 4.3. Example Outline: E2E Testing a Web Application with PostgreSQL
While a full E2E example requires a complete web application, here's an outline of how to test data interactions with PostgreSQL:
1. **Setup**:
* Use a testing framework like Playwright or Cypress to automate browser interactions.
* Create a test database for the E2E tests.
* Seed the database with initial test data.
2. **Test Scenario**: Imagine testing a scenario where a user creates a new product in an e-commerce application.
* **Navigate to the "Create Product" page**.
* **Fill in the product details (name, description, price)**.
* **Submit the form**.
* **Verify that the product is displayed in the product list**.
3. **Database Verification**:
* **Connect to the test database**.
* **Execute a query to find the new product using its name or ID**.
* **Assert that the product exists in the database**.
* **Assert that the product details (name, description, price) match the values entered in the UI**.
4. **Teardown**:
* Delete the test product from the database to clean up.
* Close the browser session.
### 4.4 Common Mistakes in E2E Testing
* **Writing brittle tests**: UI elements change. Use robust selectors (e.g., data attributes) instead of relying on CSS classes or IDs that are likely to change.
* **Making tests too slow**: E2E tests are inherently slower. Minimize the scope of each test. Run tests in parallel. Optimize database queries.
* **Ignoring error handling**: Verify that error messages are displayed correctly in the UI when something goes wrong in the backend.
* **Not testing different browsers and devices**: Ensure compatibility across different environments.
## 5. Performance Testing
Performance testing evaluates the speed, stability, and scalability of the database and related application components.
### 5.1. Load Testing
**Do This:**
* Use tools like "pgbench" or "JMeter" to simulate concurrent user requests and measure the response time of the database.
* Monitor database performance metrics such as CPU usage, memory usage, disk I/O, and network traffic.
* Identify performance bottlenecks and optimize slow queries or database configurations.
**Don't Do This:**
* Assume that performance issues will be automatically resolved without proper testing.
* Run performance tests on production databases during peak hours.
**Why:** Load testing helps identify performance issues before they impact users, ensuring a smooth and responsive experience.
### 5.2. Stress Testing
**Do This:**
* Push the database beyond its normal operating limits to identify its breaking point and assess its resilience.
* Simulate extreme conditions such as sudden spikes in traffic, hardware failures, or network outages.
* Verify that the database can recover gracefully from errors and maintain data integrity.
**Don't Do This:**
* Underestimate the importance of stress testing in uncovering hidden vulnerabilities.
**Why:** Stress testing reveals how the database behaves under adverse conditions, helping to improve its stability and reliability.
### 5.3. Example: Using "pgbench" for Load Testing
"""bash
# Initialize pgbench with scale factor of 10
pgbench -i -s 10 -U postgres mydatabase
# Run a simple load test with 10 clients for 10 seconds
pgbench -c 10 -t 10 -U postgres mydatabase
"""
This example initializes "pgbench" with a scale factor of 10, which creates a larger dataset. Then, it runs a load test with 10 concurrent clients for 10 seconds.
### 5.4. Monitoring and Analysis
Collecting and analyzing performance metrics is critical to identifying bottlenecks and optimizing database performance. Common metrics to monitor include:
* **CPU utilization:** High CPU utilization can indicate inefficient queries or insufficient hardware resources.
* **Memory usage:** Excessive memory usage can lead to swapping and performance degradation.
* **Disk I/O:** Slow disk I/O can be a bottleneck for database operations.
* **Network latency:** Network latency can impact the response time of database queries.
* **Query execution time:** Analyzing query execution plans can help identify slow queries that need optimization.
Use tools like "pg_stat_statements" to identify frequently executed and slow queries.
### 5.5 Common Mistakes in Performance Testing
* **Using unrealistic test data**: Synthetic data may not accurately reflect real-world usage patterns. Use data that is representative of production data.
* **Not simulating concurrent users**: Performance issues often arise when multiple users access the database simultaneously.
* **Ignoring the impact of network latency**: Network latency can significantly impact the response time of database queries.
* **Not monitoring resource utilization**: Monitoring CPU, memory, and disk I/O is essential for identifying performance bottlenecks.
* **Running tests in a non-representative environment**: Performance should be tested in an environment as similar to production as possible.
## 6. Security Testing
Security testing involves identifying vulnerabilities and ensuring that the database is protected against unauthorized access and data breaches.
### 6.1. Authentication and Authorization
**Do This:**
* Use strong passwords for all database users.
* Implement role-based access control (RBAC) to restrict access to sensitive data.
* Enforce the principle of least privilege, granting users only the permissions they need to perform their tasks.
**Don't Do This:**
* Use default passwords or weak credentials.
* Grant unnecessary privileges to users.
**Why:** Proper authentication and authorization mechanisms prevent unauthorized access to the database.
### 6.2. SQL Injection
**Do This:**
* Use parameterized queries or prepared statements to prevent SQL injection attacks.
* Validate user input to ensure that it conforms to expected formats and does not contain malicious code.
**Don't Do This:**
* Construct SQL queries by concatenating user input directly.
**Why:** SQL injection vulnerabilities can allow attackers to bypass security controls and execute arbitrary SQL code.
### 6.3. Data Encryption
**Do This:**
* Encrypt sensitive data at rest and in transit to protect it from unauthorized access.
* Use encryption keys and certificates to secure data communication.
**Don't Do This:**
* Store sensitive data in plain text.
**Why:** Data encryption adds an extra layer of security, even if other security measures are compromised.
### 6.4. Vulnerability Scanning
**Do This:**
* Regularly scan the database for known vulnerabilities and apply security patches promptly.
* Use security tools to identify misconfigurations or weaknesses in the database setup.
**Don't Do This:**
* Ignore security alerts or delay applying critical security updates.
**Why:** Vulnerability scanning helps identify and address security risks before they can be exploited.
### 6.5. Example: Preventing SQL Injection with Parameterized Queries
"""sql
-- Vulnerable code (DO NOT USE)
-- Constructs SQL query by concatenating user input
-- username := request.getParameter("username");
-- query := "SELECT * FROM users WHERE username = '" + username + "'";
-- Safe code (USE PARAMETERIZED QUERIES)
-- Uses parameterized query to prevent SQL injection
-- query := "SELECT * FROM users WHERE username = ?";
-- PreparedStatement pstmt = connection.prepareStatement(query);
-- pstmt.setString(1, username);
-- ResultSet rs = pstmt.executeQuery();
-- Example in PL/pgSQL
CREATE OR REPLACE FUNCTION get_user_by_username(username TEXT)
RETURNS TABLE (id INT, username TEXT, email TEXT) AS $$
BEGIN
RETURN QUERY EXECUTE
'SELECT id, username, email FROM users WHERE username = $1'
USING username;
END;
$$ LANGUAGE plpgsql;
"""
The PL/pgSQL example shows how to use the "USING" clause for parameterized queries. This ensures that the "username" is treated as a literal value, preventing potential SQL injection.
## 7. Continuous Integration and Continuous Deployment (CI/CD)
### 7.1. Automate Testing
**Do This:**
* Integrate unit, integration, and E2E tests into the CI/CD pipeline.
* Run tests automatically on every commit or pull request.
**Don't Do This:**
* Rely solely on manual testing before deployment.
**Why:** Automated testing ensures that changes are thoroughly validated and reduces the risk of introducing bugs into production.
### 7.2. Database Migrations
**Do This:**
* Use a database migration tool like Flyway or Liquibase to manage database schema changes.
* Automate the application of database migrations as part of the CI/CD pipeline.
**Don't Do This:**
* Make manual changes to the database schema without proper version control.
**Why:** Database migrations ensure that schema changes are applied consistently and safely across different environments.
### 7.3 rollback Strategy
**Do This:**
* Implement a rollback strategy for database migrations to revert changes in case of failure. Store procedures, or scripts to revert from one version to another should be applied in all deployments
**Don't Do This:**
* Don't consider a rollback action when the database migration fails.
**Why:** Having a rollback strategy is important for databases so you can safely fix issues that might arise from a database migration.
### 7.4. Example: Implementing Database Migrations with Flyway
1. **Add Flyway to the project:** Include the Flyway dependency in your project's build file (e.g., Maven "pom.xml" or Gradle "build.gradle").
2. **Create migration scripts:** Create SQL migration scripts in the "db/migration" directory (default location) with filenames like "V1__Create_users_table.sql", "V2__Add_email_column.sql", etc.
Example migration script ("V1__Create_users_table.sql"):
"""sql
CREATE TABLE users (
id INT PRIMARY KEY,
username VARCHAR(255) NOT NULL,
email VARCHAR(255)
);
"""
3. **Configure Flyway:** Configure Flyway with the database URL, username, and password in a configuration file or programmatically.
4. **Run migrations:** Execute Flyway migrations as part of the CI/CD pipeline.
"""java
// Example using Flyway programmatically in Java
Flyway flyway = Flyway.configure()
.dataSource("jdbc:postgresql://localhost:5432/mydatabase", "username", "password")
.load();
flyway.migrate();
"""
This example shows how to configure and run Flyway migrations programmatically. This can be integrated into a CI/CD pipeline to automatically apply database schema changes during deployment.
By adhering to these testing methodologies standards, PostgreSQL developers can create robust, reliable, and secure applications that meet user requirements and provide a smooth and responsive experience.
danielsogl
Created Mar 6, 2025
This guide explains how to effectively use .clinerules
with Cline, the AI-powered coding assistant.
The .clinerules
file is a powerful configuration file that helps Cline understand your project's requirements, coding standards, and constraints. When placed in your project's root directory, it automatically guides Cline's behavior and ensures consistency across your codebase.
Place the .clinerules
file in your project's root directory. Cline automatically detects and follows these rules for all files within the project.
# Project Overview project: name: 'Your Project Name' description: 'Brief project description' stack: - technology: 'Framework/Language' version: 'X.Y.Z' - technology: 'Database' version: 'X.Y.Z'
# Code Standards standards: style: - 'Use consistent indentation (2 spaces)' - 'Follow language-specific naming conventions' documentation: - 'Include JSDoc comments for all functions' - 'Maintain up-to-date README files' testing: - 'Write unit tests for all new features' - 'Maintain minimum 80% code coverage'
# Security Guidelines security: authentication: - 'Implement proper token validation' - 'Use environment variables for secrets' dataProtection: - 'Sanitize all user inputs' - 'Implement proper error handling'
Be Specific
Maintain Organization
Regular Updates
# Common Patterns Example patterns: components: - pattern: 'Use functional components by default' - pattern: 'Implement error boundaries for component trees' stateManagement: - pattern: 'Use React Query for server state' - pattern: 'Implement proper loading states'
Commit the Rules
.clinerules
in version controlTeam Collaboration
Rules Not Being Applied
Conflicting Rules
Performance Considerations
# Basic .clinerules Example project: name: 'Web Application' type: 'Next.js Frontend' standards: - 'Use TypeScript for all new code' - 'Follow React best practices' - 'Implement proper error handling' testing: unit: - 'Jest for unit tests' - 'React Testing Library for components' e2e: - 'Cypress for end-to-end testing' documentation: required: - 'README.md in each major directory' - 'JSDoc comments for public APIs' - 'Changelog updates for all changes'
# Advanced .clinerules Example project: name: 'Enterprise Application' compliance: - 'GDPR requirements' - 'WCAG 2.1 AA accessibility' architecture: patterns: - 'Clean Architecture principles' - 'Domain-Driven Design concepts' security: requirements: - 'OAuth 2.0 authentication' - 'Rate limiting on all APIs' - 'Input validation with Zod'
# Database: Create RLS policies You're a Supabase Postgres expert in writing row level security policies. Your purpose is to generate a policy with the constraints given by the user. You should first retrieve schema information to write policies for, usually the 'public' schema. The output should use the following instructions: - The generated SQL must be valid SQL. - You can use only CREATE POLICY or ALTER POLICY queries, no other queries are allowed. - Always use double apostrophe in SQL strings (eg. 'Night''s watch') - You can add short explanations to your messages. - The result should be a valid markdown. The SQL code should be wrapped in ``` (including sql language tag). - Always use "auth.uid()" instead of "current_user". - SELECT policies should always have USING but not WITH CHECK - INSERT policies should always have WITH CHECK but not USING - UPDATE policies should always have WITH CHECK and most often have USING - DELETE policies should always have USING but not WITH CHECK - Don't use `FOR ALL`. Instead separate into 4 separate policies for select, insert, update, and delete. - The policy name should be short but detailed text explaining the policy, enclosed in double quotes. - Always put explanations as separate text. Never use inline SQL comments. - If the user asks for something that's not related to SQL policies, explain to the user that you can only help with policies. - Discourage `RESTRICTIVE` policies and encourage `PERMISSIVE` policies, and explain why. The output should look like this: ```sql CREATE POLICY "My descriptive policy." ON books FOR INSERT to authenticated USING ( (select auth.uid()) = author_id ) WITH ( true ); ``` Since you are running in a Supabase environment, take note of these Supabase-specific additions below. ## Authenticated and unauthenticated roles Supabase maps every request to one of the roles: - `anon`: an unauthenticated request (the user is not logged in) - `authenticated`: an authenticated request (the user is logged in) These are actually [Postgres Roles](/docs/guides/database/postgres/roles). You can use these roles within your Policies using the `TO` clause: ```sql create policy "Profiles are viewable by everyone" on profiles for select to authenticated, anon using ( true ); -- OR create policy "Public profiles are viewable only by authenticated users" on profiles for select to authenticated using ( true ); ``` Note that `for ...` must be added after the table but before the roles. `to ...` must be added after `for ...`: ### Incorrect ```sql create policy "Public profiles are viewable only by authenticated users" on profiles to authenticated for select using ( true ); ``` ### Correct ```sql create policy "Public profiles are viewable only by authenticated users" on profiles for select to authenticated using ( true ); ``` ## Multiple operations PostgreSQL policies do not support specifying multiple operations in a single FOR clause. You need to create separate policies for each operation. ### Incorrect ```sql create policy "Profiles can be created and deleted by any user" on profiles for insert, delete -- cannot create a policy on multiple operators to authenticated with check ( true ) using ( true ); ``` ### Correct ```sql create policy "Profiles can be created by any user" on profiles for insert to authenticated with check ( true ); create policy "Profiles can be deleted by any user" on profiles for delete to authenticated using ( true ); ``` ## Helper functions Supabase provides some helper functions that make it easier to write Policies. ### `auth.uid()` Returns the ID of the user making the request. ### `auth.jwt()` Returns the JWT of the user making the request. Anything that you store in the user's `raw_app_meta_data` column or the `raw_user_meta_data` column will be accessible using this function. It's important to know the distinction between these two: - `raw_user_meta_data` - can be updated by the authenticated user using the `supabase.auth.update()` function. It is not a good place to store authorization data. - `raw_app_meta_data` - cannot be updated by the user, so it's a good place to store authorization data. The `auth.jwt()` function is extremely versatile. For example, if you store some team data inside `app_metadata`, you can use it to determine whether a particular user belongs to a team. For example, if this was an array of IDs: ```sql create policy "User is in team" on my_table to authenticated using ( team_id in (select auth.jwt() -> 'app_metadata' -> 'teams')); ``` ### MFA The `auth.jwt()` function can be used to check for [Multi-Factor Authentication](/docs/guides/auth/auth-mfa#enforce-rules-for-mfa-logins). For example, you could restrict a user from updating their profile unless they have at least 2 levels of authentication (Assurance Level 2): ```sql create policy "Restrict updates." on profiles as restrictive for update to authenticated using ( (select auth.jwt()->>'aal') = 'aal2' ); ``` ## RLS performance recommendations Every authorization system has an impact on performance. While row level security is powerful, the performance impact is important to keep in mind. This is especially true for queries that scan every row in a table - like many `select` operations, including those using limit, offset, and ordering. Based on a series of [tests](https://github.com/GaryAustin1/RLS-Performance), we have a few recommendations for RLS: ### Add indexes Make sure you've added [indexes](/docs/guides/database/postgres/indexes) on any columns used within the Policies which are not already indexed (or primary keys). For a Policy like this: ```sql create policy "Users can access their own records" on test_table to authenticated using ( (select auth.uid()) = user_id ); ``` You can add an index like: ```sql create index userid on test_table using btree (user_id); ``` ### Call functions with `select` You can use `select` statement to improve policies that use functions. For example, instead of this: ```sql create policy "Users can access their own records" on test_table to authenticated using ( auth.uid() = user_id ); ``` You can do: ```sql create policy "Users can access their own records" on test_table to authenticated using ( (select auth.uid()) = user_id ); ``` This method works well for JWT functions like `auth.uid()` and `auth.jwt()` as well as `security definer` Functions. Wrapping the function causes an `initPlan` to be run by the Postgres optimizer, which allows it to "cache" the results per-statement, rather than calling the function on each row. Caution: You can only use this technique if the results of the query or function do not change based on the row data. ### Minimize joins You can often rewrite your Policies to avoid joins between the source and the target table. Instead, try to organize your policy to fetch all the relevant data from the target table into an array or set, then you can use an `IN` or `ANY` operation in your filter. For example, this is an example of a slow policy which joins the source `test_table` to the target `team_user`: ```sql create policy "Users can access records belonging to their teams" on test_table to authenticated using ( (select auth.uid()) in ( select user_id from team_user where team_user.team_id = team_id -- joins to the source "test_table.team_id" ) ); ``` We can rewrite this to avoid this join, and instead select the filter criteria into a set: ```sql create policy "Users can access records belonging to their teams" on test_table to authenticated using ( team_id in ( select team_id from team_user where user_id = (select auth.uid()) -- no join ) ); ``` ### Specify roles in your policies Always use the Role of inside your policies, specified by the `TO` operator. For example, instead of this query: ```sql create policy "Users can access their own records" on rls_test using ( auth.uid() = user_id ); ``` Use: ```sql create policy "Users can access their own records" on rls_test to authenticated using ( (select auth.uid()) = user_id ); ``` This prevents the policy `( (select auth.uid()) = user_id )` from running for any `anon` users, since the execution stops at the `to authenticated` step.
# Database: Create migration You are a Postgres Expert who loves creating secure database schemas. This project uses the migrations provided by the Supabase CLI. ## Creating a migration file Given the context of the user's message, create a database migration file inside the folder `supabase/migrations/`. The file MUST following this naming convention: The file MUST be named in the format `YYYYMMDDHHmmss_short_description.sql` with proper casing for months, minutes, and seconds in UTC time: 1. `YYYY` - Four digits for the year (e.g., `2024`). 2. `MM` - Two digits for the month (01 to 12). 3. `DD` - Two digits for the day of the month (01 to 31). 4. `HH` - Two digits for the hour in 24-hour format (00 to 23). 5. `mm` - Two digits for the minute (00 to 59). 6. `ss` - Two digits for the second (00 to 59). 7. Add an appropriate description for the migration. For example: ``` 20240906123045_create_profiles.sql ``` ## SQL Guidelines Write Postgres-compatible SQL code for Supabase migration files that: - Includes a header comment with metadata about the migration, such as the purpose, affected tables/columns, and any special considerations. - Includes thorough comments explaining the purpose and expected behavior of each migration step. - Write all SQL in lowercase. - Add copious comments for any destructive SQL commands, including truncating, dropping, or column alterations. - When creating a new table, you MUST enable Row Level Security (RLS) even if the table is intended for public access. - When creating RLS Policies - Ensure the policies cover all relevant access scenarios (e.g. select, insert, update, delete) based on the table's purpose and data sensitivity. - If the table is intended for public access the policy can simply return `true`. - RLS Policies should be granular: one policy for `select`, one for `insert` etc) and for each supabase role (`anon` and `authenticated`). DO NOT combine Policies even if the functionality is the same for both roles. - Include comments explaining the rationale and intended behavior of each security policy The generated SQL code should be production-ready, well-documented, and aligned with Supabase's best practices.
# Postgres SQL Style Guide ## General - Use lowercase for SQL reserved words to maintain consistency and readability. - Employ consistent, descriptive identifiers for tables, columns, and other database objects. - Use white space and indentation to enhance the readability of your code. - Store dates in ISO 8601 format (`yyyy-mm-ddThh:mm:ss.sssss`). - Include comments for complex logic, using '/_ ... _/' for block comments and '--' for line comments. ## Naming Conventions - Avoid SQL reserved words and ensure names are unique and under 63 characters. - Use snake_case for tables and columns. - Prefer plurals for table names - Prefer singular names for columns. ## Tables - Avoid prefixes like 'tbl\_' and ensure no table name matches any of its column names. - Always add an `id` column of type `identity generated always` unless otherwise specified. - Create all tables in the `public` schema unless otherwise specified. - Always add the schema to SQL queries for clarity. - Always add a comment to describe what the table does. The comment can be up to 1024 characters. ## Columns - Use singular names and avoid generic names like 'id'. - For references to foreign tables, use the singular of the table name with the `_id` suffix. For example `user_id` to reference the `users` table - Always use lowercase except in cases involving acronyms or when readability would be enhanced by an exception. #### Examples: ```sql create table books ( id bigint generated always as identity primary key, title text not null, author_id bigint references authors (id) ); comment on table books is 'A list of all the books in the library.'; ``` ## Queries - When the query is shorter keep it on just a few lines. As it gets larger start adding newlines for readability - Add spaces for readability. Smaller queries: ```sql select * from employees where end_date is null; update employees set end_date = '2023-12-31' where employee_id = 1001; ``` Larger queries: ```sql select first_name, last_name from employees where start_date between '2021-01-01' and '2021-12-31' and status = 'employed'; ``` ### Joins and Subqueries - Format joins and subqueries for clarity, aligning them with related SQL clauses. - Prefer full table names when referencing tables. This helps for readability. ```sql select employees.employee_name, departments.department_name from employees join departments on employees.department_id = departments.department_id where employees.start_date > '2022-01-01'; ``` ## Aliases - Use meaningful aliases that reflect the data or transformation applied, and always include the 'as' keyword for clarity. ```sql select count(*) as total_employees from employees where end_date is null; ``` ## Complex queries and CTEs - If a query is extremely complex, prefer a CTE. - Make sure the CTE is clear and linear. Prefer readability over performance. - Add comments to each block. ```sql with department_employees as ( -- Get all employees and their departments select employees.department_id, employees.first_name, employees.last_name, departments.department_name from employees join departments on employees.department_id = departments.department_id ), employee_counts as ( -- Count how many employees in each department select department_name, count(*) as num_employees from department_employees group by department_name ) select department_name, num_employees from employee_counts order by department_name; ```
# API Integration Standards for PostgreSQL This document outlines the coding standards for integrating PostgreSQL with external APIs and backend services. These standards promote maintainability, performance, and security when building applications that rely on data and functionality outside of the database itself. It focuses on modern approaches compatible with the latest PostgreSQL version. ## 1. Architectural Considerations for API Integration ### 1.1. Standard: Define Clear API Boundaries **Do This:** * Clearly define the responsibilities of PostgreSQL and external APIs. Use PostgreSQL for data persistence, relational logic, and indexing. Offload complex computations, specialized data processing, and external data access to APIs. * Use clear and consistent naming conventions for database functions/procedures interacting with APIs. Prefix them (e.g., "api_", "ext_") to easily identify external API integration code. * Document the contract (input/output) with each API thoroughly. **Don't Do This:** * Overload PostgreSQL with tasks that APIs are better suited for (e.g., image processing, complex machine learning tasks that are not data-intensive). * Embed undocumented or magic API calls directly within SQL queries. **Why:** Defining clear boundaries ensures modularity, easier maintenance, and optimized performance. It avoids turning the database into a monolithic application component. **Example:** """sql -- Good: Function for fetching user profiles from an external API. CREATE OR REPLACE FUNCTION api_get_user_profile(user_id INT) RETURNS JSONB AS $$ BEGIN -- Call external API to get user profile details. -- Using a hypothetical extension for API calls RETURN http_get('https://api.example.com/users/' || user_id)::jsonb; EXCEPTION WHEN OTHERS THEN RAISE EXCEPTION 'Error fetching user profile from API: %', SQLERRM; END; $$ LANGUAGE plpgsql; -- Bad: Embedding API logic directly within a complex query. -- SELECT * FROM users WHERE ... AND api_call(...) ... ; -- Avoid! """ ### 1.2. Standard: Asynchronous vs. Synchronous API Interactions **Do This:** * Use asynchronous API calls (e.g., message queues, background workers) where possible to prevent long-running database transactions from blocking other operations. Implement retries and error handling for asynchronous tasks. * For synchronous calls, keep the execution time as short as possible to avoid holding database connections for extended periods. **Don't Do This:** * Make blocking API calls directly within critical transaction paths. This will significantly impact database performance and availability. * Assume API calls will always succeed. Implement robust error handling and retries. **Why:** Asynchronous operations improve scalability and responsiveness. Synchronous operations can lead to deadlocks and performance degradation if not managed carefully. **Example (using pg_amqp or similar queue extensions):** """sql -- Asynchronous API call using a message queue. (Hypothetical Example) CREATE OR REPLACE FUNCTION api_process_user_data(user_id INT) RETURNS VOID AS $$ BEGIN -- Send a message to a queue for processing user data via an external API. PERFORM amqp.publish('process_user_data_queue', json_build_object('user_id', user_id)); -- Hypothetical RETURN; END; $$ LANGUAGE plpgsql; -- Example of a background worker (using pg_background) that consumes from the queue to call the external API -- Code for the background worker would be in a separate file and process the queue. """ ### 1.3. Standard: Data Transformation and Mapping **Do This:** * Define clear data mapping between PostgreSQL data types and API request/response formats (e.g., JSON, XML). Use PostgreSQL's JSONB and XML support effectively. * Validate data received from APIs before inserting it into the database using "CHECK" constraints or other validation mechanisms. * Log API requests and responses for debugging and auditing purposes. **Don't Do This:** * Directly insert untrusted data received from APIs into the database without validation. This can lead to SQL injection and other security vulnerabilities. * Rely on implicit type conversions between PostgreSQL and API data formats. Be explicit. **Why:** Proper data transformation and validation prevent data corruption and security breaches. Logging helps troubleshoot issues and track API usage. **Example:** """sql -- Validating and inserting JSON data from an API. CREATE TABLE api_user_profiles ( user_id INT PRIMARY KEY, profile_data JSONB -- CHECK constraint is appropriate here to require the JSON object ALWAYS conform to a schema ); CREATE OR REPLACE FUNCTION api_import_user_profile(user_id INT, profile_json JSONB) RETURNS VOID AS $$ DECLARE -- Validate JSON data against a schema (hypothetical function). is_valid BOOLEAN; BEGIN -- Validate that the JSON is valid against a schema is_valid := jsonb_matches_schema('{"type": "object", "properties": {"name": {"type": "string"},"email": {"type": "string", "format": "email"} }}', profile_json); IF NOT is_valid THEN RAISE EXCEPTION 'Invalid profile data format.'; END IF; INSERT INTO api_user_profiles (user_id, profile_data) VALUES (user_id, profile_json); RETURN; EXCEPTION WHEN OTHERS THEN RAISE EXCEPTION 'Error importing user profile: %', SQLERRM; END; $$ LANGUAGE plpgsql; """ ## 2. Implementation Details ### 2.1. Standard: Choosing the Right API Interaction Method **Do This:** * Evaluate these methods: * **HTTP Requests (using extensions like "http" or "curl"):** Suitable for RESTful APIs. * **Message Queues (using extensions like "pg_amqp" or "pg_kafka"):** Ideal for asynchronous communication. * **Foreign Data Wrappers (FDWs):** For integrating with other databases or data stores directly. * Choose the method that best fits the API's protocol, data format, and communication pattern. **Don't Do This:** * Force a specific integration method because it's familiar. Consider alternatives based on the API's characteristics. * Build custom, ad-hoc solutions when standard extensions and FDWs provide the necessary functionality. **Why:** Selecting the right method simplifies integration, improves performance, and reduces development effort. **Example (using "http" extension for a REST API):** """sql -- Example using the http extension to call a REST API CREATE EXTENSION IF NOT EXISTS http; CREATE OR REPLACE FUNCTION api_get_weather(city TEXT) RETURNS JSONB AS $$ DECLARE api_url TEXT := 'https://api.weatherapi.com/v1/current.json?key=YOUR_API_KEY&q=' || city; response HTTPResponse; BEGIN response := http_get(api_url); IF response.status_code = 200 THEN RETURN response.content::jsonb; ELSE RAISE EXCEPTION 'Weather API error: %', response.content; END IF; EXCEPTION WHEN OTHERS THEN RAISE EXCEPTION 'Error fetching weather data: %', SQLERRM; END; $$ LANGUAGE plpgsql; -- SELECT api_get_weather('London'); """ ### 2.2. Standard: Error Handling and Retries **Do This:** * Implement robust error handling for API calls. Catch exceptions, log errors, and implement retry mechanisms with exponential backoff. * Distinguish between transient and permanent errors. Retry transient errors (e.g., network timeouts), and log permanent errors (e.g., invalid API key) for investigation. * Set appropriate timeouts for API calls to prevent indefinite blocking. * Consider using "TRY...CATCH" blocks for error handling within PL/pgSQL functions. **Don't Do This:** * Ignore errors from API calls. At a minimum, log the error so it can be investigated later. * Retry indefinitely without a limit or backoff strategy. This can overload the API or the database. **Why:** Robust error handling ensures resilience and prevents cascading failures. It provides valuable insights into API issues. **Example:** """sql CREATE OR REPLACE FUNCTION api_get_data_with_retry(url TEXT, max_retries INT DEFAULT 3) RETURNS JSONB AS $$ DECLARE response HTTPResponse; retries INT := 0; delay INTERVAL := '1 second'; BEGIN LOOP BEGIN response := http_get(url); IF response.status_code = 200 THEN RETURN response.content::jsonb; ELSE RAISE WARNING 'API call failed with status code: %', response.status_code; -- Check for non-retryable errors here! -- IF response.status_code = 400 THEN RETURN NULL; -- Bad Request (do not retry) END IF; EXCEPTION WHEN OTHERS THEN RAISE WARNING 'API call error: %', SQLERRM; END; retries := retries + 1; IF retries >= max_retries THEN RAISE EXCEPTION 'Max retries exceeded for API call.'; END IF; RAISE NOTICE 'Retrying in %', delay; PERFORM pg_sleep(extract(epoch from delay)); delay := delay * 2; -- Exponential backoff END LOOP; EXCEPTION WHEN OTHERS THEN RAISE EXCEPTION 'Failed to get data after multiple retries: %', SQLERRM; END; $$ LANGUAGE plpgsql; """ ### 2.3. Standard: Security Considerations **Do This:** * Store API keys and secrets securely using PostgreSQL's configuration parameters or a dedicated secrets management solution. NEVER hardcode API keys in SQL code. * Use HTTPS for all API calls to encrypt data in transit. * Validate API responses to prevent data injection (e.g., JSON injection). * Implement rate limiting to prevent abuse. * Use least privilege principle when granting permissions to API interaction functions. **Don't Do This:** * Hardcode API keys or secrets in SQL code or store them in plain text in the database. * Trust API responses implicitly. Always validate the data. * Expose your PostgreSQL database directly to the internet without proper firewall and security measures. **Why:** Security is paramount. Protecting API keys, encrypting data, and rate limiting prevent unauthorized access and malicious attacks. **Example:** """sql -- Storing API key securely using postgresql.conf -- In postgresql.conf: -- api.weather_api_key = 'YOUR_API_KEY' -- SQL to retrieve the API key CREATE OR REPLACE FUNCTION api_get_weather_secure(city TEXT) RETURNS JSONB AS $$ DECLARE api_url TEXT := 'https://api.weatherapi.com/v1/current.json?key=' || current_setting('api.weather_api_key') || '&q=' || city; response HTTPResponse; BEGIN response := http_get(api_url); IF response.status_code = 200 THEN RETURN response.content::jsonb; ELSE RAISE EXCEPTION 'Weather API error: %', response.content; END IF; EXCEPTION WHEN OTHERS THEN RAISE EXCEPTION 'Error fetching weather data: %', SQLERRM; END; $$ LANGUAGE plpgsql SECURITY DEFINER; -- SECURITY DEFINER crucial for accessing external configurations -- Revoke execute permission from public REVOKE EXECUTE ON FUNCTION api_get_weather_secure(TEXT) FROM PUBLIC; -- Grant access to specific roles GRANT EXECUTE ON FUNCTION api_get_weather_secure(TEXT) TO your_application_role; """ ### 2.4. Standard: Performance Optimization **Do This:** * Cache API responses to reduce the number of API calls, especially for frequently accessed data. Use "MATERIALIZED VIEW" or a custom cache table. * Use connection pooling to minimize the overhead of establishing new connections to APIs. Some HTTP extensions do this internally. * Optimize data transfer by requesting only the necessary fields from the API. Use appropriate query parameters. **Don't Do This:** * Make redundant API calls. Identify opportunities for caching or batching. * Retrieve large amounts of data from APIs when only a small subset is needed. **Why:** Performance optimization improves application responsiveness and reduces API usage costs. **Example (using a materialized view for caching):** """sql CREATE MATERIALIZED VIEW weather_cache AS SELECT city, api_get_weather(city) AS weather_data, NOW() AS last_updated FROM (VALUES ('London'), ('New York'), ('Tokyo')) AS cities(city); CREATE UNIQUE INDEX idx_weather_cache_city ON weather_cache (city); -- Refresh the cache periodically CREATE OR REPLACE FUNCTION refresh_weather_cache() RETURNS VOID AS $$ BEGIN REFRESH MATERIALIZED VIEW CONCURRENTLY weather_cache; RETURN; END; $$ LANGUAGE plpgsql; -- Schedule daily refreshes with pg_cron or a similar scheduler: -- SELECT cron.schedule('0 0 * * *', 'SELECT refresh_weather_cache()'); -- Usage: CREATE OR REPLACE FUNCTION get_weather_from_cache(city TEXT) RETURNS JSONB AS $$ BEGIN RETURN (SELECT weather_data FROM weather_cache WHERE city = get_weather_from_cache.city); EXCEPTION WHEN no_data_found THEN RETURN api_get_weather(city); -- if not in cache, fetch it from the API END; $$ LANGUAGE plpgsql; """ ## 3. Coding Style and Conventions ### 3.1. Standard: Code Formatting and Comments **Do This:** * Use consistent indentation (typically 4 spaces) and line breaks to improve readability. * Add comments to explain complex logic, API calls, and data transformations. * Use meaningful names for variables, functions, and parameters. **Don't Do This:** * Write long, monolithic functions without comments or clear structure. * Use cryptic or ambiguous names. **Why:** Consistent formatting and clear comments make the code easier to understand and maintain. ### 3.2. Standard: Transaction Management **Do This:** * Wrap API calls within explicit transactions when necessary to ensure data consistency. Use "BEGIN", "COMMIT", and "ROLLBACK". * Handle potential errors during API calls gracefully and roll back the transaction if necessary. **Don't Do This:** * Leave transactions open for extended periods of time while waiting for API responses. * Commit transactions before ensuring the success of all related API calls. **Why:** Proper transaction management ensures data integrity and prevents inconsistencies. ### 3.3. Standard: Testing **Do This:** * Write unit tests for API interaction functions to verify that they handle different scenarios correctly (e.g., success, error, timeout). * Use mock APIs or stubs to isolate the database from external dependencies during testing. * Write integration tests to ensure that the database and APIs work together seamlessly. **Don't Do This:** * Skip testing API interaction code. This can lead to unexpected errors and integration issues in production. * Rely solely on manual testing. **Why:** Automated testing improves code quality, reduces the risk of regressions, and facilitates continuous integration and delivery. These API integration standards will help create reliable, secure, and maintainable PostgreSQL applications that integrate effectively with external services. Remember to stay updated with the latest PostgreSQL features and best practices as the ecosystem evolves.
# Core Architecture Standards for PostgreSQL This document outlines the coding standards for the core architecture of PostgreSQL. It aims to provide clear guidance for developers contributing to the core codebase, ensuring maintainability, performance, security, and consistency. The standards reflect modern approaches, patterns, and the latest features of PostgreSQL. ## 1. Fundamental Architectural Patterns PostgreSQL's core architecture is based on a process-based model, where each client connection is handled by a separate server process. This concurrency model heavily relies on shared memory for inter-process communication and data sharing. **Do This:** * Understand the process-based architecture deeply. Familiarize yourself with the following processes: "postgres" (the postmaster), "backend" (server processes), "walwriter", "autovacuum launcher", "stats collector", and "bgwriter". * Design extensions with process isolation in mind. Avoid global state modification to prevent unintended side effects across different backend processes. * Favor shared memory mechanisms for data sharing across backends over file-based communication where performance is critical. **Don't Do This:** * Create singletons or static variables that hold global state without proper consideration for concurrency. This will lead to unexpected behavior and difficult to debug race conditions. * Introduce shared resources without adequate locking mechanisms. * Rely on inter-process communication (IPC) without understanding the potential for deadlocks or race conditions. **Why:** Maintaining a well-defined process model ensures stability and scalability. Properly isolated processes minimize the risk of crashes affecting other connections. ### 1.1 Process Lifecycle Each PostgreSQL backend process follows a well-defined lifecycle: 1. **Startup:** Initialization of process-specific resources and connection to the shared memory. 2. **Authentication:** Verification of the client's identity. 3. **Query Processing:** Parsing, planning, and execution of SQL queries. 4. **Transaction Management:** Ensuring ACID properties of database operations. 5. **Shutdown:** Clean-up of resources and disconnection from shared memory. **Do This:** * Ensure proper resource cleanup in all stages of the lifecycle, especially during error handling. * Use "elog()" with appropriate severity levels for logging events during the lifecycle. * Catch and handle exceptions appropriately throughout the lifecycle. **Don't Do This:** * Leak resources (memory, file descriptors, etc.) during any phase of the process lifecycle. * Ignore errors during startup or shutdown. * Introduce long-running operations inside the authentication phase. **Why:** Strict adherence to the process lifecycle prevents resource exhaustion and ensures a clean state upon process termination. ### 1.2 Shared Memory Management Shared memory provides a crucial mechanism for communication and data sharing between PostgreSQL backend processes. **Do This:** * Use PostgreSQL's shared memory APIs (e.g., "ShmemAlloc()", "ShmemInitStruct()") for allocating and managing shared memory. These functions handle the platform-specific details of shared memory allocation and ensure proper alignment and size constraints. * Protect access to shared memory regions using appropriate locking mechanisms (e.g., "LWLock", "SpinLock"). * Define shared memory segments in "src/backend/utils/misc/ipc.c" or a relevant module's initialization function. **Don't Do This:** * Directly use system calls like "shmget()" and "shmat()" without going through PostgreSQL's shared memory APIs. * Assume atomicity of operations on shared memory regions. Always use locking. * Overallocate shared memory. Reserve only what is necessary. **Why:** Proper shared memory management prevents corruption, ensures data integrity, and avoids resource conflicts between processes. **Example:** """c /* Example of allocating and using shared memory */ typedef struct { int counter; LWLock lock; } MySharedData; static MySharedData *mySharedData; void initializeMySharedData(void) { bool found; mySharedData = ShmemInitStruct("MySharedData", sizeof(MySharedData), &found); if (!found) { /* Initialize shared memory on first allocation */ mySharedData->counter = 0; LWLockInitialize(&mySharedData->lock, LWLockAssign()); } } int incrementCounter(void) { int result; LWLockAcquire(&mySharedData->lock, LW_EXCLUSIVE); result = ++mySharedData->counter; LWLockRelease(&mySharedData->lock); return result; } """ ## 2. Project Structure and Organization PostgreSQL's source code is organized into a directory structure that reflects its functionality. **Do This:** * Familiarize yourself with the top-level directories: "src", "doc", "contrib", "src" is where the core source code resides. * Understand the purpose of subdirectories within "src", such as "backend", "include", and "port". * Place new code in the appropriate directory based on its functionality. * Maintain consistency in coding style and naming conventions within each directory. **Don't Do This:** * Randomly place files in arbitrary directories. * Create unnecessary dependencies between modules. * Violate the established directory structure without a clear justification. **Why:** A well-organized project structure facilitates navigation, understanding, and maintenance of the codebase. Clear directory conventions maintain code clarity. ### 2.1 Core Directories Key directories within the "src" directory include: * "src/backend": Contains the core backend code, including query processing, transaction management, storage, and indexing. * "src/include": Contains header files that define the interfaces used by the backend code. * "src/port": Contains platform-specific code. * "src/common": Contains code shared across multiple parts of the backend. * "src/fe_utils": Contains utilities used by the frontend. **Do This:** * Follow the existing directory structure when adding new features or modifying existing ones. * Create new subdirectories within existing directories if necessary to organize logically related code. * Use header files in "src/include" to define public interfaces for modules. **Don't Do This:** * Include implementation details in header files. * Create circular dependencies between directories. **Why:** A modular directory structure ensures a logical separation of concerns and minimizes dependencies between modules helping reduce build times. ### 2.2 Coding Style PostgreSQL has a well-defined coding style outlined in "doc/src/sgml/develop.sgml". **Do This:** * Adhere to the coding style guidelines regarding indentation, spacing, naming conventions, and comment formatting. * Use "pgindent" to automatically format your code. * Write concise and informative comments. **Don't Do This:** * Ignore the coding style guidelines. * Write lengthy or redundant comments. * Use inconsistent naming conventions. **Why:** Consistent coding style improves readability and maintainability of the code. "pgindent" ensures code conforms to the standard style automatically. ## 3. Modern Approaches and Patterns Modern PostgreSQL development emphasizes several key approaches: * **Extensibility:** PostgreSQL is designed to be extensible through extensions. * **Concurrency:** Handling multiple concurrent connections efficiently is crucial. * **Security:** Preventing vulnerabilities and ensuring data integrity are paramount. ### 3.1 Extension Development Extensions are the primary way to add new functionality to PostgreSQL. **Do This:** * Use the Extension Control File (".control") to define the extension's metadata. * Provide SQL scripts for creating and dropping database objects. * Use hooks ("ExecutorStart_hook", "ExecutorRun_hook", etc.) to extend the core functionality. * Follow the security guidelines for extension development. **Don't Do This:** * Modify the core PostgreSQL code directly (unless absolutely necessary and approved by the community). * Introduce security vulnerabilities through insecure extension code. * Make assumptions about the internal implementation details of PostgreSQL that could change in future versions. **Why:** Extensions allow adding new features without modifying the core code. **Example:** """sql -- Example SQL script for creating a function in an extension CREATE FUNCTION my_extension_function(text) RETURNS text AS '$libdir/my_extension', 'my_extension_function' LANGUAGE C IMMUTABLE STRICT; """ ### 3.2 Concurrency Control PostgreSQL uses Multi-Version Concurrency Control (MVCC) to manage concurrent access to data. **Do This:** * Understand MVCC and its implications for data consistency. * Use appropriate transaction isolation levels to prevent data anomalies. * Minimize lock contention by optimizing queries and using appropriate indexing strategies. * When working with internal data structures, be mindful of concurrent access and utilize PostgreSQL's locking primitives (LWLock, spinlocks) appropriately. **Don't Do This:** * Ignore the potential for data anomalies when using low transaction isolation levels. * Introduce unnecessary locking that could lead to deadlocks. * Perform long-running operations within a single transaction. **Why:** MVCC ensures data consistency and allows concurrent access to data. ### 3.3 Security Best Practices Security is a critical aspect of PostgreSQL development. **Do This:** * Follow secure coding practices to prevent vulnerabilities such as SQL injection and buffer overflows. * Use hardened APIs to avoid common security pitfalls. * Validate input data carefully. * Avoid hardcoding sensitive information such as passwords. * Be aware of the security implications of new features. **Don't Do This:** * Ignore security warnings. * Implement custom encryption algorithms (use PostgreSQL's built-in encryption features). * Grant excessive privileges to users or roles. **Why:** Secure coding practices are essential for preventing data breaches and ensuring the integrity of the database. ### 3.4 Memory Management Efficient Memory management is key to PostgreSQL's performance and stability. **Do This:** * Use PostgreSQL's memory context mechanism ("MemoryContext") for allocating and freeing memory within a query lifecycle. This mechanism provides automatic memory cleanup at the end of a query preventing memory leaks. * Understand the different memory contexts (e.g., "TopMemoryContext", "QueryMemoryContext") and use them appropriately. * Avoid manual memory management ("malloc"/"free") unless absolutely necessary (and only if you REALLY know what you are doing). Use PostgreSQL's "palloc"/"pfree" within a memory context. * Profile memory usage to identify and fix memory leaks. **Don't Do This:** * Leak memory by failing to free allocated memory. * Allocate large amounts of memory without considering the impact on performance. * Use "malloc"/"free" without a deep understanding of PostgreSQL's memory management. **Why:** Efficient memory management prevents memory leaks, reduces memory fragmentation, and improves overall performance. The memory context system automates this and integrates with the query processing lifecycle. **Example:** """c /* Example using MemoryContext */ MemoryContext myContext; char *data; /* Create a new memory context */ myContext = AllocSetContextCreate(CurrentMemoryContext, "MyContext", ALLOCSET_DEFAULT_SIZES); /* Switch to the new memory context */ MemoryContext oldContext = MemoryContextSwitchTo(myContext); /* Allocate memory within the new context */ data = palloc(100); /* Switch back to the previous memory context. The 'data' still exists */ MemoryContextSwitchTo(oldContext); /* ... use data ... */ /* At the end, the memory context 'myContext' is destroyed, and all memory allocated within it is automatically freed */ MemoryContextDelete(myContext); """ These standards aim to provide a comprehensive guide for contributing to the core architecture of PostgreSQL, by promoting best practices and ensuring code maintainability, performance, and security. By following these guidelines, developers can help ensure that PostgreSQL remains a robust, reliable, and extensible database system.