# Security Best Practices Standards for LLVM
This document outlines security best practices for LLVM development. Adhering to these guidelines will help produce more secure, robust, and maintainable code, mitigating potential vulnerabilities and fostering a secure development ecosystem. This document is meant to guide developers and AI coding assistants in producing higher quality code.
## 1. Input Validation & Sanitization
### 1.1 Importance of Input Validation
**Why:** LLVM, as a compiler infrastructure, processes arbitrary code provided as input. Unvalidated or improperly sanitized input can lead to vulnerabilities such as buffer overflows, command injection, and denial-of-service attacks.
**Do This:**
* Always validate input data, including code, flags, and configuration files.
* Use whitelisting (allow only known-good values) instead of blacklisting (block known-bad values) whenever feasible.
* Implement layered validation: validate at the parsing stage, semantic analysis, and code generation.
**Don't Do This:**
* Assume input is safe without validation.
* Rely solely on blacklisting patterns, as they can be easily bypassed.
### 1.2 Specific Validation Techniques
* **String Length Validation:** Ensure string lengths are within expected bounds *before* copying or processing them.
"""c++
#include
#include
void processName(const std::string& name) {
constexpr size_t maxNameLength = 256;
if (name.length() > maxNameLength) {
// Handle error: name too long
llvm::errs() << "Error: Name exceeds maximum length.\n";
return;
}
// Correct: Securely copy or process the validated name
std::string safeName = name.substr(0, maxNameLength);
// ... further processing with safeName ...
}
"""
* **Number Range Validation:** Verify numerical input is within a reasonable range.
"""c++
#include // for std::numeric_limits
void processSize(int size) {
constexpr int minSize = 0;
constexpr int maxSize = 1024;
if (size < minSize || size > maxSize) {
// Handle error: size out of range
llvm::errs() << "Error: Size is out of acceptable range.\n";
return;
}
// Correct: Proceed knowing size is within valid bounds
// ... further processing with validated size ...
}
"""
* **Type Validation:** Validate that input is of the correct type before doing anything with it to prevent unexpected type coercion.
* **File Path Validation:** When LLVM components accept file paths, validate them to prevent directory traversal attacks. Use "llvm::sys::path::remove_dots" and "llvm::sys::path::is_absolute" along with a whitelist of allowed directories.
"""c++
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Path.h"
#include "llvm/Support/raw_ostream.h"
#include
bool isValidFilePath(const std::string& filePath) {
llvm::SmallString<128> path(filePath);
llvm::sys::path::remove_dots(path, true);
// Define your allowed directory here. For demonstration, we allow only a "temp" directory.
std::string allowedRoot = "/tmp";
// Check if the path is absolute and starts with the allowed root.
if (!llvm::sys::path::is_absolute(path) ||
!llvm::StringRef(path.c_str()).startswith(allowedRoot)) {
llvm::errs() << "Error: Invalid file path. Must be an absolute path rooted in /tmp.\n";
return false;
}
// Further validation here, e.g., check file existence.
if (!llvm::sys::fs::exists(path)) {
llvm::errs() << "Error: File does not exist.\n";
return false;
}
return true;
}
"""
* **Command Line Argument Validation:** Validate command-line arguments against expected types, ranges, and formats. Use LLVM's "cl::opt" for argument parsing, which allows for defining validators.
"""c++
#include "llvm/Support/CommandLine.h"
using namespace llvm;
using namespace cl;
static cl::opt OptimizationLevel(
"O",
cl::desc("Optimization level. Range 0-3"),
cl::value_desc("level"),
cl::init(2), // Default value is 2
cl::Prefix,
cl::Optional,
cl::ValueRequired,
cl::validate>([](int value) {
if (value < 0 || value > 3) {
return make_error("Optimization level must be between 0 and 3.", llvm::inconvertibleErrorCode());
}
return Error::success();
})
);
int main(int argc, const char** argv) {
cl::ParseCommandLineOptions(argc, argv, "My Compiler");
// Access the validated value safely.
int optLevel = OptimizationLevel;
// ... use optLevel ...
}
"""
* **Data Structure Validation:** Validate the integrity of internal data structures after modifications or when receiving data from external sources.
**Anti-Patterns:**
* **Insufficient Validation:** Validating only at one point in the code. Data should be validated at all boundaries.
* **Ignoring Errors:** Not properly handling validation errors. Log errors, report them to the user, and prevent further processing with invalid input.
* **Using Regular Expressions (Regex) Uncarefully:** Using complex or untrusted regex patterns without proper escaping or limiting execution time, as they can be a denial-of-service vulnerability. If using regex, prefer LLVM's "llvm::Regex" class.
* **Blindly Trusting External Data:** Never assume that data from external sources, such as files, network connections, or environment variables, is trustworthy. Always validate these inputs thoroughly.
## 2. Buffer Overflow Protection
### 2.1 Importance of Buffer Overflow Prevention
**Why:** Buffer overflows remain a common source of vulnerabilities. Writing beyond the bounds of a buffer can overwrite adjacent memory, leading to arbitrary code execution or denial of service. LLVM components, particularly those involved in parsing and code generation, must be extremely careful to avoid buffer overflows.
**Do This:**
* When using C-style arrays, always check the size before writing.
* Prefer using "std::vector", "std::string", "llvm::SmallVector", and "llvm::StringRef" which provide automatic bounds checking and memory management.
* When working with raw memory buffers, utilize functions like "memcpy" and "strncpy" carefully, ensuring the destination buffer is large enough. Restrict the number of bytes copied to prevent overwrites
* Use AddressSanitizer (ASan) during development to detect memory errors.
**Don't Do This:**
* Use "strcpy" or "strcat", as they don't perform bounds checking.
* Make assumptions about the size of input data.
### 2.2 Code Examples
* **Using "llvm::SmallVector":**
"""c++
#include "llvm/ADT/SmallVector.h"
#include "llvm/Support/raw_ostream.h"
void processData(const llvm::ArrayRef inputData) {
llvm::SmallVector buffer;
buffer.reserve(inputData.size() * 2); // Pre-allocate to avoid reallocations.
for (int value : inputData) {
if (value > 0) {
buffer.push_back(value * 2);
}
}
// Correct: Use the buffer safely
for (int val : buffer) {
llvm::outs() << val << "\n";
}
}
"""
* **Using "llvm::StringRef" (for read-only access):**
"""c++
#include "llvm/ADT/StringRef.h"
#include "llvm/Support/raw_ostream.h"
void printFirstNChars(llvm::StringRef str, size_t n) {
if (n > str.size()) {
n = str.size(); // Safeguard against out-of-bounds access.
}
// Correct: Access the string safely using StringRef's size.
llvm::outs() << str.substr(0, n) << "\n";
}
"""
* **Safe "memcpy" usage (when raw memory manipulation is unavoidable):**
"""c++
#include
void copyData(const void* source, void* destination, size_t sourceSize, size_t destinationSize) {
if (destinationSize < sourceSize) {
// Handle error: buffer too small
llvm::errs() << "Error: Destination butter is too small.\n";
return;
}
// Correct: Copy data safely using memcpy with size checks
memcpy(destination, source, sourceSize);
}
"""
**Anti-Patterns:**
* **Unchecked Array Access:** Accessing array elements without validating the index.
* **Off-by-One Errors:** Incorrectly calculating buffer sizes, leading to overflows.
* **Incorrectly Sized Buffers:** allocating a buffer that can be too big.
## 3. Integer Overflow Protection
### 3.1 Importance of Integer Overflow Prevention
**Why:** Integer overflows can cause unexpected behavior, including incorrect calculations, buffer overflows (when used to calculate buffer sizes), and even arbitrary code execution.
**Do This:**
* Use checked arithmetic operations where possible. Many compilers offer built-in functions or libraries for this purpose (e.g., "__builtin_add_overflow" in GCC/Clang).
* Explicitly check for potential overflows before performing arithmetic operations, especially when dealing with user-provided input or values derived from it.
**Don't Do This:**
* Assume that integer arithmetic always behaves as expected without checking for potential overflows.
* Ignore compiler warnings related to potential integer overflows.
### 3.2 Code Examples
* **Using compiler built-in overflow checking (Clang/GCC):**
"""c++
#include // for std::numeric_limits
#include
bool safeAdd(int a, int b, int &result) {
if (__builtin_add_overflow(a, b, &result)) {
// Overflow occurred
std::cerr << "Error: Integer overflow detected!\n";
return false; // Indicate failure
}
return true; // Indicate success
}
int main() {
int x = std::numeric_limits::max();
int y = 1;
int sum;
if (safeAdd(x, y, sum)) {
std::cout << "Sum: " << sum << std::endl;
} else {
std::cout << "Addition failed due to overflow.\n";
}
return 0;
}
"""
* **Manual overflow checking:**
"""c++
#include
bool safeMultiply(int a, int b, int &result) {
if (a > std::numeric_limits::max() / b) {
// Multiplication would overflow
llvm::errs() << "Error: Integer overflow detected!\n";
return false;
}
result = a * b;
return true;
}
"""
**Anti-Patterns:**
* **Ignoring Overflow Possibilities:** Performing arithmetic operations without considering the range of possible results.
* **Using Unsigned Integers as a Fix:** While unsigned integers wrap around predictably, this can still mask errors and lead to unexpected behavior. It's better to explicitly check for overflows. Using wraparound that is not desired may also violate MISRA C/C++ standards.
## 4. Format String Vulnerabilities
### 4.1 Importance of preventing format string vulnerabilities
**Why:** Using user-controlled strings as format strings in functions like "printf" can lead to arbitrary code execution. This happens because format specifiers (e.g., "%s", "%x", "%n") can read from or write to arbitrary memory locations.
**Do This:**
* Always use string literals as format strings. If you need to print user-provided data, pass it as an argument to the format function, not as part of the format string.
* Use safer alternatives like LLVM’s "llvm::raw_ostream".
* If you are making a function that makes use of a format string, use LLVM’s "format" library for safer handling.
**Don't Do This:**
* Never use user-controlled input directly as a format string.
* Disable format string protection flags in the compiler.
### 4.2 Code Examples
* **Vulnerable code:**
"""c++
#include
#include
void printMessage(const std::string& message) {
// Vulnerable: message is used directly as the format string.
printf(message.c_str());
}
"""
* **Correct code using "llvm::outs()":**
"""c++
#include "llvm/Support/raw_ostream.h"
#include
void printMessage(const std::string& message) {
// Correct: Pass the message as an argument to llvm::outs().
llvm::outs() << message << "\n";
}
"""
* **Correct use of the "format" library:**
"""c++
#include "llvm/Support/Format.h"
#include "llvm/Support/raw_ostream.h"
#include
void printFormattedMessage(const std::string& name, int value) {
std::string formattedStr = llvm::formatv("Name: {0}, Value: {1:d}", name, value).str();
llvm::outs() << formattedStr << "\n";
}
"""
**Anti-Patterns:**
* **Using User Input as Format String:** Directly passing user input or data derived from it as a format string to functions like "printf", "fprintf", or "sprintf".
* **Ignoring Compiler Warnings:** Disabling or ignoring compiler warnings related to format string vulnerabilities.
## 5. Resource Management (Memory Leaks, File Descriptors)
### 5.1 Importance of Resource Management
**Why:** Failure to properly manage resources, such as memory, file descriptors, and network connections, can lead to resource exhaustion, denial-of-service attacks, and other vulnerabilities.
**Do This:**
* Use RAII (Resource Acquisition Is Initialization) to tie the lifetime of resources to the lifetime of objects.
* Prefer smart pointers ("std::unique_ptr", "std::shared_ptr") for automatic memory management.
* Always close file descriptors and network connections when they are no longer needed.
**Don't Do This:**
* Allocate memory without a corresponding "delete".
* Leave file descriptors or network connections open indefinitely.
* Ignore errors when acquiring or releasing resources.
### 5.2 Code Examples
* **Using "std::unique_ptr":**
"""c++
#include
class MyObject {
public:
MyObject() { /* ... */ }
~MyObject() { /* Cleanup code */ }
};
void process() {
// Correct: MyObject will be automatically deleted when uniquePtr goes out of scope.
std::unique_ptr uniquePtr(new MyObject());
// ... use uniquePtr ...
}
"""
* **RAII for File Handling**
"""c++
#include
#include
#include
class FileGuard {
private:
std::ofstream fileStream;
std::string filePath;
public:
// Constructor: Opens the file.
FileGuard(const std::string& path) : filePath(path), fileStream(path) {
if (!fileStream.is_open()) {
throw std::runtime_error("Could not open file: " + path);
}
llvm::errs() << "Opened " << filePath << "\n";
}
// Destructor: Closes the file. Always executed.
~FileGuard() {
if (fileStream.is_open()) {
fileStream.close();
llvm::errs() << "Closed " << filePath << "\n";
}
}
// Provide access to the file stream
std::ofstream& get() {
return fileStream;
}
// Prevent copying to avoid double close.
FileGuard(const FileGuard&) = delete;
FileGuard& operator=(const FileGuard&) = delete;
// Allow moving
FileGuard(FileGuard&&) = default;
FileGuard& operator=(FileGuard&&) = default;
};
int main() {
try {
//Guaranteed to close file on scope exit even if exception arise.
FileGuard guard("example.txt");
guard.get() << "Hello, RAII!\n";
llvm::errs() << "Wrote to file.\n";
// Simulate an exception
throw std::runtime_error("Simulated error!");
} catch (const std::exception& e) {
llvm::errs() << "Exception caught: " << e.what() << "\n";
}
return 0;
}
"""
**Anti-Patterns:**
* **Raw Pointers Without Ownership:** Using raw pointers without clear ownership semantics, leading to potential memory leaks or double frees.
* **Ignoring Exceptions in Destructors:** Throwing exceptions from destructors can lead to program termination or undefined behavior.
* **Manual Resource Management with Complex Logic:** Relying on manual resource management in functions with complex control flow, increasing the risk of errors.
## 6. Thread Safety and Data Races
### 6.1 Importance of Thread Safety
**Why:** LLVM is increasingly used in multithreaded environments. Data races and other concurrency issues can lead to unpredictable behavior, including memory corruption and security vulnerabilities.
**Do This:**
* Use appropriate locking mechanisms (e.g., mutexes, atomic operations) to protect shared data.
* Follow lock ordering conventions to prevent deadlocks.
* Minimize the scope of locks to reduce contention.
* Use thread-safe data structures provided by LLVM or the standard library.
**Don't Do This:**
* Access shared data without proper synchronization.
* Hold locks for extended periods, blocking other threads.
* Introduce dependencies between locks that can cause deadlocks.
### 6.2 Code Examples
* **Using "llvm::sys::Mutex":**
"""c++
#include "llvm/Support/thread.h"
#include
llvm::sys::Mutex myMutex;
std::vector sharedData;
void threadFunc(int id) {
// Correct: Acquire lock before accessing shared data.
llvm::sys::ScopedLock lock(myMutex);
sharedData.push_back(id);
llvm::errs() << "Thread " << id << " added to sharedData\n";
}
"""
* **Using Atomic Operations:**
"""c++
#include
#include "llvm/Support/raw_ostream.h"
std::atomic counter(0);
void incrementCounter() {
// Correct: Atomically increment the counter.
counter++;
}
"""
**Anti-Patterns:**
* **Unprotected Shared Data:** Accessing shared variables from multiple threads without any synchronization mechanisms.
* **Large Critical Sections:** Holding locks for long periods or around complex operations, reducing concurrency and increasing contention.
* **Ignoring Memory Ordering:** Neglecting memory ordering constraints when using atomic operations, potentially leading to unexpected behavior.
* **Lock Inversion:** Acquiring locks in different orders in different threads, creating a deadlock risk.
## 7. Error Handling and Exception Safety
### 7.1 Importance of Proper Error Handling
**Why:** Robust error handling is essential for preventing unexpected program termination and ensuring the integrity of data. Poorly handled errors can also introduce security vulnerabilities.
**Do This:**
* Use exceptions to signal errors that cannot be handled locally.
* Ensure exception safety: guarantee that resources are properly released even in the presence of exceptions.
* Log errors with sufficient context to aid in debugging.
* Use "llvm::Expected" for functions that may fail but where throwing an exception is undesirable.
**Don't Do This:**
* Ignore error codes or exceptions.
* Rely on global error variables.
* Leak resources in the event of an exception.
* Throw exceptions across module boundaries, prefer error codes or "llvm::Expected" instead.
### 7.2 Code Examples
* **Using "llvm::Expected":**
"""c++
#include "llvm/Support/Error.h"
#include "llvm/Support/raw_ostream.h"
llvm::Expected divide(int a, int b) {
if (b == 0) {
return llvm::make_error("Division by zero", llvm::inconvertibleErrorCode());
}
return a / b;
}
void processDivision(int x, int y) {
llvm::Expected result = divide(x, y);
if (result) {
llvm::outs() << "Result: " << *result << "\n";
} else {
llvm::errs() << "Error: " << llvm::toString(result.takeError()) << "\n";
}
}
"""
* **Exception safe class:**
"""c++
#include
#include
#include
class ResourceHolder {
private:
std::unique_ptr data; // Use a smart pointer
size_t size;
public:
ResourceHolder(size_t size) : size(size), data(new int[size]) {
llvm::errs() << "ResourceHolder allocated memory.\n";
// Simulate an exception during initialization.
// Ensures a partially constructed object is safely cleaned up.
if (size > 1000) {
throw std::runtime_error("Resource size too large!");
}
}
~ResourceHolder() {
llvm::errs() << "ResourceHolder releasing memory.\n";
}
// No copy semantics to prevent double deletion
ResourceHolder(const ResourceHolder&) = delete;
ResourceHolder& operator=(const ResourceHolder&) = delete;
// Provide move semantics
ResourceHolder(ResourceHolder&&) = default;
ResourceHolder& operator=(ResourceHolder&&) = default;
void writeData(size_t index, int value) {
if (index >= size) {
throw std::out_of_range("Index out of bounds.");
}
data[index] = value;
llvm::errs() << "Wrote " << value << " to index " << index << "\n";
}
};
int main() {
try {
ResourceHolder res(100);
res.writeData(0, 42);
res.writeData(1, 123);
// ... Use the resource ...
llvm::errs() << "ResourceHolder used.\n";
} catch (const std::exception& e) {
llvm::errs() << "Exception caught: " << e.what() << "\n";
// Handle the exception
}
llvm::errs() << "Exiting main.\n";
return 0;
}
"""
**Anti-Patterns:**
* **Ignoring Error Codes:** Disregarding return values from functions that indicate errors, leading to continued execution with potentially corrupted data.
* **Naked "new" and "delete":** Using raw "new" and "delete" without proper exception handling, risking memory leaks in case of exceptions.
* **Throwing Exceptions from Destructors:** Throwing exceptions from destructors can lead to program termination or undefined behavior, especially during stack unwinding.
* **Catching Exceptions By Reference:** Only catch exceptions by const reference ("catch (const std::exception& e)"), and only throw exception objects, not pointers.
## 8. Security Auditing and Testing
### 8.1 Importance of Security Auditing and Testing
**Why:** Regular security audits and testing are essential for identifying and mitigating vulnerabilities before they can be exploited.
**Do This:**
* Conduct regular code reviews with a focus on security.
* Use static analysis tools to detect potential vulnerabilities.
* Write unit tests and integration tests that specifically target security-related aspects of the code.
* Use fuzzing to test the robustness of the code against unexpected or malicious input.
* Integrate security testing into the continuous integration (CI) process.
* Use AddressSanitizer, MemorySanitzer, and UndefinedBehaviorSanitizer to detect memory errors and undefined behavior.
**Don't Do This:**
* Assume that code is secure without proper testing.
* Rely solely on automated tools; manual code review is also essential.
* Neglect to update security tests as the code evolves.
### 8.2 Tools and Techniques
* **Static Analysis:** Use tools like clang-tidy and Coverity Scan to identify potential vulnerabilities.
"""bash
clang-tidy -checks='*' MyFile.cpp -- -I/path/to/llvm/include
"""
* **Fuzzing:** Use tools like libFuzzer integrated with LLVM to automatically generate test inputs.
* Example LibFuzzer usage for a simple function:
"""c++
// my_fuzzer.cpp
#include
#include
#include
// The function to be fuzzed
bool MyFunction(const uint8_t *Data, size_t Size) {
if (Size < 3) return false;
if (Data[0] == 'F' && Data[1] == 'U' && Data[2] == 'Z') {
std::cerr << "Found vulnerable input!\n";
// Trigger some error here or call a potentially risky function. For demonstration purposes, exit:
exit(1);
}
return false;
}
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
MyFunction(Data, Size);
return 0;
}
"""
* Compile with: "clang++ -fsanitize=address,fuzzer my_fuzzer.cpp -o my_fuzzer"
* Run with: "./my_fuzzer"
* **Dynamic Analysis:** Use AddressSanitizer (ASan), MemorySanitizer (MSan), and UndefinedBehaviorSanitizer (UBSan) to detect memory errors and undefined behavior at runtime. These sanitizers are integrated into Clang and can be enabled with compiler flags.
"""bash
clang++ -fsanitize=address -fsanitize=undefined MyFile.cpp -o MyExecutable
"""
**Anti-Patterns:**
* **Lack of Security Focus:** Performing code reviews without a specific focus on security vulnerabilities.
* **Ignoring Static Analysis Findings:** Ignoring or dismissing warnings from static analysis tools without proper investigation.
* **Insufficient Testing:** Relying solely on basic unit tests without targeted security testing or fuzzing.
* **Not Integrating Security into CI/CD:** Failing to incorporate security testing into the continuous integration and continuous deployment pipelines.
* **No Threat Model:** Not having a clear understanding of the potential threats to the system and how they might be exploited.
## 9. Dependencies and Third-Party Libraries
### 9.1 Importance of Secure Dependency Management:
**Why**: Third-party libraries can introduce vulnerabilities. It's crucial to manage dependencies carefully to minimize the risk of using vulnerable code.
**Do This:**
* Keep dependencies up-to-date with the latest security patches.
* Use dependency management tools to track and manage dependencies.
* Regularly scan dependencies for known vulnerabilities using tools like Dependabot or Snyk.
* Vet external dependencies prior to inclusion for suspicious or unmaintained code. Consider their security policies.
* Prefer dependencies with active security maintenance.
* Use static analysis tools to check for vulnerabilities in third-party code.
**Don't Do This:**
* Use outdated or unmaintained dependencies.
* Ignore security warnings from dependency scanning tools.
* Bundle vulnerable code into the build.
### 9.2 Examples and Best Practices
* **Updating Dependencies:** Regularly update third-party libraries to incorporate the latest security fixes.
"""bash
# Example using a hypothetical package manager
update-dependencies --security-only
"""
* **Dependency Scanning**
Integrate a dependency scanning tool into your CI/CD pipeline to automatically check for vulnerabilities whenever dependencies are updated.
"""yaml
# Example CI configuration using Snyk
jobs:
- scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Run Snyk to check for vulnerabilities
uses: snyk/actions/snyk-scan@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
with:
command: monitor
"""
**Anti-Patterns:**
* **Blindly Trusting Dependencies:** Assuming that third-party libraries are always secure without verification.
* **Using Outdated Versions:** Failing to update dependencies regularly to apply security patches.
* **Ignoring Security Alerts:** Disregarding security warnings from dependency scanning tools.
* **Embedding Vulnerable Code:** Bundling vulnerable third-party code directly into the LLVM build.
* **Adding Unnecessary Dependencies:** Including any third-party dependency that is not essential.
## 10. Privilege Separation and Least Privilege
### 10.1 Importance of Privilege Separation and Least Privilege
**Why:** Limiting the privileges of LLVM components and adhering to the principle of least privilege reduces the potential impact of a security breach. If a component is compromised, the attacker's access to sensitive data and system resources will be limited
**Do This:**
* Run LLVM components with the minimum necessary privileges.
* Use separate processes or containers to isolate components from each other.
* Avoid running components as root or with administrator privileges.
* Implement access control mechanisms to restrict access to sensitive data and functions.
**Don't Do This:**
* Run all components with the same high level of privileges.
* Grant unnecessary permissions to users or processes.
* Store sensitive data in easily accessible locations.
### 10.2 Code Examples & Usage
* **Limiting Privileges in a Build System:**
When executing build scripts, use a dedicated user account with limited privileges.
"""bash
# Example using Docker to build in a container with limited privileges
docker run --user builduser -v $(pwd):/app -w /app mybuildimage ./build.sh
"""
**Anti-Patterns:**
* **Running Components as Root:** Running LLVM components as the root user without a valid reason.
* **Granting Excessive Permissions:** Providing users or processes with more privileges than they require.
* **Weak Access Control:** Implementing inadequate access control mechanisms that allow unauthorized access to sensitive resources.
* **Ignoring Principle of Least Privilege:** Violating the principle of least privilege by granting broad access to data or resources.
By adhering to these security best practices, LLVM developers can create more resilient and secure software, reducing the risk of vulnerabilities and protecting against potential attacks.
danielsogl
Created Mar 6, 2025
This guide explains how to effectively use .clinerules
with Cline, the AI-powered coding assistant.
The .clinerules
file is a powerful configuration file that helps Cline understand your project's requirements, coding standards, and constraints. When placed in your project's root directory, it automatically guides Cline's behavior and ensures consistency across your codebase.
Place the .clinerules
file in your project's root directory. Cline automatically detects and follows these rules for all files within the project.
# Project Overview project: name: 'Your Project Name' description: 'Brief project description' stack: - technology: 'Framework/Language' version: 'X.Y.Z' - technology: 'Database' version: 'X.Y.Z'
# Code Standards standards: style: - 'Use consistent indentation (2 spaces)' - 'Follow language-specific naming conventions' documentation: - 'Include JSDoc comments for all functions' - 'Maintain up-to-date README files' testing: - 'Write unit tests for all new features' - 'Maintain minimum 80% code coverage'
# Security Guidelines security: authentication: - 'Implement proper token validation' - 'Use environment variables for secrets' dataProtection: - 'Sanitize all user inputs' - 'Implement proper error handling'
Be Specific
Maintain Organization
Regular Updates
# Common Patterns Example patterns: components: - pattern: 'Use functional components by default' - pattern: 'Implement error boundaries for component trees' stateManagement: - pattern: 'Use React Query for server state' - pattern: 'Implement proper loading states'
Commit the Rules
.clinerules
in version controlTeam Collaboration
Rules Not Being Applied
Conflicting Rules
Performance Considerations
# Basic .clinerules Example project: name: 'Web Application' type: 'Next.js Frontend' standards: - 'Use TypeScript for all new code' - 'Follow React best practices' - 'Implement proper error handling' testing: unit: - 'Jest for unit tests' - 'React Testing Library for components' e2e: - 'Cypress for end-to-end testing' documentation: required: - 'README.md in each major directory' - 'JSDoc comments for public APIs' - 'Changelog updates for all changes'
# Advanced .clinerules Example project: name: 'Enterprise Application' compliance: - 'GDPR requirements' - 'WCAG 2.1 AA accessibility' architecture: patterns: - 'Clean Architecture principles' - 'Domain-Driven Design concepts' security: requirements: - 'OAuth 2.0 authentication' - 'Rate limiting on all APIs' - 'Input validation with Zod'
# API Integration Standards for LLVM This document outlines the coding standards for integrating external APIs and backend services within the LLVM project. It focuses on patterns and practices that ensure maintainability, performance, and security, while adhering to the existing LLVM coding conventions. These guidelines aim to provide a consistent approach to API integration across the LLVM ecosystem. ## 1. Introduction Integrating LLVM components with external APIs and backend services requires careful consideration to maintain the project's stability, performance, and security. This document provides guidelines for creating robust, understandable, and maintainable integrations. It covers best practices for error handling, data serialization, asynchronous operations, authentication, and more. While LLVM traditionally avoids extensive external dependencies, certain tools and analyses may benefit significantly from external integration. This document addresses these scenarios, keeping the core LLVM principles in mind. ## 2. General Principles * **Minimize Dependencies:** Strive to minimize external dependencies. Evaluate the cost of introducing a new dependency against the benefits it provides. Consider if the functionality can be implemented within LLVM components. * **Do This:** Carefully evaluate whether an external dependency is truly necessary. * **Don't Do This:** Introduce dependencies without a thorough assessment of their impact on the project. * **Maintainability & Readability:** Code should be self-documenting, easy to understand, and follow LLVM's overall coding style. * **Do This:** Use meaningful variable and function names. Add comments explaining complex logic or integration points. Follow the LLVM coding style consistently. * **Don't Do This:** Write overly complex or cryptic code. Skimp on comments, especially around API calls. * **Error Handling:** Implement robust error handling to gracefully handle API failures. Log errors appropriately. * **Do This:** Use exception handling or checked error returns as appropriate. Provide informative error messages. * **Don't Do This:** Ignore error codes or exceptions. Allow exceptions to propagate uncaught. * **Security:** Protect against common security vulnerabilities (e.g., injection attacks, data breaches) when interacting with external APIs. * **Do This:** Validate all inputs from external APIs. Use secure communication protocols (HTTPS). Follow security best practices for the external platform. * **Don't Do This:** Trust data received from external APIs without validation. Store sensitive information unencrypted. ## 3. Connecting with Backend Services ### 3.1 Architectural Considerations * **Abstraction:** Introduce an abstraction layer to isolate the LLVM components from the specifics of the external API. This makes it easier to change the integration implementation or switch to a different API in the future. * **Why:** This reduces coupling and promotes a clear separation of concerns. * **Design Patterns:** Employ proven design patterns like Facade, Adapter, or Repository to structure the integration code. * **Why:** These patterns improve code organization, testability, and reduce code duplication. """c++ // Example: Facade pattern for API integration class ExternalAPI { public: virtual std::string fetchData(const std::string& query) = 0; virtual ~ExternalAPI() = default; }; class ConcreteExternalAPI : public ExternalAPI { public: std::string fetchData(const std::string& query) override { // Implementation to call the actual external API (e.g., using libcurl) // Example (placeholder): std::string result = "Data from external API for query: " + query; return result; } }; class LLVMDataService { public: LLVMDataService(ExternalAPI* api) : externalAPI(api) {} std::string retrieveData(const std::string& query) { // Perform LLVM-specific logic before calling the API llvm::outs() << "Preparing to fetch data for: " << query << "\n"; std::string data = externalAPI->fetchData(query); // Perform LLVM-specific logic after calling the API llvm::outs() << "Data retrieved successfully.\n"; return data; } private: ExternalAPI* externalAPI; }; // Usage: // ExternalAPI* apiImpl = new ConcreteExternalAPI(); // LLVMDataService service(apiImpl); // std::string data = service.retrieveData("some_query"); """ ### 3.2 Implementation Details * **HTTP Clients:** Use a well-established HTTP client library (e.g., "libcurl"). Ensure it is properly configured to handle TLS/SSL and other security concerns. Consider using a high-level library wrapper for easier use, but ensure it doesn't add significant overhead. * **Do This:** Choose a robust, widely used library with good support for security features. * **Don't Do This:** Roll your own HTTP client or use a deprecated library. * **Data Serialization:** Use a standardized data serialization format (e.g., JSON, Protocol Buffers). Use a library specifically designed for the format. * **Do This:** Choose a format appropriate for your data and performance requirements. Use "rapidjson" or "llvm::json" for smaller lightweight payloads where speed is critical. Use protobuf where schemas are well defined and versioning is a concern. * **Don't Do This:** Use custom or ad-hoc serialization formats. Serialize sensitive data without encryption. """c++ // Example: Using llvm::json for serializing and deserializing data #include "llvm/Support/JSON.h" #include <string> #include <vector> namespace llvm { void serializeData() { json::Object obj; obj["name"] = "Example Data"; obj["value"] = 42; obj["items"] = json::Array{1, 2, 3, 4, 5}; std::string jsonString = json::write(obj); llvm::outs() << "Serialized JSON: " << jsonString << "\n"; } void deserializeData(const std::string& jsonString) { Expected<json::Value> jsonValue = json::parse(jsonString); if (!jsonValue) { llvm::errs() << "Failed to parse JSON: " << toString(jsonValue.takeError()) << "\n"; return; } if (jsonValue->kind() != json::Value::Object) { llvm::errs() << "Expected JSON object.\n"; return; } json::Object& obj = *jsonValue->getAsObject(); std::string name = obj["name"]->getAsString().value(); int value = obj["value"]->getAsInteger().value(); json::Array& items = *obj["items"]->getAsArray(); llvm::outs() << "Name: " << name << "\n"; llvm::outs() << "Value: " << value << "\n"; llvm::outs() << "Items: "; for (auto& item : items) { llvm::outs() << item->getAsInteger().value() << " "; } llvm::outs() << "\n"; } } // namespace llvm // Usage: // llvm::serializeData(); // llvm::deserializeData(R"({"name":"Example Data","value":42,"items":[1,2,3,4,5]})"); """ * **Asynchronous Operations:** Use asynchronous operations to avoid blocking the main thread, especially for long-running API calls. Use "std::future" and "std::async". * **Do This:** Launch API calls in separate threads or use a non-blocking I/O model. * **Don't Do This:** Perform synchronous API calls on the main thread. * **Authentication:** Implement a secure authentication mechanism (e.g., OAuth 2.0, API keys). Store credentials securely (e.g., using a secrets manager). * **Do This:** Use established authentication protocols. Regularly rotate API keys. * **Don't Do This:** Hardcode credentials in the source code. Store credentials in plain text. Store secrets directly in git. ## 4. LLVM-Specific Considerations * **Integration Points:** Identify appropriate extension points within LLVM for integrating with external APIs. Common areas include: * **Passes:** Create a new pass to interact with the API. * **Analysis Utilities:** Extend analysis utilities to fetch data from external sources. * **Target-Specific CodeGen:** Modify target-specific code generation to leverage external services. * **LLVM Context:** Ensure the API calls do not interfere with the LLVM context or the overall compilation process. * **Do This:** Create a separate LLVM context for API-related operations (if necessary). Carefully synchronize access to shared resources. * **Don't Do This:** Directly modify the LLVM context from API callbacks without proper synchronization. * **Error Reporting:** Use LLVM's error reporting mechanisms to provide informative error messages related to API failures. * **Do This:** Use "llvm::errs()" and other LLVM error reporting tools to propagate errors to the user. * **Don't Do This:** Use "std::cerr" or other generic error streams. """c++ // Example: Reporting errors using llvm::errs() #include "llvm/Support/raw_ostream.h" void handleAPIError(const std::string& errorMessage) { llvm::errs() << "Error during API call: " << errorMessage << "\n"; } // Usage: // if (apiCallFailed) { // handleAPIError("Failed to retrieve data from the external service."); // } """ ## 5. Modern Approaches and Patterns * **gRPC:** Consider using gRPC for communication with backend services. gRPC is a high-performance, open-source universal RPC framework. Its advantages include: * **Protocol Buffers:** Uses Protocol Buffers for efficient serialization. * **Code Generation:** Automatically generates client and server code from protocol definitions. * **Multiple Languages:** Supports multiple programming languages. * **Microservices:** Design integrations following a microservices architecture. Decompose the API integration into smaller, independent services. * **Benefits:** Improved scalability, maintainability, and fault isolation. * **Considerations:** increased complexity and managing inter-service communication. * **Serverless Functions:** Use serverless functions (e.g., AWS Lambda, Azure Functions) to implement API integrations. * **Benefits:** Scalability, cost-effectiveness, and reduced operational overhead. ## 6. Common Anti-Patterns and Mistakes * **Tight Coupling:** Tightly coupling LLVM components with external APIs makes the code fragile and difficult to test/maintain. * **Ignoring Rate Limits:** Exceeding API rate limits can lead to service disruptions or being blocked. * **Lack of Monitoring:** Failing to monitor the health and performance of API integrations results in delayed problem detection and resolution. ## 7. Performance Optimization * **Caching:** Implement caching mechanisms to reduce the number of API calls, especially for frequently requested data. Use "llvm::StringMap" or other LLVM data structures for efficient storage. * **Do This:** Implement a cache with a reasonable expiration policy (TTL). Use a cache key that accurately reflects the data being cached. * **Don't Do This:** Cache sensitive data without proper encryption. Store an unbounded cache that grows indefinitely and consumes resources. * **Batching:** Batch multiple API requests into a single call to reduce network overhead. * **Why:** Reduces the overhead of multiple requests. * **Compression:** Enable data compression to reduce the size of data transmitted over the network. * **Why:** Reduces bandwidth usage and improves transfer speed. ## 8. Security Best Practices * **Input Validation:** Validate all inputs from external APIs to prevent injection attacks and other vulnerabilities. * **Do This:** Use whitelisting to allow only valid characters, formats, and lengths. Understand the specific validation requirements of the LLVM code consuming the external data. * **Don't Do This:** Trust data received from external APIs without validation. * **Data Encryption:** Encrypt sensitive data both in transit and at rest. Use TLS/SSL for communication over the network. * **Access Control:** Implement proper access control to restrict access to sensitive data and API endpoints. * **Regular Security Audits:** Conduct regular security audits to identify and address potential vulnerabilities. ## 9. Example: Integrating with a Simple REST API This example demonstrates a simplified integration with a REST API using "libcurl". It uses the Facade pattern to abstract the API interaction. """c++ #include "llvm/Support/raw_ostream.h" #include <curl/curl.h> #include <string> #include <stdexcept> namespace llvm { class RESTAPI { public: virtual std::string fetchData(const std::string& url) = 0; virtual ~RESTAPI() = default; }; class CurlRESTAPI : public RESTAPI { public: std::string fetchData(const std::string& url) override { std::string response; CURL* curl = curl_easy_init(); if (!curl) { throw std::runtime_error("Failed to initialize libcurl"); } curl_easy_setopt(curl, CURLOPT_URL, url.c_str()); curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, writeCallback); curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response); curl_easy_setopt(curl, CURLOPT_USERAGENT, "LLVM REST Client"); // Set a user agent CURLcode res = curl_easy_perform(curl); if (res != CURLE_OK) { curl_easy_cleanup(curl); throw std::runtime_error("curl_easy_perform() failed: " + std::string(curl_easy_strerror(res))); } curl_easy_cleanup(curl); return response; } private: static size_t writeCallback(void* contents, size_t size, size_t nmemb, std::string* output) { size_t totalSize = size * nmemb; output->append((char*)contents, totalSize); return totalSize; } }; class MyLLVMTool { public: MyLLVMTool(RESTAPI* api) : restAPI(api) {} std::string fetchDataFromAPI(const std::string& query) { try { std::string apiUrl = "https://api.example.com/data?q=" + query; std::string data = restAPI->fetchData(apiUrl); return data; } catch (const std::exception& e) { llvm::errs() << "Error fetching data: " << e.what() << "\n"; return ""; // Or handle the error more gracefully as needed } } private: RESTAPI* restAPI; }; } // namespace llvm // Usage: // llvm::RESTAPI* apiImpl = new llvm::CurlRESTAPI(); // llvm::MyLLVMTool tool{apiImpl}; // std::string data = tool.fetchDataFromAPI("some_query"); """ **Important Considerations:** * This example is simplified. Adapt it to your specific API requirements. * Replace ""https://api.example.com/data"" with the actual URL of the REST API. * Implement proper error handling (e.g., checking HTTP status codes). * Consider using a more robust JSON parsing library (e.g., "rapidjson", "llvm::json") for handling the API response. * Use dependency injection (as shown) to allow for easier testing with mock APIs. This document provides a comprehensive set of guidelines for integrating external APIs and backend services within LLVM. Adhering to these standards will help create robust, maintainable, and secure integrations that enhance the capabilities of the LLVM project.
# Component Design Standards for LLVM This document outlines the component design standards for LLVM, focusing on creating reusable, maintainable, and performant components within the LLVM ecosystem. It provides guidelines applicable to all LLVM subprojects, including the core compiler infrastructure, Clang, LLD, and related tools. These standards are designed to promote consistency, readability, and long-term maintainability of the LLVM codebase. ## 1. Principles of Component Design in LLVM ### 1.1 Abstraction and Encapsulation * **Do This:** Design components with clear abstractions that hide implementation details. * **Don't Do This:** Expose internal data structures or implementation details directly. **Why:** Abstraction simplifies the interface and reduces interdependence between components, making it easier to modify one component without affecting others. Encapsulation protects the internal state of a component from unintended external modification. **Code Example (Good):** """c++ // Good: Hiding implementation details behind an abstract interface. class TargetLoweringInfo { public: virtual ~TargetLoweringInfo() = default; virtual unsigned getRegForInlineAsm(const TargetRegisterClass *RC, MVT VT) const = 0; virtual CallingConv::ID getIRCallConv() const = 0; // ... other abstract methods }; // Concrete implementation, not exposed directly class AArch64TargetLowering : public TargetLoweringInfo { public: unsigned getRegForInlineAsm(const TargetRegisterClass *RC, MVT VT) const override { // ... implementation details for AArch64 return 0; // Dummy return } CallingConv::ID getIRCallConv() const override { //... return CallingConv::C; // dummy default } }; """ **Code Example (Bad):** """c++ // Bad: Exposing internal data structures. struct MyComponent { std::vector<int> internalData; // Directly accessible and modifiable. void processData(); // Public method. }; """ This example violates encapsulation by allowing direct access to the "internalData", which can lead to unintended side effects. ### 1.2 Single Responsibility Principle (SRP) * **Do This:** A component should have only one reason to change. * **Don't Do This:** Components should not be overly complex or perform unrelated tasks. Overly complex components are harder to understand, test, and maintain. **Why:** Following SRP makes components more focused, which improves their readability, testability, and reusability. **Code Example (Good):** """c++ // Good: Separate class for each responsibility. class InstructionSelector { public: MachineInstr *select(const BasicBlock &BB) { // ... instruction selection logic return nullptr; // Dummy return } }; class InstructionScheduler { public: void schedule(MachineInstr *instr) { // ... instruction scheduling logic } }; """ These classes are focused on their respective tasks: instruction selection and instruction scheduling. **Code Example (Bad):** """c++ // Bad: Combined responsibilities. class CompilerPass { public: void run(Module &M) { // ... instruction selection // ... instruction scheduling // ... register allocation } }; """ The "CompilerPass" class mixes instruction selection, scheduling, and register allocation, violating SRP. ### 1.3 Interface Segregation Principle (ISP) * **Do This:** Design interfaces that are specific to the clients that use them. * **Don't Do This:** Force clients to depend on interfaces they don't use. **Why:** ISP prevents unnecessary dependencies and reduces the impact of changes to interfaces. **Code Example (Good):** """c++ // Good: Segregated interfaces. class Printable { public: virtual void print() = 0; }; class Serializable { public: virtual void serialize() = 0; }; class MyClass : public Printable, public Serializable { public: void print() override { /*...*/ } void serialize() override { /*...*/ } }; """ Each interface focuses on a specific functionality. Components only implement the interfaces they need. **Code Example (Bad):** """c++ // Bad: Monolithic interface. class SomeInterface { public: virtual void methodA() = 0; virtual void methodB() = 0; virtual void methodC() = 0; }; class ClientA : public SomeInterface { public: void methodA() override { /*...*/ } void methodB() override { /*...*/ } void methodC() override { /* throw std::runtime_error("Not Implemented")*/ } // ClientA doesn't need this method, but is forced to implement it. }; """ This example forces "ClientA" to implement "methodC" even if it doesn't need it. ### 1.4 Dependency Inversion Principle (DIP) * **Do This:** Depend on abstractions, not concretions. * **Don't Do This:** Hardcode dependencies on concrete classes. **Why:** DIP promotes loose coupling and makes it easier to substitute implementations. **Code Example (Good):** """c++ // Good: Using dependency injection. class Logger { public: virtual void log(const std::string &message) = 0; }; class ConsoleLogger : public Logger { public: void log(const std::string &message) override { std::cout << message << std::endl; } }; class MyComponent { private: Logger *logger; public: MyComponent(Logger *logger) : logger(logger) {} void doSomething() { logger->log("Doing something..."); } }; """ The "MyComponent" depends on the "Logger" abstraction, not a concrete "ConsoleLogger". This allows substituting different loggers easily. **Code Example (Bad):** """c++ // Bad: Hardcoded dependency. class MyBadComponent { private: ConsoleLogger logger; // Hardcoded dependency on ConsoleLogger public: void doSomething() { logger.log("Doing something..."); } }; """ This example is tightly coupled to "ConsoleLogger", making it harder to test and reuse. ## 2. Component Structure and Organization ### 2.1 Directory Structure * **Do This:** Organize components into logical directories. * **Don't Do This:** Put all files in a single directory or create deeply nested directory structures that are hard to navigate. **Why:** A clear directory structure improves code discoverability and maintainability. LLVM generally follows a module-based structure. **Example:** """ llvm/ lib/ IR/ # LLVM Intermediate Representation AsmParser/ Core/ Instructions/ # Instruction definitions files (*.def) # Auto-generated instruction implementations (*.inc) Analysis/ # Static analyses Transforms/ # Transformations Scalar/ Vectorize/ include/ llvm/ IR/ Analysis/ Transforms/ tools/ llvm-as/ # LLVM assembler llvm-dis/ # LLVM disassembler """ ### 2.2 Naming Conventions * **Do This:** Use consistent naming conventions for files, classes, functions, and variables. * **Don't Do This:** Use cryptic or inconsistent names that are hard to understand. **Why:** Naming conventions improve code readability and reduce ambiguity. Names should be descriptive and follow LLVM's established conventions. **Examples:** * **Classes:** "ClassName" (CamelCase, starting with a capital letter) * **Functions:** "functionName" (camelCase, starting with a lowercase letter) * **Variables:** "variableName" (camelCase, starting with a lowercase letter) * **Files:** "ComponentName.cpp", "ComponentName.h" ### 2.3 Header Files * **Do This:** Use include guards to prevent multiple inclusions. Organize header files to minimize dependencies. Forward declare classes whenever possible. * **Don't Do This:** Include unnecessary header files. **Why:** Include guards prevent compilation errors. Minimizing includes reduces build times and dependencies. **Code Example:** """c++ // MyComponent.h #ifndef LLVM_MYCOMPONENT_H #define LLVM_MYCOMPONENT_H #include "llvm/Support/raw_ostream.h" // Only required headers namespace llvm { class MyClass; // Forward declaration class MyComponent { public: void doSomething(MyClass &obj); }; } // namespace llvm #endif // LLVM_MYCOMPONENT_H """ ## 3. Code Style and Formatting ### 3.1 LLVM Style * **Do This:** Follow the LLVM coding standards (see [LLVM Coding Standards](https://llvm.org/docs/CodingStandards.html)). Use "clang-format" to automatically format your code. * **Don't Do This:** Deviate from the LLVM coding standards. * **Why:** Consistency improves readability and reduces cognitive load. * Use 2-space indentation. * Keep lines reasonably short (around 80-120 characters). * Use descriptive comments. ### 3.2 Using "clang-format" * **Do This:** Configure "clang-format" with the LLVM style. Run it before committing code. * **Don't Do This:** Rely on manual formatting or ignore "clang-format" warnings. **Why:** "clang-format" automates code formatting and ensures compliance with LLVM's style guidelines. **Configuration:** LLVM uses a ".clang-format" file in the root of the project to define the style. **Usage:** """bash clang-format -i MyComponent.cpp # Format the file in-place """ ### 3.3 Comments and Documentation * **Do This:** Write clear and concise comments that explain the purpose and behavior of your code. Use Doxygen to generate API documentation. Provide usage examples. * **Don't Do This:** Write redundant comments that simply repeat what the code does. **Why:** Good comments improve code understanding and maintainability. Doxygen documentation makes it easy to generate API references automatically. **Code Example:** """c++ /** * @brief Calculates the average value of a vector of integers. * * @param values The vector of integers. * @return The average value, or 0.0 if the vector is empty. */ double calculateAverage(const std::vector<int> &values) { if (values.empty()) { return 0.0; } double sum = 0.0; for (int value : values) { sum += value; } return sum / values.size(); } """ ## 4. Error Handling and Assertions ### 4.1 LLVM Error Handling * **Do This:** Use "llvm::Error" to represent and propagate errors. Employ "Expected<T>" to represent a value that might be an error. * **Don't Do This:** Use exceptions for normal error handling. * **Why:** "llvm::Error" provides a consistent and efficient way to handle errors within LLVM. **Code Example:** """c++ #include "llvm/Support/Error.h" #include "llvm/Support/raw_ostream.h" llvm::Error doSomething(int value) { if (value < 0) { return llvm::make_error<llvm::StringError>("Value must be non-negative", llvm::inconvertibleErrorCode()); } // ... do something return llvm::Error::success(); } llvm::Expected<int> computeValue(int input) { if (input == 0) { return llvm::make_error<llvm::StringError>("Input cannot be zero", llvm::inconvertibleErrorCode()); } return input * 2; } int main() { if (auto Err = doSomething(-1)) { llvm::errs() << "Error: " << Err << "\n"; return 1; } auto Result = computeValue(10); if (!Result) { llvm::errs() << "Error: " << Result.takeError() << "\n"; return 1; } llvm::outs() << "Value: " << *Result << "\n"; return 0; } """ ### 4.2 Assertions * **Do This:** Use "llvm::support::llvm_unreachable" for cases that should never occur. Use "assert" liberally to check preconditions and invariants during development. * **Don't Do This:** Use assertions for handling expected errors or user input validation. **Why:** Assertions help catch bugs early in development. "llvm_unreachable" indicates that a code path is guaranteed to be unreachable. **Code Example:** """c++ #include "llvm/Support/raw_ostream.h" #include "llvm/Support/ErrorHandling.h" int getValue(int index) { assert(index >= 0 && index < 10 && "Index out of bounds"); // Precondition. if (index == 5) { llvm::support::llvm_unreachable("This should never happen!"); } return index * 2; } """ ## 5. Performance Considerations ### 5.1 Data Structures Selection * **Do This:** Choose appropriate data structures based on performance requirements. Consider using "llvm::SmallVector", "llvm::DenseMap", "llvm::SetVector", and other LLVM-specific data structures. * **Don't Do This:** Use standard library containers without considering performance implications. **Why:** LLVM-specific data structures are often optimized for common LLVM use cases. "SmallVector" avoids dynamic allocation for small numbers of elements. "DenseMap" is optimized for integer and pointer keys. **Code Example:** """c++ #include "llvm/ADT/SmallVector.h" void processValues(llvm::SmallVector<int, 32> &values) { // ... process values efficiently } #include "llvm/ADT/DenseMap.h" void processMap(llvm::DenseMap<int, int> &map) { // ... process map efficiently } """ ### 5.2 Memory Management * **Do This:** Use RAII (Resource Acquisition Is Initialization) to manage resources. Avoid manual memory management. When manual memory management is unavoidable, use smart pointers ("std::unique_ptr", "std::shared_ptr"). * **Don't Do This:** Use "new" and "delete" directly. Leak memory. **Why:** RAII and smart pointers automate resource management and prevent memory leaks. **Code Example:** """c++ #include <memory> class MyResource { public: MyResource() { /* Acquire resource */ } ~MyResource() { /* Release resource */ } }; void doSomething() { std::unique_ptr<MyResource> resource(new MyResource()); // Resource is automatically released when resource goes out of scope. } """ ### 5.3 Code Optimization * **Do This:** Profile your code. Use optimization flags ("-O2", "-O3"). Consider using profile-guided optimization (PGO). Minimize unnecessary computations. * **Don't Do This:** Optimize prematurely without profiling. Ignore performance bottlenecks. **Why:** Profiling identifies performance bottlenecks. Optimization flags and PGO improve code performance. ## 6. Testing ### 6.1 Unit Tests * **Do This:** Write unit tests for each component. Use LLVM's testing framework, including lit. * **Don't Do This:** Neglect testing. Commit code without tests. **Why:** Unit tests verify the correctness of components and prevent regressions. ### 6.2 Integration Tests * **Do This:** Write integration tests to ensure that components work together correctly. * **Don't Do This:** Assume that components will work together without testing. **Why:** Integration tests verify the interactions between components and catch integration issues. ### 6.3 Regression Tests * **Do This:** Add regression tests for each bug fix to prevent regressions. **Why:** Regression tests ensure that bug fixes are not inadvertently undone by future changes. ## 7. Concurrency and Thread Safety ### 7.1 Thread Safety * **Do This:** Design components to be thread-safe if they will be used in a multi-threaded environment. Use appropriate locking mechanisms ("llvm::sys::Mutex", "llvm::sys::LockGuard"). * **Don't Do This:** Share mutable state between threads without proper synchronization. Introduce race conditions. **Why:** Thread safety prevents data corruption and undefined behavior in concurrent environments. **Code Example:** """c++ #include "llvm/Support/Threading.h" class ThreadSafeComponent { private: llvm::sys::Mutex mutex; int state; public: int getState() { llvm::sys::LockGuard<llvm::sys::Mutex> lock(mutex); return state; } void setState(int newState) { llvm::sys::LockGuard<llvm::sys::Mutex> lock(mutex); state = newState; } }; """ Following these component design standards will result in a more robust, maintainable, and performant LLVM codebase. Adherence to these guidelines is crucial for fostering a healthy and collaborative development environment.
# Code Style and Conventions Standards for LLVM This document outlines the code style and conventions standards for the LLVM project. Adhering to these guidelines ensures code consistency, readability, and maintainability, which are crucial for a large and complex project like LLVM. These guidelines are intended to be used by both human developers and AI coding assistants to improve the quality and consistency of LLVM code. ## 1. General Formatting ### 1.1. Indentation and Whitespace * **Do This:** Use 2 spaces for indentation. Tabs should *never* be used. * **Don't Do This:** Use tabs or more than 2 spaces for indentation. * **Why:** Consistency in indentation is vital for readability. Two spaces provide a good balance between code nesting and horizontal space consumption. * **Example:** """c++ if (condition) { for (int i = 0; i < 10; ++i) { // Code within the loop doSomething(i); } } else { // Alternative code } """ ### 1.2. Line Length * **Do This:** Keep lines under 80 characters where practical. Aim for readability, and don't obsess over fitting everything into 80 characters if it harms clarity. * **Don't Do This:** Allow lines to routinely exceed 120 characters, making them hard to read on smaller screens or in diffs. * **Why:** Shorter lines improve readability and facilitate code review by allowing side-by-side comparisons in diff tools. * **Example:** """c++ // Good: Line split for readability Value *result = builder->CreateAdd(operand1, operand2, "sum"); // Bad: Long line, harder to read Value *result = builder->CreateAdd(operand1, operand2, "very_long_variable_name_that_makes_the_line_exceed_80_characters"); """ ### 1.3. Whitespace Usage * **Do This:** * Use a single space after keywords like "if", "for", "while", and "switch". * Use a single space around operators like "=", "+", "-", "*", "/", "==", "!=", "<", ">", "<=", ">=", "&&", "||". * Do not use spaces inside parentheses, brackets, or braces except where needed for clarity. * **Don't Do This:** * Omit spaces after keywords or around operators. * Add excessive spaces inside parentheses, brackets, or braces. * **Why:** Consistent whitespace improves readability and makes the code visually less cluttered. * **Example:** """c++ // Good if (x == 5) { y = z + 1; } // Bad if(x==5){ y=z+1; } """ ### 1.4. Vertical Whitespace * **Do This:** Use blank lines to separate logical blocks of code, such as function definitions, major sections within a function, and between different data structures. * **Don't Do This:** Overuse or underuse blank lines, resulting in either scattered or crammed code. * **Why:** Judicious use of vertical whitespace enhances the visual structure of the code, making it easier to understand. * **Example:** """c++ // Good void processData() { // Initialize variables int count = 0; std::vector<int> data; // Load data from file loadDataFromFile("data.txt", data); // Process data for (int value : data) { count += value; } // Print result std::cout << "Total: " << count << std::endl; } // Bad (crammed) void processData(){int count=0;std::vector<int> data;loadDataFromFile("data.txt",data);for(int value:data){count+=value;}std::cout<<"Total: "<<count<<std::endl;} """ ## 2. Naming Conventions ### 2.1. General Naming * **Do This:** * Use descriptive and meaningful names. * Prefer clear and explicit names over short and cryptic ones. * Be consistent in applying the same naming scheme across the project. * **Don't Do This:** * Use single-character variable names (except in very short loops). * Use abbreviations that are not widely understood in the LLVM community. * **Why:** Good naming significantly enhances code readability and reduces cognitive load. * **Example:**: "for (int i = 0; i < N; ++i)" is often fine, but "for (int elementIndex = 0; elementIndex < numberOfElements; ++elementIndex)" is easier to follow if the loop is more complex. ### 2.2. Variable Naming * **Do This:** Use "camelCase" for local variable names. * **Don't Do This:** Use "snake_case" or "PascalCase" for local variables. * **Why:** "camelCase" is a common convention in LLVM for local variables. * **Example:** """c++ int numberOfItems = 10; std::string itemName = "Example"; """ ### 2.3. Function Naming * **Do This:** Use "camelCase" for function names. Function names should generally be verbs or verb phrases indicating the action they perform. * **Don't Do This:** Use "snake_case" or "PascalCase" for function names. * **Why:** Consistency in function naming is important. Verb-based names accurately describe what functions do. * **Example:** """c++ int calculateSum(int a, int b) { return a + b; } void processData() { // ... } """ ### 2.4. Class and Struct Naming * **Do This:** Use "PascalCase" for class and struct names. * **Don't Do This:** Use "camelCase" or "snake_case" for class and struct names. * **Why:** "PascalCase" is the standard convention for class and struct names in LLVM. * **Example:** """c++ class MyClass { public: // ... }; struct DataStructure { int value; }; """ ### 2.5. Constant Naming * **Do This:** Use "PascalCase" for named constants (i.e. those defined with "static const"). Use all-uppercase "SCREAMING_SNAKE_CASE" for "#define" constants. Prefer "static const" over "#define" whenever possible. * **Don't Do This:** Use "camelCase" or "snake_case" for constants. * **Why:** Differentiating constants from variables helps in understanding the code. * **Example:** """c++ static const int MaxValue = 100; #define ARRAY_SIZE 256 """ ### 2.6. Enum Naming * **Do This:** Use "PascalCase" for enum names and "PascalCase" for enum values. * **Don't Do This:** Use "camelCase" or "snake_case" for enum names and values. * **Why:** Consistent enum naming enhances code clarity. * **Example:** """c++ enum class Color { Red, Green, Blue }; """ ### 2.7. Template Parameter Naming * **Do This:** Use a single uppercase letter, or a descriptive name starting with an uppercase letter. When a descriptive name is used, it should match the concept that the template parameter represents. * **Don't Do This:** Use unclear abbreviations. * **Why:** Template parameters should be easily identifiable within the template. * **Example:** """c++ template <typename T> T add(T a, T b) { return a + b; } template <typename ElementType> class MyVector { // ... }; """ ## 3. Comments ### 3.1. General Commenting * **Do This:** * Write clear and concise comments to explain complex logic and design decisions. * Keep comments up-to-date with code changes. * Use proper grammar and spelling in comments. * **Don't Do This:** * Write obvious comments that simply restate the code. * Leave outdated or incorrect comments. * Use excessive jargon or abbreviations without explanation. * **Why:** Comments are essential for understanding the code's purpose and functionality. Clear and accurate comments reduce maintenance effort and improve collaboration. ### 3.2. Doxygen-Style Comments * **Do This:** Use Doxygen-style comments for documenting functions, classes, and files. * **Don't Do This:** Neglect to document the purpose, parameters, and return values of functions and classes. * **Why:** Doxygen-style comments allow automatic documentation generation, which is crucial for large projects like LLVM. * **Example:** """c++ /** * @brief Calculates the sum of two integers. * * This function adds two integers and returns the result. * * @param a The first integer. * @param b The second integer. * @return The sum of a and b. */ int calculateSum(int a, int b) { return a + b; } """ ### 3.3. Inline Comments * **Do This:** Use inline comments to explain specific lines or blocks of code that are not immediately obvious. * **Don't Do This:** Overuse inline comments for trivial code. * **Why:** Inline comments clarify complex logic or non-obvious operations. * **Example:** """c++ for (int i = 0; i < 10; ++i) { // Multiply i by 2 to get the even number int evenNumber = i * 2; // ... } """ ## 4. Code Structure and Design ### 4.1. Function Length * **Do This:** Keep functions reasonably short (typically under 50 lines). If a function becomes too long, refactor it into smaller, more manageable functions. * **Don't Do This:** Write very long functions that perform multiple unrelated tasks. * **Why:** Shorter functions are easier to understand, test, and maintain. They also promote code reuse. ### 4.2. Class Design * **Do This:** * Adhere to the Single Responsibility Principle (SRP): each class should have one specific responsibility. * Use proper encapsulation: keep internal state private and provide access through public methods. * Use inheritance and polymorphism appropriately to model relationships between classes. * **Don't Do This:** * Create "god classes" that do everything. * Expose internal state directly. * Overuse inheritance, leading to complex and fragile class hierarchies. * **Why:** Good class design promotes modularity, reusability, and maintainability. ### 4.3. Error Handling * **Do This:** * Use exceptions for exceptional cases (e.g., invalid input, resource allocation failure). * Handle errors gracefully and provide informative error messages. * Use "llvm::Error" to handle recoverable errors. * **Don't Do This:** * Ignore errors or handle them silently. * Use exceptions for normal control flow. * **Why:** Robust error handling is crucial for preventing crashes and providing a good user experience. * **Example:** """c++ #include "llvm/Support/Error.h" #include "llvm/Support/raw_ostream.h" llvm::Error processData(int value) { if (value < 0) { return llvm::make_error<llvm::StringError>("Invalid value: " + std::to_string(value), llvm::inconvertibleErrorCode()); } // Process data llvm::outs() << "Processing value: " << value << "\n"; return llvm::Error::success(); } int main() { llvm::Error err1 = processData(10); if (err1) { llvm::errs() << "Error: " << llvm::toString(std::move(err1)) << "\n"; } llvm::Error err2 = processData(-5); if (err2) { llvm::errs() << "Error: " << llvm::toString(std::move(err2)) << "\n"; } return 0; } """ ### 4.4. Resource Management * **Do This:** * Use RAII (Resource Acquisition Is Initialization) to manage resources like memory, file handles, and locks. * Use smart pointers ("std::unique_ptr", "std::shared_ptr") to automatically manage dynamically allocated memory. * **Don't Do This:** * Manually allocate and deallocate memory using "new" and "delete" without proper RAII. * Leak resources. * **Why:** RAII ensures that resources are properly released, even in the presence of exceptions, preventing resource leaks and improving code reliability. ### 4.5. Modern C++ Features * **Do This:** * Use modern C++ features like lambda expressions, range-based for loops, and auto type deduction. * Prefer "constexpr" for compile-time constants. * Use move semantics to avoid unnecessary copying of objects. * **Don't Do This:** * Rely on deprecated C++ features. * Write verbose code that can be simplified with modern features. * **Why:** Modern C++ features can significantly improve code readability, efficiency, and safety. * **Example:** """c++ #include <iostream> #include <vector> int main() { std::vector<int> data = {1, 2, 3, 4, 5}; // Range-based for loop for (int value : data) { std::cout << value << " "; } std::cout << std::endl; // Lambda expression auto multiplyByTwo = [](int x) { return x * 2; }; std::vector<int> multipliedData; for (int value : data) { multipliedData.push_back(multiplyByTwo(value)); } for (int value : multipliedData) { std::cout << value << " "; } std::cout << std::endl; // Auto type deduction auto sum = 0; for (auto value : data) { sum += value; } std::cout << "Sum: " << sum << std::endl; return 0; } """ ## 5. LLVM-Specific Guidelines ### 5.1. LLVM Coding Conventions * **Do This:** * Follow the existing coding style and conventions of the specific LLVM component you are working on. * Look at existing files or recent commits within the specific directory to determine local conventions. * **Don't Do This:** * Introduce new coding styles that are inconsistent with the rest of the component. * **Why:** Consistency within a component is crucial for maintainability and readability. ### 5.2. LLVM Data Structures * **Do This:** * Use LLVM-specific data structures like "SmallVector", "StringRef", and "ArrayRef" where appropriate. These are optimized for common LLVM use cases. * **Don't Do This:** * Use standard library containers ("std::vector", "std::string") without carefully considering performance implications. * **Why:** LLVM data structures are designed to be efficient and memory-friendly for specific LLVM tasks. ### 5.3. LLVM Diagnostic Infrastructure * **Do This:** * Use the LLVM diagnostic infrastructure ("llvm::DiagnosticInfo", "llvm::SourceMgr", "llvm::LLVMContext") for reporting errors, warnings, and remarks. * **Don't Do This:** * Print diagnostic messages directly to "std::cerr" or "std::cout". * **Why:** The LLVM diagnostic infrastructure provides a consistent and extensible way to report diagnostic information, allowing tools to handle diagnostics in a uniform manner. ### 5.4. LLVM Pass Infrastructure * **Do This:** * Use the LLVM pass infrastructure for implementing compiler passes. * Follow the standard pass structure, including the "runOnFunction" or "runOnModule" methods. * **Don't Do This:** * Implement custom pass management mechanisms. * **Why:** The LLVM pass infrastructure provides a uniform and efficient way to implement and manage compiler passes. ### 5.5. Including LLVM Headers * **Do This:** Include headers using angle brackets "<>" for system headers and LLVM headers that are part of the LLVM distribution. Use quotes """" for headers that are local to your project or component. Order includes alphabetically within each category. * **Don't Do This:** Mix include styles or use relative paths for LLVM distribution headers. * **Why:** This helps distinguish standard library headers from project-specific headers and keeps include paths clean and maintainable. **Example:** """c++ #include <algorithm> #include <iostream> #include <vector> #include "llvm/ADT/SmallVector.h" #include "llvm/IR/Function.h" #include "llvm/Pass.h" #include "MyComponent/MyHeader.h" """ ## 6. Security Best Practices ### 6.1. Input Validation * **Do This:** Validate all external inputs to prevent vulnerabilities such as buffer overflows, format string bugs, and code injection. * **Don't Do This:** Assume that external inputs are safe or well-formed. * **Why:** Input validation is essential for preventing security vulnerabilities. ### 6.2. Memory Safety * **Do This:** * Use memory-safe programming techniques (e.g., bounds checking, smart pointers). * Be careful when using raw pointers and manual memory management. * **Don't Do This:** * Write code that is prone to buffer overflows, use-after-free errors, or other memory-related vulnerabilities. * **Why:** Memory safety is crucial for preventing security exploits. ### 6.3. Integer Overflows * **Do This:** Check for integer overflows when performing arithmetic operations, especially when dealing with sizes and indices. * **Don't Do This:** Assume that integer arithmetic is always safe. * **Why:** Integer overflows can lead to unexpected behavior and security vulnerabilities. ### 6.4. Safe String Handling * **Do This:** Use safe string handling functions (e.g., "llvm::StringRef::startswith", "llvm::StringRef::endswith") to avoid buffer overflows and format string bugs. * **Don't Do This:** Use unsafe string functions like "strcpy" or "sprintf". * **Why:** Safe string handling prevents common security vulnerabilities related to string manipulation. ## 7. Performance Optimization ### 7.1. Data Locality * **Do This:** Design data structures and algorithms to maximize data locality, which can improve cache utilization and reduce memory access latency. * **Don't Do This:** Access memory in a random or scattered manner. * **Why:** Data locality can significantly improve performance, especially for memory-bound applications. ### 7.2. Avoiding Unnecessary Copies * **Do This:** Use move semantics to avoid unnecessary copying of objects. * **Don't Do This:** Pass large objects by value when they can be passed by reference or moved. * **Why:** Copying large objects can be expensive, especially when they are not modified. ### 7.3. Efficient Algorithms * **Do This:** Choose efficient algorithms (e.g., sorting, searching) that are appropriate for the specific task. * **Don't Do This:** Use naive or inefficient algorithms that can lead to poor performance. * **Why:** Algorithm choice has a significant impact on performance. ### 7.4. Profiling * **Do This:** Use profiling tools to identify performance bottlenecks and optimize the most critical sections of code. Consider using LLVM's built-in profiling capabilities. * **Don't Do This:** Guess at performance issues without empirical evidence. * **Why:** Profiling provides valuable insights into performance bottlenecks and helps focus optimization efforts on the most critical areas. ## 8. Tooling and Automation ### 8.1. Clang-Format * **Do This:** Use "clang-format" to automatically format code according to the LLVM coding style. Configure your IDE or editor to automatically run clang-format on save. * **Don't Do This:** Manually format code or ignore "clang-format" warnings. * **Why:** "clang-format" ensures consistent code formatting and reduces the burden of manual formatting. ### 8.2. Clang-Tidy * **Do This:** Use "clang-tidy" to automatically check code for style violations, potential bugs, and security vulnerabilities. Configure your build system to run "clang-tidy" as part of the build process. * **Don't Do This:** Ignore "clang-tidy" warnings or disable checks without a valid reason. * **Why:** "clang-tidy" helps identify and fix issues early in the development cycle. ### 8.3. Continuous Integration * **Do This:** Use a continuous integration (CI) system to automatically build, test, and analyze code changes before they are merged into the main branch. * **Don't Do This:** Merge code changes without proper CI testing. * **Why:** CI ensures that code changes do not break the build, introduce new bugs, or violate coding standards. By adhering to these code style and conventions standards, LLVM developers can contribute to a more consistent, readable, and maintainable codebase, ultimately leading to a better and more secure compiler infrastructure. These standards are intended to guide both human developers and AI coding assistants in producing high-quality LLVM code.
# Deployment and DevOps Standards for LLVM This document outlines deployment and DevOps standards for LLVM projects. It focuses on build processes, continuous integration/continuous deployment (CI/CD), and production considerations. The goal is to ensure LLVM projects are built, tested, and deployed efficiently, reliably, and securely. ## 1. Build Processes A well-defined build process is crucial for producing consistent and reproducible artifacts. LLVM uses CMake as its primary build system. ### 1.1. CMake Standards CMake is fundamental to building LLVM. Proper usage ensures portability and maintainability. **Do This:** * Use CMake targets extensively. This makes dependencies explicit and simplifies the build graph. * Employ generator expressions ("$<...>") for conditional compilation based on build configurations (Debug, Release, etc.). * Use CMake modules to encapsulate common build logic. * Leverage features provided by the "LLVM-Config.cmake" module (installed with LLVM) within projects using LLVM. * Prefer "target_link_libraries", "target_include_directories", "target_compile_definitions", and "target_compile_features" over global settings. **Don't Do This:** * Avoid direct manipulation of compiler flags (e.g., setting "CXX_FLAGS" directly). Use CMake's built-in mechanisms instead. * Don't overuse "execute_process" for tasks that can be handled by CMake commands. * Don't hardcode paths; rely on CMake variables (e.g., "CMAKE_SOURCE_DIR", "CMAKE_BINARY_DIR"). **Why:** CMake provides a platform-independent and well-structured way to manage the build process. Adhering to these standards ensures portability and simplifies maintainability. **Example:** """cmake # CMakeLists.txt cmake_minimum_required(VERSION 3.13) # Ensure a modern CMake version project(MyProject) # Find LLVM find_package(LLVM REQUIRED CONFIG) message(STATUS "Found LLVM ${LLVM_PACKAGE_VERSION}") include_directories(${LLVM_INCLUDE_DIRS}) add_executable(MyTool MyTool.cpp) # Link against LLVM libraries target_link_libraries(MyTool LLVMSupport LLVMCore) # Add compile definitions based on build type: target_compile_definitions(MyTool PRIVATE $
# Core Architecture Standards for LLVM This document outlines the core architecture standards for LLVM development, providing guidelines to ensure consistency, maintainability, performance, and security within the LLVM project. It focuses on the fundamental architectural patterns, project structure, and organization principles that govern the LLVM codebase. ## 1. Fundamental Architectural Patterns ### 1.1. The Three-Phase Design (Frontend, Optimizer, Backend) **Description:** LLVM employs a three-phase design: a frontend that parses source code into an intermediate representation (IR), an optimizer that performs transformations on the IR, and a backend that translates the IR into machine code. This separation of concerns is a cornerstone of LLVM's flexibility and retargetability. **Do This:** * Design components within the frontend, optimizer, or backend that adhere to their respective responsibilities. Avoid mixing concerns across phases. * Ensure each phase communicates through the well-defined LLVM IR. * Frontend components should aim to generate semantically equivalent IR regardless of the source language's specific syntax. * Backends should focus on target machine specific optimizations and code generation without modifying program semantics beyond that which is required for the target architecture. **Don't Do This:** * Implement source language-specific optimizations in the backend. These transformations belong in the optimizer (or potentially the frontend during initial lowering). * Bypass the IR for direct communication between frontends and backends. This breaks the modularity and retargetability. **Why This Matters:** * **Maintainability:** Clear separation makes it easier to understand, modify, and extend individual phases without affecting others. * **Retargetability:** The IR acts as a stable interface, allowing new frontends and backends to be added without requiring changes to the core optimizer. * **Optimization:** Centralized optimization within the optimizer phase allows for target-independent improvements that benefit all languages and architectures. **Code Example (Illustrative):** """c++ // Frontend (Clang) - Generates LLVM IR from C++ source // (Simplified example) llvm::Module *generateIR(const char *src) { // Parse C++ code and build AST // ... // Transform AST into LLVM IR instructions llvm::LLVMContext &context = llvm::getGlobalContext(); auto module = std::make_unique<llvm::Module>("my_module", context); llvm::FunctionType *funcType = llvm::FunctionType::get(llvm::Type::getInt32Ty(context), false); llvm::Function *mainFunc = llvm::Function::Create(funcType, llvm::Function::ExternalLinkage, "main", module.get()); // Create a basic block and insert instructions (e.g., return 0) llvm::BasicBlock *entry = llvm::BasicBlock::Create(context, "entrypoint", mainFunc); llvm::IRBuilder<> builder(entry); llvm::Value *retVal = llvm::ConstantInt::get(llvm::Type::getInt32Ty(context), 0); builder.CreateRet(retVal); return module.release(); } // Optimizer (LLVM core) - Optimizes the LLVM IR llvm::Module *optimizeIR(llvm::Module *module) { llvm::FunctionPassManager fpm; // Add optimization passes (e.g., dead code elimination, constant propagation) fpm.addPass(llvm::createInstructionCombiningPass()); fpm.addPass(llvm::createReassociatePass()); fpm.addPass(llvm::createGVNPass()); fpm.addPass(llvm::createCFGSimplificationPass()); llvm::ModulePassManager mpm; mpm.addPass(llvm::createModuleToFunctionPassAdaptor(std::move(fpm))); mpm.run(*module); return module; } // Backend (e.g., x86) - Generates machine code from optimized IR void generateMachineCode(llvm::Module *module) { // Target-specific code generation logic // ... //Use TargetMachine to emit native code. } """ ### 1.2. The Module, Function, BasicBlock, and Instruction Hierarchy **Description:** LLVM IR is structured hierarchically: A "Module" contains "Function"s, which contain "BasicBlock"s, which contain "Instruction"s. This structure models the program's organization and control flow. **Do This:** * Understand and leverage this hierarchy when manipulating IR. Functions represent procedures, basic blocks represent straight-line code segments, and instructions are the fundamental operations. * Use LLVM's APIs to traverse and modify the IR structure. * Follow conventions for naming functions, basic blocks, and instructions to improve readability. Consider using debug metadata to preserve source-level names. **Don't Do This:** * Treat the IR as a flat list of instructions. The hierarchical structure enables powerful analyses and transformations. * Manually manipulate raw pointers to IR objects. Use the provided LLVM APIs (e.g., iterators, "replaceUsesWith") for safe and correct manipulation. **Why This Matters:** * **Organization:** The hierarchy reflects the program's structure, enabling modular analysis and transformation. * **Data Flow:** The basic block structure naturally aligns with data flow analysis. * **Analysis and Optimization:** Many optimization passes rely on the hierarchical structure, such as function-level inlining, basic block reordering, and loop unrolling. **Code Example:** """c++ // Creating a function and a basic block, adding instructions llvm::Function *createFunction(llvm::Module *module, const std::string &name) { llvm::LLVMContext &context = module->getContext(); llvm::FunctionType *funcType = llvm::FunctionType::get(llvm::Type::getInt32Ty(context), false); llvm::Function *func = llvm::Function::Create(funcType, llvm::Function::ExternalLinkage, name, module); llvm::BasicBlock *entryBB = llvm::BasicBlock::Create(context, "entry", func); llvm::IRBuilder<> builder(entryBB); // Create an integer constant llvm::Value *constant = llvm::ConstantInt::get(llvm::Type::getInt32Ty(context), 42); // Create a return instruction builder.CreateRet(constant); return func; } // Iterating through instructions in a basic block void printInstructions(llvm::Function *func) { for (llvm::BasicBlock &bb : *func) { llvm::outs() << "Basic Block: " << bb.getName() << "\n"; for (llvm::Instruction &inst : bb) { llvm::outs() << " " << inst << "\n"; } } } """ ## 2. Project Structure and Organization ### 2.1. Directory Structure Conventions **Description:** LLVM follows a strict directory structure to organize source code. This structure promotes discoverability and reduces namespace collisions. **Do This:** * Place source files related to a specific component (e.g., a backend, an optimization pass) within a dedicated directory under the appropriate top-level directory (e.g., "lib/Target/", "lib/Transforms/"). * Use meaningful directory and file names that clearly indicate the component's purpose. * Maintain consistent naming conventions (e.g., camelCase for class names, snake_case for function names). **Don't Do This:** * Place unrelated source files in the same directory. * Create deeply nested directory structures that make it difficult to navigate the codebase. * Ignore the existing directory structure and create new directories without justification. **Why This Matters:** * **Discoverability:** A well-defined directory structure makes it easier to find and understand the code related to a specific feature. * **Namespace Management:** Separating components into different directories reduces the risk of naming conflicts. * **Build System Integration:** The directory structure is closely tied to the build system, ensuring that source files are compiled and linked correctly. **Example:** For instance, the X86 backend resides in "lib/Target/X86/". Within that directory, you will find subdirectories like "AsmPrinter", "Disassembler", "MCTargetDesc", etc., each dedicated to a distinct aspect of the backend. ### 2.2. Component-Based Design **Description:** LLVM's architecture promotes modularity through the use of components. Each component encapsulates a specific functionality and exposes a well-defined interface. **Do This:** * Design new features as independent components with clear interfaces. * Minimize dependencies between components to improve maintainability and testability. * Use the pimpl idiom (pointer to implementation) to hide implementation details and ensure binary compatibility. * Consider a class hierarchy for extensibility if multiple derived classes provide differing implementations of an interface. Using the visitor pattern for manipulation of these classes. **Don't Do This:** * Create monolithic components that perform multiple unrelated tasks. * Introduce tight coupling between components, making it difficult to modify or replace them independently. **Why This Matters:** * **Maintainability:** Components can be developed, tested, and deployed independently. * **Reusability:** Components can be reused in different parts of the system. * **Testability:** Components can be tested in isolation with minimal dependencies. **Code Example (Pimpl Idiom):** """c++ // Header file (MyComponent.h) class MyComponent { public: MyComponent(); ~MyComponent(); void doSomething(); private: class Impl; Impl *impl; }; // Source file (MyComponent.cpp) #include "MyComponent.h" class MyComponent::Impl { public: void doSomethingImpl() { // Implementation details llvm::outs() << "MyComponent is doing something!\n"; } }; MyComponent::MyComponent() : impl(new Impl()) {} MyComponent::~MyComponent() { delete impl; } void MyComponent::doSomething() { impl->doSomethingImpl(); } """ ## 3. Memory Management ### 3.1. RAII (Resource Acquisition Is Initialization) **Description:** LLVM heavily relies on RAII to manage resources, particularly memory. This approach ensures that resources are automatically released when an object goes out of scope. **Do This:** * Use smart pointers (e.g., "std::unique_ptr", "std::shared_ptr") or custom RAII classes to manage dynamically allocated memory. * Prefer "std::unique_ptr" for exclusive ownership and "std::shared_ptr" for shared ownership. * Avoid raw "new" and "delete" whenever possible. **Don't Do This:** * Manually allocate and deallocate memory without using RAII. * Forget to release allocated memory, leading to memory leaks. **Why This Matters:** * **Memory Safety:** RAII ensures that memory is always released, even in the presence of exceptions. * **Resource Management:** RAII can be used to manage other resources besides memory, such as file handles and locks. * **Code Clarity:** RAII makes code easier to read and understand by tying resource lifetime to object lifetime. **Code Example:** """c++ // Using std::unique_ptr for memory management #include <memory> void processData() { std::unique_ptr<int[]> data(new int[100]); // Allocate an array // ... use the array } // The array is automatically deallocated when data goes out of scope """ ### 3.2. LLVM's Memory Allocators (BumpPtrAllocator, etc.) **Description:** LLVM provides custom memory allocators like "BumpPtrAllocator" for efficient allocation of many small objects. These allocators are optimized for specific use cases within the compiler. **Do This:** * Use "BumpPtrAllocator" for allocating small objects within a compilation unit or pass where memory can be freed all at once. * Consider using other LLVM allocators, such as "FoldingSetAllocator", for specialized data structures. * Be mindful of the lifetime of the allocator and the objects it allocates. **Don't Do This:** * Use "BumpPtrAllocator" for long-lived objects that need to be individually deallocated. * Mix allocations from different allocators without careful consideration of ownership. **Why This Matters:** * **Performance:** Custom allocators can be significantly faster than general-purpose allocators for certain use cases. * **Memory Efficiency:** Specialized allocators can reduce memory fragmentation and overhead. * **Integration:** LLVM allocators are designed to work seamlessly with the LLVM ecosystem. **Code Example:** """c++ #include "llvm/Support/Allocator.h" void allocateObjects(llvm::BumpPtrAllocator &allocator) { int *ptr1 = new (allocator.Allocate(sizeof(int), llvm::Align(sizeof(int)))) int(10); double *ptr2 = new (allocator.Allocate(sizeof(double), llvm::Align(sizeof(double)))) double(3.14); // ... use the allocated objects // All objects allocated from the BumpPtrAllocator are freed when the allocator is destroyed. } int main() { llvm::BumpPtrAllocator allocator; allocateObjects(allocator); return 0; // Allocator is destroyed here, freeing all allocated memory. } """ ## 4. Error Handling and Assertions ### 4.1. LLVM's Error Handling Mechanisms (llvm::Error, Expected<T>) **Description:** LLVM introduces "llvm::Error" and "llvm::Expected<T>" for explicit error handling. These mechanisms provide a structured way to represent and propagate errors, improving robustness and maintainability. Replacing the older error handling with "std::error_code" in certain areas. **Do This:** * Use "llvm::Error" to represent recoverable errors that can be handled by the caller. * Use "llvm::Expected<T>" to return either a value of type "T" or an "llvm::Error" indicating failure. * Propagate errors up the call stack using "llvm::Error" or by returning "llvm::Expected<T>". Check if a value has an error ("if (!expected_value)") before using it. **Don't Do This:** * Use exceptions for recoverable errors. LLVM largely avoids exceptions. * Ignore error codes or simply print error messages and continue execution. All errors should be handled and/or propagated. * Use raw bool for returning an error, instead prefer "llvm::Error". **Why This Matters:** * **Explicit Error Handling:** "llvm::Error" and "llvm::Expected<T>" make error handling explicit and visible in the code. * **Error Propagation:** These mechanisms ensure that errors are propagated up the call stack, allowing calling functions to handle them appropriately. * **Robustness:** Proper error handling improves the robustness of the system and prevents unexpected crashes. **Code Example:** """c++ #include "llvm/Support/Error.h" #include "llvm/Support/raw_ostream.h" llvm::Expected<int> divide(int a, int b) { if (b == 0) { return llvm::make_error<llvm::StringError>("Division by zero", llvm::inconvertibleErrorCode()); } return a / b; } int main() { llvm::Expected<int> result = divide(10, 2); if (result) { llvm::outs() << "Result: " << *result << "\n"; } else { llvm::outs() << "Error: " << llvm::toString(result.takeError()) << "\n"; } result = divide(5, 0); if(result){ llvm::outs() << "Result: " << *result << "\n"; } else { llvm::outs() << "Error: " << llvm::toString(result.takeError()) << "\n"; } return 0; } """ ### 4.2. Assertions for Internal Consistency **Description:** LLVM uses assertions extensively to check for internal consistency and preconditions. Assertions are used to catch programming errors during development and debugging. **Do This:** * Use "assert()" liberally to check for conditions that should always be true. * Provide informative error messages in assertions to help diagnose problems. * Disable assertions in production builds to avoid performance overhead. * Review and update assertions regularly as the code evolves. **Don't Do This:** * Use assertions to check for recoverable errors. Assertions should only be used for internal consistency checks. * Rely on assertions to prevent security vulnerabilities. Security checks should be implemented separately. * Leave dead code and dead assertions in the code. **Why This Matters:** * **Early Error Detection:** Assertions catch programming errors early in the development cycle, reducing debugging time. * **Code Clarity:** Assertions document the expected behavior of the code. * **Debugging Aid:** Assertions provide valuable information for diagnosing problems. **Code Example:** """c++ void processValue(int value) { assert(value >= 0 && "Value must be non-negative"); // ... use the value } """ ## 5. Code Formatting and Style ### 5.1. LLVM's Formatting Style (clang-format) **Description:** LLVM uses "clang-format" to enforce a consistent code formatting style. This ensures that all code in the LLVM project adheres to the same conventions. **Do This:** * Use "clang-format" to format all code before committing changes. * Install and configure "clang-format" to integrate with your editor or IDE. * Follow the LLVM coding standards as defined by "clang-format" which can be found on the LLVM website. * If you have code that you are not editing reformat it into a separate commit. **Don't Do This:** * Ignore "clang-format" warnings or manually format code. * Introduce style inconsistencies into the codebase. * Disable "clang-format" checks in the build system. **Why This Matters:** * **Consistency:** Consistent formatting makes code easier to read and understand. * **Collaboration:** Consistent formatting reduces conflicts and improves collaboration between developers. * **Automation:** "clang-format" automates the formatting process, saving time and effort. ### 5.2. Naming Conventions **Description:** LLVM follows well-defined naming conventions for variables, functions, classes, and other identifiers. **Do This:** * Use descriptive names that clearly indicate the purpose of the identifier. * Follow the LLVM naming conventions for different types of identifiers. (e.g. CamelCase for class names, snake_case for methods, etc.) * Use short names for variables with limited scopes (e.g., loop indices). **Don't Do This:** * Use cryptic or ambiguous names that are difficult to understand. * Violate the LLVM naming conventions. * Use overly long names, especially when the context is clear. """c++ // Example of naming conventions class MyClassName { // Class name: CamelCase public: void my_method_name() { // Method name: snake_case int loop_index = 0; // Variable name (short name for limited scope) // ... } }; """ ## 6. Concurrency and Thread Safety ### 6.1. Thread-Safety Considerations **Description:** LLVM is increasingly used in multi-threaded environments. Therefore, thread safety is a critical concern. **Do This:** * Identify and protect shared data structures with appropriate locking mechanisms. * Use fine-grained locking to minimize contention and improve performance. * Follow the LLVM synchronization primitives and best practices. * Prefer immutable data structures where possible to avoid synchronization issues. **Don't Do This:** * Introduce data races or other concurrency bugs. * Use global variables without proper synchronization. * Assume that code is thread-safe without explicit verification. **Why This Matters:** * **Correctness:** Thread safety ensures that code behaves correctly in multi-threaded environments. * **Performance:** Efficient synchronization mechanisms minimize performance overhead. * **Scalability:** Thread-safe code scales better on multi-core processors. ### 6.2. LLVM's Synchronization Primitives **Description:** LLVM provides synchronization primitives (e.g., "llvm::Mutex", "llvm::LockGuard") that are optimized for its specific needs. **Do This:** * Prefer LLVM's synchronization primitives over platform-specific primitives. * Use "llvm::LockGuard" to ensure that locks are always released, even in the presence of exceptions. * Document the locking strategy used to protect shared data structures. **Don't Do This:** * Use raw mutexes and condition variables without proper RAII wrappers. * Hold locks for extended periods of time, blocking other threads unnecessarily. * Introduce deadlocks by acquiring locks in inconsistent orders. **Code Example:** """c++ #include "llvm/Support/Threading.h" #include <mutex> class ThreadSafeCounter { private: int counter = 0; mutable llvm::Mutex mutex; public: void increment() { std::lock_guard<llvm::Mutex> lock(mutex); counter++; } int getCount() const { std::lock_guard<llvm::Mutex> lock(mutex); return counter; } }; """ ## 7. Optimization and Performance ### 7.1. Code Profiling and Benchmarking **Description:** LLVM provides tools for profiling and benchmarking code to identify performance bottlenecks. **Do This:** * Use LLVM's profiling tools to identify hot spots in the code. * Create microbenchmarks to measure the performance of specific algorithms and data structures. * Use regression tests to ensure that performance improvements are not lost over time. **Don't Do This:** * Make performance optimizations without measuring their impact. * Ignore performance regressions in the test suite. * Optimize prematurely without identifying the real bottlenecks. ### 7.2. Data Structure Choices **Description:** The choice of data structures can have a significant impact on performance. **Do This:** * Choose data structures that are appropriate for the access patterns and data sizes. * Use efficient data structures for frequently accessed data. * Consider using specialized data structures provided by LLVM (e.g., "SmallVector"). **Don't Do This:** * Use inefficient data structures without considering the performance implications. * Rely on default data structures without performance testing. **Code Example:** """c++ #include "llvm/ADT/SmallVector.h" void processValues(llvm::SmallVector<int, 16> &values) { // Use SmallVector for small vectors that are frequently accessed for (int value : values) { // ... } } """ This comprehensive coding standards document serves as a guide for LLVM developers, helping to ensure consistency, maintainability, performance, and security within the LLVM project. By adhering to these guidelines, developers can contribute to a high-quality codebase that meets the needs of the LLVM community. This document provides specific and detailed instructions that are meant to be used by AI coding assistants.