Core Architecture Standards for Fly.io

Security Best Practices Standards for Fly.io

Fly.io

# Security Best Practices Standards for Fly.io This document outlines security best practices for developing and deploying applications on the Fly.io platform. Adhering to these standards will help protect your applications and data from common vulnerabilities and ensure a secure and reliable deployment. ## 1. Secure Configuration and Secrets Management ### 1.1. Secure Secrets Storage **Standard:** Never hardcode secrets directly in your application code, Dockerfiles, or configuration files. Use Fly.io's built-in secrets management. **Why:** Hardcoding secrets exposes them to anyone with access to your codebase or container images. Fly.io secrets are encrypted at rest and in transit, minimizing the risk of exposure. **Do This:** * Use "flyctl secrets" to manage secrets. """bash flyctl secrets set DATABASE_URL="postgres://user:password@host:port/database" flyctl secrets set API_KEY="your_super_secret_api_key" """ * Access secrets in your application code through environment variables. """python # Python example import os database_url = os.environ.get("DATABASE_URL") api_key = os.environ.get("API_KEY") if not database_url or not api_key: raise ValueError("Required secrets are not set.") # Use database_url and api_key to connect to your database and make API calls """ **Don't Do This:** * Hardcode secrets in your code: """python # Python example - BAD PRACTICE database_url = "postgres://user:password@host:port/database" api_key = "your_super_secret_api_key" """ * Store secrets in version control. * Expose secrets in logs. **Anti-Pattern:** Using ".env" files in production. While convenient for local development, they are not secure for production deployments and can easily be accidentally committed to source control or exposed. ### 1.2. Environment-Specific Configuration **Standard:** Separate configuration for development, staging, and production environments. **Why:** Using the same configuration across environments can lead to misconfiguration and security vulnerabilities. For example, using production API keys in a development environment could expose sensitive data. **Do This:** * Utilize Fly.io's built-in support for environment variables to specify configurations. * Use separate Fly.io apps for each environment (e.g., "myapp-dev", "myapp-staging", "myapp-prod"). * Create and manage environment-specific secrets using "flyctl secrets". """bash # Set secrets for the production app flyctl secrets set --app myapp-prod DATABASE_URL="..." API_KEY="..." # Set secrets for the staging app flyctl secrets set --app myapp-staging DATABASE_URL="..." API_KEY="..." """ **Don't Do This:** * Use the same secrets across all environments. * Rely on manual configuration changes between environments. **Code Example:** """toml # fly.toml - Example configuration for defining specific build arguments and env vars [build] builder = "dockerfile" # Pass in build-time variables that depend on target environment. # For example, NODE_ENV = "production" when building for production. build-target = "release" #example [env] PORT = "8080" [deploy] release_command = "/app/migrate_db" """ ### 1.3. Principle of Least Privilege **Standard:** Grant the minimum necessary privileges to users, applications, and services. **Why:** Limiting access reduces the potential impact of security breaches. If a compromised account or service has limited privileges, the attacker's ability to cause damage is significantly reduced. **Do This:** * Use Fly.io's RBAC (Role-Based Access Control) features documented here: (Fly.io currently offers limited RBAC). * Ensure applications running within VMs only have the permissions they need, using "USER" directives in Dockerfiles. * Configure firewall rules to restrict network access to only necessary ports and services. **Don't Do This:** * Run applications as root unless absolutely necessary. * Grant broad permissions to services or users without a specific justification. **Code Example (Dockerfile):** """dockerfile FROM ubuntu:latest # Update and install necessary packages RUN apt-get update && apt-get install -y --no-install-recommends \ python3 python3-pip # Create a non-root user RUN useradd -m -s /bin/bash appuser # Set the working directory WORKDIR /app # Copy application files COPY . . # Install Python dependencies RUN pip3 install -r requirements.txt --user # Change ownership of the application directory to the non-root user RUN chown -R appuser:appuser /app # Switch to the non-root user USER appuser # Command to run the application CMD ["python3", "app.py"] """ ### 1.4. Regular Security Audits and Updates **Standard:** Regularly review your application code, dependencies, and infrastructure for security vulnerabilities. Keep your software up-to-date with the latest security patches. **Why:** New vulnerabilities are discovered regularly. Staying up-to-date with security patches helps prevent exploits. Regular audits can identify potential vulnerabilities early. **Do This:** * Use automated vulnerability scanning tools (e.g., Snyk, Trivy) to scan your dependencies and container images. * Subscribe to security mailing lists and advisories for the technologies you use (e.g., Python, Node.js, PostgreSQL). * Regularly update your base images in your Dockerfiles. * Implement a process for reviewing and addressing security vulnerabilities promptly. **Don't Do This:** * Ignore security alerts or vulnerabilities. * Use outdated versions of software without security patches. **Code Example (using Snyk in a CI/CD pipeline):** """yaml # .github/workflows/security.yml - Example GitHub Actions workflow for running Snyk tests. name: Security Scan on: push: branches: [ main ] # or whatever your main branch is pull_request: branches: [ main ] jobs: snyk: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Run Snyk to check for vulnerabilities uses: snyk/actions/python@master # Or javascript etc, adjust as needed env: SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }} with: args: --file=requirements.txt --severity-threshold=high """ ## 2. Securing Network Communications ### 2.1. HTTPS for All Traffic **Standard:** Use HTTPS for all communication between clients and your Fly.io application. **Why:** HTTPS encrypts data in transit, preventing eavesdropping and man-in-the-middle attacks. **Do This:** * Allow fly.io to automatically provision TLS certificates for your application. Fly.io automatically provides free TLS certificates through Let's Encrypt. """bash flyctl certs show your-app-name.fly.dev """ * Ensure your application is configured to redirect HTTP traffic to HTTPS. **Don't Do This:** * Use plain HTTP for sensitive data. * Disable TLS encryption. **Code Example (configuring redirection in a web server):** """nginx # nginx configuration to redirect HTTP to HTTPS server { listen 80; server_name your-app-name.fly.dev; return 301 https://$host$request_uri; } server { listen 443 ssl; server_name your-app-name.fly.dev; # SSL certificate configuration ssl_certificate /etc/letsencrypt/live/your-app-name.fly.dev/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/your-app-name.fly.dev/privkey.pem; # ... other configurations ... } """ ### 2.2. Firewall Configuration **Standard:** Configure firewall rules (e.g., using iptables or UFW) to limit network access to only necessary ports and services. **Why:** Firewalls prevent unauthorized access to your application and reduce the attack surface. **Do This:** * Use Fly.io's private networking to isolate apps. * Use a tool like "ufw" to manage firewall rules inside of your VM. **Don't Do This:** * Leave unnecessary ports open to the public internet. * Disable the firewall. **Code Example (using "ufw" to allow only SSH and HTTP/HTTPS traffic):** """bash # Allow SSH access ufw allow OpenSSH # Allow HTTP traffic ufw allow 80 # Allow HTTPS traffic ufw allow 443 # Enable the firewall ufw enable # Check the firewall status ufw status """ ### 2.3. Mutual TLS (mTLS) **Standard:** Use mTLS for secure communication between services within your Fly.io private network. **Why:** mTLS provides strong authentication and encryption by requiring both the client and server to present valid certificates. **Do This:** * Generate client and server certificates using a tool like OpenSSL. * Configure your services to require client certificates during TLS handshakes. * Distribute client certificates securely. **Don't Do This:** * Use self-signed certificates in production without proper validation. * Store private keys in insecure locations. ### 2.4. Monitoring and Logging **Standard:** Implement comprehensive logging and monitoring to detect and respond to security incidents. **Why:** Logging and monitoring provide visibility into your application's behavior, allowing you to identify suspicious activity and security vulnerabilities. **Do This:** * Use a centralized logging system to collect logs from all your Fly.io applications and services (e.g., Grafana Loki). * Monitor key security metrics, such as authentication failures, API request rates, and error rates. **Don't Do This:** * Disable logging. * Store sensitive data in logs without proper redaction. * Ignore suspicious activity detected by monitoring systems. ## 3. Application Security ### 3.1. Input Validation and Output Encoding **Standard:** Validate all input data from clients and other services. Encode output data to prevent cross-site scripting (XSS) and other injection attacks. **Why:** Input validation prevents attackers from injecting malicious code or data into your application. Output encoding prevents injected code from being executed in the client's browser. **Do This:** * Use server-side validation to verify the format, type, and length of all input data. * Use a templating engine with automatic output encoding (e.g., Jinja2 for Python, Handlebars for JavaScript). **Don't Do This:** * Trust client-side validation alone. * Display raw user input without encoding. **Code Example (Python using Flask and Jinja2):** """python # Flask example with Jinja2 templating engine from flask import Flask, request, render_template import bleach app = Flask(__name__) @app.route('/', methods=['GET', 'POST']) def index(): if request.method == 'POST': # Validate the input name = request.form.get('name') if not name or len(name) > 100: return render_template('index.html', error='Invalid name') # Sanitize HTML input using bleach message = bleach.clean(request.form.get('message')) # Render the template with the sanitized message return render_template('index.html', name=name, message=message) return render_template('index.html') #index.html Jinja2 template <!DOCTYPE html> <html> <head> <title>Input Validation Example</title> </head> <body> {% if error %} <p style="color:red;">{{ error }}</p> {% endif %} <form method="post"> <label for="name">Name:</label><br> <input type="text" id="name" name="name"><br><br> <label for="message">Message:</label><br> <textarea id="message" name="message"></textarea><br><br> <input type="submit" value="Submit"> </form> {% if name and message %} <h2>Hello, {{ name }}!</h2> <p>Your message: {{ message }}</p> {% endif %} </body> </html> """ ### 3.2. Cross-Site Request Forgery (CSRF) Protection **Standard:** Implement CSRF protection to prevent attackers from forging requests on behalf of authenticated users. **Why:** CSRF attacks can allow attackers to perform unauthorized actions on behalf of logged-in users. **Do This:** * Use a CSRF token that is unique to each user session. * Include the CSRF token in all forms and AJAX requests. * Validate the CSRF token on the server before processing the request. **Don't Do This:** * Disable CSRF protection. * Use the same CSRF token for all users. **Code Example (Python using Flask and WTForms):** """python # Python using Flask and WTForms from flask import Flask, render_template, session, redirect, url_for from flask_wtf import FlaskForm, CSRFProtect from wtforms import StringField, SubmitField from wtforms.validators import DataRequired app = Flask(__name__) app.config['SECRET_KEY'] = 'your_secret_key' # Change this to a strong random key csrf = CSRFProtect(app) class MyForm(FlaskForm): name = StringField('Name', validators=[DataRequired()]) submit = SubmitField('Submit') @app.route('/', methods=['GET', 'POST']) def index(): form = MyForm() if form.validate_on_submit(): session['name'] = form.name.data return redirect(url_for('success')) return render_template('index.html', form=form) @app.route('/success') def success(): if 'name' in session: name = session['name'] return render_template('success.html', name=name) else: return redirect(url_for('index')) if __name__ == '__main__': app.run(debug=True) """ ### 3.3. Authentication and Authorization **Standard:** Implement strong authentication and authorization mechanisms to control access to your application. **Why:** Authentication verifies the identity of users, while authorization determines what resources they are allowed to access. **Do This:** * Use strong password policies (e.g., minimum length, complexity requirements). * Implement multi-factor authentication (MFA) for privileged accounts. * Use a role-based access control (RBAC) system to manage user permissions. * Store passwords securely using a strong hashing algorithm (e.g., bcrypt, Argon2). **Don't Do This:** * Store passwords in plain text. * Use weak or default passwords. * Grant excessive permissions to users. ### 3.4. Dependency Management **Standard:** Keep your application's dependencies up-to-date and use tools to detect and prevent vulnerable dependencies. **Why:** Vulnerabilities in dependencies can be exploited to compromise your application. **Do This:** * Use a dependency management tool (e.g., pip for Python, npm for Node.js) to manage your application's dependencies. * Regularly update your dependencies to the latest versions. * Use automated vulnerability scanning tools (e.g., Snyk, OWASP Dependency-Check). **Don't Do This:** * Use outdated dependencies without security patches. * Ignore security alerts from dependency scanning tools. ### 3.5. Error Handling and Logging **Standard:** Handle errors gracefully and log sufficient information to diagnose problems. **Why:** Proper error handling prevents sensitive information from being exposed to users. Logging provides valuable information for debugging and security incident response. **Do This:** * Implement a global error handler to catch unexpected exceptions. * Log errors with sufficient detail to identify the root cause. * Redact sensitive information (e.g., passwords, API keys) from logs. * Use structured logging to make logs easier to query and analyze. **Don't Do This:** * Expose stack traces or other sensitive information to users in error messages. * Log sensitive data in plain text. * Ignore errors or warnings. ## 4. Dockerfile and Image Security ### 4.1. Minimal Base Images **Standard:** Use minimal base images for your Docker containers to reduce the attack surface. **Why:** Smaller images contain fewer dependencies, reducing the number of potential vulnerabilities. **Do This:** * Use lightweight base images like Alpine Linux or distroless images. **Don't Do This:** * Use full-featured base images like Ubuntu or Debian unless necessary. **Code Example (using Alpine Linux as a base image):** """dockerfile FROM python:3.9-alpine # Install dependencies # Copy application files # Set the working directory # Command to run the application """ ### 4.2. Multi-Stage Builds **Standard:** Use multi-stage builds to separate build-time dependencies from runtime dependencies. **Why:** Multi-stage builds allow you to include build tools and dependencies in a temporary build environment, and then copy only the necessary artifacts to the final image. **Do This:** * Use separate "FROM" instructions for the build and runtime stages. * Copy only the necessary files and dependencies from the build stage to the runtime stage. **Don't Do This:** * Include unnecessary build tools or dependencies in the final image. **Code Example (using multi-stage build):** """dockerfile # Build Stage FROM golang:1.21 AS builder WORKDIR /app COPY go.mod go.sum ./ RUN go mod download COPY . ./ RUN go build -o /app/mybinary # Production Stage FROM alpine:latest WORKDIR /app COPY --from=builder /app/mybinary /app/mybinary CMD ["/app/mybinary"] """ ### 4.3. Image Scanning **Standard:** Scan your Docker images for vulnerabilities before deploying them to Fly.io. **Why:** Image scanning identifies potential vulnerabilities in your container images before they can be exploited. **Do This:** * Use a container image scanning tool (e.g., Trivy, Clair, Anchore). * Integrate image scanning into your CI/CD pipeline. * Address vulnerabilities identified by the scanner before deploying the image. This comprehensively describes Security Best Practices on Fly.io. Adherence will increase security for development teams and should be enforced in CI/CD.

DA

danielsoglCreated Mar 6, 2025

Component Design Standards for Fly.io

Fly.io

# Component Design Standards for Fly.io This document outlines the component design standards for applications deployed on Fly.io. Adhering to these guidelines will promote maintainability, reusability, performance, and security in your Fly.io applications. ## 1. Introduction to Component Design in Fly.io Component design in Fly.io focuses on creating modular, independent, and reusable parts of an application that are easy to develop, test, and maintain. Given Fly.io's geographically distributed nature, well-designed components also contribute to improved latency and resilience. In this context, "component" is a logical grouping of functionalities, often corresponding to modules, classes, or services. * **Goal:** Build robust, scalable, and maintainable applications on Fly.io. * **Focus:** Modularity, reusability, performance, and security. ## 2. Architectural Considerations ### 2.1 Microservices vs. Monolith with Modules Fly.io supports both microservice and monolithic architectures (with a modular design). The choice depends on the application's complexity and scalability needs. * **Microservices:** Independent, deployable services communicating over the network. Suited for complex applications requiring independent scaling and fault isolation. * **Monolith with Modules:** A single application with clear module boundaries internally. Suitable for smaller applications or when operational overhead of microservices is a concern. **Do This:** * For large applications, decompose into loosely coupled microservices, each handling a specific domain. * For smaller projects, leverage a modular approach within a monolithic application. **Don't Do This:** * Create tightly coupled microservices that lead to a distributed monolith. * Build a monolithic application with no modularity, resulting in unmaintainable code. **Why:** Microservices offer better scalability and fault isolation, while modular monoliths simplify development and deployment for smaller applications. Proper modularity reduces dependencies which helps isolate deployment errors and simplifies development. **Example (Microservice):** """dockerfile # Dockerfile for a user service FROM python:3.11-slim-bookworm WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "user_service.py"] """ **Example (Monolith with Modules):** """python # app.py from user_module import User from product_module import Product # Use the modules user = User(name="John Doe") product = Product(name="Awesome Product") print(f"User: {user.name}, Product: {product.name}") """ ### 2.2 Location Awareness on Fly.io Fly.io's ability to run applications close to users means components should be designed with location awareness in mind. * **Data locality:** Store and process data in the region closest to the users. * **Regional deployments:** Deploy specific components to particular Fly.io regions. **Do This:** * Use Fly.io's region routing features to direct traffic to the nearest instance of a component. * Implement caching strategies to minimize cross-region data access. **Don't Do This:** * Assume all users are geographically close to a single server. * Ignore latency implications of cross-region data access. **Why:** Minimizing latency improves the user experience and reduces bandwidth costs. **Example (Fly.io Region Routing with "fly.toml"):** """toml app = "my-app" primary_region = "iad" # Initial region [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 [[http_service.route]] service = "my-app-eu" # Example: Send requests from Europe to europe VMs path = "/api/europe" [deploy] regions = ["iad", "fra", "syd"] # Regions used for deployment """ ### 2.3 Fault Tolerance & Resilience Fly.io's distributed nature requires components to be fault-tolerant. * **Replication:** Run multiple instances of each component across different regions. * **Circuit Breakers:** Implement circuit breaker pattern to prevent cascading failures. * **Health checks:** Use Fly.io's health checks to monitor component availability and automatically restart failed instances. **Do This:** * Configure health checks for all critical components in your "fly.toml". * Use retry mechanisms with exponential backoff for communication between components. * Implement circuit breakers to isolate failing components. **Don't Do This:** * Rely on a single instance of a component without redundancy. * Allow one failing component to bring down the entire application. **Why:** Redundancy and fault isolation ensures higher availability and a better user experience. **Example (Fly.io Health Check in "fly.toml"):** """toml app = "my-app" primary_region = "iad" [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 [http_service.checks] path = "/healthz" # endpoint of your healthcheck interval = "10s" timeout = "5s" """ ## 3. Coding Standards for Components ### 3.1 Single Responsibility Principle (SRP) Each component should have one, and only one, reason to change. **Do This:** * Design classes and modules with a clear, focused purpose. * Refactor large components into smaller, more manageable units. **Don't Do This:** * Create "god classes" or modules that handle multiple unrelated tasks. **Why:** Makes components easier to understand, test, and maintain. **Example (Python SRP):** """python # Good: Separate classes for User and Email class User: def __init__(self, name, email): self.name = name self.email = email class EmailService: def send_welcome_email(self, user): print(f"Sending welcome email to {user.email}") # Bad: User class handles both user data and email sending class UserWithEmail: def __init__(self, name, email): self.name = name self.email = email def send_welcome_email(self): #Violates SRP: User shouldn't handle email print(f"Sending welcome email to {self.email}") user = User("John Doe", "john@example.com") email_service = EmailService() email_service.send_welcome_email(user) """ ### 3.2 Open/Closed Principle (OCP) Components should be open for extension but closed for modification. **Do This:** * Use inheritance or composition to add new functionality without modifying existing code. * Favor interfaces and abstract classes to decouple components. **Don't Do This:** * Directly modify existing code to add new features, risking regressions. **Why:** Reduces the risk of introducing bugs when adding new features. **Example (Python OCP):** """python # Good: Using Strategy Pattern from abc import ABC, abstractmethod class PaymentStrategy(ABC): @abstractmethod def pay(self, amount): pass class CreditCardPayment(PaymentStrategy): def pay(self, amount): print(f"Paying {amount} with credit card") class PayPalPayment(PaymentStrategy): def pay(self, amount): print(f"Paying {amount} with PayPal") class ShoppingCart: def __init__(self, payment_strategy: PaymentStrategy): self.payment_strategy = payment_strategy def checkout(self, amount): self.payment_strategy.pay(amount) # Bad: Modifying the ShoppingCart class directly class ShoppingCartBad: def checkout(self, amount, payment_method): if payment_method == "credit_card": print(f"Paying {amount} with credit card") elif payment_method == "paypal": print(f"Paying {amount} with PayPal") else: print("Invalid payment method") cart = ShoppingCart(CreditCardPayment()) cart.checkout(100) """ ### 3.3 Liskov Substitution Principle (LSP) Subtypes must be substitutable for their base types without altering the correctness of the program. **Do This:** * Ensure that subclasses correctly implement the behavior of their base classes. * Avoid introducing unexpected side effects in subclasses. **Don't Do This:** * Create subclasses that violate the contract of their base classes. **Why:** Prevents unexpected behavior and ensures that polymorphism works correctly. **Example (violating Liskov Substitution ):** """python class Rectangle: def __init__(self, width, height): self.width = width self.height = height def set_width(self, width): self.width = width def set_height(self, height): self.height = height def area(self): return self.width * self.height class Square(Rectangle): #violates LSP as Square's invariant is width == height def __init__(self, size): super().__init__(size, size) def set_width(self, width): self.width = width self.height = width def set_height(self, height): self.width = height self.height = height def print_area(rectangle: Rectangle): rectangle.set_width(5) rectangle.set_height(4) print(rectangle.area()) rectangle = Rectangle(2, 3) print_area(rectangle) # Output: 20 square = Square(2) print_area(square) # Output: 16 (incorrect if we expect a standard rectangle behavior) """ In this example, the "Square" class violates LSP because setting the width or height also sets the other dimension, which is not the behavior expected of a generic "Rectangle". ### 3.4 Interface Segregation Principle (ISP) Clients should not be forced to depend upon interfaces that they do not use. **Do This:** * Create small, specific interfaces instead of large, general-purpose ones. * Refactor interfaces to separate unrelated methods. **Don't Do This:** * Force classes to implement methods they don't need. **Why:** Reduces dependencies and improves code flexibility. **Example (Python ISP):** """python # Good: Separate interfaces for different functionalities from abc import ABC, abstractmethod class Printer(ABC): @abstractmethod def print_document(self, document): pass class Scanner(ABC): @abstractmethod def scan_document(self, document): pass class Copier(ABC): @abstractmethod def copy_document(self, document): pass # Bad: One large interface with all functionalities mixed class MultiFunctionDevice(ABC): @abstractmethod def print_document(self, document): pass @abstractmethod def scan_document(self, document): pass @abstractmethod def copy_document(self, document): pass class SimplePrinter(Printer): def print_document(self, document): print(f"Printing {document}") class AllInOnePrinter(Printer, Scanner, Copier): def print_document(self, document): print(f"Printing {document}") def scan_document(self, document): print(f"Scanning {document}") def copy_document(self, document): print(f"Copying {document}") """ A client needing only printing should not depend on the "Scanner" or "Copier" methods. ### 3.5 Dependency Inversion Principle (DIP) High-level modules should not depend on low-level modules. Both should depend on abstractions. Abstractions should not depend on details. Details should depend on abstractions. **Do This:** * Use dependency injection to provide dependencies to components. * Program to interfaces rather than concrete implementations. **Don't Do This:** * Hardcode dependencies within components. **Why:** Increases code flexibility and testability. **Example (Python DIP):** """python # Good: Using dependency injection class Switchable: def turn_on(self): raise NotImplementedError def turn_off(self): raise NotImplementedError class LightBulb(Switchable): def turn_on(self): print("LightBulb: turned on...") def turn_off(self): print("LightBulb: turned off...") class ElectricPowerSwitch: def __init__(self, client: Switchable): self.client = client self.on = False def press(self): if self.on: self.client.turn_off() self.on = False else: self.client.turn_on() self.on = True # Bad: Hardcoded dependency class SwitchBad: def __init__(self): self.bulb = LightBulb() #Concrete dependency = Bad self.on = False def press(self): if self.on: self.bulb.turn_off() self.on = False else: self.bulb.turn_off() self.on = True bulb = LightBulb() switch = ElectricPowerSwitch(bulb) #Dependency Injection switch.press() switch.press() """ ## 4. Fly.io Specific Considerations ### 4.1 Using Fly.io Volumes Components that require persistent storage should leverage Fly.io Volumes. **Do This:** * Mount volumes to specific directories in your Fly.io instances. * Use volumes to store data that needs to persist across deployments. **Don't Do This:** * Store persistent data within the container's filesystem, risking data loss on restarts. **Why:** Volumes provide reliable and persistent storage for your applications. **Example (Fly.io Volume Configuration in "fly.toml"):** """toml app = "my-data-app" primary_region = "ord" [build] [deploy] release_command = "/app/migrate_db.sh" [[mounts]] source = "data_volume" # Existing volume name destination = "/data" # Where the volume is mounted [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 """ ### 4.2 Fly.io Secrets Management Securely manage sensitive information using Fly.io Secrets. **Do This:** * Store API keys, database passwords, and other sensitive data as Fly.io Secrets. * Access secrets in your code using environment variables. **Don't Do This:** * Hardcode secrets in your code or configuration files. * Commit secrets to your version control system. **Why:** Protects sensitive data and prevents unauthorized access. **Example (Accessing Fly.io Secret in Python):** """python import os database_password = os.environ.get("DATABASE_PASSWORD") # Use the password to connect to the database print(f"Connecting to database with password: {database_password}") """ ### 4.3 Fly.io Edge Network and Global Distribution Leverage Fly.io's edge network for improved performance. **Do This:** * Configure your services to take full advantage of the Fly.io global network. * Utilize region pinning when needing to ensure consistency as a trade-off. **Don't Do This:** * Ignore latency implications of not using Fly.io's global network effectively. **Why:** Reduced latency provides a better user experience ## 5. Component Communication ### 5.1 REST APIs Use REST APIs for synchronous communication between components. **Do This:** * Design REST APIs using standard HTTP methods and status codes. * Use a consistent API versioning strategy. * Implement proper authentication and authorization for API endpoints. **Don't Do This:** * Expose internal implementation details through the API. * Create overly complex or inconsistent APIs. **Why:** REST APIs are well-established and easy to understand, enabling interoperability ### 5.2 Message Queues (e.g. Redis, NATS) Use message queues for asynchronous communication between components. **Do This:** * Choose a message queue that fits your application's needs (e.g., Redis, RabbitMQ, NATS). * Design message formats that are easy to serialize and deserialize. * Implement error handling and retry mechanisms for message processing. **Don't Do This:** * Use message queues for synchronous operations that require immediate responses. * Create overly complex messaging topologies. **Why:** Message queues enable decoupling, asynchronous processing, and improved scalability. Fly.io makes it easy to deploy Redis and NATS in a colocated fashion. ### 5.3 gRPC Consider gRPC for high-performance communication between internal components. **Do This:** * Define gRPC services using Protocol Buffers. * Generate code for both client and server using gRPC tools. * Implement proper error handling and logging. **Don't Do This:** * Use gRPC for external APIs that need to be easily accessible to a wide range of clients. * Overcomplicate gRPC service definitions. **Why:** gRPC provides high performance, efficient serialization, and strong typing. It typically requires more sophistication than REST. ## 6. Testing ### 6.1 Unit Testing Write unit tests for all components to verify their functionality in isolation. **Do This:** * Use a testing framework appropriate for your language (e.g., pytest for Python, JUnit for Java). * Write tests that cover all possible code paths and edge cases. * Use mocks and stubs to isolate components from their dependencies. **Don't Do This:** * Skip unit testing or write tests that are too superficial. * Write tests that are tightly coupled to the implementation details of the tested components. **Why:** Unit tests ensure that components function correctly and prevent regressions. ### 6.2 Integration Testing Write integration tests to verify the interaction between different components. **Do This:** * Test the communication between components using real or simulated dependencies. * Verify that data is correctly passed between components and that the overall system behaves as expected. **Don't Do This:** * Skip integration testing or write tests that are too narrow in scope. * Rely solely on unit tests without verifying how components work together. **Why:** Integration tests ensure that components work together correctly. ### 6.3 End-to-End Testing Write end-to-end tests to verify the entire application flow from the user interface to the backend. **Do This:** * Use a testing framework that simulates user interactions (e.g., Selenium, Cypress). * Test the entire application flow from the user interface to the backend. * Verify that the application meets the user's requirements. **Don't Do This:** * Skip end-to-end testing or write tests that are too complex and brittle. * Rely solely on unit and integration tests without verifying the end-to-end user experience. **Why:** End-to-end tests ensure that the application meets the user's requirements and provides a good user experience. ## 7. Monitoring and Logging ### 7.1 Centralized Logging Use a centralized logging system to collect and analyze logs from all components. **Do This:** * Use a logging framework appropriate for your language (e.g., log4j for Java, logging for Python). * Configure components to log all important events, including errors, warnings, and informational messages. * Use a tool such as Grafana Loki or similar system for log aggregation. **Don't Do This:** * Skip logging or rely solely on local log files. * Log sensitive data such as passwords or API keys. **Why:** Centralized logging enables easier troubleshooting, performance monitoring, and security analysis. ### 7.2 Metrics Collection Collect metrics from all components to monitor their performance and resource usage. **Do This:** * Use a metrics library appropriate for your language (e.g., Prometheus client libraries). * Collect metrics such as CPU usage, memory usage, network traffic, and request latency. * Use a monitoring system such as Prometheus or Grafana to visualize and analyze metrics. **Don't Do This:** * Skip metrics collection or collect only a limited set of metrics. * Use metrics that are not meaningful or actionable. **Why:** Metrics provide valuable insights into the health and performance of your components. ### 7.3 Tracing Implement distributed tracing to track requests as they flow through different components. **Do This:** * Use a tracing library such as Jaeger or Zipkin. * Instrument code to generate spans for each request as it enters and exits a component. * Use a tracing backend to collect and visualize traces. **Don't Do This:** * Skip tracing or trace only a limited set of requests. * Create traces that are too granular or lack context. **Why:** Tracing enables you to identify performance bottlenecks and diagnose issues in distributed systems. Fly.io has solid support for well created tracing setups.

DA

danielsoglCreated Mar 6, 2025

State Management Standards for Fly.io

Fly.io

# State Management Standards for Fly.io This document outlines coding standards and best practices for managing application state within Fly.io applications. It provides guidance on data flow, reactivity, and state management options specific to the Fly.io environment. These standards aim to improve maintainability, performance, and scalability of your deployments. ## 1. Introduction to State Management on Fly.io Effective state management is crucial for building robust and scalable applications on Fly.io. Fly.io's distributed architecture presents unique challenges and opportunities for managing state, requiring careful consideration of data consistency, latency, and resilience. This guide covers approaches ranging from simple in-memory state to distributed databases and caching strategies. ## 2. Choosing the Right State Management Approach Selecting the appropriate state management solution is a critical architectural decision influenced by application requirements, data volume, consistency needs, and performance goals. ### 2.1. Factors to Consider * **Data Consistency:** Determine the required consistency level (e.g., eventual consistency, strong consistency). Strong consistency typically involves more complex setups and potential performance trade-offs, but is necessary for sensitive data. * **Data Volume:** Consider the volume of data that needs to be stored and managed. Small amounts of session data may be effectively handled in-memory, while large datasets require database solutions. * **Latency Requirements:** Analyze latency constraints based on the application's user experience needs. Caching and geographically distributed data stores can minimize latency. * **Scalability:** Choose solutions that can scale horizontally to handle increasing traffic and data volume. Stateless application components coupled with externalized, scalable state management solutions are generally preferred. * **Complexity:** Balance the need for sophisticated state management with the overhead of implementation and maintenance. Start with simpler solutions and only introduce more complex tools when necessary. ### 2.2. State Management Options * **In-Memory State:** Suitable for small amounts of ephemeral data (e.g., temporary UI state) or data that can be easily regenerated. Do not rely solely on in-memory state for critical data as instances can be terminated or restarted. * *When to Use:* Transient, non-critical UI state, caching frequently accessed but non-essential data. * *Example:* A simple counter application. * **Fly.io Volumes:** Persistent storage within a region. Volumes are attached to a single VM in one region. * *When to Use:* Stateful applications within a specific region that need persistent storage. Ideal for databases where regional locality is desired. Can be combined with replicated databases for higher availability. * *Example:* Postgres data directory on a dedicated VM within a region. * **Fly.io Postgres:** Fly.io-managed Postgres clusters distributed globally. Provides automated backups, scaling, and fault tolerance. Ideal for transactional data with standard SQL semantics. * *When to Use:* Applications requiring standard SQL functionality with ACID properties, automated backups and scaling. * *Example:* Storing user data, product catalogs, order information. * **Key-Value Stores (Redis, Memcached):** Fast, in-memory data stores suitable for caching and session management. Generally provide eventual consistency. * *When to Use:* Caching frequently accessed data, managing user sessions, rate limiting. * *Example:* Caching API responses, session data for authenticated users. * **Distributed Databases (CockroachDB, YugabyteDB):** Distributed SQL databases providing strong consistency, fault tolerance, and scalability. * *When to Use:* Applications requiring strong consistency, high availability, and global distribution of data. * *Example:* Financial transactions, inventory management, global user profiles. * **Object Storage (AWS S3, Google Cloud Storage):** Storing large unstructured data such as images, videos, and backups. * *When to Use:* Storing static assets, large media files, and backups. * *Example:* User-uploaded photos, video content, database backups. ## 3. State Management Standards The following standards apply to all state management solutions used within Fly.io applications. ### 3.1. General Principles * **Do This:** Externalize all persistent application state. Avoid storing critical data solely within application instances. * **Why:** Fly.io instances are ephemeral and can be restarted or relocated. Data stored only in-memory will be lost. * **Don't Do This:** Rely on local file storage within the VM for important data unless using Fly.io Volumes when region-specific affinity is satisfactory. * **Why:** Instance failures or relocation will result in data loss. * **Do This:** Favor stateless application components whenever possible. * **Why:** Simplifies scaling, deployment, and recovery in a distributed environment. ### 3.2. Configuration and Secrets * **Do This:** Store configuration and secrets using Fly.io's secrets management. * **Why:** Securely injects environment variables at runtime, avoiding hardcoding. * **Don't Do This:** Commit secrets directly to source code or include them in Docker images. * **Why:** Compromises security and violates best practices. * **Example:** Setting a database password as a Fly.io secret. """bash fly secrets set DATABASE_PASSWORD=your_secret_password """ Accessing it in the application (Node.js): """javascript const dbPassword = process.env.DATABASE_PASSWORD; """ ### 3.3. Database Connections * **Do This:** Use connection pooling to efficiently manage database connections. * **Why:** Reduces connection overhead and improves application performance by reusing existing connections. * **Do This:** Set appropriate connection timeouts to prevent resource exhaustion. * **Why:** Avoids connections being held open indefinitely, especially during network issues. * **Do This:** Use environment variables to configure database connection strings. * **Why:** Allows dynamic configuration based on the environment (development, staging, production). * **Example:** Connecting to Fly.io Postgres with connection pooling (Node.js with "pg"): """javascript const { Pool } = require('pg'); const pool = new Pool({ connectionString: process.env.DATABASE_URL, max: 20, // Max number of clients in the pool idleTimeoutMillis: 30000, // Close idle clients after 30 seconds connectionTimeoutMillis: 2000, // Return an error after 2 seconds if connection could not be established }); module.exports = { query: (text, params) => pool.query(text, params), }; // Example usage: async function fetchData() { const { rows } = await pool.query('SELECT NOW()'); console.log(rows[0]); } """ ### 3.4. Caching * **Do This:** Implement caching for frequently accessed data to reduce database load and improve response times. * **Why:** Caching minimizes latency and improves application performance by serving data from memory. * **Do This:** Use appropriate cache invalidation strategies to ensure data consistency. * **Why:** Avoid serving stale data to users. Implement time-based expiration (TTL) or event-based invalidation. * **Do This:** Consider using a distributed cache like Redis or Memcached for shared caching across multiple application instances. * **Why:** Provides a centralized cache that can be accessed by all instances. * **Example:** Using Redis for caching (Node.js with "ioredis"): """javascript const Redis = require('ioredis'); const redis = new Redis(process.env.REDIS_URL); // Connect to Redis async function getCachedData(key, fetchData) { const cachedData = await redis.get(key); if (cachedData) { return JSON.parse(cachedData); } const data = await fetchData(); // Fetch data from source await redis.set(key, JSON.stringify(data), 'EX', 3600); // Cache for 1 hour (3600 seconds) return data; } // Example usage: async function fetchUserData() { // Logic to fetch user data from the database return { id: 123, name: 'John Doe' }; } async function getUser(userId) { const cacheKey = "user:${userId}"; const userData = await getCachedData(cacheKey, fetchUserData); console.log(userData); } """ ### 3.5. Session Management * **Do This:** Store session data in a reliable external data store (e.g., Redis, database). * **Why:** Ensures session persistence across instance restarts and scaling events. * **Do This:** Use secure session cookies with appropriate attributes (e.g., "HttpOnly", "Secure", "SameSite"). * **Why:** Enhances security by preventing cross-site scripting (XSS) and cross-site request forgery (CSRF) attacks. * **Do This:** Implement session expiration and regular session cleanup to prevent resource exhaustion. * **Why:** Prevents accumulation of orphaned session data. * **Example:** Express.js session configuration using Redis (Node.js with "connect-redis" and "express-session"): """javascript const session = require('express-session'); const RedisStore = require('connect-redis').default; const Redis = require('ioredis'); const redisClient = new Redis(process.env.REDIS_URL); app.use(session({ store: new RedisStore({ client: redisClient }), secret: process.env.SESSION_SECRET, resave: false, saveUninitialized: false, cookie: { secure: process.env.NODE_ENV === 'production', // Only send over HTTPS in production httpOnly: true, // Prevent client-side JavaScript access sameSite: 'strict', // Prevent CSRF attacks maxAge: 24 * 60 * 60 * 1000, // Session expires after 24 hours } })); """ ### 3.6. Data Replication and Distribution * **Do This:** Consider using data replication or distribution strategies to improve availability and reduce latency for geographically distributed users. * **Why:** Provides redundancy and faster access to data by placing it closer to users. * **Do This:** Use caution regarding eventual consistency. Always handle conflict resolution and data reconciliation properly. * **Fly.io Postgres:** Use multi-region Postgres clusters for automatic data replication and failover. ### 3.7. Monitoring and Logging * **Do This:** Implement comprehensive monitoring and logging to track state management performance and identify potential issues. * **Why:** Allows proactive identification and resolution of problems. * **Do This:** Log relevant state transitions and errors to facilitate debugging. * **Why:** Provides insight into application behavior and helps diagnose root causes of issues. * **Do This:** Monitor database connection pool usage, cache hit rates, and other key metrics. * **Why:** Provides early warnings of performance bottlenecks or resource exhaustion. ## 4. Technology-Specific State Management ### 4.1. Remix Remix handles data loading and mutations through Actions and Loaders. Leverage this mechanism for Fly.io specific considerations. * **Do This:** Use "getSession" and "commitSession" for managing user sessions backed by a database or Redis. """javascript // Session management example using Remix: import { createCookieSessionStorage } from "@remix-run/node"; // or cloudflare/deno const { getSession, commitSession, destroySession } = createCookieSessionStorage({ cookie: { name: "__session", httpOnly: true, path: "/", sameSite: "lax", secrets: ["s3cret"], secure: process.env.NODE_ENV === "production", }, }); export { getSession, commitSession, destroySession }; """ * **Do This:** For Remix applications, consider using Fly.io Volumes for persistent storage where regional performance is desired. * **Don't Do This:** Avoid directly manipulating localStorage or sessionStorage for critical application state within Remix, as this data is client-side only and is not persisted across different devices and browsers.. ### 4.2. Next.js Next.js offers various options for state management ranging from built-in solutions to third-party libraries. * **Do This:** For global state, utilize Context API with "useReducer" or state management libraries like Zustand or Jotai. These integrate well with Server Components and provide efficient updates. * **Do This:** If you are using Next.js App Router, consider using Server Actions for data mutations, which allow you to execute server-side code directly from your components. Data persistence should still be handled with external databases or storage solutions. """javascript // Example Server Action for submitting a form 'use server' export async function createInvoice(formData: FormData) { const rawFormData = { customerId: formData.get('customerId'), amount: formData.get('amount'), status: formData.get('status'), }; // Persist the data to a database await createInvoiceInDb(rawFormData); // Replace with your DB persistence logic revalidatePath('/dashboard/invoices'); // Optional: Revalidate cache automatically after mutation redirect('/dashboard/invoices'); // Optional: Redirect user to another page } // In your component import { createInvoice } from './actions'; import { useFormState } from 'react-dom' export default function Page() { const [state, dispatch] = useFormState(createInvoice, null); return ( <form action={dispatch}> {/* Form fields */} <button type="submit">Create Invoice</button> </form> ); } """ * **Don't Do This:** Rely exclusively on "getServerSideProps" for handling all dynamic data, especially if the data isn't truly required for initial page render. This can negatively impact performance. ### 4.3. General State Management Libraries (Redux, Zustand, Jotai) * **Do This:** Centralize state updates with reducers or update functions. * **Do This:** Use asynchronous actions or middleware (e.g., Redux Thunk, Redux Saga) for handling data fetching and other side effects. * **Do This:** Optimize state updates to prevent unnecessary re-renders. Use selectors or memoization techniques to derive state from the global store. ## 5. Anti-Patterns * **Over-Reliance on Global State:** Avoid storing unnecessary data in global state, which can lead to performance issues and make debugging difficult. * **Ignoring Concurrency Issues:** Be mindful of concurrency issues when updating shared state, especially in a distributed environment. Use appropriate locking mechanisms or optimistic concurrency control. * **Lack of Monitoring:** Failing to monitor state management performance can lead to undetected issues and performance bottlenecks. ## 6. Optimizing for Fly.io's Architecture Fly.io offers a globally distributed platform, allowing you to place your application instances close to your users. This can significantly reduce latency, but requires careful consideration of data locality and consistency. * **Regional Data Affinity:** Consider the implications of placing data within a specific region. Data stored on a Fly.io Volume is tied to that region. This is useful when data is primarily accessed by users in that region, but can increase latency for users accessing data from other regions. * **Global Data Replication:** For data that needs to be accessed globally with low latency, consider using Fly.io Postgres with multi-region replication or a globally distributed database like CockroachDB or YugabyteDB. * **Caching Strategies:** Use a tiered caching approach to minimize latency. Cache frequently accessed data close to the user using client-side caching (e.g., browser cache, service worker) or edge caching (e.g., Fly.io CDN). For shared data, use a distributed cache like Redis. ## 7. Conclusion By following these coding standards, you can build robust, scalable, and maintainable applications on Fly.io. Choosing the right state management solution and following best practices for configuration, caching, session management, and monitoring will significantly improve the performance, reliability, and security of your deployments. Always consider the specific requirements of your application and the unique characteristics of the Fly.io environment when making state management decisions.

DA

danielsoglCreated Mar 6, 2025

Performance Optimization Standards for Fly.io

Fly.io

# Performance Optimization Standards for Fly.io This document outlines the coding standards focused on performance optimization for applications deployed on Fly.io. Adhering to these standards will lead to faster, more responsive, and resource-efficient applications. These standards are tailored for the latest version of Fly.io and incorporate modern approaches for optimal performance within the Fly.io ecosystem. ## 1. Architectural Considerations for Performance ### 1.1. Region Selection and Geographic Distribution **Standards:** * **Do This:** Deploy your application to multiple regions closest to your users. Use Fly.io's built-in support for global deployments to minimize latency. * **Don't Do This:** Deploy only to a single region, especially if your user base is geographically distributed. **Why:** Reduces latency by serving users from the nearest available region. Improves availability by distributing load across multiple regions. **Code Example (fly.toml):** """toml app = "my-fly-app" primary_region = "iad" # Initial region [regions] [[regions.group]] codes = ["iad", "lhr", "syd"] #Expand reach source = "primary" console_command = "/app/bin/my-fly-app migrate" [build] [deploy] release_command = "/app/bin/my-fly-app migrate" strategy = "rolling" [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 processes = ["app"] [[http_service.ports]] port = 80 handlers = ["http"] [[http_service.ports]] port = 443 handlers = ["tls", "http"] [experimental] allowed_public_ports = [] [[services]] protocol = "tcp" internal_port = 8080 processes = ["app"] [[services.ports]] port = 80 handlers = ["http"] [[services.ports]] port = 443 handlers = ["tls", "http"] """ **Anti-Pattern:** Hardcoding region-specific logic into the application code. Use Fly.io's configuration and routing features instead. ### 1.2. Database Proximity **Standards:** * **Do This:** Locate your database (e.g., Postgres, Redis) in the same region as your application servers whenever possible to minimize network latency. Consider using Fly.io's managed Postgres or Redis services. * **Don't Do This:** Access a database across regions unless absolutely necessary. **Why:** Reduces latency for database queries, improving overall application responsiveness. **Code Example (Connecting to Fly.io Postgres):** """python import psycopg2 import os # Fetch database credentials from environment variables db_host = os.environ.get("FLY_POSTGRES_FQDN") db_name = os.environ.get("PGDATABASE") db_user = os.environ.get("PGUSER") db_password = 'your_db_password' # Better to get this from a secret try: conn = psycopg2.connect( host=db_host, database=db_name, user=db_user, password=db_password, port=5432 # Usually 5432 for PostgreSQL ) print("Database connection successful") cur = conn.cursor() cur.execute("SELECT version();") db_version = cur.fetchone() print(f"PostgreSQL version: {db_version}") cur.close() conn.close() except psycopg2.Error as e: print(f"Error connecting to database: {e}") """ **Anti-Pattern:** Ignoring database latency. Profile database queries to identify and optimize slow operations. ### 1.3. Caching Strategies **Standards:** * **Do This:** Implement caching at multiple levels: browser, CDN (using Fly.io's global edge network), application server (in-memory), and database (query caching). Use appropriate cache invalidation strategies. Implement HTTP caching headers (e.g., "Cache-Control", "Expires"). * **Don't Do This:** Rely solely on database caching. Cache frequently accessed data closer to the user. **Why:** Reduces load on application servers and databases, resulting in faster response times and lower resource utilization. **Code Example (HTTP Caching with Flask):** """python from flask import Flask, make_response app = Flask(__name__) @app.route('/') def index(): response = make_response("<h1>Hello, World!</h1>") response.headers['Cache-Control'] = 'public, max-age=3600' # Cache for 1 hour return response if __name__ == '__main__': app.run(debug=True) """ **Anti-Pattern:** Aggressively caching dynamic content. Use appropriate cache invalidation techniques when data changes. ### 1.4. Connection Pooling **Standards:** * **Do This:** Use connection pooling for database connections to reduce the overhead of establishing new connections for each request. * **Don't Do This:** Create a new database connection for every request, especially under high load. **Why:** Reduces database load and improves application response time by reusing existing connections. **Code Example (Connection Pooling with SQLAlchemy):** """python from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker import os db_host = os.environ.get("FLY_POSTGRES_FQDN") db_name = os.environ.get("PGDATABASE") db_user = os.environ.get("PGUSER") db_password = 'your_db_password' # get this from a secrets manager! # Database URL (adjust username, password, host, and database name) db_url = f"postgresql://{db_user}:{db_password}@{db_host}/{db_name}" # Create a database engine with connection pooling engine = create_engine(db_url, pool_size=5, max_overflow=10) # Adjust pool_size and max_overflow # Create a session factory Session = sessionmaker(bind=engine) # Example Usage: def get_data_from_db(): session = Session() try: # Perform database operations using the session # Example: # results = session.query(MyTable).all() print("Querying the DB... Replace with your actual query here") except Exception as e: print(f"Error during database operation: {e}") finally: session.close() # Always close the session! if __name__ == '__main__': get_data_from_db() """ **Anti-Pattern:** Setting the connection pool size too small or too large. Tune based on application load and database capacity. ## 2. Code-Level Optimizations ### 2.1. Efficient Data Structures and Algorithms **Standards:** * **Do This:** Choose appropriate data structures (e.g., dictionaries, sets) and algorithms (e.g., sorting algorithms, search algorithms) for the specific task. Optimize for time and space complexity appropriately. * **Don't Do This:** Use inefficient data structures or algorithms that lead to slow execution or high memory consumption. **Why:** Improves application performance by minimizing resource usage and execution time. **Code Example (Using Sets for Efficient Membership Testing):** """python my_list = [1, 2, 3, 4, 5] #Original Data my_set = set(my_list) # Convert to Set #Checking for membership is much faster in sets, if you only need this functionality if 3 in my_set: print("3 exists in my_set") if 6 in my_set: print("6 exists in my_set") else : print("6 does not exist in my_set") """ **Anti-Pattern:** Linear search on large, unsorted lists. Consider using binary search or hash tables. ### 2.2. Asynchronous Operations **Standards:** * **Do This:** Use asynchronous operations (e.g., async/await in Python, Promises in JavaScript) for I/O-bound tasks such as network requests, file I/O, and database queries to avoid blocking the main thread. * **Don't Do This:** Perform blocking I/O operations on the main thread. **Why:** Prevents blocking the event loop, allowing the application to handle more requests concurrently. Improves responsiveness and throughput. **Code Example (Asynchronous HTTP Request with Python aiohttp):** """python import asyncio import aiohttp async def fetch_data(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.text() async def main(): data = await fetch_data('https://example.com') print(data[:100]) # Print the first 100 characters if __name__ == '__main__': asyncio.run(main()) """ **Anti-Pattern:** Mixing synchronous and asynchronous code without proper thread management. Use appropriate executors or thread pools. ### 2.3. Resource Management **Standards:** * **Do This:** Explicitly release resources such as file handles, database connections, and memory as soon as they are no longer needed. Use "try...finally" blocks or context managers ("with" statement in Python) to ensure proper resource cleanup. Utilize Fly.io's autoscaling to efficiently use resources. Consider autoscaling to zero during off-peak hours. * **Don't Do This:** Leak resources, which can lead to memory exhaustion or other performance problems. **Why:** Prevents resource leaks, ensuring efficient utilization of system resources. Improves application stability and scalability. **Code Example (Using "with" Statement for File Handling):** """python try: with open('my_file.txt', 'r') as f: data = f.read() print(data) except FileNotFoundError:

DA

danielsoglCreated Mar 6, 2025

Testing Methodologies Standards for Fly.io

Fly.io

# Testing Methodologies Standards for Fly.io This document outlines testing methodologies standards for Fly.io applications, covering unit, integration, and end-to-end testing. It provides guidance on how to apply these principles specifically within the Fly.io environment, highlighting platform-specific considerations and best practices. ## 1. General Testing Principles for Fly.io These principles apply to all levels of testing and are crucial for ensuring the reliability and performance of Fly.io applications. * **Do This:** Employ the testing pyramid, prioritizing unit tests, then integration tests, and finally end-to-end tests, balancing coverage cost. * **Don't Do This:** Rely heavily on end-to-end tests at the expense of unit tests, as they offer less isolation and slower feedback. **Why:** A balanced testing approach provides a comprehensive view of application correctness, catching issues early and often. Unit tests quickly verify individual components, integration tests validate interactions between components, and end-to-end tests ensure the entire system behaves as expected. Focusing too much on end-to-end tests makes debugging more difficult and slow down development cycles. * **Do This:** Write tests that are independent, repeatable, and deterministic. * **Don't Do This:** Create tests that depend on external services or require specific data states that are hard to reproduce consistently. **Why:** Reliable tests provide a strong foundation for continuous integration and continuous delivery (CI/CD). Non-deterministic tests undermine trust in the testing process and can lead to false positives or negatives. By isolating dependencies and ensuring repeatable test environments. * **Do This:** Use descriptive test names that clearly explain what the test is verifying. * **Don't Do This:** Use vague or cryptic test names that make it difficult to understand the purpose of the test. **Why:** Clear test names improve readability and maintainability. When a test fails, a descriptive name allows developers to quickly understand the issue and its context. ### 1.1 Fly.io Specific Considerations * **Do This:** Consider regional testing when deploying to multiple Fly.io regions. Write tests that verify regional data consistency and performance (latency). * **Don't Do This:** Assume that your application behaves identically across all regions without specific checks in place. **Why:** Fly.io's multi-region deployments introduce complexity. Regional data replication and network latency can impact application behavior differently across the globe. * **Do This:** Include tests that simulate Fly.io platform events, such as restarts, scaling, and health checks. Create corresponding unit tests that cover the handler logic. * **Don't Do This:** Assume that the application will always run uninterrupted. **Why:** Fly.io is a dynamic platform. Handling events such as restarts and scaling gracefully is crucial for ensuring high availability and a smooth user experience. Testing ensures that the application recovers correctly from unexpected events. ## 2. Unit Testing Unit testing focuses testing individual components of an application in isolation. * **Do This:** Write unit tests for all non-trivial functions and methods. * **Don't Do This:** Skip unit testing for "simple" functions, as they can still contain errors and are often refactored later. **Why:** Unit tests are the fastest and most reliable way to catch errors early in the development cycle. They provide a safety net when refactoring code and improve overall code quality. * **Do This:** Use mocking and stubbing techniques to isolate units of code from external dependencies. * **Don't Do This:** Directly call external services or databases in unit tests. This makes tests slow, unreliable, and difficult to maintain. **Why:** Isolation is key to effective unit testing. Mocking and stubbing allow you to control the behavior of dependencies, ensuring focused tests that verify the logic of a single unit of code. ### 2.1 Code Examples (Go) """go package main import ( "testing" "net/http" "net/http/httptest" ) func GetGreeting(name string) string { return "Hello, " + name + "!" } func handler(w http.ResponseWriter, r *http.Request) { name := r.URL.Query().Get("name") greeting := GetGreeting(name) w.WriteHeader(http.StatusOK) w.Write([]byte(greeting)) } func TestGetGreeting(t *testing.T) { expected := "Hello, World!" actual := GetGreeting("World") if actual != expected { t.Errorf("Expected %s, but got %s", expected, actual) } } func TestHandler(t *testing.T) { req, err := http.NewRequest("GET", "/?name=Test", nil) if err != nil { t.Fatal(err) } rr := httptest.NewRecorder() handler := http.HandlerFunc(handler) handler.ServeHTTP(rr, req) if status := rr.Code; status != http.StatusOK { t.Errorf("handler returned wrong status code: got %v want %v", status, http.StatusOK) } expected := "Hello, Test!" if rr.Body.String() != expected { t.Errorf("handler returned unexpected body: got %v want %v", rr.Body.String(), expected) } } """ In this example, "TestGetGreeting" tests the "GetGreeting" function in isolation. "TestHandler" tests the HTTP handler, mocking HTTP requests. ### 2.2 Fly.io Specific Unit Testing Examples Given a Fly.io app that relies on the "FLY_REGION" environment variable: """go package main import ( "os" "testing" ) func GetRegion() string { region := os.Getenv("FLY_REGION") if region == "" { return "unknown" } return region } func TestGetRegion(t *testing.T) { // Set the FLY_REGION environment variable for testing os.Setenv("FLY_REGION", "ord") defer os.Unsetenv("FLY_REGION") // Clean up after the test expected := "ord" actual := GetRegion() if actual != expected { t.Errorf("Expected region %s, but got %s", expected, actual) } } func TestGetRegion_NoEnv(t *testing.T) { // Ensure FLY_REGION is not set for this test case os.Unsetenv("FLY_REGION") expected := "unknown" actual := GetRegion() if actual != expected { t.Errorf("Expected region %s, but got %s", expected, actual) } } """ **Why:** These tests ensure that the application correctly retrieves and handles the "FLY_REGION" environment variable, which is critical for region-aware logic within a Fly.io application. The "defer os.Unsetenv("FLY_REGION")" ensures that the environment variable is cleaned up after the first test, preventing interference with other tests. Testing with no environment variable ensures that the code handles unexpected situations and defaults correctly. ## 3. Integration Testing Integration testing focuses testing interactions between different components or services of an application. * **Do This:** Test the interactions between modules, services, or databases to ensure they work together correctly. * **Don't Do This:** Test individual units of code in isolation during integration testing. That's the scope of unit tests. **Why:** Integration tests verify that components correctly exchange data and behave as expected when integrated. They catch issues that are not apparent when testing individual units of code. * **Do This:** Use lightweight test databases or mock external services to control the test environment. * **Don't Do This:** Use production databases or rely on live external services during integration testing. This can lead to data corruption, performance issues, and unreliable test results. **Why:** Controlled test environments ensure that integration tests are predictable and repeatable. Using production resources introduces risks and dependencies that complicate testing. ### 3.1 Code Examples (Go) Assumes the following is being tested. """go package main import ( "database/sql" "fmt" _ "github.com/lib/pq" // PostgreSQL driver "log" ) type User struct { ID int Name string Email string } func GetUserByID(db *sql.DB, id int) (*User, error) { query := "SELECT id, name, email FROM users WHERE id = $1" row := db.QueryRow(query, id) user := &User{} err := row.Scan(&user.ID, &user.Name, &user.Email) if err != nil { return nil, err } return user, nil } """ Here is an example integration test. """go package main import ( "database/sql" "fmt" "log" "os" "testing" _ "github.com/lib/pq" // PostgreSQL driver ) var testDB *sql.DB func setupTestDB() (*sql.DB, error) { connStr := os.Getenv("TEST_DATABASE_URL") if connStr == "" { connStr = "postgres://user:password@localhost:5432/testdb?sslmode=disable" // Default for local testing log.Println("Using default test database URL. Set TEST_DATABASE_URL for explicit configuration.") } db, err := sql.Open("postgres", connStr) if err != nil { return nil, fmt.Errorf("failed to open database: %w", err) } err = db.Ping() if err != nil { return nil, fmt.Errorf("failed to connect to database: %w", err) } // Initialize the database schema (create tables, etc.) _, err = db.Exec(" CREATE TABLE IF NOT EXISTS users ( id SERIAL PRIMARY KEY, name TEXT NOT NULL, email TEXT NOT NULL ); INSERT INTO users (name, email) VALUES ('Test User', 'test@example.com'); ") if err != nil { return nil, fmt.Errorf("failed to initialize database schema: %w", err) } return db, nil } func cleanupTestDB(db *sql.DB) error { _, err := db.Exec("DROP TABLE IF EXISTS users;") if err != nil { return fmt.Errorf("failed to drop table: %w", err) } return nil } func TestGetUserByID(t *testing.T) { if testDB == nil { t.Skip("Test database not initialized. Set TEST_DATABASE_URL.") // Skip tests if the global testDB isn't initialized. } user, err := GetUserByID(testDB, 1) if err != nil { t.Fatalf("Error getting user: %v", err) } if user == nil { t.Fatalf("User not found") } if user.Name != "Test User" { t.Errorf("Expected user name 'Test User', got '%s'", user.Name) } if user.Email != "test@example.com" { t.Errorf("Expected user email 'test@example.com', got '%s'", user.Email) } } func TestMain(m *testing.M) { var err error testDB, err = setupTestDB() if err != nil { log.Fatalf("Failed to set up test database: %v", err) } code := m.Run() if testDB != nil { if err := cleanupTestDB(testDB); err != nil { log.Printf("Failed to clean up test database: %v", err) } testDB.Close() // Close the DB connection. } os.Exit(code) } """ Key improvements: * Uses environment variable "TEST_DATABASE_URL" so the integration tests will connect to the correct database both during local development and within CI/CD after deployment to Fly.io (using "fly secrets set"). * Initializes and cleans up the test database, important because the tests create a temporary table. * Uses "TestMain" allows for setup and teardown of costly resources like the test DB. * Skips the tests if the DB isn't initalized (via "TEST_DATABASE_URL"), so "go test" will work even without a specified test DB. * Closes test DB connection in "TestMain". ### 3.2 Fly.io Specific Integration Testing Examples Testing interactions between different Fly.io services: """go //Imagine a service that depends of a Redis instance package main import ( "context" "fmt" "os" "testing" "github.com/go-redis/redis/v8" ) var rdb *redis.Client func setupRedis() (*redis.Client, error) { redisURL := os.Getenv("FLY_REDIS_CACHE_URL") //Get from Fly env if redisURL == "" { return nil, fmt.Errorf("FLY_REDIS_CACHE_URL not set") } opt, err := redis.ParseURL(redisURL) if err != nil { return nil, fmt.Errorf("failed to parse Redis URL: %w", err) } rdb := redis.NewClient(opt) _, err = rdb.Ping(context.Background()).Result() if err != nil { return nil, fmt.Errorf("failed to connect to Redis: %w", err) } return rdb, nil } func TestRedisConnection(t *testing.T) { if rdb == nil { t.Skip("Redis not initialized. Make sure FLY_REDIS_CACHE_URL is set.") } ctx := context.Background() err := rdb.Set(ctx, "testkey", "testvalue", 0).Err() if err != nil { t.Fatalf("Failed to set value in Redis: %v", err) } val, err := rdb.Get(ctx, "testkey").Result() if err != nil { t.Fatalf("Failed to get value from Redis: %v", err) } if val != "testvalue" { t.Errorf("Expected 'testvalue', got '%s'", val) } // Cleanup rdb.Del(ctx, "testkey") } func TestMain(m *testing.M) { var err error rdb, err = setupRedis() if err != nil { fmt.Printf("Failed to set up Redis: %v\n", err) } code := m.Run() if rdb != nil { rdb.Close() } os.Exit(code) } """ **Why:** This example demonstrates how to test the integration between a Fly.io application and a Redis instance. Retrieves the Redis connection URL from the environment, connects to Redis, performs basic operations, and cleans up. This verifies that the application can correctly communicate with stateful services deployed on Fly.io. "FLY_REDIS_CACHE_URL" is a common (but not required) environment variable generated by the Fly.io Redis add-on. ## 4. End-to-End (E2E) Testing End-to-end testing verifies end-to-end system behavior. * **Do This:** Use E2E tests to validate critical user flows from start to finish. * **Don't Do This:** Test every possible scenario with E2E tests, as they are slow and expensive to maintain. Focus on the most important workflows. **Why:** E2E tests provide the highest level of confidence that the application is functioning correctly from the user's perspective. They simulate real user interactions and catch issues that may not be apparent in unit or integration tests. * **Do This:** Use tools like Cypress, Playwright, or Selenium to automate browser-based E2E tests. In CLI tools, use shell scripting or dedicated testing frameworks to drive flows. * **Don't Do This:** Manually run E2E tests, as this is time-consuming and prone to human error. **Why:** Automation is essential for efficient E2E testing. Automated tests can be run frequently as part of CI/CD pipelines. ### 4.1 Example (Playwright - Node.js) First, install Playwright: """bash npm install -D @playwright/test npx playwright install """ Then, create a test file e.g., "tests/example.spec.ts": """typescript import { test, expect } from '@playwright/test'; test('homepage has title and links to intro page', async ({ page }) => { await page.goto('https://your-fly-io-app.fly.dev/'); // Replace with your Fly.io app URL // Expect a title "to contain" a substring. await expect(page).toHaveTitle(/Your App Title/); // Replace with your app title // create a locator const getStarted = page.getByRole('link', { name: 'Get started' }); // Expect an attribute "to be strictly equal" to the expected value. await expect(getStarted).toHaveAttribute('href', '/intro'); // Click the get started link. await getStarted.click(); // Expects the URL to contain intro. await expect(page).toHaveURL(/.*intro/); }); """ Update the "playwright.config.ts" file: """typescript import { defineConfig, devices } from '@playwright/test'; const baseURL = process.env.BASE_URL || 'https://your-fly-io-app.fly.dev/'; export default defineConfig({ testDir: './tests', fullyParallel: true, reporter: 'html', use: { baseURL: baseURL, trace: 'on-first-retry', }, projects: [ { name: 'chromium', use: { ...devices['Desktop Chrome'] }, }, ], }); """ Run the tests: """bash npx playwright test """ **Notes:** 1. Set "process.env.BASE_URL" to your Fly.io app URL. This allows overriding in different environments (testing vs. production). 2. Use "fly secrets" commands to create and change the environment variables in production. 3. Replace "/Your App Title/" with your application's title. ### 4.2 Fly.io CI Integration Example - GitHub Actions This configuration is a ".github/workflows/playwright.yml" file. """yaml name: Playwright Tests on: push: branches: [ "main" ] pull_request: jobs: test: timeout-minutes: 60 runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - uses: actions/setup-node@v3 with: node-version: 18 - name: Install dependencies run: npm ci - name: Install Playwright Browsers run: npx playwright install --with-deps - name: Run Playwright tests run: npx playwright test - name: Generate HTML report run: npx playwright show-report if: always() """ **Why:** The Github Action installs all dependencies, Playwright browsers and executes Playwright tests. The action then generates an HTML report of the test results. ## 5. Performance Testing While not strictly a type of unit, integration, or e2e testing, performance testing is crucial for Fly.io environments. * **Do This:** Use load testing tools to simulate concurrent users accessing specific resources. Use "fly scale" to scale up the application based on the results of load tests. * **Don't Do This:** Neglect performance testing until after deployment, as this can lead to unexpected issues in production. **Why:** Performance testing helps identify bottlenecks and optimize code for scalability. * **Do This:** Monitor application performance using tools like Grafana and Prometheus, and Fly.io's own metrics dashboard. * **Don't Do This:** Rely solely on manual observation to assess application performance. **Why:** Continuous monitoring provides valuable insights into application behavior over time. ### 5.1 Example (k6) K6 is a popular open-source load testing tool for performance testing HTTP(S) services. Create a script ("script.js") with a sample request: """javascript import http from 'k6/http'; import { sleep } from 'k6'; export const options = { vus: 10, // Virtual Users duration: '10s', }; export default function () { http.get('https://your-fly-io-app.fly.dev/'); // Replace with your Fly.io app URL sleep(1); } """ Configure your "fly.toml" to enable metrics: """toml [metrics] path = "/metrics" """ Run k6 test: """bash k6 run script.js """ **Why:** This runs a 10-second load test with 10 virtual users against your specified Fly.io application URL. ## 6. Security Testing * **Do This:** Use static analysis tools to identify potential security vulnerabilities in the code. * **Don't Do This:** Rely solely on manual code reviews to catch security issues. **Why:** Automated tools can quickly scan large codebases for common security patterns that may be missed by human reviewers. * **Do This:** Use vulnerability scanning tools to identify security issues in dependencies. Regularly update dependencies to patch known vulnerabilities. * **Don't Do This:** Use outdated dependencies without assessing the security risks. **Why:** Dependencies often contain security vulnerabilities that can be exploited by attackers. Keeping dependencies up-to-date is essential for maintaining a secure application. * **Do This:** Implement security tests that cover various aspects of your Fly.io app (authentication, authorization, input validation, etc.). * **Don't Do This:** Deploy an application without running any security tests. **Why:** Without adequate security tests, your application is at higher risk. ### 6.1 Common Anti-Patterns * **Inadequate test coverage:** Failing to write tests for critical parts of the application. This leaves potential vulnerabilities and bugs undiscovered. * **Ignoring test failures:** Ignoring failing tests and continuing to develop new features. This leads to a build-up of technical debt and makes it harder to maintain the application. Failing tests should be addressed immediately. * **Writing flaky tests:** Creating tests that sometimes pass and sometimes fail without any code changes. This undermines trust in the testing process and makes it difficult to identify real issues. These should be investigated to remove non-determinism or rewritten. * **Over-reliance on manual testing:** Depending solely on manual testing, which can lead to missed bugs and security vulnerabilities. * **Not testing Fly.io platform interactions:** Neglecting to test how the application interacts with Fly.io platform features, such as restarts, scaling, and health checks. By following these guidelines, developers can create high-quality, reliable, and performant Fly.io applications.

DA

danielsoglCreated Mar 6, 2025

Cline

Overview

Key Concepts

Purpose of .clinerules

File Location

Rule Structure

1. Project Overview

2. Code Standards

3. Security Rules

Best Practices

Writing Effective Rules

Common Patterns

Integration with Development Workflow

Using with Version Control

Troubleshooting

Common Issues

Examples

Basic Project Setup

Advanced Configuration

Related Rules

Security Best Practices Standards for Fly.io

Component Design Standards for Fly.io

State Management Standards for Fly.io

Performance Optimization Standards for Fly.io

Testing Methodologies Standards for Fly.io