# Security Best Practices Standards for Fly.io
This document outlines security best practices for developing and deploying applications on the Fly.io platform. Adhering to these standards will help protect your applications and data from common vulnerabilities and ensure a secure and reliable deployment.
## 1. Secure Configuration and Secrets Management
### 1.1. Secure Secrets Storage
**Standard:** Never hardcode secrets directly in your application code, Dockerfiles, or configuration files. Use Fly.io's built-in secrets management.
**Why:** Hardcoding secrets exposes them to anyone with access to your codebase or container images. Fly.io secrets are encrypted at rest and in transit, minimizing the risk of exposure.
**Do This:**
* Use "flyctl secrets" to manage secrets.
"""bash
flyctl secrets set DATABASE_URL="postgres://user:password@host:port/database"
flyctl secrets set API_KEY="your_super_secret_api_key"
"""
* Access secrets in your application code through environment variables.
"""python
# Python example
import os
database_url = os.environ.get("DATABASE_URL")
api_key = os.environ.get("API_KEY")
if not database_url or not api_key:
raise ValueError("Required secrets are not set.")
# Use database_url and api_key to connect to your database and make API calls
"""
**Don't Do This:**
* Hardcode secrets in your code:
"""python
# Python example - BAD PRACTICE
database_url = "postgres://user:password@host:port/database"
api_key = "your_super_secret_api_key"
"""
* Store secrets in version control.
* Expose secrets in logs.
**Anti-Pattern:** Using ".env" files in production. While convenient for local development, they are not secure for production deployments and can easily be accidentally committed to source control or exposed.
### 1.2. Environment-Specific Configuration
**Standard:** Separate configuration for development, staging, and production environments.
**Why:** Using the same configuration across environments can lead to misconfiguration and security vulnerabilities. For example, using production API keys in a development environment could expose sensitive data.
**Do This:**
* Utilize Fly.io's built-in support for environment variables to specify configurations.
* Use separate Fly.io apps for each environment (e.g., "myapp-dev", "myapp-staging", "myapp-prod").
* Create and manage environment-specific secrets using "flyctl secrets".
"""bash
# Set secrets for the production app
flyctl secrets set --app myapp-prod DATABASE_URL="..." API_KEY="..."
# Set secrets for the staging app
flyctl secrets set --app myapp-staging DATABASE_URL="..." API_KEY="..."
"""
**Don't Do This:**
* Use the same secrets across all environments.
* Rely on manual configuration changes between environments.
**Code Example:**
"""toml
# fly.toml - Example configuration for defining specific build arguments and env vars
[build]
builder = "dockerfile"
# Pass in build-time variables that depend on target environment.
# For example, NODE_ENV = "production" when building for production.
build-target = "release" #example
[env]
PORT = "8080"
[deploy]
release_command = "/app/migrate_db"
"""
### 1.3. Principle of Least Privilege
**Standard:** Grant the minimum necessary privileges to users, applications, and services.
**Why:** Limiting access reduces the potential impact of security breaches. If a compromised account or service has limited privileges, the attacker's ability to cause damage is significantly reduced.
**Do This:**
* Use Fly.io's RBAC (Role-Based Access Control) features documented here: (Fly.io currently offers limited RBAC).
* Ensure applications running within VMs only have the permissions they need, using "USER" directives in Dockerfiles.
* Configure firewall rules to restrict network access to only necessary ports and services.
**Don't Do This:**
* Run applications as root unless absolutely necessary.
* Grant broad permissions to services or users without a specific justification.
**Code Example (Dockerfile):**
"""dockerfile
FROM ubuntu:latest
# Update and install necessary packages
RUN apt-get update && apt-get install -y --no-install-recommends \
python3 python3-pip
# Create a non-root user
RUN useradd -m -s /bin/bash appuser
# Set the working directory
WORKDIR /app
# Copy application files
COPY . .
# Install Python dependencies
RUN pip3 install -r requirements.txt --user
# Change ownership of the application directory to the non-root user
RUN chown -R appuser:appuser /app
# Switch to the non-root user
USER appuser
# Command to run the application
CMD ["python3", "app.py"]
"""
### 1.4. Regular Security Audits and Updates
**Standard:** Regularly review your application code, dependencies, and infrastructure for security vulnerabilities. Keep your software up-to-date with the latest security patches.
**Why:** New vulnerabilities are discovered regularly. Staying up-to-date with security patches helps prevent exploits. Regular audits can identify potential vulnerabilities early.
**Do This:**
* Use automated vulnerability scanning tools (e.g., Snyk, Trivy) to scan your dependencies and container images.
* Subscribe to security mailing lists and advisories for the technologies you use (e.g., Python, Node.js, PostgreSQL).
* Regularly update your base images in your Dockerfiles.
* Implement a process for reviewing and addressing security vulnerabilities promptly.
**Don't Do This:**
* Ignore security alerts or vulnerabilities.
* Use outdated versions of software without security patches.
**Code Example (using Snyk in a CI/CD pipeline):**
"""yaml
# .github/workflows/security.yml - Example GitHub Actions workflow for running Snyk tests.
name: Security Scan
on:
push:
branches: [ main ] # or whatever your main branch is
pull_request:
branches: [ main ]
jobs:
snyk:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Snyk to check for vulnerabilities
uses: snyk/actions/python@master # Or javascript etc, adjust as needed
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
with:
args: --file=requirements.txt --severity-threshold=high
"""
## 2. Securing Network Communications
### 2.1. HTTPS for All Traffic
**Standard:** Use HTTPS for all communication between clients and your Fly.io application.
**Why:** HTTPS encrypts data in transit, preventing eavesdropping and man-in-the-middle attacks.
**Do This:**
* Allow fly.io to automatically provision TLS certificates for your application. Fly.io automatically provides free TLS certificates through Let's Encrypt.
"""bash
flyctl certs show your-app-name.fly.dev
"""
* Ensure your application is configured to redirect HTTP traffic to HTTPS.
**Don't Do This:**
* Use plain HTTP for sensitive data.
* Disable TLS encryption.
**Code Example (configuring redirection in a web server):**
"""nginx
# nginx configuration to redirect HTTP to HTTPS
server {
listen 80;
server_name your-app-name.fly.dev;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl;
server_name your-app-name.fly.dev;
# SSL certificate configuration
ssl_certificate /etc/letsencrypt/live/your-app-name.fly.dev/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/your-app-name.fly.dev/privkey.pem;
# ... other configurations ...
}
"""
### 2.2. Firewall Configuration
**Standard:** Configure firewall rules (e.g., using iptables or UFW) to limit network access to only necessary ports and services.
**Why:** Firewalls prevent unauthorized access to your application and reduce the attack surface.
**Do This:**
* Use Fly.io's private networking to isolate apps.
* Use a tool like "ufw" to manage firewall rules inside of your VM.
**Don't Do This:**
* Leave unnecessary ports open to the public internet.
* Disable the firewall.
**Code Example (using "ufw" to allow only SSH and HTTP/HTTPS traffic):**
"""bash
# Allow SSH access
ufw allow OpenSSH
# Allow HTTP traffic
ufw allow 80
# Allow HTTPS traffic
ufw allow 443
# Enable the firewall
ufw enable
# Check the firewall status
ufw status
"""
### 2.3. Mutual TLS (mTLS)
**Standard:** Use mTLS for secure communication between services within your Fly.io private network.
**Why:** mTLS provides strong authentication and encryption by requiring both the client and server to present valid certificates.
**Do This:**
* Generate client and server certificates using a tool like OpenSSL.
* Configure your services to require client certificates during TLS handshakes.
* Distribute client certificates securely.
**Don't Do This:**
* Use self-signed certificates in production without proper validation.
* Store private keys in insecure locations.
### 2.4. Monitoring and Logging
**Standard:** Implement comprehensive logging and monitoring to detect and respond to security incidents.
**Why:** Logging and monitoring provide visibility into your application's behavior, allowing you to identify suspicious activity and security vulnerabilities.
**Do This:**
* Use a centralized logging system to collect logs from all your Fly.io applications and services (e.g., Grafana Loki).
* Monitor key security metrics, such as authentication failures, API request rates, and error rates.
**Don't Do This:**
* Disable logging.
* Store sensitive data in logs without proper redaction.
* Ignore suspicious activity detected by monitoring systems.
## 3. Application Security
### 3.1. Input Validation and Output Encoding
**Standard:** Validate all input data from clients and other services. Encode output data to prevent cross-site scripting (XSS) and other injection attacks.
**Why:** Input validation prevents attackers from injecting malicious code or data into your application. Output encoding prevents injected code from being executed in the client's browser.
**Do This:**
* Use server-side validation to verify the format, type, and length of all input data.
* Use a templating engine with automatic output encoding (e.g., Jinja2 for Python, Handlebars for JavaScript).
**Don't Do This:**
* Trust client-side validation alone.
* Display raw user input without encoding.
**Code Example (Python using Flask and Jinja2):**
"""python
# Flask example with Jinja2 templating engine
from flask import Flask, request, render_template
import bleach
app = Flask(__name__)
@app.route('/', methods=['GET', 'POST'])
def index():
if request.method == 'POST':
# Validate the input
name = request.form.get('name')
if not name or len(name) > 100:
return render_template('index.html', error='Invalid name')
# Sanitize HTML input using bleach
message = bleach.clean(request.form.get('message'))
# Render the template with the sanitized message
return render_template('index.html', name=name, message=message)
return render_template('index.html')
#index.html Jinja2 template
{% if error %}
<p>{{ error }}</p>
{% endif %}
Name:<br>
<br><br>
Message:<br>
<br><br>
{% if name and message %}
Hello, {{ name }}!
<p>Your message: {{ message }}</p>
{% endif %}
"""
### 3.2. Cross-Site Request Forgery (CSRF) Protection
**Standard:** Implement CSRF protection to prevent attackers from forging requests on behalf of authenticated users.
**Why:** CSRF attacks can allow attackers to perform unauthorized actions on behalf of logged-in users.
**Do This:**
* Use a CSRF token that is unique to each user session.
* Include the CSRF token in all forms and AJAX requests.
* Validate the CSRF token on the server before processing the request.
**Don't Do This:**
* Disable CSRF protection.
* Use the same CSRF token for all users.
**Code Example (Python using Flask and WTForms):**
"""python
# Python using Flask and WTForms
from flask import Flask, render_template, session, redirect, url_for
from flask_wtf import FlaskForm, CSRFProtect
from wtforms import StringField, SubmitField
from wtforms.validators import DataRequired
app = Flask(__name__)
app.config['SECRET_KEY'] = 'your_secret_key' # Change this to a strong random key
csrf = CSRFProtect(app)
class MyForm(FlaskForm):
name = StringField('Name', validators=[DataRequired()])
submit = SubmitField('Submit')
@app.route('/', methods=['GET', 'POST'])
def index():
form = MyForm()
if form.validate_on_submit():
session['name'] = form.name.data
return redirect(url_for('success'))
return render_template('index.html', form=form)
@app.route('/success')
def success():
if 'name' in session:
name = session['name']
return render_template('success.html', name=name)
else:
return redirect(url_for('index'))
if __name__ == '__main__':
app.run(debug=True)
"""
### 3.3. Authentication and Authorization
**Standard:** Implement strong authentication and authorization mechanisms to control access to your application.
**Why:** Authentication verifies the identity of users, while authorization determines what resources they are allowed to access.
**Do This:**
* Use strong password policies (e.g., minimum length, complexity requirements).
* Implement multi-factor authentication (MFA) for privileged accounts.
* Use a role-based access control (RBAC) system to manage user permissions.
* Store passwords securely using a strong hashing algorithm (e.g., bcrypt, Argon2).
**Don't Do This:**
* Store passwords in plain text.
* Use weak or default passwords.
* Grant excessive permissions to users.
### 3.4. Dependency Management
**Standard:** Keep your application's dependencies up-to-date and use tools to detect and prevent vulnerable dependencies.
**Why:** Vulnerabilities in dependencies can be exploited to compromise your application.
**Do This:**
* Use a dependency management tool (e.g., pip for Python, npm for Node.js) to manage your application's dependencies.
* Regularly update your dependencies to the latest versions.
* Use automated vulnerability scanning tools (e.g., Snyk, OWASP Dependency-Check).
**Don't Do This:**
* Use outdated dependencies without security patches.
* Ignore security alerts from dependency scanning tools.
### 3.5. Error Handling and Logging
**Standard:** Handle errors gracefully and log sufficient information to diagnose problems.
**Why:** Proper error handling prevents sensitive information from being exposed to users. Logging provides valuable information for debugging and security incident response.
**Do This:**
* Implement a global error handler to catch unexpected exceptions.
* Log errors with sufficient detail to identify the root cause.
* Redact sensitive information (e.g., passwords, API keys) from logs.
* Use structured logging to make logs easier to query and analyze.
**Don't Do This:**
* Expose stack traces or other sensitive information to users in error messages.
* Log sensitive data in plain text.
* Ignore errors or warnings.
## 4. Dockerfile and Image Security
### 4.1. Minimal Base Images
**Standard:** Use minimal base images for your Docker containers to reduce the attack surface.
**Why:** Smaller images contain fewer dependencies, reducing the number of potential vulnerabilities.
**Do This:**
* Use lightweight base images like Alpine Linux or distroless images.
**Don't Do This:**
* Use full-featured base images like Ubuntu or Debian unless necessary.
**Code Example (using Alpine Linux as a base image):**
"""dockerfile
FROM python:3.9-alpine
# Install dependencies
# Copy application files
# Set the working directory
# Command to run the application
"""
### 4.2. Multi-Stage Builds
**Standard:** Use multi-stage builds to separate build-time dependencies from runtime dependencies.
**Why:** Multi-stage builds allow you to include build tools and dependencies in a temporary build environment, and then copy only the necessary artifacts to the final image.
**Do This:**
* Use separate "FROM" instructions for the build and runtime stages.
* Copy only the necessary files and dependencies from the build stage to the runtime stage.
**Don't Do This:**
* Include unnecessary build tools or dependencies in the final image.
**Code Example (using multi-stage build):**
"""dockerfile
# Build Stage
FROM golang:1.21 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . ./
RUN go build -o /app/mybinary
# Production Stage
FROM alpine:latest
WORKDIR /app
COPY --from=builder /app/mybinary /app/mybinary
CMD ["/app/mybinary"]
"""
### 4.3. Image Scanning
**Standard:** Scan your Docker images for vulnerabilities before deploying them to Fly.io.
**Why:** Image scanning identifies potential vulnerabilities in your container images before they can be exploited.
**Do This:**
* Use a container image scanning tool (e.g., Trivy, Clair, Anchore).
* Integrate image scanning into your CI/CD pipeline.
* Address vulnerabilities identified by the scanner before deploying the image.
This comprehensively describes Security Best Practices on Fly.io. Adherence will increase security for development teams and should be enforced in CI/CD.
danielsogl
Created Mar 6, 2025
This guide explains how to effectively use .clinerules
with Cline, the AI-powered coding assistant.
The .clinerules
file is a powerful configuration file that helps Cline understand your project's requirements, coding standards, and constraints. When placed in your project's root directory, it automatically guides Cline's behavior and ensures consistency across your codebase.
Place the .clinerules
file in your project's root directory. Cline automatically detects and follows these rules for all files within the project.
# Project Overview project: name: 'Your Project Name' description: 'Brief project description' stack: - technology: 'Framework/Language' version: 'X.Y.Z' - technology: 'Database' version: 'X.Y.Z'
# Code Standards standards: style: - 'Use consistent indentation (2 spaces)' - 'Follow language-specific naming conventions' documentation: - 'Include JSDoc comments for all functions' - 'Maintain up-to-date README files' testing: - 'Write unit tests for all new features' - 'Maintain minimum 80% code coverage'
# Security Guidelines security: authentication: - 'Implement proper token validation' - 'Use environment variables for secrets' dataProtection: - 'Sanitize all user inputs' - 'Implement proper error handling'
Be Specific
Maintain Organization
Regular Updates
# Common Patterns Example patterns: components: - pattern: 'Use functional components by default' - pattern: 'Implement error boundaries for component trees' stateManagement: - pattern: 'Use React Query for server state' - pattern: 'Implement proper loading states'
Commit the Rules
.clinerules
in version controlTeam Collaboration
Rules Not Being Applied
Conflicting Rules
Performance Considerations
# Basic .clinerules Example project: name: 'Web Application' type: 'Next.js Frontend' standards: - 'Use TypeScript for all new code' - 'Follow React best practices' - 'Implement proper error handling' testing: unit: - 'Jest for unit tests' - 'React Testing Library for components' e2e: - 'Cypress for end-to-end testing' documentation: required: - 'README.md in each major directory' - 'JSDoc comments for public APIs' - 'Changelog updates for all changes'
# Advanced .clinerules Example project: name: 'Enterprise Application' compliance: - 'GDPR requirements' - 'WCAG 2.1 AA accessibility' architecture: patterns: - 'Clean Architecture principles' - 'Domain-Driven Design concepts' security: requirements: - 'OAuth 2.0 authentication' - 'Rate limiting on all APIs' - 'Input validation with Zod'
# Component Design Standards for Fly.io This document outlines the component design standards for applications deployed on Fly.io. Adhering to these guidelines will promote maintainability, reusability, performance, and security in your Fly.io applications. ## 1. Introduction to Component Design in Fly.io Component design in Fly.io focuses on creating modular, independent, and reusable parts of an application that are easy to develop, test, and maintain. Given Fly.io's geographically distributed nature, well-designed components also contribute to improved latency and resilience. In this context, "component" is a logical grouping of functionalities, often corresponding to modules, classes, or services. * **Goal:** Build robust, scalable, and maintainable applications on Fly.io. * **Focus:** Modularity, reusability, performance, and security. ## 2. Architectural Considerations ### 2.1 Microservices vs. Monolith with Modules Fly.io supports both microservice and monolithic architectures (with a modular design). The choice depends on the application's complexity and scalability needs. * **Microservices:** Independent, deployable services communicating over the network. Suited for complex applications requiring independent scaling and fault isolation. * **Monolith with Modules:** A single application with clear module boundaries internally. Suitable for smaller applications or when operational overhead of microservices is a concern. **Do This:** * For large applications, decompose into loosely coupled microservices, each handling a specific domain. * For smaller projects, leverage a modular approach within a monolithic application. **Don't Do This:** * Create tightly coupled microservices that lead to a distributed monolith. * Build a monolithic application with no modularity, resulting in unmaintainable code. **Why:** Microservices offer better scalability and fault isolation, while modular monoliths simplify development and deployment for smaller applications. Proper modularity reduces dependencies which helps isolate deployment errors and simplifies development. **Example (Microservice):** """dockerfile # Dockerfile for a user service FROM python:3.11-slim-bookworm WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "user_service.py"] """ **Example (Monolith with Modules):** """python # app.py from user_module import User from product_module import Product # Use the modules user = User(name="John Doe") product = Product(name="Awesome Product") print(f"User: {user.name}, Product: {product.name}") """ ### 2.2 Location Awareness on Fly.io Fly.io's ability to run applications close to users means components should be designed with location awareness in mind. * **Data locality:** Store and process data in the region closest to the users. * **Regional deployments:** Deploy specific components to particular Fly.io regions. **Do This:** * Use Fly.io's region routing features to direct traffic to the nearest instance of a component. * Implement caching strategies to minimize cross-region data access. **Don't Do This:** * Assume all users are geographically close to a single server. * Ignore latency implications of cross-region data access. **Why:** Minimizing latency improves the user experience and reduces bandwidth costs. **Example (Fly.io Region Routing with "fly.toml"):** """toml app = "my-app" primary_region = "iad" # Initial region [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 [[http_service.route]] service = "my-app-eu" # Example: Send requests from Europe to europe VMs path = "/api/europe" [deploy] regions = ["iad", "fra", "syd"] # Regions used for deployment """ ### 2.3 Fault Tolerance & Resilience Fly.io's distributed nature requires components to be fault-tolerant. * **Replication:** Run multiple instances of each component across different regions. * **Circuit Breakers:** Implement circuit breaker pattern to prevent cascading failures. * **Health checks:** Use Fly.io's health checks to monitor component availability and automatically restart failed instances. **Do This:** * Configure health checks for all critical components in your "fly.toml". * Use retry mechanisms with exponential backoff for communication between components. * Implement circuit breakers to isolate failing components. **Don't Do This:** * Rely on a single instance of a component without redundancy. * Allow one failing component to bring down the entire application. **Why:** Redundancy and fault isolation ensures higher availability and a better user experience. **Example (Fly.io Health Check in "fly.toml"):** """toml app = "my-app" primary_region = "iad" [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 [http_service.checks] path = "/healthz" # endpoint of your healthcheck interval = "10s" timeout = "5s" """ ## 3. Coding Standards for Components ### 3.1 Single Responsibility Principle (SRP) Each component should have one, and only one, reason to change. **Do This:** * Design classes and modules with a clear, focused purpose. * Refactor large components into smaller, more manageable units. **Don't Do This:** * Create "god classes" or modules that handle multiple unrelated tasks. **Why:** Makes components easier to understand, test, and maintain. **Example (Python SRP):** """python # Good: Separate classes for User and Email class User: def __init__(self, name, email): self.name = name self.email = email class EmailService: def send_welcome_email(self, user): print(f"Sending welcome email to {user.email}") # Bad: User class handles both user data and email sending class UserWithEmail: def __init__(self, name, email): self.name = name self.email = email def send_welcome_email(self): #Violates SRP: User shouldn't handle email print(f"Sending welcome email to {self.email}") user = User("John Doe", "john@example.com") email_service = EmailService() email_service.send_welcome_email(user) """ ### 3.2 Open/Closed Principle (OCP) Components should be open for extension but closed for modification. **Do This:** * Use inheritance or composition to add new functionality without modifying existing code. * Favor interfaces and abstract classes to decouple components. **Don't Do This:** * Directly modify existing code to add new features, risking regressions. **Why:** Reduces the risk of introducing bugs when adding new features. **Example (Python OCP):** """python # Good: Using Strategy Pattern from abc import ABC, abstractmethod class PaymentStrategy(ABC): @abstractmethod def pay(self, amount): pass class CreditCardPayment(PaymentStrategy): def pay(self, amount): print(f"Paying {amount} with credit card") class PayPalPayment(PaymentStrategy): def pay(self, amount): print(f"Paying {amount} with PayPal") class ShoppingCart: def __init__(self, payment_strategy: PaymentStrategy): self.payment_strategy = payment_strategy def checkout(self, amount): self.payment_strategy.pay(amount) # Bad: Modifying the ShoppingCart class directly class ShoppingCartBad: def checkout(self, amount, payment_method): if payment_method == "credit_card": print(f"Paying {amount} with credit card") elif payment_method == "paypal": print(f"Paying {amount} with PayPal") else: print("Invalid payment method") cart = ShoppingCart(CreditCardPayment()) cart.checkout(100) """ ### 3.3 Liskov Substitution Principle (LSP) Subtypes must be substitutable for their base types without altering the correctness of the program. **Do This:** * Ensure that subclasses correctly implement the behavior of their base classes. * Avoid introducing unexpected side effects in subclasses. **Don't Do This:** * Create subclasses that violate the contract of their base classes. **Why:** Prevents unexpected behavior and ensures that polymorphism works correctly. **Example (violating Liskov Substitution ):** """python class Rectangle: def __init__(self, width, height): self.width = width self.height = height def set_width(self, width): self.width = width def set_height(self, height): self.height = height def area(self): return self.width * self.height class Square(Rectangle): #violates LSP as Square's invariant is width == height def __init__(self, size): super().__init__(size, size) def set_width(self, width): self.width = width self.height = width def set_height(self, height): self.width = height self.height = height def print_area(rectangle: Rectangle): rectangle.set_width(5) rectangle.set_height(4) print(rectangle.area()) rectangle = Rectangle(2, 3) print_area(rectangle) # Output: 20 square = Square(2) print_area(square) # Output: 16 (incorrect if we expect a standard rectangle behavior) """ In this example, the "Square" class violates LSP because setting the width or height also sets the other dimension, which is not the behavior expected of a generic "Rectangle". ### 3.4 Interface Segregation Principle (ISP) Clients should not be forced to depend upon interfaces that they do not use. **Do This:** * Create small, specific interfaces instead of large, general-purpose ones. * Refactor interfaces to separate unrelated methods. **Don't Do This:** * Force classes to implement methods they don't need. **Why:** Reduces dependencies and improves code flexibility. **Example (Python ISP):** """python # Good: Separate interfaces for different functionalities from abc import ABC, abstractmethod class Printer(ABC): @abstractmethod def print_document(self, document): pass class Scanner(ABC): @abstractmethod def scan_document(self, document): pass class Copier(ABC): @abstractmethod def copy_document(self, document): pass # Bad: One large interface with all functionalities mixed class MultiFunctionDevice(ABC): @abstractmethod def print_document(self, document): pass @abstractmethod def scan_document(self, document): pass @abstractmethod def copy_document(self, document): pass class SimplePrinter(Printer): def print_document(self, document): print(f"Printing {document}") class AllInOnePrinter(Printer, Scanner, Copier): def print_document(self, document): print(f"Printing {document}") def scan_document(self, document): print(f"Scanning {document}") def copy_document(self, document): print(f"Copying {document}") """ A client needing only printing should not depend on the "Scanner" or "Copier" methods. ### 3.5 Dependency Inversion Principle (DIP) High-level modules should not depend on low-level modules. Both should depend on abstractions. Abstractions should not depend on details. Details should depend on abstractions. **Do This:** * Use dependency injection to provide dependencies to components. * Program to interfaces rather than concrete implementations. **Don't Do This:** * Hardcode dependencies within components. **Why:** Increases code flexibility and testability. **Example (Python DIP):** """python # Good: Using dependency injection class Switchable: def turn_on(self): raise NotImplementedError def turn_off(self): raise NotImplementedError class LightBulb(Switchable): def turn_on(self): print("LightBulb: turned on...") def turn_off(self): print("LightBulb: turned off...") class ElectricPowerSwitch: def __init__(self, client: Switchable): self.client = client self.on = False def press(self): if self.on: self.client.turn_off() self.on = False else: self.client.turn_on() self.on = True # Bad: Hardcoded dependency class SwitchBad: def __init__(self): self.bulb = LightBulb() #Concrete dependency = Bad self.on = False def press(self): if self.on: self.bulb.turn_off() self.on = False else: self.bulb.turn_off() self.on = True bulb = LightBulb() switch = ElectricPowerSwitch(bulb) #Dependency Injection switch.press() switch.press() """ ## 4. Fly.io Specific Considerations ### 4.1 Using Fly.io Volumes Components that require persistent storage should leverage Fly.io Volumes. **Do This:** * Mount volumes to specific directories in your Fly.io instances. * Use volumes to store data that needs to persist across deployments. **Don't Do This:** * Store persistent data within the container's filesystem, risking data loss on restarts. **Why:** Volumes provide reliable and persistent storage for your applications. **Example (Fly.io Volume Configuration in "fly.toml"):** """toml app = "my-data-app" primary_region = "ord" [build] [deploy] release_command = "/app/migrate_db.sh" [[mounts]] source = "data_volume" # Existing volume name destination = "/data" # Where the volume is mounted [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 """ ### 4.2 Fly.io Secrets Management Securely manage sensitive information using Fly.io Secrets. **Do This:** * Store API keys, database passwords, and other sensitive data as Fly.io Secrets. * Access secrets in your code using environment variables. **Don't Do This:** * Hardcode secrets in your code or configuration files. * Commit secrets to your version control system. **Why:** Protects sensitive data and prevents unauthorized access. **Example (Accessing Fly.io Secret in Python):** """python import os database_password = os.environ.get("DATABASE_PASSWORD") # Use the password to connect to the database print(f"Connecting to database with password: {database_password}") """ ### 4.3 Fly.io Edge Network and Global Distribution Leverage Fly.io's edge network for improved performance. **Do This:** * Configure your services to take full advantage of the Fly.io global network. * Utilize region pinning when needing to ensure consistency as a trade-off. **Don't Do This:** * Ignore latency implications of not using Fly.io's global network effectively. **Why:** Reduced latency provides a better user experience ## 5. Component Communication ### 5.1 REST APIs Use REST APIs for synchronous communication between components. **Do This:** * Design REST APIs using standard HTTP methods and status codes. * Use a consistent API versioning strategy. * Implement proper authentication and authorization for API endpoints. **Don't Do This:** * Expose internal implementation details through the API. * Create overly complex or inconsistent APIs. **Why:** REST APIs are well-established and easy to understand, enabling interoperability ### 5.2 Message Queues (e.g. Redis, NATS) Use message queues for asynchronous communication between components. **Do This:** * Choose a message queue that fits your application's needs (e.g., Redis, RabbitMQ, NATS). * Design message formats that are easy to serialize and deserialize. * Implement error handling and retry mechanisms for message processing. **Don't Do This:** * Use message queues for synchronous operations that require immediate responses. * Create overly complex messaging topologies. **Why:** Message queues enable decoupling, asynchronous processing, and improved scalability. Fly.io makes it easy to deploy Redis and NATS in a colocated fashion. ### 5.3 gRPC Consider gRPC for high-performance communication between internal components. **Do This:** * Define gRPC services using Protocol Buffers. * Generate code for both client and server using gRPC tools. * Implement proper error handling and logging. **Don't Do This:** * Use gRPC for external APIs that need to be easily accessible to a wide range of clients. * Overcomplicate gRPC service definitions. **Why:** gRPC provides high performance, efficient serialization, and strong typing. It typically requires more sophistication than REST. ## 6. Testing ### 6.1 Unit Testing Write unit tests for all components to verify their functionality in isolation. **Do This:** * Use a testing framework appropriate for your language (e.g., pytest for Python, JUnit for Java). * Write tests that cover all possible code paths and edge cases. * Use mocks and stubs to isolate components from their dependencies. **Don't Do This:** * Skip unit testing or write tests that are too superficial. * Write tests that are tightly coupled to the implementation details of the tested components. **Why:** Unit tests ensure that components function correctly and prevent regressions. ### 6.2 Integration Testing Write integration tests to verify the interaction between different components. **Do This:** * Test the communication between components using real or simulated dependencies. * Verify that data is correctly passed between components and that the overall system behaves as expected. **Don't Do This:** * Skip integration testing or write tests that are too narrow in scope. * Rely solely on unit tests without verifying how components work together. **Why:** Integration tests ensure that components work together correctly. ### 6.3 End-to-End Testing Write end-to-end tests to verify the entire application flow from the user interface to the backend. **Do This:** * Use a testing framework that simulates user interactions (e.g., Selenium, Cypress). * Test the entire application flow from the user interface to the backend. * Verify that the application meets the user's requirements. **Don't Do This:** * Skip end-to-end testing or write tests that are too complex and brittle. * Rely solely on unit and integration tests without verifying the end-to-end user experience. **Why:** End-to-end tests ensure that the application meets the user's requirements and provides a good user experience. ## 7. Monitoring and Logging ### 7.1 Centralized Logging Use a centralized logging system to collect and analyze logs from all components. **Do This:** * Use a logging framework appropriate for your language (e.g., log4j for Java, logging for Python). * Configure components to log all important events, including errors, warnings, and informational messages. * Use a tool such as Grafana Loki or similar system for log aggregation. **Don't Do This:** * Skip logging or rely solely on local log files. * Log sensitive data such as passwords or API keys. **Why:** Centralized logging enables easier troubleshooting, performance monitoring, and security analysis. ### 7.2 Metrics Collection Collect metrics from all components to monitor their performance and resource usage. **Do This:** * Use a metrics library appropriate for your language (e.g., Prometheus client libraries). * Collect metrics such as CPU usage, memory usage, network traffic, and request latency. * Use a monitoring system such as Prometheus or Grafana to visualize and analyze metrics. **Don't Do This:** * Skip metrics collection or collect only a limited set of metrics. * Use metrics that are not meaningful or actionable. **Why:** Metrics provide valuable insights into the health and performance of your components. ### 7.3 Tracing Implement distributed tracing to track requests as they flow through different components. **Do This:** * Use a tracing library such as Jaeger or Zipkin. * Instrument code to generate spans for each request as it enters and exits a component. * Use a tracing backend to collect and visualize traces. **Don't Do This:** * Skip tracing or trace only a limited set of requests. * Create traces that are too granular or lack context. **Why:** Tracing enables you to identify performance bottlenecks and diagnose issues in distributed systems. Fly.io has solid support for well created tracing setups.
# Core Architecture Standards for Fly.io This document outlines the core architectural standards for developing applications on Fly.io. Adhering to these standards will result in more maintainable, performant, and secure applications. It focuses on principles and patterns particularly relevant to Fly.io's distributed, edge-based architecture. ## 1. Fundamental Architectural Patterns ### 1.1. Microservices Architecture **Standard:** Favor a microservices architecture for complex applications. * **Do This:** Decompose large monolithic applications into smaller, independent services with well-defined APIs. Each service should own its data. * **Don't Do This:** Create a single, monolithic codebase for large applications. **Why:** Microservices promote modularity, independent scaling, and faster development cycles. Each service can be deployed and scaled independently, which aligns perfectly with Fly.io's global distribution. **Fly.io Considerations:** * Use Fly.io Regions effectively. Deploy services to regions close to your users for low latency. * Utilize Fly.io's internal DNS for service discovery and communication. **Example:** """yaml # fly.toml for service A app = "service-a" primary_region = "iad" [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 # fly.toml for service B app = "service-b" primary_region = "lhr" [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 """ **Anti-pattern:** Tightly coupled microservices defeating independent deployment and scaling. ### 1.2. Event-Driven Architecture **Standard:** Employ event-driven architecture for asynchronous communication between services. * **Do This:** Use message queues (e.g., Kafka, RabbitMQ, Redis Streams) to decouple services and enable resilient communication. Apply the Saga pattern where necessary. * **Don't Do This:** Rely on synchronous HTTP calls for every inter-service communication. **Why:** Event-driven architectures enhance scalability and fault tolerance. Fly.io's globally distributed nature benefits from asynchronous communication, minimizing the impact of network latency and temporary outages. **Fly.io Considerations:** * Run message brokers as Fly.io apps, leveraging the global network for distribution. * Consider using Fly.io Volumes for persistent storage of message queues. **Example:** (Using Redis Streams) """python # Service A (producer) import redis import os redis_host = os.environ.get("REDIS_HOST", "redis") # Use FLY_APP_NAME or similar redis_port = int(os.environ.get("REDIS_PORT", 6379)) r = redis.Redis(host=redis_host, port=redis_port) stream_name = "user_events" def publish_event(user_id, event_type): r.xadd(stream_name, {"user_id": user_id, "event_type": event_type}) publish_event("123", "user_created") # Service B (consumer) import redis import os redis_host = os.environ.get("REDIS_HOST", "redis") # Use FLY_APP_NAME or similar redis_port = int(os.environ.get("REDIS_PORT", 6379)) r = redis.Redis(host=redis_host, port=redis_port) stream_name = "user_events" last_id = '$' # Start reading from the end for new messages while True: response = r.xread({stream_name: last_id}, block=1000) # Block for 1 second if response: stream, messages = response[0] for message_id, data in messages: print(f"Received event: {data}") last_id = message_id """ **Anti-pattern:** Implementing complex distributed transactions with synchronous calls across multiple services. ### 1.3. Serverless Functions **Standard:** Utilize serverless functions for event-driven tasks and processing tasks. * **Do This:** Employ serverless functions for asynchronous tasks, lightweight API endpoints, and event-driven triggers. * **Don't Do This:** Use serverless functions for long-running processes or stateful services. **Why:** Serverless functions scale automatically and only charge for actual usage, optimizing resource utilization. **Fly.io Considerations:** * While Fly.io doesn't purely offer serverless, consider using lightweight Fly Machines orchestrated via an external event source or using a framework designed for fast-scaling workloads on Fly.io. * Be mindful of cold starts in serverless environments, and optimize function execution time. **Example:** (Simulated serverless-style function with Fly Machines and Redis Queue) """python # Processing function (deployed as a Fly Machine) import redis import os import time redis_host = os.environ.get("REDIS_HOST", "redis") redis_port = int(os.environ.get("REDIS_PORT", 6379)) r = redis.Redis(host=redis_host, port=redis_port) queue_name = "processing_queue" def process_item(item): print(f"Processing item: {item}") time.sleep(2) # Simulate processing time print(f"Item processed: {item}") while True: item = r.blpop(queue_name, timeout=10) # Block until item is available if item: _, data = item item_data = data.decode('utf-8') process_item(item_data) # Enqueueing script (deployed as another Fly Machine or run externally) import redis import os redis_host = os.environ.get("REDIS_HOST", "redis") redis_port = int(os.environ.get("REDIS_PORT", 6379)) r = redis.Redis(host=redis_host, port=redis_port) queue_name = "processing_queue" for i in range(5): r.rpush(queue_name, f"Item-{i}") print(f"Enqueued Item-{i}") """ **Anti-pattern:** Using serverless functions for tasks that require significant persistent storage or are inherently stateful. ## 2. Project Structure and Organization ### 2.1. Monorepo vs. Polyrepo **Standard:** For most projects on Fly.io, especially those involving microservices, prefer a polyrepo structure unless there's a strong reason for a monorepo. * **Do This:** Keep each microservice in its own repository. * **Don't Do This:** Force all microservices into one giant monorepo without carefully considering dependencies and build pipelines. **Why:** Polyrepos offer better isolation between services, independent versioning, and clear ownership. This suits Fly.io's philosophy of independent deployments. **Fly.io Considerations:** * Each repository maps directly to a Fly.io app. * Use CI/CD pipelines to automate deployments from each repo to Fly.io. **Alternative:** If a monorepo is chosen (e.g. for shared libraries), proper tooling and processes are crucial. **Example:** * "repository: service-a" (maps to "app = "service-a"" in "fly.toml") * "repository: service-b" (maps to "app = "service-b"" in "fly.toml") **Anti-pattern:** Unnecessarily large monorepos creating complex build dependencies and slowing down deployments. ### 2.2. Standard Directory Structure **Standard:** Define a consistent directory structure within each service repository. * **Do This:** * "src/": Source code * "config/": Configuration files (including "fly.toml") * "tests/": Unit and integration tests * "deploy/": Deployment scripts and configurations * Versioning and Changelog: Keep consistent versioning across all services with frequent commits. * **Don't Do This:** Scatter files randomly throughout the repository without a clear organization. **Why:** A well-defined directory structure improves code discoverability and maintainability. **Example:** """ service-a/ ├── src/ │ ├── main.py │ ├── utils.py │ └── ... ├── config/ │ ├── fly.toml │ └── settings.py ├── tests/ │ ├── test_main.py │ └── ... ├── deploy/ │ └── Dockerfile └── README.md """ **Anti-pattern:** A flat or deeply nested directory structure that makes it difficult to locate specific files. ### 2.3. Configuration Management **Standard:** Externalize configuration using environment variables, and utilize Fly.io secrets for sensitive data. * **Do This:** Store configuration parameters in environment variables. Use "fly secrets" to manage sensitive information (API keys, database passwords). Utilize ".env" files for local development. * **Don't Do This:** Hardcode configuration values directly in your codebase, or commit sensitive data to your repository. **Why:** Externalized configuration enhances security and simplifies deployments across different environments. **Fly.io Considerations:** * Use "fly secrets" to set secrets that are securely injected into your Fly.io apps. * Use Fly Volumes for persistent storage if the configuration needs to be dynamically updated. **Example:** """bash # Setting a secret fly secrets set API_KEY="your_api_key" # Accessing the secret in your code (Python) import os api_key = os.environ.get("API_KEY") if api_key: print(f"API Key: {api_key}") else: print("API Key not found.") """ **Anti-pattern:** Storing passwords or API keys directly in the codebase or committing them to version control. ## 3. Deployment and CI/CD ### 3.1. Automated Deployments **Standard:** Implement automated CI/CD pipelines for deploying changes to Fly.io. * **Do This:** Use GitHub Actions, GitLab CI, or similar tools to trigger deployments on code changes. * **Don't Do This:** Manually deploy code changes to Fly.io (except maybe for initial setup/testing). **Why:** Automated deployments ensure consistency and reduce the risk of human error. **Fly.io Considerations:** * Use "flyctl deploy" CLI command in your CI/CD pipelines. * Leverage Fly.io's built-in health checks for zero-downtime deployments. **Example:** (GitHub Actions) """yaml # .github/workflows/deploy.yml name: Deploy to Fly.io on: push: branches: - main jobs: deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Deploy to Fly.io uses: fly-apps/flyctl-action@v1 with: fly_api_token: ${{ secrets.FLY_API_TOKEN }} """ **Anti-pattern:** Manual deployments that are error-prone and impossible to reproduce consistently. ### 3.2. Immutable Infrastructure **Standard:** Treat infrastructure as immutable. Deploy new versions of your application instead of modifying existing instances in place. * **Do This:** Use Docker containers and "flyctl deploy" to create new application instances. Utilize Fly Machines for fine-grained control. * **Don't Do This:** SSH into running instances and make manual changes. **Why:** Immutable infrastructure ensures consistency and simplifies rollback procedures. **Fly.io Considerations:** * Fly.io encourages immutable deployments using Docker images. * Rollbacks are easy and quick since previous instances are preserved. **Example:** """dockerfile # Dockerfile FROM python:3.9-slim-buster WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "main.py"] """ **Anti-pattern:** Modifying running instances directly, leading to configuration drift and inconsistencies. ### 3.3. Health Checks and Monitoring # **Standard:** Implement health checks and monitoring to detect and recover from failures. * **Do This:** Define health check endpoints in your applications. Use Fly.io's built-in health checks to automatically restart unhealthy instances. Monitor application metrics using Prometheus, Grafana, or similar tools. * **Don't Do This:** Rely solely on manual observation to identify and resolve issues. **Why:** Health checks and monitoring ensure that your application is running as expected and that problems are detected and addressed quickly. **Fly.io Considerations:** * Configure health checks in your "fly.toml" file. * Integrate with monitoring services to track application performance and resource utilization. **Example:** """toml # fly.toml [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 [[http_service.checks]] path = "/health" interval = "10s" timeout = "2s" grace_period = "5s" """ **Anti-pattern:** Lack of monitoring and manual intervention needed for even basic restarts. ## 4. Data Management and Persistence ### 4.1. Database Choice **Standard:** Choose the right database for your application's needs. * **Do This:** Consider PostgreSQL for relational data, Redis for caching and real-time data, and object storage for storing files. * **Don't Do This:** Use a single database for all use cases without considering performance and scalability requirements. **Why:** Choosing the right database improves performance and reduces complexity. **Fly.io Considerations:** * Fly.io offers managed PostgreSQL and Redis databases. * Use Fly.io Volumes for persistent storage of database data. **Example:** """toml # fly.toml for a PostgreSQL app app = "my-postgres-app" primary_region = "iad" [build] image = "postgres:14" [env] POSTGRES_PASSWORD = "your_password" POSTGRES_USER = "your_user" POSTGRES_DB = "your_db" """ **Anti-pattern:** Using a relational database for high-velocity, unstructured data, or forgetting to consider geographic data locality. ### 4.2. Data Locality and Replication **Standard:** Consider data locality and replication for optimal performance and availability. * **Do This:** Deploy your database close to your application servers. Use database replication to ensure data availability across different regions. Leverage Fly.io Regions. * **Don't Do This:** Store all data in a single region without considering latency or disaster recovery. **Why:** Data locality minimizes latency and improves application performance. Replication protects against data loss and ensures high availability. **Fly.io Considerations:** * Use Fly.io Regions to deploy your database and application servers in the same geographic location. * Configure database replication to replicate data across multiple regions. **Example:** Configure PostgreSQL replication across multiple Fly.io regions. (Requires setting up streaming replication outside the scope of this document). **Anti-pattern:** Fetching all data across the globe instead of creating regional read replicas. ### 4.3. Database Migrations **Standard:** Use database migrations to manage schema changes. * **Do This:** Use a database migration tool (e.g., Alembic, Flyway) to manage schema changes in a controlled and repeatable manner. * **Don't Do This:** Make manual schema changes directly in your database. **Why:** Database migrations ensure that schema changes are applied consistently across different environments and simplify rollback procedures. **Fly.io Considerations:** * Include database migrations as part of your CI/CD pipeline. * Use Fly.io Volumes to store migration scripts. ## 5. Security Best Practices ### 5.1. Least Privilege **Standard:** Follow the principle of least privilege. * **Do This:** Grant only the necessary permissions to users and services. Avoid using root or admin accounts unless absolutely necessary. Use environment-specific service accounts with limited scope. * **Don't Do This:** Grant excessive permissions that could be exploited by attackers. **Why:** The principle of least privilege limits the impact of security breaches. **Fly.io Considerations:** * Use Fly.io's built-in security features to restrict access to your applications and data. * Use environment variables to store credentials instead of hardcoding them in your code. ### 5.2. Input Validation and Output Encoding **Standard:** Validate all user inputs and encode outputs to prevent security vulnerabilities. * **Do This:** Use input validation to prevent SQL injection, cross-site scripting (XSS), and other attacks. Encode outputs to prevent XSS vulnerabilities when displaying user-generated content. * **Don't Do This:** Trust user input blindly or allow user-generated content to be displayed without proper encoding. **Why:** Input validation and output encoding prevent common security vulnerabilities. ### 5.3. Dependency Management **Standard:** Manage your application's dependencies carefully. * **Do This:** Use a dependency management tool (e.g., pip, npm, Maven) to track and manage your application's dependencies. Regularly update dependencies to patch security vulnerabilities. Scan dependencies for known vulnerabilities using tools like "npm audit" or "pip check". * **Don't Do This:** Use outdated or unmaintained dependencies. **Why:** Dependency management helps to prevent security vulnerabilities and ensures that your application is using the latest security patches. **Fly.io Considerations:** * Pin dependencies and use a lockfile to ensure repeatable deployments. * Regularly rebuild Docker images to update base images with security patches. ## 6. Performance Optimization ### 6.1. Caching **Standard:** Implement caching to improve application performance. * **Do This:** Use caching to store frequently accessed data in memory or on disk. Use a caching library (e.g., Redis, Memcached) to simplify caching. Leverage Fly.io Regions for geographically distributed caching. * **Don't Do This:** Cache sensitive data or data that changes frequently. **Why:** Caching reduces database load and improves response times. **Fly.io Considerations:** * Use Fly.io's Redis add-on for a managed Redis cache. * Configure HTTP caching headers to cache static assets on CDNs. ### 6.2. Connection Pooling **Standard:** Use connection pooling to reduce the overhead of creating database connections. * **Do This:** Use a connection pooling library to manage database connections efficiently. Configure the connection pool size based on your application's workload. * **Don't Do This:** Create a new database connection for every request. **Why:** Connection pooling reduces database load and improves response times. ### 6.3. Asynchronous Operations **Standard:** Use asynchronous operations to improve application responsiveness. * **Do This:** Use asynchronous tasks to perform long-running operations in the background. Use a task queue (e.g., Celery, RabbitMQ) to manage asynchronous tasks. * **Don't Do This:** Block the main thread with long-running operations. **Why:** Asynchronous operations improve application responsiveness and prevent the application from becoming unresponsive.
# State Management Standards for Fly.io This document outlines coding standards and best practices for managing application state within Fly.io applications. It provides guidance on data flow, reactivity, and state management options specific to the Fly.io environment. These standards aim to improve maintainability, performance, and scalability of your deployments. ## 1. Introduction to State Management on Fly.io Effective state management is crucial for building robust and scalable applications on Fly.io. Fly.io's distributed architecture presents unique challenges and opportunities for managing state, requiring careful consideration of data consistency, latency, and resilience. This guide covers approaches ranging from simple in-memory state to distributed databases and caching strategies. ## 2. Choosing the Right State Management Approach Selecting the appropriate state management solution is a critical architectural decision influenced by application requirements, data volume, consistency needs, and performance goals. ### 2.1. Factors to Consider * **Data Consistency:** Determine the required consistency level (e.g., eventual consistency, strong consistency). Strong consistency typically involves more complex setups and potential performance trade-offs, but is necessary for sensitive data. * **Data Volume:** Consider the volume of data that needs to be stored and managed. Small amounts of session data may be effectively handled in-memory, while large datasets require database solutions. * **Latency Requirements:** Analyze latency constraints based on the application's user experience needs. Caching and geographically distributed data stores can minimize latency. * **Scalability:** Choose solutions that can scale horizontally to handle increasing traffic and data volume. Stateless application components coupled with externalized, scalable state management solutions are generally preferred. * **Complexity:** Balance the need for sophisticated state management with the overhead of implementation and maintenance. Start with simpler solutions and only introduce more complex tools when necessary. ### 2.2. State Management Options * **In-Memory State:** Suitable for small amounts of ephemeral data (e.g., temporary UI state) or data that can be easily regenerated. Do not rely solely on in-memory state for critical data as instances can be terminated or restarted. * *When to Use:* Transient, non-critical UI state, caching frequently accessed but non-essential data. * *Example:* A simple counter application. * **Fly.io Volumes:** Persistent storage within a region. Volumes are attached to a single VM in one region. * *When to Use:* Stateful applications within a specific region that need persistent storage. Ideal for databases where regional locality is desired. Can be combined with replicated databases for higher availability. * *Example:* Postgres data directory on a dedicated VM within a region. * **Fly.io Postgres:** Fly.io-managed Postgres clusters distributed globally. Provides automated backups, scaling, and fault tolerance. Ideal for transactional data with standard SQL semantics. * *When to Use:* Applications requiring standard SQL functionality with ACID properties, automated backups and scaling. * *Example:* Storing user data, product catalogs, order information. * **Key-Value Stores (Redis, Memcached):** Fast, in-memory data stores suitable for caching and session management. Generally provide eventual consistency. * *When to Use:* Caching frequently accessed data, managing user sessions, rate limiting. * *Example:* Caching API responses, session data for authenticated users. * **Distributed Databases (CockroachDB, YugabyteDB):** Distributed SQL databases providing strong consistency, fault tolerance, and scalability. * *When to Use:* Applications requiring strong consistency, high availability, and global distribution of data. * *Example:* Financial transactions, inventory management, global user profiles. * **Object Storage (AWS S3, Google Cloud Storage):** Storing large unstructured data such as images, videos, and backups. * *When to Use:* Storing static assets, large media files, and backups. * *Example:* User-uploaded photos, video content, database backups. ## 3. State Management Standards The following standards apply to all state management solutions used within Fly.io applications. ### 3.1. General Principles * **Do This:** Externalize all persistent application state. Avoid storing critical data solely within application instances. * **Why:** Fly.io instances are ephemeral and can be restarted or relocated. Data stored only in-memory will be lost. * **Don't Do This:** Rely on local file storage within the VM for important data unless using Fly.io Volumes when region-specific affinity is satisfactory. * **Why:** Instance failures or relocation will result in data loss. * **Do This:** Favor stateless application components whenever possible. * **Why:** Simplifies scaling, deployment, and recovery in a distributed environment. ### 3.2. Configuration and Secrets * **Do This:** Store configuration and secrets using Fly.io's secrets management. * **Why:** Securely injects environment variables at runtime, avoiding hardcoding. * **Don't Do This:** Commit secrets directly to source code or include them in Docker images. * **Why:** Compromises security and violates best practices. * **Example:** Setting a database password as a Fly.io secret. """bash fly secrets set DATABASE_PASSWORD=your_secret_password """ Accessing it in the application (Node.js): """javascript const dbPassword = process.env.DATABASE_PASSWORD; """ ### 3.3. Database Connections * **Do This:** Use connection pooling to efficiently manage database connections. * **Why:** Reduces connection overhead and improves application performance by reusing existing connections. * **Do This:** Set appropriate connection timeouts to prevent resource exhaustion. * **Why:** Avoids connections being held open indefinitely, especially during network issues. * **Do This:** Use environment variables to configure database connection strings. * **Why:** Allows dynamic configuration based on the environment (development, staging, production). * **Example:** Connecting to Fly.io Postgres with connection pooling (Node.js with "pg"): """javascript const { Pool } = require('pg'); const pool = new Pool({ connectionString: process.env.DATABASE_URL, max: 20, // Max number of clients in the pool idleTimeoutMillis: 30000, // Close idle clients after 30 seconds connectionTimeoutMillis: 2000, // Return an error after 2 seconds if connection could not be established }); module.exports = { query: (text, params) => pool.query(text, params), }; // Example usage: async function fetchData() { const { rows } = await pool.query('SELECT NOW()'); console.log(rows[0]); } """ ### 3.4. Caching * **Do This:** Implement caching for frequently accessed data to reduce database load and improve response times. * **Why:** Caching minimizes latency and improves application performance by serving data from memory. * **Do This:** Use appropriate cache invalidation strategies to ensure data consistency. * **Why:** Avoid serving stale data to users. Implement time-based expiration (TTL) or event-based invalidation. * **Do This:** Consider using a distributed cache like Redis or Memcached for shared caching across multiple application instances. * **Why:** Provides a centralized cache that can be accessed by all instances. * **Example:** Using Redis for caching (Node.js with "ioredis"): """javascript const Redis = require('ioredis'); const redis = new Redis(process.env.REDIS_URL); // Connect to Redis async function getCachedData(key, fetchData) { const cachedData = await redis.get(key); if (cachedData) { return JSON.parse(cachedData); } const data = await fetchData(); // Fetch data from source await redis.set(key, JSON.stringify(data), 'EX', 3600); // Cache for 1 hour (3600 seconds) return data; } // Example usage: async function fetchUserData() { // Logic to fetch user data from the database return { id: 123, name: 'John Doe' }; } async function getUser(userId) { const cacheKey = "user:${userId}"; const userData = await getCachedData(cacheKey, fetchUserData); console.log(userData); } """ ### 3.5. Session Management * **Do This:** Store session data in a reliable external data store (e.g., Redis, database). * **Why:** Ensures session persistence across instance restarts and scaling events. * **Do This:** Use secure session cookies with appropriate attributes (e.g., "HttpOnly", "Secure", "SameSite"). * **Why:** Enhances security by preventing cross-site scripting (XSS) and cross-site request forgery (CSRF) attacks. * **Do This:** Implement session expiration and regular session cleanup to prevent resource exhaustion. * **Why:** Prevents accumulation of orphaned session data. * **Example:** Express.js session configuration using Redis (Node.js with "connect-redis" and "express-session"): """javascript const session = require('express-session'); const RedisStore = require('connect-redis').default; const Redis = require('ioredis'); const redisClient = new Redis(process.env.REDIS_URL); app.use(session({ store: new RedisStore({ client: redisClient }), secret: process.env.SESSION_SECRET, resave: false, saveUninitialized: false, cookie: { secure: process.env.NODE_ENV === 'production', // Only send over HTTPS in production httpOnly: true, // Prevent client-side JavaScript access sameSite: 'strict', // Prevent CSRF attacks maxAge: 24 * 60 * 60 * 1000, // Session expires after 24 hours } })); """ ### 3.6. Data Replication and Distribution * **Do This:** Consider using data replication or distribution strategies to improve availability and reduce latency for geographically distributed users. * **Why:** Provides redundancy and faster access to data by placing it closer to users. * **Do This:** Use caution regarding eventual consistency. Always handle conflict resolution and data reconciliation properly. * **Fly.io Postgres:** Use multi-region Postgres clusters for automatic data replication and failover. ### 3.7. Monitoring and Logging * **Do This:** Implement comprehensive monitoring and logging to track state management performance and identify potential issues. * **Why:** Allows proactive identification and resolution of problems. * **Do This:** Log relevant state transitions and errors to facilitate debugging. * **Why:** Provides insight into application behavior and helps diagnose root causes of issues. * **Do This:** Monitor database connection pool usage, cache hit rates, and other key metrics. * **Why:** Provides early warnings of performance bottlenecks or resource exhaustion. ## 4. Technology-Specific State Management ### 4.1. Remix Remix handles data loading and mutations through Actions and Loaders. Leverage this mechanism for Fly.io specific considerations. * **Do This:** Use "getSession" and "commitSession" for managing user sessions backed by a database or Redis. """javascript // Session management example using Remix: import { createCookieSessionStorage } from "@remix-run/node"; // or cloudflare/deno const { getSession, commitSession, destroySession } = createCookieSessionStorage({ cookie: { name: "__session", httpOnly: true, path: "/", sameSite: "lax", secrets: ["s3cret"], secure: process.env.NODE_ENV === "production", }, }); export { getSession, commitSession, destroySession }; """ * **Do This:** For Remix applications, consider using Fly.io Volumes for persistent storage where regional performance is desired. * **Don't Do This:** Avoid directly manipulating localStorage or sessionStorage for critical application state within Remix, as this data is client-side only and is not persisted across different devices and browsers.. ### 4.2. Next.js Next.js offers various options for state management ranging from built-in solutions to third-party libraries. * **Do This:** For global state, utilize Context API with "useReducer" or state management libraries like Zustand or Jotai. These integrate well with Server Components and provide efficient updates. * **Do This:** If you are using Next.js App Router, consider using Server Actions for data mutations, which allow you to execute server-side code directly from your components. Data persistence should still be handled with external databases or storage solutions. """javascript // Example Server Action for submitting a form 'use server' export async function createInvoice(formData: FormData) { const rawFormData = { customerId: formData.get('customerId'), amount: formData.get('amount'), status: formData.get('status'), }; // Persist the data to a database await createInvoiceInDb(rawFormData); // Replace with your DB persistence logic revalidatePath('/dashboard/invoices'); // Optional: Revalidate cache automatically after mutation redirect('/dashboard/invoices'); // Optional: Redirect user to another page } // In your component import { createInvoice } from './actions'; import { useFormState } from 'react-dom' export default function Page() { const [state, dispatch] = useFormState(createInvoice, null); return ( <form action={dispatch}> {/* Form fields */} <button type="submit">Create Invoice</button> </form> ); } """ * **Don't Do This:** Rely exclusively on "getServerSideProps" for handling all dynamic data, especially if the data isn't truly required for initial page render. This can negatively impact performance. ### 4.3. General State Management Libraries (Redux, Zustand, Jotai) * **Do This:** Centralize state updates with reducers or update functions. * **Do This:** Use asynchronous actions or middleware (e.g., Redux Thunk, Redux Saga) for handling data fetching and other side effects. * **Do This:** Optimize state updates to prevent unnecessary re-renders. Use selectors or memoization techniques to derive state from the global store. ## 5. Anti-Patterns * **Over-Reliance on Global State:** Avoid storing unnecessary data in global state, which can lead to performance issues and make debugging difficult. * **Ignoring Concurrency Issues:** Be mindful of concurrency issues when updating shared state, especially in a distributed environment. Use appropriate locking mechanisms or optimistic concurrency control. * **Lack of Monitoring:** Failing to monitor state management performance can lead to undetected issues and performance bottlenecks. ## 6. Optimizing for Fly.io's Architecture Fly.io offers a globally distributed platform, allowing you to place your application instances close to your users. This can significantly reduce latency, but requires careful consideration of data locality and consistency. * **Regional Data Affinity:** Consider the implications of placing data within a specific region. Data stored on a Fly.io Volume is tied to that region. This is useful when data is primarily accessed by users in that region, but can increase latency for users accessing data from other regions. * **Global Data Replication:** For data that needs to be accessed globally with low latency, consider using Fly.io Postgres with multi-region replication or a globally distributed database like CockroachDB or YugabyteDB. * **Caching Strategies:** Use a tiered caching approach to minimize latency. Cache frequently accessed data close to the user using client-side caching (e.g., browser cache, service worker) or edge caching (e.g., Fly.io CDN). For shared data, use a distributed cache like Redis. ## 7. Conclusion By following these coding standards, you can build robust, scalable, and maintainable applications on Fly.io. Choosing the right state management solution and following best practices for configuration, caching, session management, and monitoring will significantly improve the performance, reliability, and security of your deployments. Always consider the specific requirements of your application and the unique characteristics of the Fly.io environment when making state management decisions.
# Performance Optimization Standards for Fly.io This document outlines the coding standards focused on performance optimization for applications deployed on Fly.io. Adhering to these standards will lead to faster, more responsive, and resource-efficient applications. These standards are tailored for the latest version of Fly.io and incorporate modern approaches for optimal performance within the Fly.io ecosystem. ## 1. Architectural Considerations for Performance ### 1.1. Region Selection and Geographic Distribution **Standards:** * **Do This:** Deploy your application to multiple regions closest to your users. Use Fly.io's built-in support for global deployments to minimize latency. * **Don't Do This:** Deploy only to a single region, especially if your user base is geographically distributed. **Why:** Reduces latency by serving users from the nearest available region. Improves availability by distributing load across multiple regions. **Code Example (fly.toml):** """toml app = "my-fly-app" primary_region = "iad" # Initial region [regions] [[regions.group]] codes = ["iad", "lhr", "syd"] #Expand reach source = "primary" console_command = "/app/bin/my-fly-app migrate" [build] [deploy] release_command = "/app/bin/my-fly-app migrate" strategy = "rolling" [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 processes = ["app"] [[http_service.ports]] port = 80 handlers = ["http"] [[http_service.ports]] port = 443 handlers = ["tls", "http"] [experimental] allowed_public_ports = [] [[services]] protocol = "tcp" internal_port = 8080 processes = ["app"] [[services.ports]] port = 80 handlers = ["http"] [[services.ports]] port = 443 handlers = ["tls", "http"] """ **Anti-Pattern:** Hardcoding region-specific logic into the application code. Use Fly.io's configuration and routing features instead. ### 1.2. Database Proximity **Standards:** * **Do This:** Locate your database (e.g., Postgres, Redis) in the same region as your application servers whenever possible to minimize network latency. Consider using Fly.io's managed Postgres or Redis services. * **Don't Do This:** Access a database across regions unless absolutely necessary. **Why:** Reduces latency for database queries, improving overall application responsiveness. **Code Example (Connecting to Fly.io Postgres):** """python import psycopg2 import os # Fetch database credentials from environment variables db_host = os.environ.get("FLY_POSTGRES_FQDN") db_name = os.environ.get("PGDATABASE") db_user = os.environ.get("PGUSER") db_password = 'your_db_password' # Better to get this from a secret try: conn = psycopg2.connect( host=db_host, database=db_name, user=db_user, password=db_password, port=5432 # Usually 5432 for PostgreSQL ) print("Database connection successful") cur = conn.cursor() cur.execute("SELECT version();") db_version = cur.fetchone() print(f"PostgreSQL version: {db_version}") cur.close() conn.close() except psycopg2.Error as e: print(f"Error connecting to database: {e}") """ **Anti-Pattern:** Ignoring database latency. Profile database queries to identify and optimize slow operations. ### 1.3. Caching Strategies **Standards:** * **Do This:** Implement caching at multiple levels: browser, CDN (using Fly.io's global edge network), application server (in-memory), and database (query caching). Use appropriate cache invalidation strategies. Implement HTTP caching headers (e.g., "Cache-Control", "Expires"). * **Don't Do This:** Rely solely on database caching. Cache frequently accessed data closer to the user. **Why:** Reduces load on application servers and databases, resulting in faster response times and lower resource utilization. **Code Example (HTTP Caching with Flask):** """python from flask import Flask, make_response app = Flask(__name__) @app.route('/') def index(): response = make_response("<h1>Hello, World!</h1>") response.headers['Cache-Control'] = 'public, max-age=3600' # Cache for 1 hour return response if __name__ == '__main__': app.run(debug=True) """ **Anti-Pattern:** Aggressively caching dynamic content. Use appropriate cache invalidation techniques when data changes. ### 1.4. Connection Pooling **Standards:** * **Do This:** Use connection pooling for database connections to reduce the overhead of establishing new connections for each request. * **Don't Do This:** Create a new database connection for every request, especially under high load. **Why:** Reduces database load and improves application response time by reusing existing connections. **Code Example (Connection Pooling with SQLAlchemy):** """python from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker import os db_host = os.environ.get("FLY_POSTGRES_FQDN") db_name = os.environ.get("PGDATABASE") db_user = os.environ.get("PGUSER") db_password = 'your_db_password' # get this from a secrets manager! # Database URL (adjust username, password, host, and database name) db_url = f"postgresql://{db_user}:{db_password}@{db_host}/{db_name}" # Create a database engine with connection pooling engine = create_engine(db_url, pool_size=5, max_overflow=10) # Adjust pool_size and max_overflow # Create a session factory Session = sessionmaker(bind=engine) # Example Usage: def get_data_from_db(): session = Session() try: # Perform database operations using the session # Example: # results = session.query(MyTable).all() print("Querying the DB... Replace with your actual query here") except Exception as e: print(f"Error during database operation: {e}") finally: session.close() # Always close the session! if __name__ == '__main__': get_data_from_db() """ **Anti-Pattern:** Setting the connection pool size too small or too large. Tune based on application load and database capacity. ## 2. Code-Level Optimizations ### 2.1. Efficient Data Structures and Algorithms **Standards:** * **Do This:** Choose appropriate data structures (e.g., dictionaries, sets) and algorithms (e.g., sorting algorithms, search algorithms) for the specific task. Optimize for time and space complexity appropriately. * **Don't Do This:** Use inefficient data structures or algorithms that lead to slow execution or high memory consumption. **Why:** Improves application performance by minimizing resource usage and execution time. **Code Example (Using Sets for Efficient Membership Testing):** """python my_list = [1, 2, 3, 4, 5] #Original Data my_set = set(my_list) # Convert to Set #Checking for membership is much faster in sets, if you only need this functionality if 3 in my_set: print("3 exists in my_set") if 6 in my_set: print("6 exists in my_set") else : print("6 does not exist in my_set") """ **Anti-Pattern:** Linear search on large, unsorted lists. Consider using binary search or hash tables. ### 2.2. Asynchronous Operations **Standards:** * **Do This:** Use asynchronous operations (e.g., async/await in Python, Promises in JavaScript) for I/O-bound tasks such as network requests, file I/O, and database queries to avoid blocking the main thread. * **Don't Do This:** Perform blocking I/O operations on the main thread. **Why:** Prevents blocking the event loop, allowing the application to handle more requests concurrently. Improves responsiveness and throughput. **Code Example (Asynchronous HTTP Request with Python aiohttp):** """python import asyncio import aiohttp async def fetch_data(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.text() async def main(): data = await fetch_data('https://example.com') print(data[:100]) # Print the first 100 characters if __name__ == '__main__': asyncio.run(main()) """ **Anti-Pattern:** Mixing synchronous and asynchronous code without proper thread management. Use appropriate executors or thread pools. ### 2.3. Resource Management **Standards:** * **Do This:** Explicitly release resources such as file handles, database connections, and memory as soon as they are no longer needed. Use "try...finally" blocks or context managers ("with" statement in Python) to ensure proper resource cleanup. Utilize Fly.io's autoscaling to efficiently use resources. Consider autoscaling to zero during off-peak hours. * **Don't Do This:** Leak resources, which can lead to memory exhaustion or other performance problems. **Why:** Prevents resource leaks, ensuring efficient utilization of system resources. Improves application stability and scalability. **Code Example (Using "with" Statement for File Handling):** """python try: with open('my_file.txt', 'r') as f: data = f.read() print(data) except FileNotFoundError:
# Testing Methodologies Standards for Fly.io This document outlines testing methodologies standards for Fly.io applications, covering unit, integration, and end-to-end testing. It provides guidance on how to apply these principles specifically within the Fly.io environment, highlighting platform-specific considerations and best practices. ## 1. General Testing Principles for Fly.io These principles apply to all levels of testing and are crucial for ensuring the reliability and performance of Fly.io applications. * **Do This:** Employ the testing pyramid, prioritizing unit tests, then integration tests, and finally end-to-end tests, balancing coverage cost. * **Don't Do This:** Rely heavily on end-to-end tests at the expense of unit tests, as they offer less isolation and slower feedback. **Why:** A balanced testing approach provides a comprehensive view of application correctness, catching issues early and often. Unit tests quickly verify individual components, integration tests validate interactions between components, and end-to-end tests ensure the entire system behaves as expected. Focusing too much on end-to-end tests makes debugging more difficult and slow down development cycles. * **Do This:** Write tests that are independent, repeatable, and deterministic. * **Don't Do This:** Create tests that depend on external services or require specific data states that are hard to reproduce consistently. **Why:** Reliable tests provide a strong foundation for continuous integration and continuous delivery (CI/CD). Non-deterministic tests undermine trust in the testing process and can lead to false positives or negatives. By isolating dependencies and ensuring repeatable test environments. * **Do This:** Use descriptive test names that clearly explain what the test is verifying. * **Don't Do This:** Use vague or cryptic test names that make it difficult to understand the purpose of the test. **Why:** Clear test names improve readability and maintainability. When a test fails, a descriptive name allows developers to quickly understand the issue and its context. ### 1.1 Fly.io Specific Considerations * **Do This:** Consider regional testing when deploying to multiple Fly.io regions. Write tests that verify regional data consistency and performance (latency). * **Don't Do This:** Assume that your application behaves identically across all regions without specific checks in place. **Why:** Fly.io's multi-region deployments introduce complexity. Regional data replication and network latency can impact application behavior differently across the globe. * **Do This:** Include tests that simulate Fly.io platform events, such as restarts, scaling, and health checks. Create corresponding unit tests that cover the handler logic. * **Don't Do This:** Assume that the application will always run uninterrupted. **Why:** Fly.io is a dynamic platform. Handling events such as restarts and scaling gracefully is crucial for ensuring high availability and a smooth user experience. Testing ensures that the application recovers correctly from unexpected events. ## 2. Unit Testing Unit testing focuses testing individual components of an application in isolation. * **Do This:** Write unit tests for all non-trivial functions and methods. * **Don't Do This:** Skip unit testing for "simple" functions, as they can still contain errors and are often refactored later. **Why:** Unit tests are the fastest and most reliable way to catch errors early in the development cycle. They provide a safety net when refactoring code and improve overall code quality. * **Do This:** Use mocking and stubbing techniques to isolate units of code from external dependencies. * **Don't Do This:** Directly call external services or databases in unit tests. This makes tests slow, unreliable, and difficult to maintain. **Why:** Isolation is key to effective unit testing. Mocking and stubbing allow you to control the behavior of dependencies, ensuring focused tests that verify the logic of a single unit of code. ### 2.1 Code Examples (Go) """go package main import ( "testing" "net/http" "net/http/httptest" ) func GetGreeting(name string) string { return "Hello, " + name + "!" } func handler(w http.ResponseWriter, r *http.Request) { name := r.URL.Query().Get("name") greeting := GetGreeting(name) w.WriteHeader(http.StatusOK) w.Write([]byte(greeting)) } func TestGetGreeting(t *testing.T) { expected := "Hello, World!" actual := GetGreeting("World") if actual != expected { t.Errorf("Expected %s, but got %s", expected, actual) } } func TestHandler(t *testing.T) { req, err := http.NewRequest("GET", "/?name=Test", nil) if err != nil { t.Fatal(err) } rr := httptest.NewRecorder() handler := http.HandlerFunc(handler) handler.ServeHTTP(rr, req) if status := rr.Code; status != http.StatusOK { t.Errorf("handler returned wrong status code: got %v want %v", status, http.StatusOK) } expected := "Hello, Test!" if rr.Body.String() != expected { t.Errorf("handler returned unexpected body: got %v want %v", rr.Body.String(), expected) } } """ In this example, "TestGetGreeting" tests the "GetGreeting" function in isolation. "TestHandler" tests the HTTP handler, mocking HTTP requests. ### 2.2 Fly.io Specific Unit Testing Examples Given a Fly.io app that relies on the "FLY_REGION" environment variable: """go package main import ( "os" "testing" ) func GetRegion() string { region := os.Getenv("FLY_REGION") if region == "" { return "unknown" } return region } func TestGetRegion(t *testing.T) { // Set the FLY_REGION environment variable for testing os.Setenv("FLY_REGION", "ord") defer os.Unsetenv("FLY_REGION") // Clean up after the test expected := "ord" actual := GetRegion() if actual != expected { t.Errorf("Expected region %s, but got %s", expected, actual) } } func TestGetRegion_NoEnv(t *testing.T) { // Ensure FLY_REGION is not set for this test case os.Unsetenv("FLY_REGION") expected := "unknown" actual := GetRegion() if actual != expected { t.Errorf("Expected region %s, but got %s", expected, actual) } } """ **Why:** These tests ensure that the application correctly retrieves and handles the "FLY_REGION" environment variable, which is critical for region-aware logic within a Fly.io application. The "defer os.Unsetenv("FLY_REGION")" ensures that the environment variable is cleaned up after the first test, preventing interference with other tests. Testing with no environment variable ensures that the code handles unexpected situations and defaults correctly. ## 3. Integration Testing Integration testing focuses testing interactions between different components or services of an application. * **Do This:** Test the interactions between modules, services, or databases to ensure they work together correctly. * **Don't Do This:** Test individual units of code in isolation during integration testing. That's the scope of unit tests. **Why:** Integration tests verify that components correctly exchange data and behave as expected when integrated. They catch issues that are not apparent when testing individual units of code. * **Do This:** Use lightweight test databases or mock external services to control the test environment. * **Don't Do This:** Use production databases or rely on live external services during integration testing. This can lead to data corruption, performance issues, and unreliable test results. **Why:** Controlled test environments ensure that integration tests are predictable and repeatable. Using production resources introduces risks and dependencies that complicate testing. ### 3.1 Code Examples (Go) Assumes the following is being tested. """go package main import ( "database/sql" "fmt" _ "github.com/lib/pq" // PostgreSQL driver "log" ) type User struct { ID int Name string Email string } func GetUserByID(db *sql.DB, id int) (*User, error) { query := "SELECT id, name, email FROM users WHERE id = $1" row := db.QueryRow(query, id) user := &User{} err := row.Scan(&user.ID, &user.Name, &user.Email) if err != nil { return nil, err } return user, nil } """ Here is an example integration test. """go package main import ( "database/sql" "fmt" "log" "os" "testing" _ "github.com/lib/pq" // PostgreSQL driver ) var testDB *sql.DB func setupTestDB() (*sql.DB, error) { connStr := os.Getenv("TEST_DATABASE_URL") if connStr == "" { connStr = "postgres://user:password@localhost:5432/testdb?sslmode=disable" // Default for local testing log.Println("Using default test database URL. Set TEST_DATABASE_URL for explicit configuration.") } db, err := sql.Open("postgres", connStr) if err != nil { return nil, fmt.Errorf("failed to open database: %w", err) } err = db.Ping() if err != nil { return nil, fmt.Errorf("failed to connect to database: %w", err) } // Initialize the database schema (create tables, etc.) _, err = db.Exec(" CREATE TABLE IF NOT EXISTS users ( id SERIAL PRIMARY KEY, name TEXT NOT NULL, email TEXT NOT NULL ); INSERT INTO users (name, email) VALUES ('Test User', 'test@example.com'); ") if err != nil { return nil, fmt.Errorf("failed to initialize database schema: %w", err) } return db, nil } func cleanupTestDB(db *sql.DB) error { _, err := db.Exec("DROP TABLE IF EXISTS users;") if err != nil { return fmt.Errorf("failed to drop table: %w", err) } return nil } func TestGetUserByID(t *testing.T) { if testDB == nil { t.Skip("Test database not initialized. Set TEST_DATABASE_URL.") // Skip tests if the global testDB isn't initialized. } user, err := GetUserByID(testDB, 1) if err != nil { t.Fatalf("Error getting user: %v", err) } if user == nil { t.Fatalf("User not found") } if user.Name != "Test User" { t.Errorf("Expected user name 'Test User', got '%s'", user.Name) } if user.Email != "test@example.com" { t.Errorf("Expected user email 'test@example.com', got '%s'", user.Email) } } func TestMain(m *testing.M) { var err error testDB, err = setupTestDB() if err != nil { log.Fatalf("Failed to set up test database: %v", err) } code := m.Run() if testDB != nil { if err := cleanupTestDB(testDB); err != nil { log.Printf("Failed to clean up test database: %v", err) } testDB.Close() // Close the DB connection. } os.Exit(code) } """ Key improvements: * Uses environment variable "TEST_DATABASE_URL" so the integration tests will connect to the correct database both during local development and within CI/CD after deployment to Fly.io (using "fly secrets set"). * Initializes and cleans up the test database, important because the tests create a temporary table. * Uses "TestMain" allows for setup and teardown of costly resources like the test DB. * Skips the tests if the DB isn't initalized (via "TEST_DATABASE_URL"), so "go test" will work even without a specified test DB. * Closes test DB connection in "TestMain". ### 3.2 Fly.io Specific Integration Testing Examples Testing interactions between different Fly.io services: """go //Imagine a service that depends of a Redis instance package main import ( "context" "fmt" "os" "testing" "github.com/go-redis/redis/v8" ) var rdb *redis.Client func setupRedis() (*redis.Client, error) { redisURL := os.Getenv("FLY_REDIS_CACHE_URL") //Get from Fly env if redisURL == "" { return nil, fmt.Errorf("FLY_REDIS_CACHE_URL not set") } opt, err := redis.ParseURL(redisURL) if err != nil { return nil, fmt.Errorf("failed to parse Redis URL: %w", err) } rdb := redis.NewClient(opt) _, err = rdb.Ping(context.Background()).Result() if err != nil { return nil, fmt.Errorf("failed to connect to Redis: %w", err) } return rdb, nil } func TestRedisConnection(t *testing.T) { if rdb == nil { t.Skip("Redis not initialized. Make sure FLY_REDIS_CACHE_URL is set.") } ctx := context.Background() err := rdb.Set(ctx, "testkey", "testvalue", 0).Err() if err != nil { t.Fatalf("Failed to set value in Redis: %v", err) } val, err := rdb.Get(ctx, "testkey").Result() if err != nil { t.Fatalf("Failed to get value from Redis: %v", err) } if val != "testvalue" { t.Errorf("Expected 'testvalue', got '%s'", val) } // Cleanup rdb.Del(ctx, "testkey") } func TestMain(m *testing.M) { var err error rdb, err = setupRedis() if err != nil { fmt.Printf("Failed to set up Redis: %v\n", err) } code := m.Run() if rdb != nil { rdb.Close() } os.Exit(code) } """ **Why:** This example demonstrates how to test the integration between a Fly.io application and a Redis instance. Retrieves the Redis connection URL from the environment, connects to Redis, performs basic operations, and cleans up. This verifies that the application can correctly communicate with stateful services deployed on Fly.io. "FLY_REDIS_CACHE_URL" is a common (but not required) environment variable generated by the Fly.io Redis add-on. ## 4. End-to-End (E2E) Testing End-to-end testing verifies end-to-end system behavior. * **Do This:** Use E2E tests to validate critical user flows from start to finish. * **Don't Do This:** Test every possible scenario with E2E tests, as they are slow and expensive to maintain. Focus on the most important workflows. **Why:** E2E tests provide the highest level of confidence that the application is functioning correctly from the user's perspective. They simulate real user interactions and catch issues that may not be apparent in unit or integration tests. * **Do This:** Use tools like Cypress, Playwright, or Selenium to automate browser-based E2E tests. In CLI tools, use shell scripting or dedicated testing frameworks to drive flows. * **Don't Do This:** Manually run E2E tests, as this is time-consuming and prone to human error. **Why:** Automation is essential for efficient E2E testing. Automated tests can be run frequently as part of CI/CD pipelines. ### 4.1 Example (Playwright - Node.js) First, install Playwright: """bash npm install -D @playwright/test npx playwright install """ Then, create a test file e.g., "tests/example.spec.ts": """typescript import { test, expect } from '@playwright/test'; test('homepage has title and links to intro page', async ({ page }) => { await page.goto('https://your-fly-io-app.fly.dev/'); // Replace with your Fly.io app URL // Expect a title "to contain" a substring. await expect(page).toHaveTitle(/Your App Title/); // Replace with your app title // create a locator const getStarted = page.getByRole('link', { name: 'Get started' }); // Expect an attribute "to be strictly equal" to the expected value. await expect(getStarted).toHaveAttribute('href', '/intro'); // Click the get started link. await getStarted.click(); // Expects the URL to contain intro. await expect(page).toHaveURL(/.*intro/); }); """ Update the "playwright.config.ts" file: """typescript import { defineConfig, devices } from '@playwright/test'; const baseURL = process.env.BASE_URL || 'https://your-fly-io-app.fly.dev/'; export default defineConfig({ testDir: './tests', fullyParallel: true, reporter: 'html', use: { baseURL: baseURL, trace: 'on-first-retry', }, projects: [ { name: 'chromium', use: { ...devices['Desktop Chrome'] }, }, ], }); """ Run the tests: """bash npx playwright test """ **Notes:** 1. Set "process.env.BASE_URL" to your Fly.io app URL. This allows overriding in different environments (testing vs. production). 2. Use "fly secrets" commands to create and change the environment variables in production. 3. Replace "/Your App Title/" with your application's title. ### 4.2 Fly.io CI Integration Example - GitHub Actions This configuration is a ".github/workflows/playwright.yml" file. """yaml name: Playwright Tests on: push: branches: [ "main" ] pull_request: jobs: test: timeout-minutes: 60 runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - uses: actions/setup-node@v3 with: node-version: 18 - name: Install dependencies run: npm ci - name: Install Playwright Browsers run: npx playwright install --with-deps - name: Run Playwright tests run: npx playwright test - name: Generate HTML report run: npx playwright show-report if: always() """ **Why:** The Github Action installs all dependencies, Playwright browsers and executes Playwright tests. The action then generates an HTML report of the test results. ## 5. Performance Testing While not strictly a type of unit, integration, or e2e testing, performance testing is crucial for Fly.io environments. * **Do This:** Use load testing tools to simulate concurrent users accessing specific resources. Use "fly scale" to scale up the application based on the results of load tests. * **Don't Do This:** Neglect performance testing until after deployment, as this can lead to unexpected issues in production. **Why:** Performance testing helps identify bottlenecks and optimize code for scalability. * **Do This:** Monitor application performance using tools like Grafana and Prometheus, and Fly.io's own metrics dashboard. * **Don't Do This:** Rely solely on manual observation to assess application performance. **Why:** Continuous monitoring provides valuable insights into application behavior over time. ### 5.1 Example (k6) K6 is a popular open-source load testing tool for performance testing HTTP(S) services. Create a script ("script.js") with a sample request: """javascript import http from 'k6/http'; import { sleep } from 'k6'; export const options = { vus: 10, // Virtual Users duration: '10s', }; export default function () { http.get('https://your-fly-io-app.fly.dev/'); // Replace with your Fly.io app URL sleep(1); } """ Configure your "fly.toml" to enable metrics: """toml [metrics] path = "/metrics" """ Run k6 test: """bash k6 run script.js """ **Why:** This runs a 10-second load test with 10 virtual users against your specified Fly.io application URL. ## 6. Security Testing * **Do This:** Use static analysis tools to identify potential security vulnerabilities in the code. * **Don't Do This:** Rely solely on manual code reviews to catch security issues. **Why:** Automated tools can quickly scan large codebases for common security patterns that may be missed by human reviewers. * **Do This:** Use vulnerability scanning tools to identify security issues in dependencies. Regularly update dependencies to patch known vulnerabilities. * **Don't Do This:** Use outdated dependencies without assessing the security risks. **Why:** Dependencies often contain security vulnerabilities that can be exploited by attackers. Keeping dependencies up-to-date is essential for maintaining a secure application. * **Do This:** Implement security tests that cover various aspects of your Fly.io app (authentication, authorization, input validation, etc.). * **Don't Do This:** Deploy an application without running any security tests. **Why:** Without adequate security tests, your application is at higher risk. ### 6.1 Common Anti-Patterns * **Inadequate test coverage:** Failing to write tests for critical parts of the application. This leaves potential vulnerabilities and bugs undiscovered. * **Ignoring test failures:** Ignoring failing tests and continuing to develop new features. This leads to a build-up of technical debt and makes it harder to maintain the application. Failing tests should be addressed immediately. * **Writing flaky tests:** Creating tests that sometimes pass and sometimes fail without any code changes. This undermines trust in the testing process and makes it difficult to identify real issues. These should be investigated to remove non-determinism or rewritten. * **Over-reliance on manual testing:** Depending solely on manual testing, which can lead to missed bugs and security vulnerabilities. * **Not testing Fly.io platform interactions:** Neglecting to test how the application interacts with Fly.io platform features, such as restarts, scaling, and health checks. By following these guidelines, developers can create high-quality, reliable, and performant Fly.io applications.