# Tooling and Ecosystem Standards for Fly.io
This document outlines coding standards focused on tooling and the Fly.io ecosystem, promoting maintainability, performance, and security. It's designed to guide developers and inform AI coding assistants.
## 1. Development Environment Setup
### 1.1 Fly CLI Tooling
**Standard:** Use the latest version of the Fly CLI tool.
**Do This:** Regularly update the Fly CLI:
"""bash
flyctl version update
"""
**Don't Do This:** Use outdated Fly CLI versions.
**Why:** Newer versions include bug fixes, performance improvements, and access to the latest Fly.io features.
**Example:** Checking the current version and updating:
"""bash
flyctl version
# Expected Output (example): flyctl v0.2.21 darwin/arm64 Commit: 3a3... BuildDate: 2023-10-27T15:00:00Z
flyctl version update
"""
### 1.2 Editor Configuration
**Standard:** Configure your editor for Fly.io project development. Use editorconfig or similar for standardized formatting.
**Do This:**
* Install language-specific extensions for syntax highlighting and linting.
* Use editorconfig to define consistent indentation, line endings, and character encoding.
* Integrate linters and formatters (e.g., ESLint, Prettier) via editor plugins.
**Don't Do This:** Rely solely on default editor settings without language-specific configuration.
**Why:** Consistent formatting increases code readability and reduces merge conflicts. Linting helps catch potential errors early.
**Example:** ".editorconfig" file:
"""ini
root = true
[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
[*.py]
indent_style = space
indent_size = 4
[*.js]
indent_style = space
indent_size = 2
"""
### 1.3 Version Control (Git)
**Standard:** Version control all Fly.io project code.
**Do This:**
* Use meaningful commit messages following conventional commits.
* Use branches for feature development and bug fixes.
* Use pull requests for code review.
* Use ".gitignore" to exclude unnecessary files from version control (e.g., ".fly", "node_modules").
**Don't Do This:** Commit directly to the main branch. Ignore code review practices. Include secrets or credentials in the repository.
**Why:** Version control is crucial for collaboration, code tracking, and rollbacks.
**Example:** ".gitignore" File:
"""
.fly/
node_modules/
*.log
.DS_Store
"""
## 2. Dependency Management
### 2.1 Application Dependencies
**Standard:** Use appropriate dependency management tools for your language/framework (e.g., "npm" for Node.js, "pip" for Python, "go mod" for Go).
**Do This:**
* Pin dependencies to specific versions or use version ranges (e.g., "~1.2.3" or "^1.2.3").
* Use lockfiles ("package-lock.json", "requirements.txt", "go.sum") to ensure reproducible builds.
* Periodically update dependencies to address security vulnerabilities and bug fixes.
* Use Dependabot (or similar tool) to automate dependency updates and vulnerability scanning.
**Don't Do This:** Use wildcard versions ("*") or excessively broad version ranges. Neglect updating dependencies regularly.
**Why:** Dependency management ensures consistent builds and mitigates the risk of vulnerabilities and breaking changes.
**Example (Node.js package.json):**
"""json
{
"name": "my-fly-app",
"version": "1.0.0",
"dependencies": {
"express": "^4.17.1",
"node-fetch": "~2.6.1"
},
"devDependencies": {
"eslint": "^7.0.0"
}
}
"""
**Example (Python requirements.txt):**
"""
Flask==2.0.1
requests~=2.25.1
"""
### 2.2 Fly.io Specific Dependencies
**Standard:** Use official or well-maintained community libraries where available for interacting with Fly.io services (e.g., distributed locks, leader election), rather than re-inventing the wheel.
**Do This:** Explore the Fly.io documentation and community forums for relevant libraries. Evaluate libraries based on their maturity, maintainership, and compatibility with your application.
**Don't Do This:** Use undocumented or abandoned libraries. Build core Fly.io platform integrations from scratch without researching the existing ecosystem.
**Why:** Using community libraries accelerates development, leverages existing expertise, and promotes code reusability.
## 3. Logging and Monitoring
### 3.1 Centralized Logging
**Standard:** Utilize a centralized logging system accessible from within Fly.io.
**Do This:**
* Integrate your application with a logging service (e.g., Grafana Loki, ELK stack, CloudWatch Logs, Datadog).
* Structure logs in a standardized format (e.g., JSON) for easier parsing and analysis.
* Include relevant context in log messages (e.g., request ID, user ID, instance ID).
* Use environment variables to configure your logging backend to avoid hardcoding credentials:
"""python
import logging
import os
logging_level = os.environ.get("LOGGING_LEVEL", "INFO").upper()
logging.basicConfig(level=logging_level)
logger = logging.getLogger(__name__)
logger.info("Application started")
"""
* Use structured logging to facilitate querying and analysis, allowing you to aggregate logs by machine, application version, and severity:
"""python
import logging
import json
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
def log_request(request):
log_data = {
"level": "info",
"message": "Request received",
"method": request.method,
"path": request.path,
"user_agent": request.user_agent.string
}
logger.info(json.dumps(log_data))
# Example usage in a Flask route
from flask import Flask, request
app = Flask(__name__)
@app.route('/')
def hello_world():
log_request(request) # Log the incoming request
return 'Hello, World!'
"""
**Don't Do This:** Rely solely on console logging. Hardcode sensitive information in log messages.
**Why:** Centralized logging makes it easier to troubleshoot issues and monitor application health across multiple Fly.io instances.
**Example (simplified Python logging setup):**
"""python
import logging
import os
logging.basicConfig(level=os.environ.get("LOG_LEVEL", "INFO"))
logger = logging.getLogger(__name__)
logger.info("Application starting...")
"""
### 3.2 Health Checks
**Standard:** Implement health checks for your application to detect and recover from failures. Configure these endponts in your "fly.toml" file.
**Do This:**
* Provide an endpoint that reports the application's health status (e.g., "/healthz").
* Include checks for critical dependencies (e.g., database connection, Redis connectivity).
* Configure Fly.io health checks to use the health endpoint.
* Ensure the health check endpoint returns appropriate HTTP status codes (e.g., 200 OK for healthy, 503 Service Unavailable for unhealthy).
**Don't Do This:** Rely on basic ping checks. Neglect to monitor the health check status.
**Why:** Health checks allow Fly.io to automatically restart unhealthy instances, improving application availability.
**Example (basic health check endpoint in Flask):**
"""python
from flask import Flask, jsonify
app = Flask(__name__)
@app.route("/healthz")
def healthz():
# Add checks for database connection, etc. here
return jsonify({"status": "ok"}), 200
"""
**Example ("fly.toml" configuration):**
"""toml
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
[http_service.checks]
path = "/healthz"
interval = "15s"
timeout = "2s"
grace_period = "5s"
"""
### 3.3 Metrics and Monitoring
**Standard:** Collect and monitor application metrics to gain insights into performance and resource utilization. Integrate with Prometheus.
**Do This:**
* Expose application metrics using a standard format (e.g., Prometheus exposition format).
* Use a metrics collection and visualization tool (e.g., Prometheus, Grafana, Datadog).
* Monitor key metrics such as CPU usage, memory usage, network traffic, and request latency.
* Collect custom metrics that are relevant to your specific application and business logic.
**Don't Do This:** Ignore application performance metrics. Fail to set up alerts for critical conditions.
**Why:** Metrics and monitoring provide valuable insights into application behavior and allow you to identify and resolve performance bottlenecks.
**Example (Exposing metrics in Python using Prometheus client library):**
"""python
from prometheus_client import start_http_server, Summary
import random
import time
# Create a metric to track time spent and requests made.
REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request')
# Decorate function with metric.
@REQUEST_TIME.time()
def process_request(t):
"""A dummy function that takes some time."""
time.sleep(t)
if __name__ == '__main__':
# Start up the server to expose the metrics.
start_http_server(8000)
# Generate some requests.
while True:
process_request(random.random())
"""
## 4. Secret Management
### 4.1 Fly.io Secrets Store
**Standard:** Store secrets securely using Fly.io's built-in secrets store.
**Do This:**
* Use the "flyctl secrets" command to set secrets for your application: "flyctl secrets set MY_SECRET=myvalue".
* Access secrets from your application using environment variables: "os.environ.get("MY_SECRET")".
* Rotate secrets regularly.
* Prefer machine secrets over app secrets when the secret needs to follow the life cycle of the machine, such as unique auth tokens that are generated per machine
"flyctl machine secret set MY_MACHINE_SECRET=machine_secret --machine "
**Don't Do This:** Hardcode secrets in your code or configuration files. Store secrets in version control.
**Why:** Fly.io's secrets store provides a secure and convenient way to manage sensitive data.
**Example (setting and accessing a secret):**
"""bash
flyctl secrets set DATABASE_URL="postgres://user:password@host:port/database"
# In your application code (Python):
import os
database_url = os.environ.get("DATABASE_URL")
"""
### 4.2 Vault Integration
**Standard:** For more complex secret management requirements, consider integrating with HashiCorp Vault.
**Do This:**
* Deploy a Vault instance within your Fly.io organization.
* Configure your application to authenticate with Vault and retrieve secrets.
* Use Vault's features, like dynamic secrets and secret leasing, to minimize risk.
**Don't Do This:** Introduce Vault without a clear understanding of its concepts and best practices.
**Why:** Vault provides advanced features for managing and securing secrets, such as audit logging, secret rotation, and access control. This applies for more sensitive secrets or compliance-sensitive workloads.
## 5. Deployment Pipelines
### 5.1 Automated Deployments
**Standard:** Automate the deployment process using a CI/CD pipeline.
**Do This:**
* Use a CI/CD tool (e.g., GitHub Actions, GitLab CI, CircleCI) to build, test, and deploy your application.
* Create a "fly.toml" file that defines your application's configuration.
* Use the "flyctl deploy" command to deploy your application.
* Integrate automated tests into your deployment pipeline.
**Don't Do This:** Manually deploy your application from your local machine. Skip testing before deploying.
**Why:** Automation reduces the risk of human error and allows you to deploy changes quickly and reliably.
**Example (GitHub Actions workflow):**
"""yaml
name: Deploy to Fly.io
on:
push:
branches:
- main
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: superfly/flyctl-actions/setup-flyctl@master
- run: flyctl deploy --remote-only
env:
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
"""
## 6. Local Development with Fly.io Resources
### 6.1 Use "flyctl dev console"
**Standard:** Use flyctl's dev console to streamline usage of Fly.io resources during local development.
**Do This:**
* Start a dev console.
* Utilize the dev console in conjunction with your other debugging, logging, and testing processes.
"""bash
flyctl dev console
"""
## 7. Fly.io Extensions: Community and Official
### 7.1 Utilizing Extensions
**Standard:** Explore and leverage available Fly.io extensions to enhance functionality and streamline development processes.
**Do This:**
* Browse the available extensions to see if any are helpful to your deployment.
* Use them with the "flyctl extensions install" command.
* Use only extensions you understand and trust.
**Don't Do This:** Blindly install extensions without fully understanding their functionality and security implications. Neglect to explore available extensions to streamline your deployment
"""bash
flyctl extensions install
"""
## 8. Data Persistence with Volumes
### 8.1 Volume Management
**Standard:** Create and manage persistent volumes to retain data across deployments and machine restarts.
**Do This:**
* Create a directory for holding all volume-persistence related code.
* Use the "flyctl volumes create" command for managing volumes.
* Properly configure the "fly.toml" file to mount these, following best practices like creating functions which automatically map to the volume in your app, instead of hardcoding volume names in your code.
* Ensure proper backup and restore strategies for data stored on volumes.
**Don't Do This:** Store critical data directly on the instance's local disk. Neglect to backup data stored on volumes.
"""bash
flyctl volumes create data_volume --size 1 #size in gb
"""
"""toml
[mounts]
source="data_volume"
destination="/data"
"""
## 9. Resource Optimization
### 9.1 Fly Machines Resource Allocation
**Standard:** Optimize resource allocation for Fly Machines to minimize costs and improve performance.
**Do This:**
* Right-size instances based on application requirements.
* Use autoscaling to dynamically adjust the number of running instances based on load.
* Monitor resource utilization and adjust instance sizes as needed.
**Don't Do This:** Over-provision resources. Neglect to monitor resource utilization.
**Why:** Proper resource allocation can significantly reduce costs and improve application performance.
**Example ("fly.toml" Autoscaling config):**
"""toml
[processes]
app = "python app.py"
[deploy]
auto_rollback = true
[http_service]
internal_port = 8080
min_machines_running = 1
processes = ["app"]
[http_service.concurrency]
hard_limit = 25
soft_limit = 20
type = "requests"
[http_service.checks]
interval = "15s"
timeout = "2s"
grace_period = "5s"
restart_limit=3
[scale]
# these are request-based, not CPU-based
min_machines = 1
max_machines = 4
"""
danielsogl
Created Mar 6, 2025
This guide explains how to effectively use .clinerules
with Cline, the AI-powered coding assistant.
The .clinerules
file is a powerful configuration file that helps Cline understand your project's requirements, coding standards, and constraints. When placed in your project's root directory, it automatically guides Cline's behavior and ensures consistency across your codebase.
Place the .clinerules
file in your project's root directory. Cline automatically detects and follows these rules for all files within the project.
# Project Overview project: name: 'Your Project Name' description: 'Brief project description' stack: - technology: 'Framework/Language' version: 'X.Y.Z' - technology: 'Database' version: 'X.Y.Z'
# Code Standards standards: style: - 'Use consistent indentation (2 spaces)' - 'Follow language-specific naming conventions' documentation: - 'Include JSDoc comments for all functions' - 'Maintain up-to-date README files' testing: - 'Write unit tests for all new features' - 'Maintain minimum 80% code coverage'
# Security Guidelines security: authentication: - 'Implement proper token validation' - 'Use environment variables for secrets' dataProtection: - 'Sanitize all user inputs' - 'Implement proper error handling'
Be Specific
Maintain Organization
Regular Updates
# Common Patterns Example patterns: components: - pattern: 'Use functional components by default' - pattern: 'Implement error boundaries for component trees' stateManagement: - pattern: 'Use React Query for server state' - pattern: 'Implement proper loading states'
Commit the Rules
.clinerules
in version controlTeam Collaboration
Rules Not Being Applied
Conflicting Rules
Performance Considerations
# Basic .clinerules Example project: name: 'Web Application' type: 'Next.js Frontend' standards: - 'Use TypeScript for all new code' - 'Follow React best practices' - 'Implement proper error handling' testing: unit: - 'Jest for unit tests' - 'React Testing Library for components' e2e: - 'Cypress for end-to-end testing' documentation: required: - 'README.md in each major directory' - 'JSDoc comments for public APIs' - 'Changelog updates for all changes'
# Advanced .clinerules Example project: name: 'Enterprise Application' compliance: - 'GDPR requirements' - 'WCAG 2.1 AA accessibility' architecture: patterns: - 'Clean Architecture principles' - 'Domain-Driven Design concepts' security: requirements: - 'OAuth 2.0 authentication' - 'Rate limiting on all APIs' - 'Input validation with Zod'
# Security Best Practices Standards for Fly.io This document outlines security best practices for developing and deploying applications on the Fly.io platform. Adhering to these standards will help protect your applications and data from common vulnerabilities and ensure a secure and reliable deployment. ## 1. Secure Configuration and Secrets Management ### 1.1. Secure Secrets Storage **Standard:** Never hardcode secrets directly in your application code, Dockerfiles, or configuration files. Use Fly.io's built-in secrets management. **Why:** Hardcoding secrets exposes them to anyone with access to your codebase or container images. Fly.io secrets are encrypted at rest and in transit, minimizing the risk of exposure. **Do This:** * Use "flyctl secrets" to manage secrets. """bash flyctl secrets set DATABASE_URL="postgres://user:password@host:port/database" flyctl secrets set API_KEY="your_super_secret_api_key" """ * Access secrets in your application code through environment variables. """python # Python example import os database_url = os.environ.get("DATABASE_URL") api_key = os.environ.get("API_KEY") if not database_url or not api_key: raise ValueError("Required secrets are not set.") # Use database_url and api_key to connect to your database and make API calls """ **Don't Do This:** * Hardcode secrets in your code: """python # Python example - BAD PRACTICE database_url = "postgres://user:password@host:port/database" api_key = "your_super_secret_api_key" """ * Store secrets in version control. * Expose secrets in logs. **Anti-Pattern:** Using ".env" files in production. While convenient for local development, they are not secure for production deployments and can easily be accidentally committed to source control or exposed. ### 1.2. Environment-Specific Configuration **Standard:** Separate configuration for development, staging, and production environments. **Why:** Using the same configuration across environments can lead to misconfiguration and security vulnerabilities. For example, using production API keys in a development environment could expose sensitive data. **Do This:** * Utilize Fly.io's built-in support for environment variables to specify configurations. * Use separate Fly.io apps for each environment (e.g., "myapp-dev", "myapp-staging", "myapp-prod"). * Create and manage environment-specific secrets using "flyctl secrets". """bash # Set secrets for the production app flyctl secrets set --app myapp-prod DATABASE_URL="..." API_KEY="..." # Set secrets for the staging app flyctl secrets set --app myapp-staging DATABASE_URL="..." API_KEY="..." """ **Don't Do This:** * Use the same secrets across all environments. * Rely on manual configuration changes between environments. **Code Example:** """toml # fly.toml - Example configuration for defining specific build arguments and env vars [build] builder = "dockerfile" # Pass in build-time variables that depend on target environment. # For example, NODE_ENV = "production" when building for production. build-target = "release" #example [env] PORT = "8080" [deploy] release_command = "/app/migrate_db" """ ### 1.3. Principle of Least Privilege **Standard:** Grant the minimum necessary privileges to users, applications, and services. **Why:** Limiting access reduces the potential impact of security breaches. If a compromised account or service has limited privileges, the attacker's ability to cause damage is significantly reduced. **Do This:** * Use Fly.io's RBAC (Role-Based Access Control) features documented here: (Fly.io currently offers limited RBAC). * Ensure applications running within VMs only have the permissions they need, using "USER" directives in Dockerfiles. * Configure firewall rules to restrict network access to only necessary ports and services. **Don't Do This:** * Run applications as root unless absolutely necessary. * Grant broad permissions to services or users without a specific justification. **Code Example (Dockerfile):** """dockerfile FROM ubuntu:latest # Update and install necessary packages RUN apt-get update && apt-get install -y --no-install-recommends \ python3 python3-pip # Create a non-root user RUN useradd -m -s /bin/bash appuser # Set the working directory WORKDIR /app # Copy application files COPY . . # Install Python dependencies RUN pip3 install -r requirements.txt --user # Change ownership of the application directory to the non-root user RUN chown -R appuser:appuser /app # Switch to the non-root user USER appuser # Command to run the application CMD ["python3", "app.py"] """ ### 1.4. Regular Security Audits and Updates **Standard:** Regularly review your application code, dependencies, and infrastructure for security vulnerabilities. Keep your software up-to-date with the latest security patches. **Why:** New vulnerabilities are discovered regularly. Staying up-to-date with security patches helps prevent exploits. Regular audits can identify potential vulnerabilities early. **Do This:** * Use automated vulnerability scanning tools (e.g., Snyk, Trivy) to scan your dependencies and container images. * Subscribe to security mailing lists and advisories for the technologies you use (e.g., Python, Node.js, PostgreSQL). * Regularly update your base images in your Dockerfiles. * Implement a process for reviewing and addressing security vulnerabilities promptly. **Don't Do This:** * Ignore security alerts or vulnerabilities. * Use outdated versions of software without security patches. **Code Example (using Snyk in a CI/CD pipeline):** """yaml # .github/workflows/security.yml - Example GitHub Actions workflow for running Snyk tests. name: Security Scan on: push: branches: [ main ] # or whatever your main branch is pull_request: branches: [ main ] jobs: snyk: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Run Snyk to check for vulnerabilities uses: snyk/actions/python@master # Or javascript etc, adjust as needed env: SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }} with: args: --file=requirements.txt --severity-threshold=high """ ## 2. Securing Network Communications ### 2.1. HTTPS for All Traffic **Standard:** Use HTTPS for all communication between clients and your Fly.io application. **Why:** HTTPS encrypts data in transit, preventing eavesdropping and man-in-the-middle attacks. **Do This:** * Allow fly.io to automatically provision TLS certificates for your application. Fly.io automatically provides free TLS certificates through Let's Encrypt. """bash flyctl certs show your-app-name.fly.dev """ * Ensure your application is configured to redirect HTTP traffic to HTTPS. **Don't Do This:** * Use plain HTTP for sensitive data. * Disable TLS encryption. **Code Example (configuring redirection in a web server):** """nginx # nginx configuration to redirect HTTP to HTTPS server { listen 80; server_name your-app-name.fly.dev; return 301 https://$host$request_uri; } server { listen 443 ssl; server_name your-app-name.fly.dev; # SSL certificate configuration ssl_certificate /etc/letsencrypt/live/your-app-name.fly.dev/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/your-app-name.fly.dev/privkey.pem; # ... other configurations ... } """ ### 2.2. Firewall Configuration **Standard:** Configure firewall rules (e.g., using iptables or UFW) to limit network access to only necessary ports and services. **Why:** Firewalls prevent unauthorized access to your application and reduce the attack surface. **Do This:** * Use Fly.io's private networking to isolate apps. * Use a tool like "ufw" to manage firewall rules inside of your VM. **Don't Do This:** * Leave unnecessary ports open to the public internet. * Disable the firewall. **Code Example (using "ufw" to allow only SSH and HTTP/HTTPS traffic):** """bash # Allow SSH access ufw allow OpenSSH # Allow HTTP traffic ufw allow 80 # Allow HTTPS traffic ufw allow 443 # Enable the firewall ufw enable # Check the firewall status ufw status """ ### 2.3. Mutual TLS (mTLS) **Standard:** Use mTLS for secure communication between services within your Fly.io private network. **Why:** mTLS provides strong authentication and encryption by requiring both the client and server to present valid certificates. **Do This:** * Generate client and server certificates using a tool like OpenSSL. * Configure your services to require client certificates during TLS handshakes. * Distribute client certificates securely. **Don't Do This:** * Use self-signed certificates in production without proper validation. * Store private keys in insecure locations. ### 2.4. Monitoring and Logging **Standard:** Implement comprehensive logging and monitoring to detect and respond to security incidents. **Why:** Logging and monitoring provide visibility into your application's behavior, allowing you to identify suspicious activity and security vulnerabilities. **Do This:** * Use a centralized logging system to collect logs from all your Fly.io applications and services (e.g., Grafana Loki). * Monitor key security metrics, such as authentication failures, API request rates, and error rates. **Don't Do This:** * Disable logging. * Store sensitive data in logs without proper redaction. * Ignore suspicious activity detected by monitoring systems. ## 3. Application Security ### 3.1. Input Validation and Output Encoding **Standard:** Validate all input data from clients and other services. Encode output data to prevent cross-site scripting (XSS) and other injection attacks. **Why:** Input validation prevents attackers from injecting malicious code or data into your application. Output encoding prevents injected code from being executed in the client's browser. **Do This:** * Use server-side validation to verify the format, type, and length of all input data. * Use a templating engine with automatic output encoding (e.g., Jinja2 for Python, Handlebars for JavaScript). **Don't Do This:** * Trust client-side validation alone. * Display raw user input without encoding. **Code Example (Python using Flask and Jinja2):** """python # Flask example with Jinja2 templating engine from flask import Flask, request, render_template import bleach app = Flask(__name__) @app.route('/', methods=['GET', 'POST']) def index(): if request.method == 'POST': # Validate the input name = request.form.get('name') if not name or len(name) > 100: return render_template('index.html', error='Invalid name') # Sanitize HTML input using bleach message = bleach.clean(request.form.get('message')) # Render the template with the sanitized message return render_template('index.html', name=name, message=message) return render_template('index.html') #index.html Jinja2 template <!DOCTYPE html> <html> <head> <title>Input Validation Example</title> </head> <body> {% if error %} <p style="color:red;">{{ error }}</p> {% endif %} <form method="post"> <label for="name">Name:</label><br> <input type="text" id="name" name="name"><br><br> <label for="message">Message:</label><br> <textarea id="message" name="message"></textarea><br><br> <input type="submit" value="Submit"> </form> {% if name and message %} <h2>Hello, {{ name }}!</h2> <p>Your message: {{ message }}</p> {% endif %} </body> </html> """ ### 3.2. Cross-Site Request Forgery (CSRF) Protection **Standard:** Implement CSRF protection to prevent attackers from forging requests on behalf of authenticated users. **Why:** CSRF attacks can allow attackers to perform unauthorized actions on behalf of logged-in users. **Do This:** * Use a CSRF token that is unique to each user session. * Include the CSRF token in all forms and AJAX requests. * Validate the CSRF token on the server before processing the request. **Don't Do This:** * Disable CSRF protection. * Use the same CSRF token for all users. **Code Example (Python using Flask and WTForms):** """python # Python using Flask and WTForms from flask import Flask, render_template, session, redirect, url_for from flask_wtf import FlaskForm, CSRFProtect from wtforms import StringField, SubmitField from wtforms.validators import DataRequired app = Flask(__name__) app.config['SECRET_KEY'] = 'your_secret_key' # Change this to a strong random key csrf = CSRFProtect(app) class MyForm(FlaskForm): name = StringField('Name', validators=[DataRequired()]) submit = SubmitField('Submit') @app.route('/', methods=['GET', 'POST']) def index(): form = MyForm() if form.validate_on_submit(): session['name'] = form.name.data return redirect(url_for('success')) return render_template('index.html', form=form) @app.route('/success') def success(): if 'name' in session: name = session['name'] return render_template('success.html', name=name) else: return redirect(url_for('index')) if __name__ == '__main__': app.run(debug=True) """ ### 3.3. Authentication and Authorization **Standard:** Implement strong authentication and authorization mechanisms to control access to your application. **Why:** Authentication verifies the identity of users, while authorization determines what resources they are allowed to access. **Do This:** * Use strong password policies (e.g., minimum length, complexity requirements). * Implement multi-factor authentication (MFA) for privileged accounts. * Use a role-based access control (RBAC) system to manage user permissions. * Store passwords securely using a strong hashing algorithm (e.g., bcrypt, Argon2). **Don't Do This:** * Store passwords in plain text. * Use weak or default passwords. * Grant excessive permissions to users. ### 3.4. Dependency Management **Standard:** Keep your application's dependencies up-to-date and use tools to detect and prevent vulnerable dependencies. **Why:** Vulnerabilities in dependencies can be exploited to compromise your application. **Do This:** * Use a dependency management tool (e.g., pip for Python, npm for Node.js) to manage your application's dependencies. * Regularly update your dependencies to the latest versions. * Use automated vulnerability scanning tools (e.g., Snyk, OWASP Dependency-Check). **Don't Do This:** * Use outdated dependencies without security patches. * Ignore security alerts from dependency scanning tools. ### 3.5. Error Handling and Logging **Standard:** Handle errors gracefully and log sufficient information to diagnose problems. **Why:** Proper error handling prevents sensitive information from being exposed to users. Logging provides valuable information for debugging and security incident response. **Do This:** * Implement a global error handler to catch unexpected exceptions. * Log errors with sufficient detail to identify the root cause. * Redact sensitive information (e.g., passwords, API keys) from logs. * Use structured logging to make logs easier to query and analyze. **Don't Do This:** * Expose stack traces or other sensitive information to users in error messages. * Log sensitive data in plain text. * Ignore errors or warnings. ## 4. Dockerfile and Image Security ### 4.1. Minimal Base Images **Standard:** Use minimal base images for your Docker containers to reduce the attack surface. **Why:** Smaller images contain fewer dependencies, reducing the number of potential vulnerabilities. **Do This:** * Use lightweight base images like Alpine Linux or distroless images. **Don't Do This:** * Use full-featured base images like Ubuntu or Debian unless necessary. **Code Example (using Alpine Linux as a base image):** """dockerfile FROM python:3.9-alpine # Install dependencies # Copy application files # Set the working directory # Command to run the application """ ### 4.2. Multi-Stage Builds **Standard:** Use multi-stage builds to separate build-time dependencies from runtime dependencies. **Why:** Multi-stage builds allow you to include build tools and dependencies in a temporary build environment, and then copy only the necessary artifacts to the final image. **Do This:** * Use separate "FROM" instructions for the build and runtime stages. * Copy only the necessary files and dependencies from the build stage to the runtime stage. **Don't Do This:** * Include unnecessary build tools or dependencies in the final image. **Code Example (using multi-stage build):** """dockerfile # Build Stage FROM golang:1.21 AS builder WORKDIR /app COPY go.mod go.sum ./ RUN go mod download COPY . ./ RUN go build -o /app/mybinary # Production Stage FROM alpine:latest WORKDIR /app COPY --from=builder /app/mybinary /app/mybinary CMD ["/app/mybinary"] """ ### 4.3. Image Scanning **Standard:** Scan your Docker images for vulnerabilities before deploying them to Fly.io. **Why:** Image scanning identifies potential vulnerabilities in your container images before they can be exploited. **Do This:** * Use a container image scanning tool (e.g., Trivy, Clair, Anchore). * Integrate image scanning into your CI/CD pipeline. * Address vulnerabilities identified by the scanner before deploying the image. This comprehensively describes Security Best Practices on Fly.io. Adherence will increase security for development teams and should be enforced in CI/CD.
# Component Design Standards for Fly.io This document outlines the component design standards for applications deployed on Fly.io. Adhering to these guidelines will promote maintainability, reusability, performance, and security in your Fly.io applications. ## 1. Introduction to Component Design in Fly.io Component design in Fly.io focuses on creating modular, independent, and reusable parts of an application that are easy to develop, test, and maintain. Given Fly.io's geographically distributed nature, well-designed components also contribute to improved latency and resilience. In this context, "component" is a logical grouping of functionalities, often corresponding to modules, classes, or services. * **Goal:** Build robust, scalable, and maintainable applications on Fly.io. * **Focus:** Modularity, reusability, performance, and security. ## 2. Architectural Considerations ### 2.1 Microservices vs. Monolith with Modules Fly.io supports both microservice and monolithic architectures (with a modular design). The choice depends on the application's complexity and scalability needs. * **Microservices:** Independent, deployable services communicating over the network. Suited for complex applications requiring independent scaling and fault isolation. * **Monolith with Modules:** A single application with clear module boundaries internally. Suitable for smaller applications or when operational overhead of microservices is a concern. **Do This:** * For large applications, decompose into loosely coupled microservices, each handling a specific domain. * For smaller projects, leverage a modular approach within a monolithic application. **Don't Do This:** * Create tightly coupled microservices that lead to a distributed monolith. * Build a monolithic application with no modularity, resulting in unmaintainable code. **Why:** Microservices offer better scalability and fault isolation, while modular monoliths simplify development and deployment for smaller applications. Proper modularity reduces dependencies which helps isolate deployment errors and simplifies development. **Example (Microservice):** """dockerfile # Dockerfile for a user service FROM python:3.11-slim-bookworm WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "user_service.py"] """ **Example (Monolith with Modules):** """python # app.py from user_module import User from product_module import Product # Use the modules user = User(name="John Doe") product = Product(name="Awesome Product") print(f"User: {user.name}, Product: {product.name}") """ ### 2.2 Location Awareness on Fly.io Fly.io's ability to run applications close to users means components should be designed with location awareness in mind. * **Data locality:** Store and process data in the region closest to the users. * **Regional deployments:** Deploy specific components to particular Fly.io regions. **Do This:** * Use Fly.io's region routing features to direct traffic to the nearest instance of a component. * Implement caching strategies to minimize cross-region data access. **Don't Do This:** * Assume all users are geographically close to a single server. * Ignore latency implications of cross-region data access. **Why:** Minimizing latency improves the user experience and reduces bandwidth costs. **Example (Fly.io Region Routing with "fly.toml"):** """toml app = "my-app" primary_region = "iad" # Initial region [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 [[http_service.route]] service = "my-app-eu" # Example: Send requests from Europe to europe VMs path = "/api/europe" [deploy] regions = ["iad", "fra", "syd"] # Regions used for deployment """ ### 2.3 Fault Tolerance & Resilience Fly.io's distributed nature requires components to be fault-tolerant. * **Replication:** Run multiple instances of each component across different regions. * **Circuit Breakers:** Implement circuit breaker pattern to prevent cascading failures. * **Health checks:** Use Fly.io's health checks to monitor component availability and automatically restart failed instances. **Do This:** * Configure health checks for all critical components in your "fly.toml". * Use retry mechanisms with exponential backoff for communication between components. * Implement circuit breakers to isolate failing components. **Don't Do This:** * Rely on a single instance of a component without redundancy. * Allow one failing component to bring down the entire application. **Why:** Redundancy and fault isolation ensures higher availability and a better user experience. **Example (Fly.io Health Check in "fly.toml"):** """toml app = "my-app" primary_region = "iad" [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 [http_service.checks] path = "/healthz" # endpoint of your healthcheck interval = "10s" timeout = "5s" """ ## 3. Coding Standards for Components ### 3.1 Single Responsibility Principle (SRP) Each component should have one, and only one, reason to change. **Do This:** * Design classes and modules with a clear, focused purpose. * Refactor large components into smaller, more manageable units. **Don't Do This:** * Create "god classes" or modules that handle multiple unrelated tasks. **Why:** Makes components easier to understand, test, and maintain. **Example (Python SRP):** """python # Good: Separate classes for User and Email class User: def __init__(self, name, email): self.name = name self.email = email class EmailService: def send_welcome_email(self, user): print(f"Sending welcome email to {user.email}") # Bad: User class handles both user data and email sending class UserWithEmail: def __init__(self, name, email): self.name = name self.email = email def send_welcome_email(self): #Violates SRP: User shouldn't handle email print(f"Sending welcome email to {self.email}") user = User("John Doe", "john@example.com") email_service = EmailService() email_service.send_welcome_email(user) """ ### 3.2 Open/Closed Principle (OCP) Components should be open for extension but closed for modification. **Do This:** * Use inheritance or composition to add new functionality without modifying existing code. * Favor interfaces and abstract classes to decouple components. **Don't Do This:** * Directly modify existing code to add new features, risking regressions. **Why:** Reduces the risk of introducing bugs when adding new features. **Example (Python OCP):** """python # Good: Using Strategy Pattern from abc import ABC, abstractmethod class PaymentStrategy(ABC): @abstractmethod def pay(self, amount): pass class CreditCardPayment(PaymentStrategy): def pay(self, amount): print(f"Paying {amount} with credit card") class PayPalPayment(PaymentStrategy): def pay(self, amount): print(f"Paying {amount} with PayPal") class ShoppingCart: def __init__(self, payment_strategy: PaymentStrategy): self.payment_strategy = payment_strategy def checkout(self, amount): self.payment_strategy.pay(amount) # Bad: Modifying the ShoppingCart class directly class ShoppingCartBad: def checkout(self, amount, payment_method): if payment_method == "credit_card": print(f"Paying {amount} with credit card") elif payment_method == "paypal": print(f"Paying {amount} with PayPal") else: print("Invalid payment method") cart = ShoppingCart(CreditCardPayment()) cart.checkout(100) """ ### 3.3 Liskov Substitution Principle (LSP) Subtypes must be substitutable for their base types without altering the correctness of the program. **Do This:** * Ensure that subclasses correctly implement the behavior of their base classes. * Avoid introducing unexpected side effects in subclasses. **Don't Do This:** * Create subclasses that violate the contract of their base classes. **Why:** Prevents unexpected behavior and ensures that polymorphism works correctly. **Example (violating Liskov Substitution ):** """python class Rectangle: def __init__(self, width, height): self.width = width self.height = height def set_width(self, width): self.width = width def set_height(self, height): self.height = height def area(self): return self.width * self.height class Square(Rectangle): #violates LSP as Square's invariant is width == height def __init__(self, size): super().__init__(size, size) def set_width(self, width): self.width = width self.height = width def set_height(self, height): self.width = height self.height = height def print_area(rectangle: Rectangle): rectangle.set_width(5) rectangle.set_height(4) print(rectangle.area()) rectangle = Rectangle(2, 3) print_area(rectangle) # Output: 20 square = Square(2) print_area(square) # Output: 16 (incorrect if we expect a standard rectangle behavior) """ In this example, the "Square" class violates LSP because setting the width or height also sets the other dimension, which is not the behavior expected of a generic "Rectangle". ### 3.4 Interface Segregation Principle (ISP) Clients should not be forced to depend upon interfaces that they do not use. **Do This:** * Create small, specific interfaces instead of large, general-purpose ones. * Refactor interfaces to separate unrelated methods. **Don't Do This:** * Force classes to implement methods they don't need. **Why:** Reduces dependencies and improves code flexibility. **Example (Python ISP):** """python # Good: Separate interfaces for different functionalities from abc import ABC, abstractmethod class Printer(ABC): @abstractmethod def print_document(self, document): pass class Scanner(ABC): @abstractmethod def scan_document(self, document): pass class Copier(ABC): @abstractmethod def copy_document(self, document): pass # Bad: One large interface with all functionalities mixed class MultiFunctionDevice(ABC): @abstractmethod def print_document(self, document): pass @abstractmethod def scan_document(self, document): pass @abstractmethod def copy_document(self, document): pass class SimplePrinter(Printer): def print_document(self, document): print(f"Printing {document}") class AllInOnePrinter(Printer, Scanner, Copier): def print_document(self, document): print(f"Printing {document}") def scan_document(self, document): print(f"Scanning {document}") def copy_document(self, document): print(f"Copying {document}") """ A client needing only printing should not depend on the "Scanner" or "Copier" methods. ### 3.5 Dependency Inversion Principle (DIP) High-level modules should not depend on low-level modules. Both should depend on abstractions. Abstractions should not depend on details. Details should depend on abstractions. **Do This:** * Use dependency injection to provide dependencies to components. * Program to interfaces rather than concrete implementations. **Don't Do This:** * Hardcode dependencies within components. **Why:** Increases code flexibility and testability. **Example (Python DIP):** """python # Good: Using dependency injection class Switchable: def turn_on(self): raise NotImplementedError def turn_off(self): raise NotImplementedError class LightBulb(Switchable): def turn_on(self): print("LightBulb: turned on...") def turn_off(self): print("LightBulb: turned off...") class ElectricPowerSwitch: def __init__(self, client: Switchable): self.client = client self.on = False def press(self): if self.on: self.client.turn_off() self.on = False else: self.client.turn_on() self.on = True # Bad: Hardcoded dependency class SwitchBad: def __init__(self): self.bulb = LightBulb() #Concrete dependency = Bad self.on = False def press(self): if self.on: self.bulb.turn_off() self.on = False else: self.bulb.turn_off() self.on = True bulb = LightBulb() switch = ElectricPowerSwitch(bulb) #Dependency Injection switch.press() switch.press() """ ## 4. Fly.io Specific Considerations ### 4.1 Using Fly.io Volumes Components that require persistent storage should leverage Fly.io Volumes. **Do This:** * Mount volumes to specific directories in your Fly.io instances. * Use volumes to store data that needs to persist across deployments. **Don't Do This:** * Store persistent data within the container's filesystem, risking data loss on restarts. **Why:** Volumes provide reliable and persistent storage for your applications. **Example (Fly.io Volume Configuration in "fly.toml"):** """toml app = "my-data-app" primary_region = "ord" [build] [deploy] release_command = "/app/migrate_db.sh" [[mounts]] source = "data_volume" # Existing volume name destination = "/data" # Where the volume is mounted [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 """ ### 4.2 Fly.io Secrets Management Securely manage sensitive information using Fly.io Secrets. **Do This:** * Store API keys, database passwords, and other sensitive data as Fly.io Secrets. * Access secrets in your code using environment variables. **Don't Do This:** * Hardcode secrets in your code or configuration files. * Commit secrets to your version control system. **Why:** Protects sensitive data and prevents unauthorized access. **Example (Accessing Fly.io Secret in Python):** """python import os database_password = os.environ.get("DATABASE_PASSWORD") # Use the password to connect to the database print(f"Connecting to database with password: {database_password}") """ ### 4.3 Fly.io Edge Network and Global Distribution Leverage Fly.io's edge network for improved performance. **Do This:** * Configure your services to take full advantage of the Fly.io global network. * Utilize region pinning when needing to ensure consistency as a trade-off. **Don't Do This:** * Ignore latency implications of not using Fly.io's global network effectively. **Why:** Reduced latency provides a better user experience ## 5. Component Communication ### 5.1 REST APIs Use REST APIs for synchronous communication between components. **Do This:** * Design REST APIs using standard HTTP methods and status codes. * Use a consistent API versioning strategy. * Implement proper authentication and authorization for API endpoints. **Don't Do This:** * Expose internal implementation details through the API. * Create overly complex or inconsistent APIs. **Why:** REST APIs are well-established and easy to understand, enabling interoperability ### 5.2 Message Queues (e.g. Redis, NATS) Use message queues for asynchronous communication between components. **Do This:** * Choose a message queue that fits your application's needs (e.g., Redis, RabbitMQ, NATS). * Design message formats that are easy to serialize and deserialize. * Implement error handling and retry mechanisms for message processing. **Don't Do This:** * Use message queues for synchronous operations that require immediate responses. * Create overly complex messaging topologies. **Why:** Message queues enable decoupling, asynchronous processing, and improved scalability. Fly.io makes it easy to deploy Redis and NATS in a colocated fashion. ### 5.3 gRPC Consider gRPC for high-performance communication between internal components. **Do This:** * Define gRPC services using Protocol Buffers. * Generate code for both client and server using gRPC tools. * Implement proper error handling and logging. **Don't Do This:** * Use gRPC for external APIs that need to be easily accessible to a wide range of clients. * Overcomplicate gRPC service definitions. **Why:** gRPC provides high performance, efficient serialization, and strong typing. It typically requires more sophistication than REST. ## 6. Testing ### 6.1 Unit Testing Write unit tests for all components to verify their functionality in isolation. **Do This:** * Use a testing framework appropriate for your language (e.g., pytest for Python, JUnit for Java). * Write tests that cover all possible code paths and edge cases. * Use mocks and stubs to isolate components from their dependencies. **Don't Do This:** * Skip unit testing or write tests that are too superficial. * Write tests that are tightly coupled to the implementation details of the tested components. **Why:** Unit tests ensure that components function correctly and prevent regressions. ### 6.2 Integration Testing Write integration tests to verify the interaction between different components. **Do This:** * Test the communication between components using real or simulated dependencies. * Verify that data is correctly passed between components and that the overall system behaves as expected. **Don't Do This:** * Skip integration testing or write tests that are too narrow in scope. * Rely solely on unit tests without verifying how components work together. **Why:** Integration tests ensure that components work together correctly. ### 6.3 End-to-End Testing Write end-to-end tests to verify the entire application flow from the user interface to the backend. **Do This:** * Use a testing framework that simulates user interactions (e.g., Selenium, Cypress). * Test the entire application flow from the user interface to the backend. * Verify that the application meets the user's requirements. **Don't Do This:** * Skip end-to-end testing or write tests that are too complex and brittle. * Rely solely on unit and integration tests without verifying the end-to-end user experience. **Why:** End-to-end tests ensure that the application meets the user's requirements and provides a good user experience. ## 7. Monitoring and Logging ### 7.1 Centralized Logging Use a centralized logging system to collect and analyze logs from all components. **Do This:** * Use a logging framework appropriate for your language (e.g., log4j for Java, logging for Python). * Configure components to log all important events, including errors, warnings, and informational messages. * Use a tool such as Grafana Loki or similar system for log aggregation. **Don't Do This:** * Skip logging or rely solely on local log files. * Log sensitive data such as passwords or API keys. **Why:** Centralized logging enables easier troubleshooting, performance monitoring, and security analysis. ### 7.2 Metrics Collection Collect metrics from all components to monitor their performance and resource usage. **Do This:** * Use a metrics library appropriate for your language (e.g., Prometheus client libraries). * Collect metrics such as CPU usage, memory usage, network traffic, and request latency. * Use a monitoring system such as Prometheus or Grafana to visualize and analyze metrics. **Don't Do This:** * Skip metrics collection or collect only a limited set of metrics. * Use metrics that are not meaningful or actionable. **Why:** Metrics provide valuable insights into the health and performance of your components. ### 7.3 Tracing Implement distributed tracing to track requests as they flow through different components. **Do This:** * Use a tracing library such as Jaeger or Zipkin. * Instrument code to generate spans for each request as it enters and exits a component. * Use a tracing backend to collect and visualize traces. **Don't Do This:** * Skip tracing or trace only a limited set of requests. * Create traces that are too granular or lack context. **Why:** Tracing enables you to identify performance bottlenecks and diagnose issues in distributed systems. Fly.io has solid support for well created tracing setups.
# Core Architecture Standards for Fly.io This document outlines the core architectural standards for developing applications on Fly.io. Adhering to these standards will result in more maintainable, performant, and secure applications. It focuses on principles and patterns particularly relevant to Fly.io's distributed, edge-based architecture. ## 1. Fundamental Architectural Patterns ### 1.1. Microservices Architecture **Standard:** Favor a microservices architecture for complex applications. * **Do This:** Decompose large monolithic applications into smaller, independent services with well-defined APIs. Each service should own its data. * **Don't Do This:** Create a single, monolithic codebase for large applications. **Why:** Microservices promote modularity, independent scaling, and faster development cycles. Each service can be deployed and scaled independently, which aligns perfectly with Fly.io's global distribution. **Fly.io Considerations:** * Use Fly.io Regions effectively. Deploy services to regions close to your users for low latency. * Utilize Fly.io's internal DNS for service discovery and communication. **Example:** """yaml # fly.toml for service A app = "service-a" primary_region = "iad" [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 # fly.toml for service B app = "service-b" primary_region = "lhr" [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 """ **Anti-pattern:** Tightly coupled microservices defeating independent deployment and scaling. ### 1.2. Event-Driven Architecture **Standard:** Employ event-driven architecture for asynchronous communication between services. * **Do This:** Use message queues (e.g., Kafka, RabbitMQ, Redis Streams) to decouple services and enable resilient communication. Apply the Saga pattern where necessary. * **Don't Do This:** Rely on synchronous HTTP calls for every inter-service communication. **Why:** Event-driven architectures enhance scalability and fault tolerance. Fly.io's globally distributed nature benefits from asynchronous communication, minimizing the impact of network latency and temporary outages. **Fly.io Considerations:** * Run message brokers as Fly.io apps, leveraging the global network for distribution. * Consider using Fly.io Volumes for persistent storage of message queues. **Example:** (Using Redis Streams) """python # Service A (producer) import redis import os redis_host = os.environ.get("REDIS_HOST", "redis") # Use FLY_APP_NAME or similar redis_port = int(os.environ.get("REDIS_PORT", 6379)) r = redis.Redis(host=redis_host, port=redis_port) stream_name = "user_events" def publish_event(user_id, event_type): r.xadd(stream_name, {"user_id": user_id, "event_type": event_type}) publish_event("123", "user_created") # Service B (consumer) import redis import os redis_host = os.environ.get("REDIS_HOST", "redis") # Use FLY_APP_NAME or similar redis_port = int(os.environ.get("REDIS_PORT", 6379)) r = redis.Redis(host=redis_host, port=redis_port) stream_name = "user_events" last_id = '$' # Start reading from the end for new messages while True: response = r.xread({stream_name: last_id}, block=1000) # Block for 1 second if response: stream, messages = response[0] for message_id, data in messages: print(f"Received event: {data}") last_id = message_id """ **Anti-pattern:** Implementing complex distributed transactions with synchronous calls across multiple services. ### 1.3. Serverless Functions **Standard:** Utilize serverless functions for event-driven tasks and processing tasks. * **Do This:** Employ serverless functions for asynchronous tasks, lightweight API endpoints, and event-driven triggers. * **Don't Do This:** Use serverless functions for long-running processes or stateful services. **Why:** Serverless functions scale automatically and only charge for actual usage, optimizing resource utilization. **Fly.io Considerations:** * While Fly.io doesn't purely offer serverless, consider using lightweight Fly Machines orchestrated via an external event source or using a framework designed for fast-scaling workloads on Fly.io. * Be mindful of cold starts in serverless environments, and optimize function execution time. **Example:** (Simulated serverless-style function with Fly Machines and Redis Queue) """python # Processing function (deployed as a Fly Machine) import redis import os import time redis_host = os.environ.get("REDIS_HOST", "redis") redis_port = int(os.environ.get("REDIS_PORT", 6379)) r = redis.Redis(host=redis_host, port=redis_port) queue_name = "processing_queue" def process_item(item): print(f"Processing item: {item}") time.sleep(2) # Simulate processing time print(f"Item processed: {item}") while True: item = r.blpop(queue_name, timeout=10) # Block until item is available if item: _, data = item item_data = data.decode('utf-8') process_item(item_data) # Enqueueing script (deployed as another Fly Machine or run externally) import redis import os redis_host = os.environ.get("REDIS_HOST", "redis") redis_port = int(os.environ.get("REDIS_PORT", 6379)) r = redis.Redis(host=redis_host, port=redis_port) queue_name = "processing_queue" for i in range(5): r.rpush(queue_name, f"Item-{i}") print(f"Enqueued Item-{i}") """ **Anti-pattern:** Using serverless functions for tasks that require significant persistent storage or are inherently stateful. ## 2. Project Structure and Organization ### 2.1. Monorepo vs. Polyrepo **Standard:** For most projects on Fly.io, especially those involving microservices, prefer a polyrepo structure unless there's a strong reason for a monorepo. * **Do This:** Keep each microservice in its own repository. * **Don't Do This:** Force all microservices into one giant monorepo without carefully considering dependencies and build pipelines. **Why:** Polyrepos offer better isolation between services, independent versioning, and clear ownership. This suits Fly.io's philosophy of independent deployments. **Fly.io Considerations:** * Each repository maps directly to a Fly.io app. * Use CI/CD pipelines to automate deployments from each repo to Fly.io. **Alternative:** If a monorepo is chosen (e.g. for shared libraries), proper tooling and processes are crucial. **Example:** * "repository: service-a" (maps to "app = "service-a"" in "fly.toml") * "repository: service-b" (maps to "app = "service-b"" in "fly.toml") **Anti-pattern:** Unnecessarily large monorepos creating complex build dependencies and slowing down deployments. ### 2.2. Standard Directory Structure **Standard:** Define a consistent directory structure within each service repository. * **Do This:** * "src/": Source code * "config/": Configuration files (including "fly.toml") * "tests/": Unit and integration tests * "deploy/": Deployment scripts and configurations * Versioning and Changelog: Keep consistent versioning across all services with frequent commits. * **Don't Do This:** Scatter files randomly throughout the repository without a clear organization. **Why:** A well-defined directory structure improves code discoverability and maintainability. **Example:** """ service-a/ ├── src/ │ ├── main.py │ ├── utils.py │ └── ... ├── config/ │ ├── fly.toml │ └── settings.py ├── tests/ │ ├── test_main.py │ └── ... ├── deploy/ │ └── Dockerfile └── README.md """ **Anti-pattern:** A flat or deeply nested directory structure that makes it difficult to locate specific files. ### 2.3. Configuration Management **Standard:** Externalize configuration using environment variables, and utilize Fly.io secrets for sensitive data. * **Do This:** Store configuration parameters in environment variables. Use "fly secrets" to manage sensitive information (API keys, database passwords). Utilize ".env" files for local development. * **Don't Do This:** Hardcode configuration values directly in your codebase, or commit sensitive data to your repository. **Why:** Externalized configuration enhances security and simplifies deployments across different environments. **Fly.io Considerations:** * Use "fly secrets" to set secrets that are securely injected into your Fly.io apps. * Use Fly Volumes for persistent storage if the configuration needs to be dynamically updated. **Example:** """bash # Setting a secret fly secrets set API_KEY="your_api_key" # Accessing the secret in your code (Python) import os api_key = os.environ.get("API_KEY") if api_key: print(f"API Key: {api_key}") else: print("API Key not found.") """ **Anti-pattern:** Storing passwords or API keys directly in the codebase or committing them to version control. ## 3. Deployment and CI/CD ### 3.1. Automated Deployments **Standard:** Implement automated CI/CD pipelines for deploying changes to Fly.io. * **Do This:** Use GitHub Actions, GitLab CI, or similar tools to trigger deployments on code changes. * **Don't Do This:** Manually deploy code changes to Fly.io (except maybe for initial setup/testing). **Why:** Automated deployments ensure consistency and reduce the risk of human error. **Fly.io Considerations:** * Use "flyctl deploy" CLI command in your CI/CD pipelines. * Leverage Fly.io's built-in health checks for zero-downtime deployments. **Example:** (GitHub Actions) """yaml # .github/workflows/deploy.yml name: Deploy to Fly.io on: push: branches: - main jobs: deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Deploy to Fly.io uses: fly-apps/flyctl-action@v1 with: fly_api_token: ${{ secrets.FLY_API_TOKEN }} """ **Anti-pattern:** Manual deployments that are error-prone and impossible to reproduce consistently. ### 3.2. Immutable Infrastructure **Standard:** Treat infrastructure as immutable. Deploy new versions of your application instead of modifying existing instances in place. * **Do This:** Use Docker containers and "flyctl deploy" to create new application instances. Utilize Fly Machines for fine-grained control. * **Don't Do This:** SSH into running instances and make manual changes. **Why:** Immutable infrastructure ensures consistency and simplifies rollback procedures. **Fly.io Considerations:** * Fly.io encourages immutable deployments using Docker images. * Rollbacks are easy and quick since previous instances are preserved. **Example:** """dockerfile # Dockerfile FROM python:3.9-slim-buster WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "main.py"] """ **Anti-pattern:** Modifying running instances directly, leading to configuration drift and inconsistencies. ### 3.3. Health Checks and Monitoring # **Standard:** Implement health checks and monitoring to detect and recover from failures. * **Do This:** Define health check endpoints in your applications. Use Fly.io's built-in health checks to automatically restart unhealthy instances. Monitor application metrics using Prometheus, Grafana, or similar tools. * **Don't Do This:** Rely solely on manual observation to identify and resolve issues. **Why:** Health checks and monitoring ensure that your application is running as expected and that problems are detected and addressed quickly. **Fly.io Considerations:** * Configure health checks in your "fly.toml" file. * Integrate with monitoring services to track application performance and resource utilization. **Example:** """toml # fly.toml [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 [[http_service.checks]] path = "/health" interval = "10s" timeout = "2s" grace_period = "5s" """ **Anti-pattern:** Lack of monitoring and manual intervention needed for even basic restarts. ## 4. Data Management and Persistence ### 4.1. Database Choice **Standard:** Choose the right database for your application's needs. * **Do This:** Consider PostgreSQL for relational data, Redis for caching and real-time data, and object storage for storing files. * **Don't Do This:** Use a single database for all use cases without considering performance and scalability requirements. **Why:** Choosing the right database improves performance and reduces complexity. **Fly.io Considerations:** * Fly.io offers managed PostgreSQL and Redis databases. * Use Fly.io Volumes for persistent storage of database data. **Example:** """toml # fly.toml for a PostgreSQL app app = "my-postgres-app" primary_region = "iad" [build] image = "postgres:14" [env] POSTGRES_PASSWORD = "your_password" POSTGRES_USER = "your_user" POSTGRES_DB = "your_db" """ **Anti-pattern:** Using a relational database for high-velocity, unstructured data, or forgetting to consider geographic data locality. ### 4.2. Data Locality and Replication **Standard:** Consider data locality and replication for optimal performance and availability. * **Do This:** Deploy your database close to your application servers. Use database replication to ensure data availability across different regions. Leverage Fly.io Regions. * **Don't Do This:** Store all data in a single region without considering latency or disaster recovery. **Why:** Data locality minimizes latency and improves application performance. Replication protects against data loss and ensures high availability. **Fly.io Considerations:** * Use Fly.io Regions to deploy your database and application servers in the same geographic location. * Configure database replication to replicate data across multiple regions. **Example:** Configure PostgreSQL replication across multiple Fly.io regions. (Requires setting up streaming replication outside the scope of this document). **Anti-pattern:** Fetching all data across the globe instead of creating regional read replicas. ### 4.3. Database Migrations **Standard:** Use database migrations to manage schema changes. * **Do This:** Use a database migration tool (e.g., Alembic, Flyway) to manage schema changes in a controlled and repeatable manner. * **Don't Do This:** Make manual schema changes directly in your database. **Why:** Database migrations ensure that schema changes are applied consistently across different environments and simplify rollback procedures. **Fly.io Considerations:** * Include database migrations as part of your CI/CD pipeline. * Use Fly.io Volumes to store migration scripts. ## 5. Security Best Practices ### 5.1. Least Privilege **Standard:** Follow the principle of least privilege. * **Do This:** Grant only the necessary permissions to users and services. Avoid using root or admin accounts unless absolutely necessary. Use environment-specific service accounts with limited scope. * **Don't Do This:** Grant excessive permissions that could be exploited by attackers. **Why:** The principle of least privilege limits the impact of security breaches. **Fly.io Considerations:** * Use Fly.io's built-in security features to restrict access to your applications and data. * Use environment variables to store credentials instead of hardcoding them in your code. ### 5.2. Input Validation and Output Encoding **Standard:** Validate all user inputs and encode outputs to prevent security vulnerabilities. * **Do This:** Use input validation to prevent SQL injection, cross-site scripting (XSS), and other attacks. Encode outputs to prevent XSS vulnerabilities when displaying user-generated content. * **Don't Do This:** Trust user input blindly or allow user-generated content to be displayed without proper encoding. **Why:** Input validation and output encoding prevent common security vulnerabilities. ### 5.3. Dependency Management **Standard:** Manage your application's dependencies carefully. * **Do This:** Use a dependency management tool (e.g., pip, npm, Maven) to track and manage your application's dependencies. Regularly update dependencies to patch security vulnerabilities. Scan dependencies for known vulnerabilities using tools like "npm audit" or "pip check". * **Don't Do This:** Use outdated or unmaintained dependencies. **Why:** Dependency management helps to prevent security vulnerabilities and ensures that your application is using the latest security patches. **Fly.io Considerations:** * Pin dependencies and use a lockfile to ensure repeatable deployments. * Regularly rebuild Docker images to update base images with security patches. ## 6. Performance Optimization ### 6.1. Caching **Standard:** Implement caching to improve application performance. * **Do This:** Use caching to store frequently accessed data in memory or on disk. Use a caching library (e.g., Redis, Memcached) to simplify caching. Leverage Fly.io Regions for geographically distributed caching. * **Don't Do This:** Cache sensitive data or data that changes frequently. **Why:** Caching reduces database load and improves response times. **Fly.io Considerations:** * Use Fly.io's Redis add-on for a managed Redis cache. * Configure HTTP caching headers to cache static assets on CDNs. ### 6.2. Connection Pooling **Standard:** Use connection pooling to reduce the overhead of creating database connections. * **Do This:** Use a connection pooling library to manage database connections efficiently. Configure the connection pool size based on your application's workload. * **Don't Do This:** Create a new database connection for every request. **Why:** Connection pooling reduces database load and improves response times. ### 6.3. Asynchronous Operations **Standard:** Use asynchronous operations to improve application responsiveness. * **Do This:** Use asynchronous tasks to perform long-running operations in the background. Use a task queue (e.g., Celery, RabbitMQ) to manage asynchronous tasks. * **Don't Do This:** Block the main thread with long-running operations. **Why:** Asynchronous operations improve application responsiveness and prevent the application from becoming unresponsive.
# State Management Standards for Fly.io This document outlines coding standards and best practices for managing application state within Fly.io applications. It provides guidance on data flow, reactivity, and state management options specific to the Fly.io environment. These standards aim to improve maintainability, performance, and scalability of your deployments. ## 1. Introduction to State Management on Fly.io Effective state management is crucial for building robust and scalable applications on Fly.io. Fly.io's distributed architecture presents unique challenges and opportunities for managing state, requiring careful consideration of data consistency, latency, and resilience. This guide covers approaches ranging from simple in-memory state to distributed databases and caching strategies. ## 2. Choosing the Right State Management Approach Selecting the appropriate state management solution is a critical architectural decision influenced by application requirements, data volume, consistency needs, and performance goals. ### 2.1. Factors to Consider * **Data Consistency:** Determine the required consistency level (e.g., eventual consistency, strong consistency). Strong consistency typically involves more complex setups and potential performance trade-offs, but is necessary for sensitive data. * **Data Volume:** Consider the volume of data that needs to be stored and managed. Small amounts of session data may be effectively handled in-memory, while large datasets require database solutions. * **Latency Requirements:** Analyze latency constraints based on the application's user experience needs. Caching and geographically distributed data stores can minimize latency. * **Scalability:** Choose solutions that can scale horizontally to handle increasing traffic and data volume. Stateless application components coupled with externalized, scalable state management solutions are generally preferred. * **Complexity:** Balance the need for sophisticated state management with the overhead of implementation and maintenance. Start with simpler solutions and only introduce more complex tools when necessary. ### 2.2. State Management Options * **In-Memory State:** Suitable for small amounts of ephemeral data (e.g., temporary UI state) or data that can be easily regenerated. Do not rely solely on in-memory state for critical data as instances can be terminated or restarted. * *When to Use:* Transient, non-critical UI state, caching frequently accessed but non-essential data. * *Example:* A simple counter application. * **Fly.io Volumes:** Persistent storage within a region. Volumes are attached to a single VM in one region. * *When to Use:* Stateful applications within a specific region that need persistent storage. Ideal for databases where regional locality is desired. Can be combined with replicated databases for higher availability. * *Example:* Postgres data directory on a dedicated VM within a region. * **Fly.io Postgres:** Fly.io-managed Postgres clusters distributed globally. Provides automated backups, scaling, and fault tolerance. Ideal for transactional data with standard SQL semantics. * *When to Use:* Applications requiring standard SQL functionality with ACID properties, automated backups and scaling. * *Example:* Storing user data, product catalogs, order information. * **Key-Value Stores (Redis, Memcached):** Fast, in-memory data stores suitable for caching and session management. Generally provide eventual consistency. * *When to Use:* Caching frequently accessed data, managing user sessions, rate limiting. * *Example:* Caching API responses, session data for authenticated users. * **Distributed Databases (CockroachDB, YugabyteDB):** Distributed SQL databases providing strong consistency, fault tolerance, and scalability. * *When to Use:* Applications requiring strong consistency, high availability, and global distribution of data. * *Example:* Financial transactions, inventory management, global user profiles. * **Object Storage (AWS S3, Google Cloud Storage):** Storing large unstructured data such as images, videos, and backups. * *When to Use:* Storing static assets, large media files, and backups. * *Example:* User-uploaded photos, video content, database backups. ## 3. State Management Standards The following standards apply to all state management solutions used within Fly.io applications. ### 3.1. General Principles * **Do This:** Externalize all persistent application state. Avoid storing critical data solely within application instances. * **Why:** Fly.io instances are ephemeral and can be restarted or relocated. Data stored only in-memory will be lost. * **Don't Do This:** Rely on local file storage within the VM for important data unless using Fly.io Volumes when region-specific affinity is satisfactory. * **Why:** Instance failures or relocation will result in data loss. * **Do This:** Favor stateless application components whenever possible. * **Why:** Simplifies scaling, deployment, and recovery in a distributed environment. ### 3.2. Configuration and Secrets * **Do This:** Store configuration and secrets using Fly.io's secrets management. * **Why:** Securely injects environment variables at runtime, avoiding hardcoding. * **Don't Do This:** Commit secrets directly to source code or include them in Docker images. * **Why:** Compromises security and violates best practices. * **Example:** Setting a database password as a Fly.io secret. """bash fly secrets set DATABASE_PASSWORD=your_secret_password """ Accessing it in the application (Node.js): """javascript const dbPassword = process.env.DATABASE_PASSWORD; """ ### 3.3. Database Connections * **Do This:** Use connection pooling to efficiently manage database connections. * **Why:** Reduces connection overhead and improves application performance by reusing existing connections. * **Do This:** Set appropriate connection timeouts to prevent resource exhaustion. * **Why:** Avoids connections being held open indefinitely, especially during network issues. * **Do This:** Use environment variables to configure database connection strings. * **Why:** Allows dynamic configuration based on the environment (development, staging, production). * **Example:** Connecting to Fly.io Postgres with connection pooling (Node.js with "pg"): """javascript const { Pool } = require('pg'); const pool = new Pool({ connectionString: process.env.DATABASE_URL, max: 20, // Max number of clients in the pool idleTimeoutMillis: 30000, // Close idle clients after 30 seconds connectionTimeoutMillis: 2000, // Return an error after 2 seconds if connection could not be established }); module.exports = { query: (text, params) => pool.query(text, params), }; // Example usage: async function fetchData() { const { rows } = await pool.query('SELECT NOW()'); console.log(rows[0]); } """ ### 3.4. Caching * **Do This:** Implement caching for frequently accessed data to reduce database load and improve response times. * **Why:** Caching minimizes latency and improves application performance by serving data from memory. * **Do This:** Use appropriate cache invalidation strategies to ensure data consistency. * **Why:** Avoid serving stale data to users. Implement time-based expiration (TTL) or event-based invalidation. * **Do This:** Consider using a distributed cache like Redis or Memcached for shared caching across multiple application instances. * **Why:** Provides a centralized cache that can be accessed by all instances. * **Example:** Using Redis for caching (Node.js with "ioredis"): """javascript const Redis = require('ioredis'); const redis = new Redis(process.env.REDIS_URL); // Connect to Redis async function getCachedData(key, fetchData) { const cachedData = await redis.get(key); if (cachedData) { return JSON.parse(cachedData); } const data = await fetchData(); // Fetch data from source await redis.set(key, JSON.stringify(data), 'EX', 3600); // Cache for 1 hour (3600 seconds) return data; } // Example usage: async function fetchUserData() { // Logic to fetch user data from the database return { id: 123, name: 'John Doe' }; } async function getUser(userId) { const cacheKey = "user:${userId}"; const userData = await getCachedData(cacheKey, fetchUserData); console.log(userData); } """ ### 3.5. Session Management * **Do This:** Store session data in a reliable external data store (e.g., Redis, database). * **Why:** Ensures session persistence across instance restarts and scaling events. * **Do This:** Use secure session cookies with appropriate attributes (e.g., "HttpOnly", "Secure", "SameSite"). * **Why:** Enhances security by preventing cross-site scripting (XSS) and cross-site request forgery (CSRF) attacks. * **Do This:** Implement session expiration and regular session cleanup to prevent resource exhaustion. * **Why:** Prevents accumulation of orphaned session data. * **Example:** Express.js session configuration using Redis (Node.js with "connect-redis" and "express-session"): """javascript const session = require('express-session'); const RedisStore = require('connect-redis').default; const Redis = require('ioredis'); const redisClient = new Redis(process.env.REDIS_URL); app.use(session({ store: new RedisStore({ client: redisClient }), secret: process.env.SESSION_SECRET, resave: false, saveUninitialized: false, cookie: { secure: process.env.NODE_ENV === 'production', // Only send over HTTPS in production httpOnly: true, // Prevent client-side JavaScript access sameSite: 'strict', // Prevent CSRF attacks maxAge: 24 * 60 * 60 * 1000, // Session expires after 24 hours } })); """ ### 3.6. Data Replication and Distribution * **Do This:** Consider using data replication or distribution strategies to improve availability and reduce latency for geographically distributed users. * **Why:** Provides redundancy and faster access to data by placing it closer to users. * **Do This:** Use caution regarding eventual consistency. Always handle conflict resolution and data reconciliation properly. * **Fly.io Postgres:** Use multi-region Postgres clusters for automatic data replication and failover. ### 3.7. Monitoring and Logging * **Do This:** Implement comprehensive monitoring and logging to track state management performance and identify potential issues. * **Why:** Allows proactive identification and resolution of problems. * **Do This:** Log relevant state transitions and errors to facilitate debugging. * **Why:** Provides insight into application behavior and helps diagnose root causes of issues. * **Do This:** Monitor database connection pool usage, cache hit rates, and other key metrics. * **Why:** Provides early warnings of performance bottlenecks or resource exhaustion. ## 4. Technology-Specific State Management ### 4.1. Remix Remix handles data loading and mutations through Actions and Loaders. Leverage this mechanism for Fly.io specific considerations. * **Do This:** Use "getSession" and "commitSession" for managing user sessions backed by a database or Redis. """javascript // Session management example using Remix: import { createCookieSessionStorage } from "@remix-run/node"; // or cloudflare/deno const { getSession, commitSession, destroySession } = createCookieSessionStorage({ cookie: { name: "__session", httpOnly: true, path: "/", sameSite: "lax", secrets: ["s3cret"], secure: process.env.NODE_ENV === "production", }, }); export { getSession, commitSession, destroySession }; """ * **Do This:** For Remix applications, consider using Fly.io Volumes for persistent storage where regional performance is desired. * **Don't Do This:** Avoid directly manipulating localStorage or sessionStorage for critical application state within Remix, as this data is client-side only and is not persisted across different devices and browsers.. ### 4.2. Next.js Next.js offers various options for state management ranging from built-in solutions to third-party libraries. * **Do This:** For global state, utilize Context API with "useReducer" or state management libraries like Zustand or Jotai. These integrate well with Server Components and provide efficient updates. * **Do This:** If you are using Next.js App Router, consider using Server Actions for data mutations, which allow you to execute server-side code directly from your components. Data persistence should still be handled with external databases or storage solutions. """javascript // Example Server Action for submitting a form 'use server' export async function createInvoice(formData: FormData) { const rawFormData = { customerId: formData.get('customerId'), amount: formData.get('amount'), status: formData.get('status'), }; // Persist the data to a database await createInvoiceInDb(rawFormData); // Replace with your DB persistence logic revalidatePath('/dashboard/invoices'); // Optional: Revalidate cache automatically after mutation redirect('/dashboard/invoices'); // Optional: Redirect user to another page } // In your component import { createInvoice } from './actions'; import { useFormState } from 'react-dom' export default function Page() { const [state, dispatch] = useFormState(createInvoice, null); return ( <form action={dispatch}> {/* Form fields */} <button type="submit">Create Invoice</button> </form> ); } """ * **Don't Do This:** Rely exclusively on "getServerSideProps" for handling all dynamic data, especially if the data isn't truly required for initial page render. This can negatively impact performance. ### 4.3. General State Management Libraries (Redux, Zustand, Jotai) * **Do This:** Centralize state updates with reducers or update functions. * **Do This:** Use asynchronous actions or middleware (e.g., Redux Thunk, Redux Saga) for handling data fetching and other side effects. * **Do This:** Optimize state updates to prevent unnecessary re-renders. Use selectors or memoization techniques to derive state from the global store. ## 5. Anti-Patterns * **Over-Reliance on Global State:** Avoid storing unnecessary data in global state, which can lead to performance issues and make debugging difficult. * **Ignoring Concurrency Issues:** Be mindful of concurrency issues when updating shared state, especially in a distributed environment. Use appropriate locking mechanisms or optimistic concurrency control. * **Lack of Monitoring:** Failing to monitor state management performance can lead to undetected issues and performance bottlenecks. ## 6. Optimizing for Fly.io's Architecture Fly.io offers a globally distributed platform, allowing you to place your application instances close to your users. This can significantly reduce latency, but requires careful consideration of data locality and consistency. * **Regional Data Affinity:** Consider the implications of placing data within a specific region. Data stored on a Fly.io Volume is tied to that region. This is useful when data is primarily accessed by users in that region, but can increase latency for users accessing data from other regions. * **Global Data Replication:** For data that needs to be accessed globally with low latency, consider using Fly.io Postgres with multi-region replication or a globally distributed database like CockroachDB or YugabyteDB. * **Caching Strategies:** Use a tiered caching approach to minimize latency. Cache frequently accessed data close to the user using client-side caching (e.g., browser cache, service worker) or edge caching (e.g., Fly.io CDN). For shared data, use a distributed cache like Redis. ## 7. Conclusion By following these coding standards, you can build robust, scalable, and maintainable applications on Fly.io. Choosing the right state management solution and following best practices for configuration, caching, session management, and monitoring will significantly improve the performance, reliability, and security of your deployments. Always consider the specific requirements of your application and the unique characteristics of the Fly.io environment when making state management decisions.
# Performance Optimization Standards for Fly.io This document outlines the coding standards focused on performance optimization for applications deployed on Fly.io. Adhering to these standards will lead to faster, more responsive, and resource-efficient applications. These standards are tailored for the latest version of Fly.io and incorporate modern approaches for optimal performance within the Fly.io ecosystem. ## 1. Architectural Considerations for Performance ### 1.1. Region Selection and Geographic Distribution **Standards:** * **Do This:** Deploy your application to multiple regions closest to your users. Use Fly.io's built-in support for global deployments to minimize latency. * **Don't Do This:** Deploy only to a single region, especially if your user base is geographically distributed. **Why:** Reduces latency by serving users from the nearest available region. Improves availability by distributing load across multiple regions. **Code Example (fly.toml):** """toml app = "my-fly-app" primary_region = "iad" # Initial region [regions] [[regions.group]] codes = ["iad", "lhr", "syd"] #Expand reach source = "primary" console_command = "/app/bin/my-fly-app migrate" [build] [deploy] release_command = "/app/bin/my-fly-app migrate" strategy = "rolling" [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 processes = ["app"] [[http_service.ports]] port = 80 handlers = ["http"] [[http_service.ports]] port = 443 handlers = ["tls", "http"] [experimental] allowed_public_ports = [] [[services]] protocol = "tcp" internal_port = 8080 processes = ["app"] [[services.ports]] port = 80 handlers = ["http"] [[services.ports]] port = 443 handlers = ["tls", "http"] """ **Anti-Pattern:** Hardcoding region-specific logic into the application code. Use Fly.io's configuration and routing features instead. ### 1.2. Database Proximity **Standards:** * **Do This:** Locate your database (e.g., Postgres, Redis) in the same region as your application servers whenever possible to minimize network latency. Consider using Fly.io's managed Postgres or Redis services. * **Don't Do This:** Access a database across regions unless absolutely necessary. **Why:** Reduces latency for database queries, improving overall application responsiveness. **Code Example (Connecting to Fly.io Postgres):** """python import psycopg2 import os # Fetch database credentials from environment variables db_host = os.environ.get("FLY_POSTGRES_FQDN") db_name = os.environ.get("PGDATABASE") db_user = os.environ.get("PGUSER") db_password = 'your_db_password' # Better to get this from a secret try: conn = psycopg2.connect( host=db_host, database=db_name, user=db_user, password=db_password, port=5432 # Usually 5432 for PostgreSQL ) print("Database connection successful") cur = conn.cursor() cur.execute("SELECT version();") db_version = cur.fetchone() print(f"PostgreSQL version: {db_version}") cur.close() conn.close() except psycopg2.Error as e: print(f"Error connecting to database: {e}") """ **Anti-Pattern:** Ignoring database latency. Profile database queries to identify and optimize slow operations. ### 1.3. Caching Strategies **Standards:** * **Do This:** Implement caching at multiple levels: browser, CDN (using Fly.io's global edge network), application server (in-memory), and database (query caching). Use appropriate cache invalidation strategies. Implement HTTP caching headers (e.g., "Cache-Control", "Expires"). * **Don't Do This:** Rely solely on database caching. Cache frequently accessed data closer to the user. **Why:** Reduces load on application servers and databases, resulting in faster response times and lower resource utilization. **Code Example (HTTP Caching with Flask):** """python from flask import Flask, make_response app = Flask(__name__) @app.route('/') def index(): response = make_response("<h1>Hello, World!</h1>") response.headers['Cache-Control'] = 'public, max-age=3600' # Cache for 1 hour return response if __name__ == '__main__': app.run(debug=True) """ **Anti-Pattern:** Aggressively caching dynamic content. Use appropriate cache invalidation techniques when data changes. ### 1.4. Connection Pooling **Standards:** * **Do This:** Use connection pooling for database connections to reduce the overhead of establishing new connections for each request. * **Don't Do This:** Create a new database connection for every request, especially under high load. **Why:** Reduces database load and improves application response time by reusing existing connections. **Code Example (Connection Pooling with SQLAlchemy):** """python from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker import os db_host = os.environ.get("FLY_POSTGRES_FQDN") db_name = os.environ.get("PGDATABASE") db_user = os.environ.get("PGUSER") db_password = 'your_db_password' # get this from a secrets manager! # Database URL (adjust username, password, host, and database name) db_url = f"postgresql://{db_user}:{db_password}@{db_host}/{db_name}" # Create a database engine with connection pooling engine = create_engine(db_url, pool_size=5, max_overflow=10) # Adjust pool_size and max_overflow # Create a session factory Session = sessionmaker(bind=engine) # Example Usage: def get_data_from_db(): session = Session() try: # Perform database operations using the session # Example: # results = session.query(MyTable).all() print("Querying the DB... Replace with your actual query here") except Exception as e: print(f"Error during database operation: {e}") finally: session.close() # Always close the session! if __name__ == '__main__': get_data_from_db() """ **Anti-Pattern:** Setting the connection pool size too small or too large. Tune based on application load and database capacity. ## 2. Code-Level Optimizations ### 2.1. Efficient Data Structures and Algorithms **Standards:** * **Do This:** Choose appropriate data structures (e.g., dictionaries, sets) and algorithms (e.g., sorting algorithms, search algorithms) for the specific task. Optimize for time and space complexity appropriately. * **Don't Do This:** Use inefficient data structures or algorithms that lead to slow execution or high memory consumption. **Why:** Improves application performance by minimizing resource usage and execution time. **Code Example (Using Sets for Efficient Membership Testing):** """python my_list = [1, 2, 3, 4, 5] #Original Data my_set = set(my_list) # Convert to Set #Checking for membership is much faster in sets, if you only need this functionality if 3 in my_set: print("3 exists in my_set") if 6 in my_set: print("6 exists in my_set") else : print("6 does not exist in my_set") """ **Anti-Pattern:** Linear search on large, unsorted lists. Consider using binary search or hash tables. ### 2.2. Asynchronous Operations **Standards:** * **Do This:** Use asynchronous operations (e.g., async/await in Python, Promises in JavaScript) for I/O-bound tasks such as network requests, file I/O, and database queries to avoid blocking the main thread. * **Don't Do This:** Perform blocking I/O operations on the main thread. **Why:** Prevents blocking the event loop, allowing the application to handle more requests concurrently. Improves responsiveness and throughput. **Code Example (Asynchronous HTTP Request with Python aiohttp):** """python import asyncio import aiohttp async def fetch_data(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.text() async def main(): data = await fetch_data('https://example.com') print(data[:100]) # Print the first 100 characters if __name__ == '__main__': asyncio.run(main()) """ **Anti-Pattern:** Mixing synchronous and asynchronous code without proper thread management. Use appropriate executors or thread pools. ### 2.3. Resource Management **Standards:** * **Do This:** Explicitly release resources such as file handles, database connections, and memory as soon as they are no longer needed. Use "try...finally" blocks or context managers ("with" statement in Python) to ensure proper resource cleanup. Utilize Fly.io's autoscaling to efficiently use resources. Consider autoscaling to zero during off-peak hours. * **Don't Do This:** Leak resources, which can lead to memory exhaustion or other performance problems. **Why:** Prevents resource leaks, ensuring efficient utilization of system resources. Improves application stability and scalability. **Code Example (Using "with" Statement for File Handling):** """python try: with open('my_file.txt', 'r') as f: data = f.read() print(data) except FileNotFoundError: