# Core Architecture Standards for Fly.io
This document outlines the core architectural standards for developing applications on Fly.io. Adhering to these standards will result in more maintainable, performant, and secure applications. It focuses on principles and patterns particularly relevant to Fly.io's distributed, edge-based architecture.
## 1. Fundamental Architectural Patterns
### 1.1. Microservices Architecture
**Standard:** Favor a microservices architecture for complex applications.
* **Do This:** Decompose large monolithic applications into smaller, independent services with well-defined APIs. Each service should own its data.
* **Don't Do This:** Create a single, monolithic codebase for large applications.
**Why:** Microservices promote modularity, independent scaling, and faster development cycles. Each service can be deployed and scaled independently, which aligns perfectly with Fly.io's global distribution.
**Fly.io Considerations:**
* Use Fly.io Regions effectively. Deploy services to regions close to your users for low latency.
* Utilize Fly.io's internal DNS for service discovery and communication.
**Example:**
"""yaml
# fly.toml for service A
app = "service-a"
primary_region = "iad"
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
# fly.toml for service B
app = "service-b"
primary_region = "lhr"
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
"""
**Anti-pattern:** Tightly coupled microservices defeating independent deployment and scaling.
### 1.2. Event-Driven Architecture
**Standard:** Employ event-driven architecture for asynchronous communication between services.
* **Do This:** Use message queues (e.g., Kafka, RabbitMQ, Redis Streams) to decouple services and enable resilient communication. Apply the Saga pattern where necessary.
* **Don't Do This:** Rely on synchronous HTTP calls for every inter-service communication.
**Why:** Event-driven architectures enhance scalability and fault tolerance. Fly.io's globally distributed nature benefits from asynchronous communication, minimizing the impact of network latency and temporary outages.
**Fly.io Considerations:**
* Run message brokers as Fly.io apps, leveraging the global network for distribution.
* Consider using Fly.io Volumes for persistent storage of message queues.
**Example:** (Using Redis Streams)
"""python
# Service A (producer)
import redis
import os
redis_host = os.environ.get("REDIS_HOST", "redis") # Use FLY_APP_NAME or similar
redis_port = int(os.environ.get("REDIS_PORT", 6379))
r = redis.Redis(host=redis_host, port=redis_port)
stream_name = "user_events"
def publish_event(user_id, event_type):
r.xadd(stream_name, {"user_id": user_id, "event_type": event_type})
publish_event("123", "user_created")
# Service B (consumer)
import redis
import os
redis_host = os.environ.get("REDIS_HOST", "redis") # Use FLY_APP_NAME or similar
redis_port = int(os.environ.get("REDIS_PORT", 6379))
r = redis.Redis(host=redis_host, port=redis_port)
stream_name = "user_events"
last_id = '$' # Start reading from the end for new messages
while True:
response = r.xread({stream_name: last_id}, block=1000) # Block for 1 second
if response:
stream, messages = response[0]
for message_id, data in messages:
print(f"Received event: {data}")
last_id = message_id
"""
**Anti-pattern:** Implementing complex distributed transactions with synchronous calls across multiple services.
### 1.3. Serverless Functions
**Standard:** Utilize serverless functions for event-driven tasks and processing tasks.
* **Do This:** Employ serverless functions for asynchronous tasks, lightweight API endpoints, and event-driven triggers.
* **Don't Do This:** Use serverless functions for long-running processes or stateful services.
**Why:** Serverless functions scale automatically and only charge for actual usage, optimizing resource utilization.
**Fly.io Considerations:**
* While Fly.io doesn't purely offer serverless, consider using lightweight Fly Machines orchestrated via an external event source or using a framework designed for fast-scaling workloads on Fly.io.
* Be mindful of cold starts in serverless environments, and optimize function execution time.
**Example:** (Simulated serverless-style function with Fly Machines and Redis Queue)
"""python
# Processing function (deployed as a Fly Machine)
import redis
import os
import time
redis_host = os.environ.get("REDIS_HOST", "redis")
redis_port = int(os.environ.get("REDIS_PORT", 6379))
r = redis.Redis(host=redis_host, port=redis_port)
queue_name = "processing_queue"
def process_item(item):
print(f"Processing item: {item}")
time.sleep(2) # Simulate processing time
print(f"Item processed: {item}")
while True:
item = r.blpop(queue_name, timeout=10) # Block until item is available
if item:
_, data = item
item_data = data.decode('utf-8')
process_item(item_data)
# Enqueueing script (deployed as another Fly Machine or run externally)
import redis
import os
redis_host = os.environ.get("REDIS_HOST", "redis")
redis_port = int(os.environ.get("REDIS_PORT", 6379))
r = redis.Redis(host=redis_host, port=redis_port)
queue_name = "processing_queue"
for i in range(5):
r.rpush(queue_name, f"Item-{i}")
print(f"Enqueued Item-{i}")
"""
**Anti-pattern:** Using serverless functions for tasks that require significant persistent storage or are inherently stateful.
## 2. Project Structure and Organization
### 2.1. Monorepo vs. Polyrepo
**Standard:** For most projects on Fly.io, especially those involving microservices, prefer a polyrepo structure unless there's a strong reason for a monorepo.
* **Do This:** Keep each microservice in its own repository.
* **Don't Do This:** Force all microservices into one giant monorepo without carefully considering dependencies and build pipelines.
**Why:** Polyrepos offer better isolation between services, independent versioning, and clear ownership. This suits Fly.io's philosophy of independent deployments.
**Fly.io Considerations:**
* Each repository maps directly to a Fly.io app.
* Use CI/CD pipelines to automate deployments from each repo to Fly.io.
**Alternative:** If a monorepo is chosen (e.g. for shared libraries), proper tooling and processes are crucial.
**Example:**
* "repository: service-a" (maps to "app = "service-a"" in "fly.toml")
* "repository: service-b" (maps to "app = "service-b"" in "fly.toml")
**Anti-pattern:** Unnecessarily large monorepos creating complex build dependencies and slowing down deployments.
### 2.2. Standard Directory Structure
**Standard:** Define a consistent directory structure within each service repository.
* **Do This:**
* "src/": Source code
* "config/": Configuration files (including "fly.toml")
* "tests/": Unit and integration tests
* "deploy/": Deployment scripts and configurations
* Versioning and Changelog: Keep consistent versioning across all services with frequent commits.
* **Don't Do This:** Scatter files randomly throughout the repository without a clear organization.
**Why:** A well-defined directory structure improves code discoverability and maintainability.
**Example:**
"""
service-a/
├── src/
│ ├── main.py
│ ├── utils.py
│ └── ...
├── config/
│ ├── fly.toml
│ └── settings.py
├── tests/
│ ├── test_main.py
│ └── ...
├── deploy/
│ └── Dockerfile
└── README.md
"""
**Anti-pattern:** A flat or deeply nested directory structure that makes it difficult to locate specific files.
### 2.3. Configuration Management
**Standard:** Externalize configuration using environment variables, and utilize Fly.io secrets for sensitive data.
* **Do This:** Store configuration parameters in environment variables. Use "fly secrets" to manage sensitive information (API keys, database passwords). Utilize ".env" files for local development.
* **Don't Do This:** Hardcode configuration values directly in your codebase, or commit sensitive data to your repository.
**Why:** Externalized configuration enhances security and simplifies deployments across different environments.
**Fly.io Considerations:**
* Use "fly secrets" to set secrets that are securely injected into your Fly.io apps.
* Use Fly Volumes for persistent storage if the configuration needs to be dynamically updated.
**Example:**
"""bash
# Setting a secret
fly secrets set API_KEY="your_api_key"
# Accessing the secret in your code (Python)
import os
api_key = os.environ.get("API_KEY")
if api_key:
print(f"API Key: {api_key}")
else:
print("API Key not found.")
"""
**Anti-pattern:** Storing passwords or API keys directly in the codebase or committing them to version control.
## 3. Deployment and CI/CD
### 3.1. Automated Deployments
**Standard:** Implement automated CI/CD pipelines for deploying changes to Fly.io.
* **Do This:** Use GitHub Actions, GitLab CI, or similar tools to trigger deployments on code changes.
* **Don't Do This:** Manually deploy code changes to Fly.io (except maybe for initial setup/testing).
**Why:** Automated deployments ensure consistency and reduce the risk of human error.
**Fly.io Considerations:**
* Use "flyctl deploy" CLI command in your CI/CD pipelines.
* Leverage Fly.io's built-in health checks for zero-downtime deployments.
**Example:** (GitHub Actions)
"""yaml
# .github/workflows/deploy.yml
name: Deploy to Fly.io
on:
push:
branches:
- main
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Deploy to Fly.io
uses: fly-apps/flyctl-action@v1
with:
fly_api_token: ${{ secrets.FLY_API_TOKEN }}
"""
**Anti-pattern:** Manual deployments that are error-prone and impossible to reproduce consistently.
### 3.2. Immutable Infrastructure
**Standard:** Treat infrastructure as immutable. Deploy new versions of your application instead of modifying existing instances in place.
* **Do This:** Use Docker containers and "flyctl deploy" to create new application instances. Utilize Fly Machines for fine-grained control.
* **Don't Do This:** SSH into running instances and make manual changes.
**Why:** Immutable infrastructure ensures consistency and simplifies rollback procedures.
**Fly.io Considerations:**
* Fly.io encourages immutable deployments using Docker images.
* Rollbacks are easy and quick since previous instances are preserved.
**Example:**
"""dockerfile
# Dockerfile
FROM python:3.9-slim-buster
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "main.py"]
"""
**Anti-pattern:** Modifying running instances directly, leading to configuration drift and inconsistencies.
### 3.3. Health Checks and Monitoring #
**Standard:** Implement health checks and monitoring to detect and recover from failures.
* **Do This:** Define health check endpoints in your applications. Use Fly.io's built-in health checks to automatically restart unhealthy instances. Monitor application metrics using Prometheus, Grafana, or similar tools.
* **Don't Do This:** Rely solely on manual observation to identify and resolve issues.
**Why:** Health checks and monitoring ensure that your application is running as expected and that problems are detected and addressed quickly.
**Fly.io Considerations:**
* Configure health checks in your "fly.toml" file.
* Integrate with monitoring services to track application performance and resource utilization.
**Example:**
"""toml
# fly.toml
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
[[http_service.checks]]
path = "/health"
interval = "10s"
timeout = "2s"
grace_period = "5s"
"""
**Anti-pattern:** Lack of monitoring and manual intervention needed for even basic restarts.
## 4. Data Management and Persistence
### 4.1. Database Choice
**Standard:** Choose the right database for your application's needs.
* **Do This:** Consider PostgreSQL for relational data, Redis for caching and real-time data, and object storage for storing files.
* **Don't Do This:** Use a single database for all use cases without considering performance and scalability requirements.
**Why:** Choosing the right database improves performance and reduces complexity.
**Fly.io Considerations:**
* Fly.io offers managed PostgreSQL and Redis databases.
* Use Fly.io Volumes for persistent storage of database data.
**Example:**
"""toml
# fly.toml for a PostgreSQL app
app = "my-postgres-app"
primary_region = "iad"
[build]
image = "postgres:14"
[env]
POSTGRES_PASSWORD = "your_password"
POSTGRES_USER = "your_user"
POSTGRES_DB = "your_db"
"""
**Anti-pattern:** Using a relational database for high-velocity, unstructured data, or forgetting to consider geographic data locality.
### 4.2. Data Locality and Replication
**Standard:** Consider data locality and replication for optimal performance and availability.
* **Do This:** Deploy your database close to your application servers. Use database replication to ensure data availability across different regions. Leverage Fly.io Regions.
* **Don't Do This:** Store all data in a single region without considering latency or disaster recovery.
**Why:** Data locality minimizes latency and improves application performance. Replication protects against data loss and ensures high availability.
**Fly.io Considerations:**
* Use Fly.io Regions to deploy your database and application servers in the same geographic location.
* Configure database replication to replicate data across multiple regions.
**Example:**
Configure PostgreSQL replication across multiple Fly.io regions. (Requires setting up streaming replication outside the scope of this document).
**Anti-pattern:** Fetching all data across the globe instead of creating regional read replicas.
### 4.3. Database Migrations
**Standard:** Use database migrations to manage schema changes.
* **Do This:** Use a database migration tool (e.g., Alembic, Flyway) to manage schema changes in a controlled and repeatable manner.
* **Don't Do This:** Make manual schema changes directly in your database.
**Why:** Database migrations ensure that schema changes are applied consistently across different environments and simplify rollback procedures.
**Fly.io Considerations:**
* Include database migrations as part of your CI/CD pipeline.
* Use Fly.io Volumes to store migration scripts.
## 5. Security Best Practices
### 5.1. Least Privilege
**Standard:** Follow the principle of least privilege.
* **Do This:** Grant only the necessary permissions to users and services. Avoid using root or admin accounts unless absolutely necessary. Use environment-specific service accounts with limited scope.
* **Don't Do This:** Grant excessive permissions that could be exploited by attackers.
**Why:** The principle of least privilege limits the impact of security breaches.
**Fly.io Considerations:**
* Use Fly.io's built-in security features to restrict access to your applications and data.
* Use environment variables to store credentials instead of hardcoding them in your code.
### 5.2. Input Validation and Output Encoding
**Standard:** Validate all user inputs and encode outputs to prevent security vulnerabilities.
* **Do This:** Use input validation to prevent SQL injection, cross-site scripting (XSS), and other attacks. Encode outputs to prevent XSS vulnerabilities when displaying user-generated content.
* **Don't Do This:** Trust user input blindly or allow user-generated content to be displayed without proper encoding.
**Why:** Input validation and output encoding prevent common security vulnerabilities.
### 5.3. Dependency Management
**Standard:** Manage your application's dependencies carefully.
* **Do This:** Use a dependency management tool (e.g., pip, npm, Maven) to track and manage your application's dependencies. Regularly update dependencies to patch security vulnerabilities. Scan dependencies for known vulnerabilities using tools like "npm audit" or "pip check".
* **Don't Do This:** Use outdated or unmaintained dependencies.
**Why:** Dependency management helps to prevent security vulnerabilities and ensures that your application is using the latest security patches.
**Fly.io Considerations:**
* Pin dependencies and use a lockfile to ensure repeatable deployments.
* Regularly rebuild Docker images to update base images with security patches.
## 6. Performance Optimization
### 6.1. Caching
**Standard:** Implement caching to improve application performance.
* **Do This:** Use caching to store frequently accessed data in memory or on disk. Use a caching library (e.g., Redis, Memcached) to simplify caching. Leverage Fly.io Regions for geographically distributed caching.
* **Don't Do This:** Cache sensitive data or data that changes frequently.
**Why:** Caching reduces database load and improves response times.
**Fly.io Considerations:**
* Use Fly.io's Redis add-on for a managed Redis cache.
* Configure HTTP caching headers to cache static assets on CDNs.
### 6.2. Connection Pooling
**Standard:** Use connection pooling to reduce the overhead of creating database connections.
* **Do This:** Use a connection pooling library to manage database connections efficiently. Configure the connection pool size based on your application's workload.
* **Don't Do This:** Create a new database connection for every request.
**Why:** Connection pooling reduces database load and improves response times.
### 6.3. Asynchronous Operations
**Standard:** Use asynchronous operations to improve application responsiveness.
* **Do This:** Use asynchronous tasks to perform long-running operations in the background. Use a task queue (e.g., Celery, RabbitMQ) to manage asynchronous tasks.
* **Don't Do This:** Block the main thread with long-running operations.
**Why:** Asynchronous operations improve application responsiveness and prevent the application from becoming unresponsive.
danielsogl
Created Mar 6, 2025
This guide explains how to effectively use .clinerules
with Cline, the AI-powered coding assistant.
The .clinerules
file is a powerful configuration file that helps Cline understand your project's requirements, coding standards, and constraints. When placed in your project's root directory, it automatically guides Cline's behavior and ensures consistency across your codebase.
Place the .clinerules
file in your project's root directory. Cline automatically detects and follows these rules for all files within the project.
# Project Overview project: name: 'Your Project Name' description: 'Brief project description' stack: - technology: 'Framework/Language' version: 'X.Y.Z' - technology: 'Database' version: 'X.Y.Z'
# Code Standards standards: style: - 'Use consistent indentation (2 spaces)' - 'Follow language-specific naming conventions' documentation: - 'Include JSDoc comments for all functions' - 'Maintain up-to-date README files' testing: - 'Write unit tests for all new features' - 'Maintain minimum 80% code coverage'
# Security Guidelines security: authentication: - 'Implement proper token validation' - 'Use environment variables for secrets' dataProtection: - 'Sanitize all user inputs' - 'Implement proper error handling'
Be Specific
Maintain Organization
Regular Updates
# Common Patterns Example patterns: components: - pattern: 'Use functional components by default' - pattern: 'Implement error boundaries for component trees' stateManagement: - pattern: 'Use React Query for server state' - pattern: 'Implement proper loading states'
Commit the Rules
.clinerules
in version controlTeam Collaboration
Rules Not Being Applied
Conflicting Rules
Performance Considerations
# Basic .clinerules Example project: name: 'Web Application' type: 'Next.js Frontend' standards: - 'Use TypeScript for all new code' - 'Follow React best practices' - 'Implement proper error handling' testing: unit: - 'Jest for unit tests' - 'React Testing Library for components' e2e: - 'Cypress for end-to-end testing' documentation: required: - 'README.md in each major directory' - 'JSDoc comments for public APIs' - 'Changelog updates for all changes'
# Advanced .clinerules Example project: name: 'Enterprise Application' compliance: - 'GDPR requirements' - 'WCAG 2.1 AA accessibility' architecture: patterns: - 'Clean Architecture principles' - 'Domain-Driven Design concepts' security: requirements: - 'OAuth 2.0 authentication' - 'Rate limiting on all APIs' - 'Input validation with Zod'
# Component Design Standards for Fly.io This document outlines the component design standards for applications deployed on Fly.io. Adhering to these guidelines will promote maintainability, reusability, performance, and security in your Fly.io applications. ## 1. Introduction to Component Design in Fly.io Component design in Fly.io focuses on creating modular, independent, and reusable parts of an application that are easy to develop, test, and maintain. Given Fly.io's geographically distributed nature, well-designed components also contribute to improved latency and resilience. In this context, "component" is a logical grouping of functionalities, often corresponding to modules, classes, or services. * **Goal:** Build robust, scalable, and maintainable applications on Fly.io. * **Focus:** Modularity, reusability, performance, and security. ## 2. Architectural Considerations ### 2.1 Microservices vs. Monolith with Modules Fly.io supports both microservice and monolithic architectures (with a modular design). The choice depends on the application's complexity and scalability needs. * **Microservices:** Independent, deployable services communicating over the network. Suited for complex applications requiring independent scaling and fault isolation. * **Monolith with Modules:** A single application with clear module boundaries internally. Suitable for smaller applications or when operational overhead of microservices is a concern. **Do This:** * For large applications, decompose into loosely coupled microservices, each handling a specific domain. * For smaller projects, leverage a modular approach within a monolithic application. **Don't Do This:** * Create tightly coupled microservices that lead to a distributed monolith. * Build a monolithic application with no modularity, resulting in unmaintainable code. **Why:** Microservices offer better scalability and fault isolation, while modular monoliths simplify development and deployment for smaller applications. Proper modularity reduces dependencies which helps isolate deployment errors and simplifies development. **Example (Microservice):** """dockerfile # Dockerfile for a user service FROM python:3.11-slim-bookworm WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "user_service.py"] """ **Example (Monolith with Modules):** """python # app.py from user_module import User from product_module import Product # Use the modules user = User(name="John Doe") product = Product(name="Awesome Product") print(f"User: {user.name}, Product: {product.name}") """ ### 2.2 Location Awareness on Fly.io Fly.io's ability to run applications close to users means components should be designed with location awareness in mind. * **Data locality:** Store and process data in the region closest to the users. * **Regional deployments:** Deploy specific components to particular Fly.io regions. **Do This:** * Use Fly.io's region routing features to direct traffic to the nearest instance of a component. * Implement caching strategies to minimize cross-region data access. **Don't Do This:** * Assume all users are geographically close to a single server. * Ignore latency implications of cross-region data access. **Why:** Minimizing latency improves the user experience and reduces bandwidth costs. **Example (Fly.io Region Routing with "fly.toml"):** """toml app = "my-app" primary_region = "iad" # Initial region [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 [[http_service.route]] service = "my-app-eu" # Example: Send requests from Europe to europe VMs path = "/api/europe" [deploy] regions = ["iad", "fra", "syd"] # Regions used for deployment """ ### 2.3 Fault Tolerance & Resilience Fly.io's distributed nature requires components to be fault-tolerant. * **Replication:** Run multiple instances of each component across different regions. * **Circuit Breakers:** Implement circuit breaker pattern to prevent cascading failures. * **Health checks:** Use Fly.io's health checks to monitor component availability and automatically restart failed instances. **Do This:** * Configure health checks for all critical components in your "fly.toml". * Use retry mechanisms with exponential backoff for communication between components. * Implement circuit breakers to isolate failing components. **Don't Do This:** * Rely on a single instance of a component without redundancy. * Allow one failing component to bring down the entire application. **Why:** Redundancy and fault isolation ensures higher availability and a better user experience. **Example (Fly.io Health Check in "fly.toml"):** """toml app = "my-app" primary_region = "iad" [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 [http_service.checks] path = "/healthz" # endpoint of your healthcheck interval = "10s" timeout = "5s" """ ## 3. Coding Standards for Components ### 3.1 Single Responsibility Principle (SRP) Each component should have one, and only one, reason to change. **Do This:** * Design classes and modules with a clear, focused purpose. * Refactor large components into smaller, more manageable units. **Don't Do This:** * Create "god classes" or modules that handle multiple unrelated tasks. **Why:** Makes components easier to understand, test, and maintain. **Example (Python SRP):** """python # Good: Separate classes for User and Email class User: def __init__(self, name, email): self.name = name self.email = email class EmailService: def send_welcome_email(self, user): print(f"Sending welcome email to {user.email}") # Bad: User class handles both user data and email sending class UserWithEmail: def __init__(self, name, email): self.name = name self.email = email def send_welcome_email(self): #Violates SRP: User shouldn't handle email print(f"Sending welcome email to {self.email}") user = User("John Doe", "john@example.com") email_service = EmailService() email_service.send_welcome_email(user) """ ### 3.2 Open/Closed Principle (OCP) Components should be open for extension but closed for modification. **Do This:** * Use inheritance or composition to add new functionality without modifying existing code. * Favor interfaces and abstract classes to decouple components. **Don't Do This:** * Directly modify existing code to add new features, risking regressions. **Why:** Reduces the risk of introducing bugs when adding new features. **Example (Python OCP):** """python # Good: Using Strategy Pattern from abc import ABC, abstractmethod class PaymentStrategy(ABC): @abstractmethod def pay(self, amount): pass class CreditCardPayment(PaymentStrategy): def pay(self, amount): print(f"Paying {amount} with credit card") class PayPalPayment(PaymentStrategy): def pay(self, amount): print(f"Paying {amount} with PayPal") class ShoppingCart: def __init__(self, payment_strategy: PaymentStrategy): self.payment_strategy = payment_strategy def checkout(self, amount): self.payment_strategy.pay(amount) # Bad: Modifying the ShoppingCart class directly class ShoppingCartBad: def checkout(self, amount, payment_method): if payment_method == "credit_card": print(f"Paying {amount} with credit card") elif payment_method == "paypal": print(f"Paying {amount} with PayPal") else: print("Invalid payment method") cart = ShoppingCart(CreditCardPayment()) cart.checkout(100) """ ### 3.3 Liskov Substitution Principle (LSP) Subtypes must be substitutable for their base types without altering the correctness of the program. **Do This:** * Ensure that subclasses correctly implement the behavior of their base classes. * Avoid introducing unexpected side effects in subclasses. **Don't Do This:** * Create subclasses that violate the contract of their base classes. **Why:** Prevents unexpected behavior and ensures that polymorphism works correctly. **Example (violating Liskov Substitution ):** """python class Rectangle: def __init__(self, width, height): self.width = width self.height = height def set_width(self, width): self.width = width def set_height(self, height): self.height = height def area(self): return self.width * self.height class Square(Rectangle): #violates LSP as Square's invariant is width == height def __init__(self, size): super().__init__(size, size) def set_width(self, width): self.width = width self.height = width def set_height(self, height): self.width = height self.height = height def print_area(rectangle: Rectangle): rectangle.set_width(5) rectangle.set_height(4) print(rectangle.area()) rectangle = Rectangle(2, 3) print_area(rectangle) # Output: 20 square = Square(2) print_area(square) # Output: 16 (incorrect if we expect a standard rectangle behavior) """ In this example, the "Square" class violates LSP because setting the width or height also sets the other dimension, which is not the behavior expected of a generic "Rectangle". ### 3.4 Interface Segregation Principle (ISP) Clients should not be forced to depend upon interfaces that they do not use. **Do This:** * Create small, specific interfaces instead of large, general-purpose ones. * Refactor interfaces to separate unrelated methods. **Don't Do This:** * Force classes to implement methods they don't need. **Why:** Reduces dependencies and improves code flexibility. **Example (Python ISP):** """python # Good: Separate interfaces for different functionalities from abc import ABC, abstractmethod class Printer(ABC): @abstractmethod def print_document(self, document): pass class Scanner(ABC): @abstractmethod def scan_document(self, document): pass class Copier(ABC): @abstractmethod def copy_document(self, document): pass # Bad: One large interface with all functionalities mixed class MultiFunctionDevice(ABC): @abstractmethod def print_document(self, document): pass @abstractmethod def scan_document(self, document): pass @abstractmethod def copy_document(self, document): pass class SimplePrinter(Printer): def print_document(self, document): print(f"Printing {document}") class AllInOnePrinter(Printer, Scanner, Copier): def print_document(self, document): print(f"Printing {document}") def scan_document(self, document): print(f"Scanning {document}") def copy_document(self, document): print(f"Copying {document}") """ A client needing only printing should not depend on the "Scanner" or "Copier" methods. ### 3.5 Dependency Inversion Principle (DIP) High-level modules should not depend on low-level modules. Both should depend on abstractions. Abstractions should not depend on details. Details should depend on abstractions. **Do This:** * Use dependency injection to provide dependencies to components. * Program to interfaces rather than concrete implementations. **Don't Do This:** * Hardcode dependencies within components. **Why:** Increases code flexibility and testability. **Example (Python DIP):** """python # Good: Using dependency injection class Switchable: def turn_on(self): raise NotImplementedError def turn_off(self): raise NotImplementedError class LightBulb(Switchable): def turn_on(self): print("LightBulb: turned on...") def turn_off(self): print("LightBulb: turned off...") class ElectricPowerSwitch: def __init__(self, client: Switchable): self.client = client self.on = False def press(self): if self.on: self.client.turn_off() self.on = False else: self.client.turn_on() self.on = True # Bad: Hardcoded dependency class SwitchBad: def __init__(self): self.bulb = LightBulb() #Concrete dependency = Bad self.on = False def press(self): if self.on: self.bulb.turn_off() self.on = False else: self.bulb.turn_off() self.on = True bulb = LightBulb() switch = ElectricPowerSwitch(bulb) #Dependency Injection switch.press() switch.press() """ ## 4. Fly.io Specific Considerations ### 4.1 Using Fly.io Volumes Components that require persistent storage should leverage Fly.io Volumes. **Do This:** * Mount volumes to specific directories in your Fly.io instances. * Use volumes to store data that needs to persist across deployments. **Don't Do This:** * Store persistent data within the container's filesystem, risking data loss on restarts. **Why:** Volumes provide reliable and persistent storage for your applications. **Example (Fly.io Volume Configuration in "fly.toml"):** """toml app = "my-data-app" primary_region = "ord" [build] [deploy] release_command = "/app/migrate_db.sh" [[mounts]] source = "data_volume" # Existing volume name destination = "/data" # Where the volume is mounted [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 """ ### 4.2 Fly.io Secrets Management Securely manage sensitive information using Fly.io Secrets. **Do This:** * Store API keys, database passwords, and other sensitive data as Fly.io Secrets. * Access secrets in your code using environment variables. **Don't Do This:** * Hardcode secrets in your code or configuration files. * Commit secrets to your version control system. **Why:** Protects sensitive data and prevents unauthorized access. **Example (Accessing Fly.io Secret in Python):** """python import os database_password = os.environ.get("DATABASE_PASSWORD") # Use the password to connect to the database print(f"Connecting to database with password: {database_password}") """ ### 4.3 Fly.io Edge Network and Global Distribution Leverage Fly.io's edge network for improved performance. **Do This:** * Configure your services to take full advantage of the Fly.io global network. * Utilize region pinning when needing to ensure consistency as a trade-off. **Don't Do This:** * Ignore latency implications of not using Fly.io's global network effectively. **Why:** Reduced latency provides a better user experience ## 5. Component Communication ### 5.1 REST APIs Use REST APIs for synchronous communication between components. **Do This:** * Design REST APIs using standard HTTP methods and status codes. * Use a consistent API versioning strategy. * Implement proper authentication and authorization for API endpoints. **Don't Do This:** * Expose internal implementation details through the API. * Create overly complex or inconsistent APIs. **Why:** REST APIs are well-established and easy to understand, enabling interoperability ### 5.2 Message Queues (e.g. Redis, NATS) Use message queues for asynchronous communication between components. **Do This:** * Choose a message queue that fits your application's needs (e.g., Redis, RabbitMQ, NATS). * Design message formats that are easy to serialize and deserialize. * Implement error handling and retry mechanisms for message processing. **Don't Do This:** * Use message queues for synchronous operations that require immediate responses. * Create overly complex messaging topologies. **Why:** Message queues enable decoupling, asynchronous processing, and improved scalability. Fly.io makes it easy to deploy Redis and NATS in a colocated fashion. ### 5.3 gRPC Consider gRPC for high-performance communication between internal components. **Do This:** * Define gRPC services using Protocol Buffers. * Generate code for both client and server using gRPC tools. * Implement proper error handling and logging. **Don't Do This:** * Use gRPC for external APIs that need to be easily accessible to a wide range of clients. * Overcomplicate gRPC service definitions. **Why:** gRPC provides high performance, efficient serialization, and strong typing. It typically requires more sophistication than REST. ## 6. Testing ### 6.1 Unit Testing Write unit tests for all components to verify their functionality in isolation. **Do This:** * Use a testing framework appropriate for your language (e.g., pytest for Python, JUnit for Java). * Write tests that cover all possible code paths and edge cases. * Use mocks and stubs to isolate components from their dependencies. **Don't Do This:** * Skip unit testing or write tests that are too superficial. * Write tests that are tightly coupled to the implementation details of the tested components. **Why:** Unit tests ensure that components function correctly and prevent regressions. ### 6.2 Integration Testing Write integration tests to verify the interaction between different components. **Do This:** * Test the communication between components using real or simulated dependencies. * Verify that data is correctly passed between components and that the overall system behaves as expected. **Don't Do This:** * Skip integration testing or write tests that are too narrow in scope. * Rely solely on unit tests without verifying how components work together. **Why:** Integration tests ensure that components work together correctly. ### 6.3 End-to-End Testing Write end-to-end tests to verify the entire application flow from the user interface to the backend. **Do This:** * Use a testing framework that simulates user interactions (e.g., Selenium, Cypress). * Test the entire application flow from the user interface to the backend. * Verify that the application meets the user's requirements. **Don't Do This:** * Skip end-to-end testing or write tests that are too complex and brittle. * Rely solely on unit and integration tests without verifying the end-to-end user experience. **Why:** End-to-end tests ensure that the application meets the user's requirements and provides a good user experience. ## 7. Monitoring and Logging ### 7.1 Centralized Logging Use a centralized logging system to collect and analyze logs from all components. **Do This:** * Use a logging framework appropriate for your language (e.g., log4j for Java, logging for Python). * Configure components to log all important events, including errors, warnings, and informational messages. * Use a tool such as Grafana Loki or similar system for log aggregation. **Don't Do This:** * Skip logging or rely solely on local log files. * Log sensitive data such as passwords or API keys. **Why:** Centralized logging enables easier troubleshooting, performance monitoring, and security analysis. ### 7.2 Metrics Collection Collect metrics from all components to monitor their performance and resource usage. **Do This:** * Use a metrics library appropriate for your language (e.g., Prometheus client libraries). * Collect metrics such as CPU usage, memory usage, network traffic, and request latency. * Use a monitoring system such as Prometheus or Grafana to visualize and analyze metrics. **Don't Do This:** * Skip metrics collection or collect only a limited set of metrics. * Use metrics that are not meaningful or actionable. **Why:** Metrics provide valuable insights into the health and performance of your components. ### 7.3 Tracing Implement distributed tracing to track requests as they flow through different components. **Do This:** * Use a tracing library such as Jaeger or Zipkin. * Instrument code to generate spans for each request as it enters and exits a component. * Use a tracing backend to collect and visualize traces. **Don't Do This:** * Skip tracing or trace only a limited set of requests. * Create traces that are too granular or lack context. **Why:** Tracing enables you to identify performance bottlenecks and diagnose issues in distributed systems. Fly.io has solid support for well created tracing setups.
# API Integration Standards for Fly.io This document outlines coding standards for API integration within the Fly.io ecosystem. It provides guidelines for connecting to backend services and external APIs, emphasizing maintainability, performance, and security. It is intended to be used as a central reference for developers and as context for AI coding assistants. It reflects best practices as of late 2024 and will be updated regularly as the Fly.io platform evolves. ## 1. General Principles ### 1.1. Idempotency * **Do This:** Ensure that API calls are idempotent where applicable. This is especially important for operations that modify data, such as creating or updating records. Use UUIDs or similar unique identifiers for requests to enable retries without unintended side effects. * **Don't Do This:** Assume that every API call succeeds on the first attempt. Network issues or server errors can lead to failed requests, and retries may be necessary. **Why this matters:** Idempotency ensures that repeated API requests have the same effect as a single request. This enhances system reliability, especially in distributed environments like Fly.io, where network hiccups are possible. **Code Example (Go):** """go package main import ( "fmt" "net/http" "bytes" "log" "github.com/google/uuid" ) func createResource(url string, data []byte, idempotencyKey string) error { client := &http.Client{} req, err := http.NewRequest("POST", url, bytes.NewBuffer(data)) if err != nil { return err } req.Header.Set("Content-Type", "application/json") req.Header.Set("Idempotency-Key", idempotencyKey) // Add Idempotency Key resp, err := client.Do(req) if err != nil { return err } defer resp.Body.Close() if resp.StatusCode >= 200 && resp.StatusCode < 300 { fmt.Println("Resource created successfully.") return nil } else { return fmt.Errorf("Failed to create resource. Status code: %d", resp.StatusCode) } } func main() { resourceData := []byte("{"name": "Example Resource"}") resourceURL := "https://api.example.com/resources" // Replace with actual API endpoint idempotencyKey := uuid.New().String() err := createResource(resourceURL, resourceData, idempotencyKey) if err != nil { log.Fatalf("Failed to create resource: %v", err) } } """ ### 1.2. Error Handling * **Do This:** Implement robust error handling for API calls. Log errors with sufficient context for debugging, including request details, response codes, and error messages. Use structured logging formats like JSON for easier analysis. Implement exponential backoff for retries in case of transient errors. * **Don't Do This:** Ignore errors or simply print error messages to the console. This makes it difficult to diagnose and resolve issues. **Why this matters:** Proper error handling ensures that your application can gracefully recover from failures and provides valuable insights into system behavior. **Code Example (Python):** """python import requests import logging import time import json import os # Configure logging (adjust level as needed) logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') def call_api_with_retry(url, data, max_retries=3, backoff_factor=2): """ Calls an API endpoint with exponential backoff retry logic. """ for attempt in range(max_retries): try: response = requests.post(url, json=data) response.raise_for_status() # Raise HTTPError for bad responses (4xx or 5xx) return response.json() except requests.exceptions.RequestException as e: logging.error(f"API request failed (attempt {attempt + 1}/{max_retries}): {e}") if attempt < max_retries - 1: wait_time = backoff_factor ** attempt logging.info(f"Retrying in {wait_time} seconds...") time.sleep(wait_time) else: logging.error("Max retries reached. API call failed.") raise #Re-raise the exception after the last retry def main(): api_url = os.environ.get("MY_API_URL", "https://example.com/api") payload = {"message": "Hello from Fly.io!"} try: result = call_api_with_retry(api_url, payload) logging.info(f"API response: {result}") except Exception as e: logging.exception("An unrecoverable error occurred during API call.") #Use logging.exception to get the traceback if __name__ == "__main__": main() """ * Note the use of "os.environ.get" to retrieve the API URL. This is critical for Fly.io deployment, making the configuration dynamic and avoiding hardcoding. The added "logging.exception" in the "except" block is crucial for capturing the full stack trace, simplifying debugging. ### 1.3. Security * **Do This:** Securely store and manage API keys and other sensitive credentials using Fly.io secrets. Avoid hardcoding credentials directly in your application code or configuration files. Implement proper authentication and authorization mechanisms to protect your APIs from unauthorized access. Enforce TLS/SSL encryption for all API communication. * **Don't Do This:** Commit credentials to your Git repository or expose them in client-side code. **Why this matters:** Protecting sensitive credentials and enforcing access control are essential for preventing security breaches and protecting user data. **Code Example (Node.js):** """javascript // Requires the 'node-fetch' package (npm install node-fetch) import fetch from 'node-fetch'; async function callApi() { const apiKey = process.env.MY_API_KEY; // Retrieve API key from Fly.io secrets if (!apiKey) { console.error("API key not found in environment variables."); return; } try { const response = await fetch('https://api.example.com/data', { // Replace with your API endpoint headers: { 'Authorization': "Bearer ${apiKey}", 'Content-Type': 'application/json' }, method: 'GET' // Or POST, PUT, DELETE as needed }); if (!response.ok) { console.error("API request failed with status: ${response.status}"); const errorData = await response.json(); // Attempt to parse the error response console.error("Error data:", errorData); return; } const data = await response.json(); console.log('API Response:', data); } catch (error) { console.error('Error calling API:', error); } } callApi(); """ * In this example, the API key is retrieved from the "process.env" object, corresponding to a Fly.io secret. Never hardcode the API key in the source code. The example now includes parsing the response and printing the error message if the API request fails, making debugging significantly easier. * Remember to set the secret using "flyctl secrets set MY_API_KEY=<your_api_key>". ### 1.4. Rate Limiting * **Do This:** Implement rate limiting to protect your APIs from abuse and prevent resource exhaustion. Use a sliding window or token bucket algorithm to enforce rate limits. Provide informative error messages to clients when they exceed the rate limit. * **Don't Do This:** Allow unlimited API requests, as this can lead to denial-of-service attacks or unexpected costs. **Why this matters:** Rate limiting protects your APIs from abuse, ensures fair resource allocation, and prevents your application from being overwhelmed by excessive traffic. **Implementation:** Rate limiting can be implemented at various levels, including: * **Application Level:** Using middleware or custom code to track and limit requests based on IP address, user ID, or API key. * **Fly.io CDN:** Leverage the Fly.io CDN features for basic rate limiting. * **External API Gateway:** Use a dedicated API gateway service (e.g., Kong, Tyk) for advanced rate limiting and other API management features. Consider using a library like "go-rate" (for Go) or "Flask-Limiter" (for Python) to simplify rate limiting implementation. ### 1.5. Data Serialization * **Do This:** Use a consistent data serialization format (e.g., JSON, Protocol Buffers) for API communication. Define clear schemas for request and response payloads to ensure data integrity and facilitate validation. * **Don't Do This:** Use inconsistent or poorly defined data formats, as this can lead to parsing errors and interoperability issues. **Why this matters:** Consistent data serialization ensures that data can be easily exchanged between different systems and programming languages. Clear schemas improve data validation and reduce the risk of errors. **Example (JSON schema):** """json { "type": "object", "properties": { "userId": { "type": "integer", "description": "Unique identifier for the user" }, "username": { "type": "string", "minLength": 3, "maxLength": 50, "description": "User's username" }, "email": { "type": "string", "format": "email", "description": "User's email address" } }, "required": [ "userId", "username", "email" ] } """ Use libraries like "jsonschema" (Python) or "ajv" (JavaScript) to validate data against a JSON schema. ## 2. Fly.io Specific Considerations ### 2.1. Fly.io Secrets * **Do This:** Use Fly.io secrets (accessed via environment variables) for any configuration value that should not be checked into source control, especially API keys, database passwords, and other credentials. * **Don't Do This:** Hardcode secrets into your source code or configuration files. **Code Example (Python):** """python import os api_key = os.environ.get("MY_API_KEY") if not api_key: print("API key not found in environment.") # Handle the missing key appropriately, e.g., exit or use a default # Use api_key in your API calls """ ### 2.2. Fly.io Regions * **Do This:** Design your API integrations to be region-aware. If your backend services are deployed in multiple Fly.io regions, use the "FLY_REGION" environment variable to determine the optimal region for connecting to those services. Consider using a service discovery mechanism to dynamically locate the nearest instance of your backend services. * **Don't Do This:** Hardcode specific region URLs or IP addresses, as this can lead to performance issues and increased latency. **Code Example (Go):** """go package main import ( "fmt" "os" ) func main() { flyRegion := os.Getenv("FLY_REGION") if flyRegion == "" { fmt.Println("FLY_REGION environment variable not set.") return } var apiEndpoint string switch flyRegion { case "ams": apiEndpoint = "https://api.example.com/ams" // Amsterdam case "iad": apiEndpoint = "https://api.example.com/iad" // Washington, D.C. case "sjc": apiEndpoint = "https://api.example.com/sjc" // San Jose default: apiEndpoint = "https://api.example.com/default" // Default region } fmt.Printf("Connecting to API endpoint: %s\n", apiEndpoint) // Your API call logic here, using apiEndpoint. } """ ### 2.3. Fly.io Private Networking * **Do This:** Utilize Fly.io's private networking features to securely communicate between your applications and backend services. Deploy your backend services within the same Fly.io organization and use internal DNS names (e.g., "<app-name>.internal") to access them. * **Don't Do This:** Expose your backend services directly to the public internet if they are only intended for internal use. **Example "fly.toml" configuration for internal service:** """toml app = "my-backend-service" primary_region = "iad" [http_service] internal_port = 8080 force_https = false auto_stop_machines = true auto_start_machines = true min_machines_running = 1 [[services]] internal_port = 8080 protocol = "tcp" processes = ["app"] [[services.ports]] port = 8080 handlers = ["tls", "http"] """ Access this service from another Fly.io app named "my-frontend-app" using "http://my-backend-service.internal:8080". No need to expose public ports. ### 2.4. Fly.io Volumes * **Do This:** Consider using Fly.io volumes to persist data that is generated or consumed by your API integrations, such as caches, logs, or temporary files. This ensures data durability across application restarts and deployments. * **Don't Do This:** Store sensitive data directly on the volume without proper encryption and access controls. **Example "fly.toml" configuration for attaching a volume:** """toml app = "my-api-app" primary_region = "iad" [build] [deploy] [env] [mounts] source = "my_volume" destination = "/data" """ Then, inside your application, you can access the volume at the "/data" path. Remember to create the volume using "flyctl volumes create my_volume --region iad --size 10" first. ### 2.5. Health Checks * **Do This:** Implement comprehensive health checks for your API integrations. Ensure that your health checks verify not only that your application is running but also that it can successfully connect to and communicate with all required backend services. This will allow Fly.io to automatically restart unhealthy instances. * **Don't Do This:** Rely solely on basic "ping" health checks that only verify that the application process is running. **Example (Fly.toml health check):** """toml [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 processes = ["app"] [http_service.checks] path = "/healthz" interval = "15s" # Check every 15 seconds timeout = "2s" # Timeout after 2 seconds grace_period = "5s" # Wait 5 seconds before starting checks """ And corresponding Go "/healthz" handler: """go package main import ( "net/http" "fmt" ) func healthzHandler(w http.ResponseWriter, r *http.Request) { // Add logic to check connections to backend services here. // For example, check DB connection, external API connection, etc. // For demonstration purpose, simply returning 200 OK. fmt.Fprint(w, "OK") } func main() { http.HandleFunc("/healthz", healthzHandler) http.ListenAndServe(":8080", nil) } """ ## 3. Patterns for Connecting to Backend Services and External APIs ### 3.1. API Gateway Pattern * **Do This:** Use an API gateway to centralize API management, routing, authentication, authorization, and other cross-cutting concerns. This improves security, simplifies application development, and enables features like rate limiting and request transformation. * **Don't Do This:** Expose your backend services directly to the public internet without an API gateway. **Implementation:** You can use a dedicated API gateway service (e.g., Kong, Tyk, Ambassador), or you can build a lightweight API gateway using a reverse proxy like Nginx or Traefik. Fly.io integrates well with these options. ### 3.2. Backend for Frontend (BFF) Pattern * **Do This:** Create separate backend services tailored to the specific needs of different frontends (e.g., web, mobile). This allows you to optimize data retrieval, transformation, and presentation for each frontend, improving performance and user experience. * **Don't Do This:** Use a single monolithic backend service that serves all frontends, as this can lead to unnecessary complexity and performance bottlenecks. **Implementation:** Deploy separate backend services for each frontend type, each with its own API endpoints and data models. ### 3.3. Circuit Breaker Pattern * **Do This:** Implement the circuit breaker pattern to prevent cascading failures when communicating with external APIs. This involves monitoring the success rate of API calls and automatically opening the circuit breaker if the failure rate exceeds a certain threshold. When the circuit breaker is open, subsequent API calls are immediately failed without even attempting to connect to the external API. Periodically, the circuit breaker will attempt to "half-open" and try a single API call to see if the external API has recovered. * **Don't Do This:** Allow your application to continuously attempt to connect to a failing external API, as this can lead to resource exhaustion and application instability. **Why this matters:** The Circuit Breaker pattern is crucial for building resilient applications that can gracefully handle failures in external services. **Code Example (Go using "github.com/sony/gobreaker"):** """go package main import ( "fmt" "net/http" "time" "github.com/sony/gobreaker" "log" "errors" ) var cb *gobreaker.CircuitBreaker func init() { settings := gobreaker.Settings{ Name: "my-api", MaxRequests: 5, //Allow 5 requests to pass through, then start circuit breaking Interval: 10 * time.Second, // Period for polling results Timeout: 3 * time.Second, // Timeout for the API call ReadyToTrip: func(counts gobreaker.Counts) bool { // Determine whether to trip the circuit breaker. failureRatio := float64(counts.TotalFailures) / float64(counts.Requests) return counts.Requests >= 10 && failureRatio >= 0.6 //Trip after 10 requests with 60% failure rate }, OnStateChange: func(name string, from gobreaker.State, to gobreaker.State) { log.Printf("Circuit Breaker %s changed from %s to %s\n", name, from, to) }, } cb = gobreaker.NewCircuitBreaker(settings) } func callExternalAPI(url string) (string, error) { // Simulate an external API call resp, err := http.Get(url) if err != nil { return "", err } defer resp.Body.Close() if resp.StatusCode >= 200 && resp.StatusCode < 300 { return "API call successful!", nil } //Simulate an API that sometimes fails if resp.StatusCode == 500 { return "", errors.New("Simulated 500 error") } return "", fmt.Errorf("API call failed with status: %d", resp.StatusCode) } func handleAPIRequest() (string, error) { result, err := cb.Execute(func() (interface{}, error) { return callExternalAPI("https://example.com/api") // Replace with your actual API endpoint }) if err != nil { return "", fmt.Errorf("Circuit Breaker Error: %v", err) } return result.(string), nil //Type assertion after cb.Execute } func main() { for i := 0; i < 20; i++ { result, err := handleAPIRequest() if err != nil { fmt.Printf("Request %d failed: %v\n", i, err) } else { fmt.Printf("Request %d successful: %s\n", i, result) } time.Sleep(200 * time.Millisecond) } time.Sleep(10 * time.Second) //Give circuit breaker time to half-open fmt.Println("Testing after some time...") result, err := handleAPIRequest() if err != nil { fmt.Printf("Request after waiting failed: %v\n", err) } else { fmt.Printf("Request after waiting successful: %s\n", result) } } """ Key improvements and explanations: * **Clearer Circuit Breaker Configuration:** The "gobreaker.Settings" struct is used to configure the circuit breaker. * **"MaxRequests"**: This defines how many requests are allowed to pass through *before* the circuit breaker starts actively monitoring for failures. * **"Interval"**: This defines how often the circuit breaker aggregates request results to determine if it should trip (open). * **"ReadyToTrip"**: This is the most important part: a function that *determines* whether to trip the circuit breaker. Previously, it was relying on just "counts.TotalFailures > 5". Now, it only trips if *both* of the following are true: * At least 10 requests have been made ("counts.Requests >= 10"). This avoids tripping prematurely due to a single initial failure. * The failure rate is 60% or higher ("failureRatio >= 0.6"). * **"OnStateChange"**: A function that gets called anytime the circuit breaker changes state (Closed, Open, Half-Open). This is useful for logging and monitoring. ### 3.4. Asynchronous Communication (Queues) * **Do This:** Use message queues (e.g., Redis Queue, RabbitMQ) for asynchronous communication between your applications and backend services. This decouples your applications, improves scalability, and enhances resilience. * **Don't Do This:** Rely solely on synchronous API calls, as this can lead to blocking operations and performance bottlenecks. **Implementation:** Use a message queue library to publish messages to a queue and consume messages from the queue in your backend services. Fly.io can easily run Redis or RabbitMQ instances. ## 4. API Versioning * **Do This:** Implement API versioning to maintain backward compatibility as your APIs evolve. Use a version number in the API endpoint URL (e.g., "/api/v1/users") or in the request headers (e.g., "Accept: application/json; version=1"). * **Don't Do This:** Make breaking changes to your APIs without introducing a new version, as this can break existing clients. **Example using URL versioning:** """ https://api.example.com/v1/users https://api.example.com/v2/users """ **Example using header versioning:** """ Accept: application/json; version=1 """ The server must inspect the "Accept" header and route the request to the appropriate version handler. ## 5. Monitoring and Logging * **Do This:** Implement comprehensive monitoring and logging for your API integrations. Track key metrics such as request latency, error rates, and throughput. Use a centralized logging system to collect and analyze logs from all your applications. * **Don't Do This:** Neglect monitoring and logging, as this makes it difficult to identify and resolve issues. **Implementation:** Use a monitoring tool like Prometheus or Grafana to collect and visualize metrics. Use a logging system like ELK stack (Elasticsearch, Logstash, Kibana) or Splunk to collect and analyze logs. Fly.io provides excellent tools for monitoring. Consider setting up Grafana and Prometheus on Fly.io for in-depth analytics. This coding standards document provides a comprehensive guide to API integration within the Fly.io ecosystem. By following these standards, you can build robust, scalable, and secure applications that leverage the full potential of the Fly.io platform. Remember to regularly review and update these standards as the Fly.io platform evolves.
# Performance Optimization Standards for Fly.io This document outlines the coding standards focused on performance optimization for applications deployed on Fly.io. Adhering to these standards will lead to faster, more responsive, and resource-efficient applications. These standards are tailored for the latest version of Fly.io and incorporate modern approaches for optimal performance within the Fly.io ecosystem. ## 1. Architectural Considerations for Performance ### 1.1. Region Selection and Geographic Distribution **Standards:** * **Do This:** Deploy your application to multiple regions closest to your users. Use Fly.io's built-in support for global deployments to minimize latency. * **Don't Do This:** Deploy only to a single region, especially if your user base is geographically distributed. **Why:** Reduces latency by serving users from the nearest available region. Improves availability by distributing load across multiple regions. **Code Example (fly.toml):** """toml app = "my-fly-app" primary_region = "iad" # Initial region [regions] [[regions.group]] codes = ["iad", "lhr", "syd"] #Expand reach source = "primary" console_command = "/app/bin/my-fly-app migrate" [build] [deploy] release_command = "/app/bin/my-fly-app migrate" strategy = "rolling" [http_service] internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 processes = ["app"] [[http_service.ports]] port = 80 handlers = ["http"] [[http_service.ports]] port = 443 handlers = ["tls", "http"] [experimental] allowed_public_ports = [] [[services]] protocol = "tcp" internal_port = 8080 processes = ["app"] [[services.ports]] port = 80 handlers = ["http"] [[services.ports]] port = 443 handlers = ["tls", "http"] """ **Anti-Pattern:** Hardcoding region-specific logic into the application code. Use Fly.io's configuration and routing features instead. ### 1.2. Database Proximity **Standards:** * **Do This:** Locate your database (e.g., Postgres, Redis) in the same region as your application servers whenever possible to minimize network latency. Consider using Fly.io's managed Postgres or Redis services. * **Don't Do This:** Access a database across regions unless absolutely necessary. **Why:** Reduces latency for database queries, improving overall application responsiveness. **Code Example (Connecting to Fly.io Postgres):** """python import psycopg2 import os # Fetch database credentials from environment variables db_host = os.environ.get("FLY_POSTGRES_FQDN") db_name = os.environ.get("PGDATABASE") db_user = os.environ.get("PGUSER") db_password = 'your_db_password' # Better to get this from a secret try: conn = psycopg2.connect( host=db_host, database=db_name, user=db_user, password=db_password, port=5432 # Usually 5432 for PostgreSQL ) print("Database connection successful") cur = conn.cursor() cur.execute("SELECT version();") db_version = cur.fetchone() print(f"PostgreSQL version: {db_version}") cur.close() conn.close() except psycopg2.Error as e: print(f"Error connecting to database: {e}") """ **Anti-Pattern:** Ignoring database latency. Profile database queries to identify and optimize slow operations. ### 1.3. Caching Strategies **Standards:** * **Do This:** Implement caching at multiple levels: browser, CDN (using Fly.io's global edge network), application server (in-memory), and database (query caching). Use appropriate cache invalidation strategies. Implement HTTP caching headers (e.g., "Cache-Control", "Expires"). * **Don't Do This:** Rely solely on database caching. Cache frequently accessed data closer to the user. **Why:** Reduces load on application servers and databases, resulting in faster response times and lower resource utilization. **Code Example (HTTP Caching with Flask):** """python from flask import Flask, make_response app = Flask(__name__) @app.route('/') def index(): response = make_response("<h1>Hello, World!</h1>") response.headers['Cache-Control'] = 'public, max-age=3600' # Cache for 1 hour return response if __name__ == '__main__': app.run(debug=True) """ **Anti-Pattern:** Aggressively caching dynamic content. Use appropriate cache invalidation techniques when data changes. ### 1.4. Connection Pooling **Standards:** * **Do This:** Use connection pooling for database connections to reduce the overhead of establishing new connections for each request. * **Don't Do This:** Create a new database connection for every request, especially under high load. **Why:** Reduces database load and improves application response time by reusing existing connections. **Code Example (Connection Pooling with SQLAlchemy):** """python from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker import os db_host = os.environ.get("FLY_POSTGRES_FQDN") db_name = os.environ.get("PGDATABASE") db_user = os.environ.get("PGUSER") db_password = 'your_db_password' # get this from a secrets manager! # Database URL (adjust username, password, host, and database name) db_url = f"postgresql://{db_user}:{db_password}@{db_host}/{db_name}" # Create a database engine with connection pooling engine = create_engine(db_url, pool_size=5, max_overflow=10) # Adjust pool_size and max_overflow # Create a session factory Session = sessionmaker(bind=engine) # Example Usage: def get_data_from_db(): session = Session() try: # Perform database operations using the session # Example: # results = session.query(MyTable).all() print("Querying the DB... Replace with your actual query here") except Exception as e: print(f"Error during database operation: {e}") finally: session.close() # Always close the session! if __name__ == '__main__': get_data_from_db() """ **Anti-Pattern:** Setting the connection pool size too small or too large. Tune based on application load and database capacity. ## 2. Code-Level Optimizations ### 2.1. Efficient Data Structures and Algorithms **Standards:** * **Do This:** Choose appropriate data structures (e.g., dictionaries, sets) and algorithms (e.g., sorting algorithms, search algorithms) for the specific task. Optimize for time and space complexity appropriately. * **Don't Do This:** Use inefficient data structures or algorithms that lead to slow execution or high memory consumption. **Why:** Improves application performance by minimizing resource usage and execution time. **Code Example (Using Sets for Efficient Membership Testing):** """python my_list = [1, 2, 3, 4, 5] #Original Data my_set = set(my_list) # Convert to Set #Checking for membership is much faster in sets, if you only need this functionality if 3 in my_set: print("3 exists in my_set") if 6 in my_set: print("6 exists in my_set") else : print("6 does not exist in my_set") """ **Anti-Pattern:** Linear search on large, unsorted lists. Consider using binary search or hash tables. ### 2.2. Asynchronous Operations **Standards:** * **Do This:** Use asynchronous operations (e.g., async/await in Python, Promises in JavaScript) for I/O-bound tasks such as network requests, file I/O, and database queries to avoid blocking the main thread. * **Don't Do This:** Perform blocking I/O operations on the main thread. **Why:** Prevents blocking the event loop, allowing the application to handle more requests concurrently. Improves responsiveness and throughput. **Code Example (Asynchronous HTTP Request with Python aiohttp):** """python import asyncio import aiohttp async def fetch_data(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.text() async def main(): data = await fetch_data('https://example.com') print(data[:100]) # Print the first 100 characters if __name__ == '__main__': asyncio.run(main()) """ **Anti-Pattern:** Mixing synchronous and asynchronous code without proper thread management. Use appropriate executors or thread pools. ### 2.3. Resource Management **Standards:** * **Do This:** Explicitly release resources such as file handles, database connections, and memory as soon as they are no longer needed. Use "try...finally" blocks or context managers ("with" statement in Python) to ensure proper resource cleanup. Utilize Fly.io's autoscaling to efficiently use resources. Consider autoscaling to zero during off-peak hours. * **Don't Do This:** Leak resources, which can lead to memory exhaustion or other performance problems. **Why:** Prevents resource leaks, ensuring efficient utilization of system resources. Improves application stability and scalability. **Code Example (Using "with" Statement for File Handling):** """python try: with open('my_file.txt', 'r') as f: data = f.read() print(data) except FileNotFoundError:
# Code Style and Conventions Standards for Fly.io This document outlines the coding style and conventions to be followed when developing applications for the Fly.io platform. Adhering to these standards ensures code maintainability, readability, and consistency, leading to improved collaboration and reduced debugging efforts. These guidelines also optimize application performance and security within the Fly.io environment. ## 1. General Principles ### 1.1 Consistency **Do This:** Maintain a consistent coding style across the entire project. Use a linter and formatter to enforce these rules automatically. **Don't Do This:** Mix different coding styles within the same file or project. **Why:** Consistency improves readability and reduces cognitive load when developers work on different parts of the application. It also helps AI coding assistants to generate code that fits seamlessly with existing code. ### 1.2 Readability **Do This:** Write code that is easy to understand and follow. Use meaningful variable and function names, and add comments where necessary. **Don't Do This:** Write overly complex or cryptic code that is difficult to decipher. **Why:** Readability is crucial for maintainability. Code should be self-documenting where possible. ### 1.3 Brevity **Do This:** Keep code concise and avoid unnecessary complexity. Use language features to express ideas clearly and efficiently. **Don't Do This:** Write verbose or repetitive code that can be simplified. **Why:** Brevity reduces the size of the codebase, makes it easier to understand, and can improve performance by reducing the amount of code that needs to be executed. ### 1.4 Testability **Do This:** Write code that is easy to test. Use dependency injection and other techniques to decouple components. **Don't Do This:** Write tightly coupled code that is difficult to isolate and test. **Why:** Testability ensures that the application functions correctly and reduces the risk of introducing bugs. Automated tests are essential for continuous integration and deployment. ## 2. Formatting ### 2.1 Indentation and Spacing **Do This:** Use 4 spaces for indentation. Use consistent spacing around operators and after commas. **Don't Do This:** Use tabs for indentation. Omit spaces around operators or after commas. **Example (Python):** """python def calculate_total(price, quantity, tax_rate=0.07): """Calculates the total cost including tax.""" subtotal = price * quantity tax = subtotal * tax_rate total = subtotal + tax return total """ **Example (Go):** """go package main import "fmt" func calculateTotal(price float64, quantity int, taxRate float64) float64 { subtotal := price * float64(quantity) tax := subtotal * taxRate total := subtotal + tax return total } func main() { fmt.Println(calculateTotal(19.99, 2, 0.08)) } """ **Why:** Consistent indentation and spacing improve readability and make it easier to visually parse the code structure. ### 2.2 Line Length **Do This:** Limit lines to a maximum of 120 characters. Break long lines into multiple lines using appropriate line breaks. **Don't Do This:** Write very long lines that require horizontal scrolling. **Why:** Limiting line length improves readability, especially when viewing code on different screen sizes or in diff tools. ### 2.3 Blank Lines **Do This:** Use blank lines to separate logical sections of code, such as function definitions, class definitions, and blocks of code within a function. **Don't Do This:** Use an excessive number of blank lines, or omit blank lines where they are needed. **Why:** Blank lines improve readability by visually separating different parts of the code. ### 2.4 File Encoding **Do This:** Use UTF-8 encoding for all source files. **Don't Do This:** Use other encodings that may not be universally supported. **Why:** UTF-8 is the standard encoding for text files and ensures that characters are displayed correctly across different platforms. ## 3. Naming Conventions ### 3.1 Variables **Do This:** Use descriptive and meaningful names for variables. Use camelCase for variable names (e.g., "userName", "orderTotal"). **Don't Do This:** Use single-letter variable names or cryptic abbreviations. **Example (JavaScript):** """javascript const userFirstName = "John"; const orderTotalAmount = 120.50; """ **Why:** Meaningful variable names make the code easier to understand and reduce the need for comments. ### 3.2 Functions and Methods **Do This:** Use descriptive verb-noun names for functions and methods. Use camelCase for function and method names (e.g., "getUserDetails", "calculateOrderTotal"). **Don't Do This:** Use vague or ambiguous names. **Example (Python):** """python def get_user_profile(user_id): """Retrieves user profile from the database.""" # ... implementation ... return profile def calculate_shipping_cost(order_subtotal, destination): """Calculates shipping cost to a certain destination""" # ... implementation ... return shipping_cost """ **Why:** Clear and descriptive function and method names make the code easier to understand and maintain. ### 3.3 Classes **Do This:** Use PascalCase for class names (e.g., "UserProfile", "OrderManager"). **Don't Do This:** Use lowercase or underscore-separated names for classes. **Example (Java):** """java public class UserProfile { private String userName; private String emailAddress; // ... methods ... } """ **Why:** PascalCase for class names is a common convention that improves code readability. ### 3.4 Constants **Do This:** Use SCREAMING_SNAKE_CASE for constant names (e.g., "MAX_RETRIES", "DEFAULT_TIMEOUT"). **Don't Do This:** Use lowercase or camelCase for constants. **Example (JavaScript):** """javascript const MAX_CONNECTIONS = 100; const API_ENDPOINT = "https://api.example.com/v1"; """ **Why:** SCREAMING_SNAKE_CASE clearly indicates that a variable is a constant and should not be modified. ## 4. Code Comments ### 4.1 Documentation Comments **Do This:** Use documentation comments to describe the purpose, parameters, and return values of functions, methods, and classes. Use a standard documentation format, such as JSDoc for JavaScript or docstrings for Python. **Don't Do This:** Omit documentation comments for important code elements. **Example (JavaScript with JSDoc):** """javascript /** * Retrieves user details from the database. * @param {string} userId - The ID of the user to retrieve. * @returns {Promise<object>} A promise that resolves to the user details. */ async function getUserDetails(userId) { // ... implementation ... return user; } """ **Example (Python with docstrings):** """python def process_order(order_id): """ Processes an order by updating the order status and sending a confirmation email. :param order_id: The ID of the order to process. :type order_id: str :raises OrderProcessingError: If an error occurs during order processing. :returns: None """ # ... implementation ... pass """ **Why:** Documentation comments provide valuable information for other developers and can be used to generate API documentation automatically. ### 4.2 Inline Comments **Do This:** Use inline comments to explain complex or non-obvious code. **Don't Do This:** Over-comment code that is already clear. Avoid stating the obvious. **Example (Go):** """go // Calculate the discount amount based on the order total. discount := orderTotal * discountRate """ **Why:** Inline comments can help to clarify the intent of the code and make it easier to understand. ## 5. Error Handling ### 5.1 Explicit Error Handling **Do This:** Handle errors explicitly using try-catch blocks or error return values. Log errors with sufficient context for debugging (using structured logging). **Don't Do This:** Ignore errors or rely on default error handling. **Example (Node.js with try/catch):** """javascript async function processPayment(paymentDetails) { try { const result = await paymentGateway.charge(paymentDetails); console.log({ message: "Payment successful", transactionId: result.transactionId }); return result; } catch (error) { console.error({ message: "Payment failed", error: error.message, paymentDetails }); throw new Error("Payment processing error"); } } """ **Example (Go with error return values):** """go func readFile(filename string) ([]byte, error) { data, err := os.ReadFile(filename) if err != nil { log.Printf("Error reading file %s: %v", filename, err) return nil, fmt.Errorf("failed to read file: %w", err) } return data, nil } func main() { content, err := readFile("myFile.txt") if err != nil { // Handle the error appropriately log.Fatalf("Could not read file %v", err) return } fmt.Println(string(content)) } """ **Why:** Explicit error handling ensures that errors are detected and handled gracefully, preventing application crashes and data loss. Logging provides valuable information for debugging and troubleshooting. ### 5.2 Custom Exceptions **Do This:** Create custom exceptions to represent specific error conditions in your application. **Don't Do This:** Use generic exceptions for all error conditions. **Example (Python):** """python class OrderProcessingError(Exception): """Custom exception for order processing errors.""" pass def process_order(order_id): try: # ... order processing logic ... if something_went_wrong: raise OrderProcessingError("Failed to process order") except OrderProcessingError as e: print(f"Error processing order: {e}") """ **Why:** Custom exceptions make the code more readable and allow for more specific error handling. ## 6. Security ### 6.1 Input Validation **Do This:** Validate all user inputs to prevent injection attacks. Use appropriate validation techniques for different types of inputs, such as regular expressions for strings and type checking for numbers. **Don't Do This:** Trust user inputs without validation. **Example (Node.js with input validation):** """javascript const validator = require('validator'); function createUser(userInput) { if (!validator.isEmail(userInput.email)) { throw new Error("Invalid email address"); } if (validator.isEmpty(userInput.password)) { throw new Error("Password cannot be empty"); } // ... create user logic ... } """ **Why:** Input validation is crucial for preventing security vulnerabilities such as SQL injection, cross-site scripting (XSS), and command injection. ### 6.2 Authentication and Authorization **Do This:** Implement secure authentication and authorization mechanisms to protect sensitive data and resources. Use established authentication protocols, such as OAuth 2.0 or JWT. **Don't Do This:** Roll your own authentication system. **Why:** Secure authentication and authorization are essential for protecting user data and preventing unauthorized access to sensitive resources. Always use industry-standard practices. ### 6.3 Secrets Management **Do This:** Store sensitive information, such as API keys and database passwords, securely using environment variables or a secrets management system. Fly.io provides built-in secret management. **Don't Do This:** Hardcode secrets in your code or store them in version control. **Example (Fly.io secrets):** Use the "flyctl secrets set" command to set secrets: """bash flyctl secrets set API_KEY=your_api_key DATABASE_URL=your_database_url """ Access secrets in your application: **Example (Go):** """go package main import ( "fmt" "os" ) func main() { apiKey := os.Getenv("API_KEY") databaseURL := os.Getenv("DATABASE_URL") fmt.Println("API Key:", apiKey) fmt.Println("Database URL:", databaseURL) // Your application logic here } """ **Why:** Storing secrets securely prevents unauthorized access to sensitive information and protects against security breaches. ### 6.4 Dependency Management **Do This:** Keep dependencies up to date to patch security vulnerabilities. Use a dependency management tool, such as npm, pip, or Go modules, to manage dependencies and track versions. **Don't Do This:** Use outdated dependencies with known security vulnerabilities. **Why:** Keeping dependencies up to date ensures that your application benefits from the latest security patches and bug fixes. ## 7. Concurrency and Parallelism (Fly.io Specific) Fly.io's architecture allows for easy scaling and distribution of your application across multiple regions. Consequently, attention to concurrency and parallelism is especially important ### 7.1 Region Awareness **Do This:** Design your application to be aware of the region it's running in. Utilize the "FLY_REGION" environment variable to customize behavior based on region. **Example (Python):** """python import os def get_database_connection_string(): region = os.environ.get("FLY_REGION") if region == "ams": return "postgres://ams_db" #Amsterdam DB elif region == "sfo": return "postgres://sfo_db" #San Francisco DB else: return "postgres://default_db" """ **Why:** Region-aware applications can optimize for latency and data locality, improving performance and user experience. ### 7.2 Handling Global State **Do This:** Avoid relying on global state. If global state is necessary, use a distributed caching system like Redis or Memcached, readily available on Fly.io. **Don't Do This:** Store critical state in-memory on a single VM. **Why:** Since Fly.io applications are often distributed across multiple VMs, relying on local in-memory state can lead to inconsistencies and data loss. ### 7.3 Database Connections **Do This:** Use connection pooling to manage database connections efficiently. Properly configure connection limits to avoid exhausting resources. **Don't Do This:** Create a new database connection for every request. **Why:** Connection pooling reduces the overhead of establishing new database connections, improving performance and scalability. ### 7.4 Background Tasks and Queues **Do This:** Offload long-running or resource-intensive tasks to background queues using services like Redis Queue, Celery or similar async task management tools. This helps maintain responsiveness of your web application. **Don't Do This:** Execute long-running tasks directly within request handlers. **Why:** Tasks in request handlers can cause delays and degrade the user experience. Background queues allow you to process tasks asynchronously, without blocking the main application thread. This is particularly important in edge deployments like Fly.io. ## 8. Fly.io Platform Specific Conventions ### 8.1 "fly.toml" Configuration **Do This:** Maintain a well-structured and documented "fly.toml" file. Use comments to explain the purpose of each section and key. **Don't Do This:** Leave unnecessary or commented-out configuration options in "fly.toml". **Why:** The "fly.toml" file is the central configuration file for your Fly.io application, readability is paramount. """toml # fly.toml app configuration file app = "my-cool-app" primary_region = "sfo" # Primary region for deployment [build] # Build configuration builder = "dockerfile" dockerfile = "Dockerfile" [http_service] # HTTP service configuration internal_port = 8080 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 1 [http_service.concurrency] type = "requests" # Handles up to 100 requests simultaneously hard_limit = 100 soft_limit = 75 [[vm]] # VM Configuration cpu_kind = "shared" cpus = 1 memory_mb = 512 """ ### 8.2 Health Checks **Do This:** Implement robust health checks to ensure that Fly.io can properly monitor the health of your application instances using "fly.toml". Ensure health checks accurately reflect the state of the app (e.g. database connectivity) **Don't Do This:** Rely on simple HTTP status code checks that might not catch underlying issues. """toml [checks] [checks.status] port = 8080 protocol = "http" path = "/healthz" # Endpoint for health status timeout = "2s" interval = "10s" restart_limit = 3 [checks.database] # Example of a custom database check port = 8080 protocol = "tcp" timeout = "2s" interval = "15s" restart_limit = 3 """ **Why:** Robust health checks allow Fly.io to automatically restart unhealthy instances, ensuring high availability. ### 8.3 Logging **Do This:** Use structured logging (e.g., JSON format) to make logs easier to parse and analyze. Use appropriate log levels (e.g., DEBUG, INFO, WARN, ERROR) to indicate the severity of events. **Don't Do This:** Print unstructured log messages to standard output. **Example (Python):** """python import logging import json logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') def process_data(data): logging.info(json.dumps({ "event": "data_received", "data": data })) try: # ... process data ... logging.debug(json.dumps({ "event": "data_processed", "status": "success" })) return True; except Exception as e: logging.error(json.dumps({ "event": "data_processing_failed", "error": str(e) })) return False; """ **Why:** Structured logging makes it easier to search, filter, and analyze logs, facilitating debugging and monitoring. ### 8.4 Using Fly Volumes **Do This:** Utilize Fly Volumes for persistent storage of data that needs to survive instance restarts. Understand the performance characteristics of Fly Volumes. **Don't Do This:** Store persistent data directly on the instance's ephemeral storage. **Why:** Fly Volumes provide persistent storage that is automatically replicated and managed by Fly.io, ensuring data durability. ### 8.5 Fly Machines API **Do This:** If your application requires more advanced control over your Fly.io deployments, leverage the Fly Machines API for custom orchestration and management. Be mindful of the API rate limits. **Why:** The Machines API is a low-level API granting precise control over Fly.io resources, enabling automation and specialized deployment strategies. This document provides a foundation for writing clean, maintainable, and secure code for the Fly.io platform. By adhering to these guidelines, developers can build robust and scalable applications that take full advantage of Fly.io's features. This guidance will serve as valuable context for AI coding assistants, helping ensure they generate code that aligns with best practices and Fly.io's unique environment.
# Security Best Practices Standards for Fly.io This document outlines security best practices for developing and deploying applications on the Fly.io platform. Adhering to these standards will help protect your applications and data from common vulnerabilities and ensure a secure and reliable deployment. ## 1. Secure Configuration and Secrets Management ### 1.1. Secure Secrets Storage **Standard:** Never hardcode secrets directly in your application code, Dockerfiles, or configuration files. Use Fly.io's built-in secrets management. **Why:** Hardcoding secrets exposes them to anyone with access to your codebase or container images. Fly.io secrets are encrypted at rest and in transit, minimizing the risk of exposure. **Do This:** * Use "flyctl secrets" to manage secrets. """bash flyctl secrets set DATABASE_URL="postgres://user:password@host:port/database" flyctl secrets set API_KEY="your_super_secret_api_key" """ * Access secrets in your application code through environment variables. """python # Python example import os database_url = os.environ.get("DATABASE_URL") api_key = os.environ.get("API_KEY") if not database_url or not api_key: raise ValueError("Required secrets are not set.") # Use database_url and api_key to connect to your database and make API calls """ **Don't Do This:** * Hardcode secrets in your code: """python # Python example - BAD PRACTICE database_url = "postgres://user:password@host:port/database" api_key = "your_super_secret_api_key" """ * Store secrets in version control. * Expose secrets in logs. **Anti-Pattern:** Using ".env" files in production. While convenient for local development, they are not secure for production deployments and can easily be accidentally committed to source control or exposed. ### 1.2. Environment-Specific Configuration **Standard:** Separate configuration for development, staging, and production environments. **Why:** Using the same configuration across environments can lead to misconfiguration and security vulnerabilities. For example, using production API keys in a development environment could expose sensitive data. **Do This:** * Utilize Fly.io's built-in support for environment variables to specify configurations. * Use separate Fly.io apps for each environment (e.g., "myapp-dev", "myapp-staging", "myapp-prod"). * Create and manage environment-specific secrets using "flyctl secrets". """bash # Set secrets for the production app flyctl secrets set --app myapp-prod DATABASE_URL="..." API_KEY="..." # Set secrets for the staging app flyctl secrets set --app myapp-staging DATABASE_URL="..." API_KEY="..." """ **Don't Do This:** * Use the same secrets across all environments. * Rely on manual configuration changes between environments. **Code Example:** """toml # fly.toml - Example configuration for defining specific build arguments and env vars [build] builder = "dockerfile" # Pass in build-time variables that depend on target environment. # For example, NODE_ENV = "production" when building for production. build-target = "release" #example [env] PORT = "8080" [deploy] release_command = "/app/migrate_db" """ ### 1.3. Principle of Least Privilege **Standard:** Grant the minimum necessary privileges to users, applications, and services. **Why:** Limiting access reduces the potential impact of security breaches. If a compromised account or service has limited privileges, the attacker's ability to cause damage is significantly reduced. **Do This:** * Use Fly.io's RBAC (Role-Based Access Control) features documented here: (Fly.io currently offers limited RBAC). * Ensure applications running within VMs only have the permissions they need, using "USER" directives in Dockerfiles. * Configure firewall rules to restrict network access to only necessary ports and services. **Don't Do This:** * Run applications as root unless absolutely necessary. * Grant broad permissions to services or users without a specific justification. **Code Example (Dockerfile):** """dockerfile FROM ubuntu:latest # Update and install necessary packages RUN apt-get update && apt-get install -y --no-install-recommends \ python3 python3-pip # Create a non-root user RUN useradd -m -s /bin/bash appuser # Set the working directory WORKDIR /app # Copy application files COPY . . # Install Python dependencies RUN pip3 install -r requirements.txt --user # Change ownership of the application directory to the non-root user RUN chown -R appuser:appuser /app # Switch to the non-root user USER appuser # Command to run the application CMD ["python3", "app.py"] """ ### 1.4. Regular Security Audits and Updates **Standard:** Regularly review your application code, dependencies, and infrastructure for security vulnerabilities. Keep your software up-to-date with the latest security patches. **Why:** New vulnerabilities are discovered regularly. Staying up-to-date with security patches helps prevent exploits. Regular audits can identify potential vulnerabilities early. **Do This:** * Use automated vulnerability scanning tools (e.g., Snyk, Trivy) to scan your dependencies and container images. * Subscribe to security mailing lists and advisories for the technologies you use (e.g., Python, Node.js, PostgreSQL). * Regularly update your base images in your Dockerfiles. * Implement a process for reviewing and addressing security vulnerabilities promptly. **Don't Do This:** * Ignore security alerts or vulnerabilities. * Use outdated versions of software without security patches. **Code Example (using Snyk in a CI/CD pipeline):** """yaml # .github/workflows/security.yml - Example GitHub Actions workflow for running Snyk tests. name: Security Scan on: push: branches: [ main ] # or whatever your main branch is pull_request: branches: [ main ] jobs: snyk: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Run Snyk to check for vulnerabilities uses: snyk/actions/python@master # Or javascript etc, adjust as needed env: SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }} with: args: --file=requirements.txt --severity-threshold=high """ ## 2. Securing Network Communications ### 2.1. HTTPS for All Traffic **Standard:** Use HTTPS for all communication between clients and your Fly.io application. **Why:** HTTPS encrypts data in transit, preventing eavesdropping and man-in-the-middle attacks. **Do This:** * Allow fly.io to automatically provision TLS certificates for your application. Fly.io automatically provides free TLS certificates through Let's Encrypt. """bash flyctl certs show your-app-name.fly.dev """ * Ensure your application is configured to redirect HTTP traffic to HTTPS. **Don't Do This:** * Use plain HTTP for sensitive data. * Disable TLS encryption. **Code Example (configuring redirection in a web server):** """nginx # nginx configuration to redirect HTTP to HTTPS server { listen 80; server_name your-app-name.fly.dev; return 301 https://$host$request_uri; } server { listen 443 ssl; server_name your-app-name.fly.dev; # SSL certificate configuration ssl_certificate /etc/letsencrypt/live/your-app-name.fly.dev/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/your-app-name.fly.dev/privkey.pem; # ... other configurations ... } """ ### 2.2. Firewall Configuration **Standard:** Configure firewall rules (e.g., using iptables or UFW) to limit network access to only necessary ports and services. **Why:** Firewalls prevent unauthorized access to your application and reduce the attack surface. **Do This:** * Use Fly.io's private networking to isolate apps. * Use a tool like "ufw" to manage firewall rules inside of your VM. **Don't Do This:** * Leave unnecessary ports open to the public internet. * Disable the firewall. **Code Example (using "ufw" to allow only SSH and HTTP/HTTPS traffic):** """bash # Allow SSH access ufw allow OpenSSH # Allow HTTP traffic ufw allow 80 # Allow HTTPS traffic ufw allow 443 # Enable the firewall ufw enable # Check the firewall status ufw status """ ### 2.3. Mutual TLS (mTLS) **Standard:** Use mTLS for secure communication between services within your Fly.io private network. **Why:** mTLS provides strong authentication and encryption by requiring both the client and server to present valid certificates. **Do This:** * Generate client and server certificates using a tool like OpenSSL. * Configure your services to require client certificates during TLS handshakes. * Distribute client certificates securely. **Don't Do This:** * Use self-signed certificates in production without proper validation. * Store private keys in insecure locations. ### 2.4. Monitoring and Logging **Standard:** Implement comprehensive logging and monitoring to detect and respond to security incidents. **Why:** Logging and monitoring provide visibility into your application's behavior, allowing you to identify suspicious activity and security vulnerabilities. **Do This:** * Use a centralized logging system to collect logs from all your Fly.io applications and services (e.g., Grafana Loki). * Monitor key security metrics, such as authentication failures, API request rates, and error rates. **Don't Do This:** * Disable logging. * Store sensitive data in logs without proper redaction. * Ignore suspicious activity detected by monitoring systems. ## 3. Application Security ### 3.1. Input Validation and Output Encoding **Standard:** Validate all input data from clients and other services. Encode output data to prevent cross-site scripting (XSS) and other injection attacks. **Why:** Input validation prevents attackers from injecting malicious code or data into your application. Output encoding prevents injected code from being executed in the client's browser. **Do This:** * Use server-side validation to verify the format, type, and length of all input data. * Use a templating engine with automatic output encoding (e.g., Jinja2 for Python, Handlebars for JavaScript). **Don't Do This:** * Trust client-side validation alone. * Display raw user input without encoding. **Code Example (Python using Flask and Jinja2):** """python # Flask example with Jinja2 templating engine from flask import Flask, request, render_template import bleach app = Flask(__name__) @app.route('/', methods=['GET', 'POST']) def index(): if request.method == 'POST': # Validate the input name = request.form.get('name') if not name or len(name) > 100: return render_template('index.html', error='Invalid name') # Sanitize HTML input using bleach message = bleach.clean(request.form.get('message')) # Render the template with the sanitized message return render_template('index.html', name=name, message=message) return render_template('index.html') #index.html Jinja2 template <!DOCTYPE html> <html> <head> <title>Input Validation Example</title> </head> <body> {% if error %} <p style="color:red;">{{ error }}</p> {% endif %} <form method="post"> <label for="name">Name:</label><br> <input type="text" id="name" name="name"><br><br> <label for="message">Message:</label><br> <textarea id="message" name="message"></textarea><br><br> <input type="submit" value="Submit"> </form> {% if name and message %} <h2>Hello, {{ name }}!</h2> <p>Your message: {{ message }}</p> {% endif %} </body> </html> """ ### 3.2. Cross-Site Request Forgery (CSRF) Protection **Standard:** Implement CSRF protection to prevent attackers from forging requests on behalf of authenticated users. **Why:** CSRF attacks can allow attackers to perform unauthorized actions on behalf of logged-in users. **Do This:** * Use a CSRF token that is unique to each user session. * Include the CSRF token in all forms and AJAX requests. * Validate the CSRF token on the server before processing the request. **Don't Do This:** * Disable CSRF protection. * Use the same CSRF token for all users. **Code Example (Python using Flask and WTForms):** """python # Python using Flask and WTForms from flask import Flask, render_template, session, redirect, url_for from flask_wtf import FlaskForm, CSRFProtect from wtforms import StringField, SubmitField from wtforms.validators import DataRequired app = Flask(__name__) app.config['SECRET_KEY'] = 'your_secret_key' # Change this to a strong random key csrf = CSRFProtect(app) class MyForm(FlaskForm): name = StringField('Name', validators=[DataRequired()]) submit = SubmitField('Submit') @app.route('/', methods=['GET', 'POST']) def index(): form = MyForm() if form.validate_on_submit(): session['name'] = form.name.data return redirect(url_for('success')) return render_template('index.html', form=form) @app.route('/success') def success(): if 'name' in session: name = session['name'] return render_template('success.html', name=name) else: return redirect(url_for('index')) if __name__ == '__main__': app.run(debug=True) """ ### 3.3. Authentication and Authorization **Standard:** Implement strong authentication and authorization mechanisms to control access to your application. **Why:** Authentication verifies the identity of users, while authorization determines what resources they are allowed to access. **Do This:** * Use strong password policies (e.g., minimum length, complexity requirements). * Implement multi-factor authentication (MFA) for privileged accounts. * Use a role-based access control (RBAC) system to manage user permissions. * Store passwords securely using a strong hashing algorithm (e.g., bcrypt, Argon2). **Don't Do This:** * Store passwords in plain text. * Use weak or default passwords. * Grant excessive permissions to users. ### 3.4. Dependency Management **Standard:** Keep your application's dependencies up-to-date and use tools to detect and prevent vulnerable dependencies. **Why:** Vulnerabilities in dependencies can be exploited to compromise your application. **Do This:** * Use a dependency management tool (e.g., pip for Python, npm for Node.js) to manage your application's dependencies. * Regularly update your dependencies to the latest versions. * Use automated vulnerability scanning tools (e.g., Snyk, OWASP Dependency-Check). **Don't Do This:** * Use outdated dependencies without security patches. * Ignore security alerts from dependency scanning tools. ### 3.5. Error Handling and Logging **Standard:** Handle errors gracefully and log sufficient information to diagnose problems. **Why:** Proper error handling prevents sensitive information from being exposed to users. Logging provides valuable information for debugging and security incident response. **Do This:** * Implement a global error handler to catch unexpected exceptions. * Log errors with sufficient detail to identify the root cause. * Redact sensitive information (e.g., passwords, API keys) from logs. * Use structured logging to make logs easier to query and analyze. **Don't Do This:** * Expose stack traces or other sensitive information to users in error messages. * Log sensitive data in plain text. * Ignore errors or warnings. ## 4. Dockerfile and Image Security ### 4.1. Minimal Base Images **Standard:** Use minimal base images for your Docker containers to reduce the attack surface. **Why:** Smaller images contain fewer dependencies, reducing the number of potential vulnerabilities. **Do This:** * Use lightweight base images like Alpine Linux or distroless images. **Don't Do This:** * Use full-featured base images like Ubuntu or Debian unless necessary. **Code Example (using Alpine Linux as a base image):** """dockerfile FROM python:3.9-alpine # Install dependencies # Copy application files # Set the working directory # Command to run the application """ ### 4.2. Multi-Stage Builds **Standard:** Use multi-stage builds to separate build-time dependencies from runtime dependencies. **Why:** Multi-stage builds allow you to include build tools and dependencies in a temporary build environment, and then copy only the necessary artifacts to the final image. **Do This:** * Use separate "FROM" instructions for the build and runtime stages. * Copy only the necessary files and dependencies from the build stage to the runtime stage. **Don't Do This:** * Include unnecessary build tools or dependencies in the final image. **Code Example (using multi-stage build):** """dockerfile # Build Stage FROM golang:1.21 AS builder WORKDIR /app COPY go.mod go.sum ./ RUN go mod download COPY . ./ RUN go build -o /app/mybinary # Production Stage FROM alpine:latest WORKDIR /app COPY --from=builder /app/mybinary /app/mybinary CMD ["/app/mybinary"] """ ### 4.3. Image Scanning **Standard:** Scan your Docker images for vulnerabilities before deploying them to Fly.io. **Why:** Image scanning identifies potential vulnerabilities in your container images before they can be exploited. **Do This:** * Use a container image scanning tool (e.g., Trivy, Clair, Anchore). * Integrate image scanning into your CI/CD pipeline. * Address vulnerabilities identified by the scanner before deploying the image. This comprehensively describes Security Best Practices on Fly.io. Adherence will increase security for development teams and should be enforced in CI/CD.