# Code Style and Conventions Standards for Jupyter Notebooks

This document outlines the code style and conventions standards for developing Jupyter Notebooks. Adhering to these guidelines will ensure maintainability, readability, collaboration, and overall code quality. These guidelines are designed to work with modern Jupyter Notebooks and related tools, including AI coding assistants.

## 1. General Philosophy

### 1.1. Readability and Maintainability First

* **Do This**: Prioritize code that is easy to understand and maintain over code that is marginally shorter or "clever."

* **Don't Do This**: Sacrifice readability for minimal gains in performance or conciseness.

### 1.2. Consistency is Key

* **Do This**: Follow a consistent style throughout the notebook, and across all notebooks in a project.

* **Don't Do This**: Mix different styles within the same notebook without a clear reason (e.g., working with legacy code).

### 1.3. Explain Yourself

* **Do This**: Use comments judiciously to explain the *why*, not just the *what*. Provide context, rationale, and assumptions.

* **Don't Do This**: Over-comment obvious code. Focus on explaining complex logic or non-obvious choices.

## 2. Notebook Structure and Organization

### 2.1. Linear Narrative

* **Do This**: Structure notebooks as a linear narrative with a clear progression from introduction to conclusion.

* **Don't Do This**: Jump randomly between unrelated topics or analysis steps.

### 2.2. Sections and Headings

* **Do This**: Use Markdown headings ("#", "##", "###") to divide the notebook into logical sections and subsections.

* **Don't Do This**: Rely solely on cell outputs to guide the reader through the analysis.

"""markdown

# 1. Introduction

## 1.1. Project Overview

### 1.1.1. Objectives

"""

### 2.3. Table of Contents (TOC)

* **Do This**: Generate and include a table of contents at the beginning of the notebook using a Jupyter extension or code snippet. Libraries like "ipywidgets" and Javascript TOC extensions can accomplish this.

* **Why**: Enables easier navigation, especially in large notebooks.

"""python

# Example (using ipywidgets):

import ipywidgets as widgets

from IPython.display import display, HTML

# (Assumes you have headings defined in Markdown cells)

toc = widgets.HTML('''

Introduction

Data Loading

Data Cleaning

''')

display(toc)

"""

### 2.4. Clear Introduction and Conclusion

* **Do This**: Start with a clear introduction outlining the notebook's purpose, objectives, and data sources. End with a summary of findings and conclusions.

* **Don't Do This**: Leave the reader unsure of the notebook's goals or key takeaways.

## 3. Code Style and Formatting

### 3.1. Pythonic Code

* **Do This**: Adhere to PEP 8 guidelines for Python code. Use a linter (e.g., "flake8", "pylint") or formatter (e.g., "black", "autopep8") to enforce these guidelines.

* **Don't Do This**: Ignore PEP 8 recommendations without a strong justification.

### 3.2. Line Length

* **Do This**: Limit line length to 79 characters for code and 72 characters for comments, aligning with PEP 8.

* **Don't Do This**: Allow lines to become excessively long, making the code difficult to read.

### 3.3. Indentation

* **Do This**: Use 4 spaces for indentation.

* **Don't Do This**: Mix tabs and spaces for indentation.

### 3.4. White Space

* **Do This**: Use blank lines to separate logical blocks of code, improving readability. Add a blank line between function definitions, class definitions and major logic blocks for readability.

* **Don't Do This**: Cram code together without any visual separation.

### 3.5. Naming Conventions

* **Do This**:

* Use descriptive names for variables, functions, and classes.

* Follow Python's naming conventions (e.g., "snake_case" for variables and functions, "CamelCase" for classes).

* Be consistent with naming conventions within the notebook.

* **Don't Do This**: Use single-character variable names (except in very limited contexts, like loop counters) or cryptic abbreviations.

"""python

# Do This

customer_name = "Alice Smith"

calculate_average_score(scores)

# Don't Do This

cn = "Alice Smith"

calc_avg(s)

"""

### 3.6. Imports

* **Do This**:

* Group imports at the top of the notebook.

* Use standard library imports before third-party library imports.

* Use absolute imports where possible.

* Import specific functions/classes instead of entire modules when appropriate (e.g., "from math import sqrt" instead of "import math" if you only need "sqrt").

* **Don't Do This**: Scatter imports throughout the notebook or use wildcard imports ("from module import *").

"""python

# Do This

import os

import sys

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

# Don't Do This

import os, sys # Poor readability

from pandas import * # Avoid wildcard imports

"""

### 3.7. String Formatting

* **Do This**: Use f-strings (formatted string literals) for string formatting, as they are more readable and efficient. Use triple quotes for docstrings and multiline strings.

* **Don't Do This**: Rely on older string formatting methods (e.g., "%" operator or ".format()") unless working with legacy code.

"""python

# Do This

name = "Bob"

age = 30

message = f"Hello, my name is {name} and I am {age} years old."

# Multiline string

long_string = """

This is a very long string that spans multiple lines.

It's useful for writing documentation or generating

large blocks of text.

"""

# Don't Do This

message = "Hello, my name is %s and I am %d years old." % (name, age) # Old style

"""

### 3.8. Code Comments

* **Do This**: Use comments to explain complex logic, non-obvious choices, and the purpose of code blocks.

* Write docstrings for functions and classes.

* **Don't Do This**: Comment obvious code or write comments that contradict the code.

"""python

# Do This

def calculate_area(radius):

"""

Calculates the area of a circle.

Args:

radius (float): The radius of the circle.

Returns:

float: The area of the circle.

"""

# Use the formula: area = pi * radius^2

area = 3.14159 * radius * radius

return area

"""

### 3.9. Error Handling

* **Do This**: Use "try...except" blocks to handle exceptions gracefully. Log errors and provide informative error messages.

* **Don't Do This**: Let exceptions crash the notebook without any handling.

"""python

# Do This

try:

result = 10 / 0

except ZeroDivisionError as e:

print(f"Error: Division by zero - {e}")

# Log the error

"""

### 3.10. Cell Execution Order

* **Do This**: Ensure that the notebook can be executed from top to bottom without errors. Restart the kernel and run all cells to verify this.

* **Don't Do This**: Rely on a specific cell execution order that is not reflected in the notebook's structure. Use the "Restart & Run All" command frequently.

## 4. Data Handling

### 4.1. Data Loading

* **Do This**: Load data at the beginning of the notebook in a dedicated "Data Loading" section. Specify the data source, file format, and any relevant loading parameters.

* **Don't Do This**: Load data multiple times throughout the notebook or hardcode file paths without explanation.

"""python

# Do This

DATA_PATH = "data/my_dataset.csv" # Define data path

try:

df = pd.read_csv(DATA_PATH)

print("Data loaded successfully.")

except FileNotFoundError:

print(f"Error: File not found at {DATA_PATH}")

# Handle the error appropriately

"""

### 4.2. Data Exploration and Visualization

* **Do This**: Use visualizations to explore data and communicate findings effectively. Label axes, add titles, and provide captions to explain the plots. Consider using interactive visualizations with libraries like "plotly" or "bokeh".

* **Don't Do This**: Create plots without clear labels or context.

"""python

# Do This

import matplotlib.pyplot as plt

import seaborn as sns

sns.histplot(data=df, x="age")

plt.title("Distribution of Ages")

plt.xlabel("Age")

plt.ylabel("Frequency")

plt.show()

"""

### 4.3. Data Cleaning and Transformation

* **Do This**: Document all data cleaning and transformation steps clearly. Explain the rationale behind each step and handle missing values and outliers appropriately.

* **Don't Do This**: Perform data cleaning without documenting the steps or making assumptions about the data.

"""python

# Data Cleaning: Handling Missing Values

def impute_missing_values(df, column, method='mean'):

"""

Imputes missing values in a specified column using a given method.

Args:

df (pd.DataFrame): The DataFrame to impute.

column (str): The name of the column with missing values.

method (str): The imputation method ('mean', 'median', 'mode').

Raises:

ValueError: If an unsupported imputation method is specified.

Returns:

pd.DataFrame: The DataFrame with imputed values.

"""

#Input validations

if method not in ['mean', 'median', 'mode']:

raise ValueError("Unsupported imputation method")

# Copy the Dataframe to avoid modifications on original data

df = df.copy()

if df[column].isnull().any(): #Checking for any null values in the column passed

if method == 'mean':

fill_value = df[column].mean()

elif method == 'median':

fill_value = df[column].median()

else: #mode

fill_value = df[column].mode()[0]

# Fill-in missing values with the method specified

df[column].fillna(fill_value, inplace=True)

print(f"Missing values in column '{column}' imputed using {method}.")

else:

print(f"No Missing values found in the specified {column} column")

return df

df = impute_missing_values(df, 'age')

"""

### 4.4. Memory Management

* **Do This**: Be mindful of memory usage, especially when working with large datasets. Use techniques like chunking, data type optimization (e.g., using "int8" instead of "int64"), and garbage collection to reduce memory footprint.

* **Don't Do This**: Load entire datasets into memory if it's not necessary or create unnecessary copies of dataframes.

"""python

# Do This: Optimize data types

df['age'] = pd.to_numeric(df['age'], downcast='integer')

# Do This: Chunking when reading files

for chunk in pd.read_csv("large_file.csv", chunksize=10000):

# Process each chunk here

print(chunk.head()) # Example operation

"""

## 5. Modeling and Machine Learning

### 5.1. Model Training and Evaluation

* **Do This**: Clearly separate model training and evaluation steps. Use appropriate metrics to evaluate model performance and document the evaluation process.

* **Don't Do This**: Train and evaluate models without proper validation or use inappropriate metrics.

"""python

# Do This

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, classification_report

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a logistic regression model

model = LogisticRegression()

model.fit(X_train, y_train)

# Make predictions on the test set

y_pred = model.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy}")

print(classification_report(y_test, y_pred))

"""

### 5.2. Hyperparameter Tuning

* **Do This**: Use techniques like cross-validation or grid search to tune hyperparameters. Document the hyperparameter tuning process and the best hyperparameter values.

* **Don't Do This**: Use default hyperparameter values without any tuning or tune hyperparameters without proper validation.

"""python

# Do This: Grid Search for Hyperparameter Tuning

from sklearn.model_selection import GridSearchCV

from sklearn.svm import SVC

param_grid = {'C': [0.1, 1, 10], 'gamma': [0.01, 0.1, 1]}

grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=2)

grid.fit(X_train, y_train)

print(f"Best parameters: {grid.best_params_}")

print(f"Best estimator: {grid.best_estimator_}")

"""

### 5.3. Model Persistence

* **Do This**: Save trained models to disk using libraries like "pickle" or "joblib". Load models when needed for predictions or further analysis.

* **Don't Do This**: Retrain models every time the notebook is executed or store models directly in the notebook.

"""python

# Do This

import joblib

# Save the trained model

filename = 'my_model.joblib'

joblib.dump(model, filename)

# Load the model

loaded_model = joblib.load(filename)

"""

## 6. Interactivity and Widgets

### 6.1. Interactive Controls

* **Do This**: Provide interactive controls using ipywidgets. The widgets should allow users to easily change parameters and see the results.

* **Don't Do This**: Create notebooks that have to be manually edited to see other results or parameters values.

"""python

import ipywidgets as widgets

from IPython.display import display

# Defining a slider widget

slider = widgets.IntSlider(

min=0,

max=100,

step=1,

description='Value:',

value=50

)

# Display the slider

display(slider)

def on_value_change(change):

new_value = change['new']

print(f"Slider value changed. New value: {new_value}")

# Observe the slider's value change

slider.observe(on_value_change, names='value')

"""

## 7. Collaboration and Version Control

### 7.1. Git and Version Control

* **Do This**: Use Git for version control. Commit changes frequently with meaningful commit messages.

* **Don't Do This**: Store notebooks without version control or commit large binary files (e.g., large datasets or model files) to the repository.

* **Why**: proper version controls help avoid conflicts and provides the change history

### 7.2. Avoiding Output in Commits

* **Do This**: Clear all outputs (cell outputs, figures) before committing changes to Git. This reduces the size of the repository and avoids conflicts caused by changing outputs. Use the extension "nbstripout".

* **Don't Do This**: Commit notebooks with large output files within the Notebook. This becomes slow, and can be problematic.

### 7.3. Environment Management

* **Do This**: Provide a "requirements.txt" or "environment.yml" file that lists all the dependencies required to run the notebook.

* **Don't Do This**: Rely on implicit dependencies or require users to manually install packages. This ensures reproducibility and prevents dependency conflicts.

"""bash

# requirements.txt

pandas==1.5.0

numpy==1.23.0

scikit-learn==1.1.3

"""

## 8. Performance Optimization

### 8.1. Vectorization

* **Do This**: Use vectorized operations with libraries like NumPy and Pandas for efficiency. Avoid explicit loops when possible.

* **Don't Do This**: Use Python loops to perform operations that can be vectorized.

"""python

# Do This

import numpy as np

# Vectorized operation

arr = np.array([1, 2, 3, 4, 5])

squared_arr = arr ** 2

# Don't Do This: Inefficient loop

squared_arr = []

for i in arr:

squared_arr.append(i ** 2)

print(squared_arr)

"""

### 8.2. Jupyter Caching

* **Do This**: Cache slow operations or function calls using libraries or magics that are designed for caching and are compatible with Jupyter Notebooks in order to increase efficiency..

* **Don't Do This**: Always calculate the result, especially if it doesn't change.

"""python

from functools import lru_cache

@lru_cache(maxsize=None)

def fibonacci(n):

if n < 2:

return n

return fibonacci(n-1) + fibonacci(n-2)

print(fibonacci(10)) # The result will be cached

"""

### 8.3. Profiling Code

* **Do This**: Use profiling tools (e.g., "%timeit", "%prun") to identify performance bottlenecks. Optimize the code based on the profiling results.

* **Don't Do This**: Guess where the performance bottlenecks are without any profiling.

"""python

# Do This: Use timeit magic

%timeit sum(range(1000))

# Do This: Use prun magic for profiling

def my_function():

# Some code here

pass

%prun my_function()

"""

## 9. Security Considerations

### 9.1. Input Validation

* **Do This**: Validate all user inputs to prevent security vulnerabilities such as code injection or cross-site scripting (XSS).

* **Don't Do This**: Trust user inputs without any validation.

### 9.2. Secrets Management

* **Do This**: Store sensitive information (e.g., API keys, passwords) securely using environment variables or dedicated secrets management tools.

* **Don't Do This**: Hardcode sensitive information directly in the notebook.

"""python

# Do This

import os

api_key = os.environ.get("MY_API_KEY")

if api_key is None:

raise ValueError("API key not found in environment variables.")

"""

### 9.3. Avoid Executing Untrusted Code

* **Do This**: Be cautious when executing code from untrusted sources. Review the code carefully before executing it.

* **Don't Do This**: Execute code from untrusted sources without any review.

## 10. AI Coding Assistant Integration

### 10.1. Prompt Engineering

* **Do This**: Use clear and specific prompts when using AI coding assistants to generate or modify code. Provide context, examples, and desired outcomes.

* **Don't Do This**: Use vague or ambiguous prompts that lead to unpredictable results.

"""

# Good Prompt:

# "Write a function that calculates the factorial of a number using recursion in Python."

# Bad Prompt:

# "Write a factorial function."

"""

### 10.2. Code Review and Validation

* **Do This**: Always review and validate code generated by AI coding assistants. Test the code thoroughly to ensure it meets the requirements and doesn't introduce any errors or security vulnerabilities.

* **Don't Do This**: Trust AI-generated code blindly without any review or testing.

### 10.3. Leveraging AI for Documentation and Comments

* **Do This**: Use AI coding assistants to generate documentation and comments based on the code. Review and refine the AI-generated documentation to ensure it is accurate and informative.

* **Don't Do This**: Rely solely on AI-generated documentation without any human review.

## 11. Conclusion

These code style and conventions standards are designed to help you write high-quality Jupyter Notebooks that are maintainable, readable, and secure. By following these guidelines, you can improve collaboration, reduce errors, optimize performance, and streamline your development workflow. Regularly review and update these standards to stay current with the latest best practices and tools. Always prioritize readability and understandability when using AI coding assistants. Be sure to validate and test any AI-generated code.

Cline

This guide explains how to effectively use .clinerules with Cline, the AI-powered coding assistant.

Overview

The .clinerules file is a powerful configuration file that helps Cline understand your project's requirements, coding standards, and constraints. When placed in your project's root directory, it automatically guides Cline's behavior and ensures consistency across your codebase.

Key Concepts

Purpose of .clinerules

Defines project-specific guidelines and requirements
Enforces consistent coding standards
Establishes documentation practices
Sets testing and quality requirements
Configures error handling preferences

File Location

Place the .clinerules file in your project's root directory. Cline automatically detects and follows these rules for all files within the project.

Rule Structure

1. Project Overview

# Project Overview
project:
  name: 'Your Project Name'
  description: 'Brief project description'
  stack:
    - technology: 'Framework/Language'
      version: 'X.Y.Z'
    - technology: 'Database'
      version: 'X.Y.Z'

2. Code Standards

# Code Standards
standards:
  style:
    - 'Use consistent indentation (2 spaces)'
    - 'Follow language-specific naming conventions'
  documentation:
    - 'Include JSDoc comments for all functions'
    - 'Maintain up-to-date README files'
  testing:
    - 'Write unit tests for all new features'
    - 'Maintain minimum 80% code coverage'

3. Security Rules

# Security Guidelines
security:
  authentication:
    - 'Implement proper token validation'
    - 'Use environment variables for secrets'
  dataProtection:
    - 'Sanitize all user inputs'
    - 'Implement proper error handling'

Best Practices

Writing Effective Rules

Be Specific
- Use clear, actionable language
- Provide examples where helpful
- Define measurable criteria
Maintain Organization
- Group related rules together
- Use consistent formatting
- Keep critical rules at the top
Regular Updates
- Review rules periodically
- Update based on team feedback
- Document changes in version control

Common Patterns

# Common Patterns Example
patterns:
  components:
    - pattern: 'Use functional components by default'
    - pattern: 'Implement error boundaries for component trees'
  stateManagement:
    - pattern: 'Use React Query for server state'
    - pattern: 'Implement proper loading states'

Integration with Development Workflow

Using with Version Control

Commit the Rules
- Include .clinerules in version control
- Document rule changes in commit messages
- Review rule changes as part of PR process
Team Collaboration
- Discuss rule changes with team
- Maintain changelog for rule updates
- Ensure all team members understand rules

Troubleshooting

Common Issues

Rules Not Being Applied
- Verify file location (must be in root directory)
- Check file formatting
- Ensure Cline has access to the file
Conflicting Rules
- Review rule hierarchy
- Resolve conflicts explicitly
- Document rule precedence
Performance Considerations
- Keep rules concise and focused
- Avoid overly complex rule structures
- Regular cleanup of obsolete rules

Examples

Basic Project Setup

# Basic .clinerules Example
project:
  name: 'Web Application'
  type: 'Next.js Frontend'
  standards:
    - 'Use TypeScript for all new code'
    - 'Follow React best practices'
    - 'Implement proper error handling'

testing:
  unit:
    - 'Jest for unit tests'
    - 'React Testing Library for components'
  e2e:
    - 'Cypress for end-to-end testing'

documentation:
  required:
    - 'README.md in each major directory'
    - 'JSDoc comments for public APIs'
    - 'Changelog updates for all changes'

Advanced Configuration

# Advanced .clinerules Example
project:
  name: 'Enterprise Application'
  compliance:
    - 'GDPR requirements'
    - 'WCAG 2.1 AA accessibility'

architecture:
  patterns:
    - 'Clean Architecture principles'
    - 'Domain-Driven Design concepts'

security:
  requirements:
    - 'OAuth 2.0 authentication'
    - 'Rate limiting on all APIs'
    - 'Input validation with Zod'

Code Style and Conventions Standards for Jupyter Notebooks

Cline

Overview

Key Concepts

Purpose of .clinerules

File Location

Rule Structure

1. Project Overview

2. Code Standards

3. Security Rules

Best Practices

Writing Effective Rules

Common Patterns

Integration with Development Workflow

Using with Version Control

Troubleshooting

Common Issues

Examples

Basic Project Setup

Advanced Configuration

Related Rules

Core Architecture Standards for Jupyter Notebooks

Component Design Standards for Jupyter Notebooks

State Management Standards for Jupyter Notebooks

Performance Optimization Standards for Jupyter Notebooks

Testing Methodologies Standards for Jupyter Notebooks