You’ve just received a Python project – maybe from a vendor or your internal team – and they assure you it’s well-written. But blind trust rarely works with code. To verify it’s solid and secure, you need a professional code review.
Today, your software runs perfectly, but if the code is written poorly, in 3-4 years, adding the simplest stuff to your app will be a massive headache. It will cost you more and take forever. Evaluate your code quality early on so that you don’t need to rewrite it entirely when your business expands and new features need to be added.
We’ve created a comprehensive Python code review checklist to help you navigate this process. In this article, we’ll share how to properly review Python code and our experience with auditing Python projects. And if you’re looking for a Python code review example, you’ll find it at the end of the article, so keep reading!
PEP 8 compliance
A key aspect of a Python code review is ensuring code adheres to a style guide like PEP 8. PEP 8, the official Python style guide, promotes code consistency by offering standardized formatting and naming conventions.
Why is this important? Code is read far more than it’s written. A consistent style, as enforced by PEP 8, makes your code clean, organized, and easier to maintain. This translates to smoother collaboration – team members can easily understand each other’s work.
Code Layout:
- Check if there are 4 spaces per indentation level
- Ensure there’s no mixing tabs and spaces for indentation
- Verify if all lines are limited to a maximum of 79 characters
- Check if top-level functions and class definitions are surrounded with two blank lines
- Check if methods within a class are separated by a single blank line
- Verify if imports are put at the top of the file, just after any module comments and docstrings, and before module globals and constants
- Check if each Python module or library is imported on a separate line
# Wrong:
# Arguments on first line forbidden when not using vertical alignment
foo = long_function_name(var_one, var_two,
var_three, var_four)
# Further indentation required as indentation is not distinguishable
def long_function_name(
var_one, var_two, var_three,
var_four):
print(var_one)
# Correct:
# Aligned with opening delimiter
foo = long_function_name(var_one, var_two,
var_three, var_four)
# Add 4 spaces (an extra level of indentation) to distinguish arguments from the rest
def long_function_name(
var_one, var_two, var_three,
var_four):
print(var_one)
# Hanging indents should add a level
foo = long_function_name(
var_one, var_two,
var_three, var_four)
Whitespace in Expressions and Statements:
- Check if there’s no extraneous whitespace immediately inside parentheses, brackets, or braces
- Check if there’s no extraneous whitespace between a trailing comma and a following close parenthesis
- Check if there’s no extraneous whitespace immediately before a comma, semicolon, or colon
# Wrong:
spam( ham[ 1 ], { eggs: 2 } )
# Correct:
spam(ham[1], {eggs: 2})
Comments:
- Verify if comments are complete sentences and are easy to understand
- If a comment is a phrase or sentence, its first word should be capitalized, unless it is an identifier that begins with a lowercase letter
- Check if inline comments are used sparingly and only when the code itself isn’t clear
- Use docstrings to provide comprehensive explanations for functions, classes, and modules
- Ensure docstrings are formatted according to PEP 257
# Incorrect usage of docstrings
def add_numbers_incorrect(a, b):
# Function to add two numbers a and b.
# Returns the sum.
return a + b
# Correct usage of docstrings
def add_numbers_correct(a, b):
"""
Add two numbers and return the result.
Parameters:
a (int): The first number.
b (int): The second number.
Returns:
int: The sum of a and b.
"""
return a + b
Naming Conventions:
- Check if module names use lowercase letters and underscores (snake_case). For example, my_module.py
- Verify if class names use CapWords (also known as CamelCase) convention, where the first letter of each word is capitalized without underscores. For example, MyClass
- Ensure variable and function names are descriptive and clearly indicate their purpose. For example, calculate_distance, format_string
- Check if function and variable names use lowercase letters and underscores (snake_case). For example, my_function, my_variable
- Check if constants use all capital letters with underscores separating words. For example, MAX_SIZE, PI
- Ensure private variables and functions use a single underscore prefix (_) to indicate that the variable or function is intended for internal use. For example, _internal_function
- Check if protected variables and functions use a double underscore prefix (__) to indicate that the variable or function is protected (kind of private, but subclasses can access them). For example, __protected_variable
- If a variable or function doesn’t have any prefix, it’s considered public. For example, public_variable, public_function()
# Not recommended (use descriptive names)
a = 30
items = ["apple", "banana", "milk"]
# Good variable names
age_of_customer = 30
shopping_cart_items = ["apple", "banana", "milk"]
Code quality
Code quality in Python development is key to creating reliable, maintainable, and efficient software. This part of the checklist will help you maintain high Python coding standards.
Code Readability and Simplicity:
- Assess the code for its readability: Python’s syntax and design principles advocate for simplicity and clarity
- Ensure the logic is clear and concise: there should be no overly complex or convoluted code structures
- Confirm that the code’s structure promotes maintainability with proper use of functions and classes
Pythonic Code Practices:
- Evaluate adherence to Pythonic idioms and best practices. This includes using list comprehensions, generator expressions, and context managers effectively
- Review the implementation of Python’s advanced features like decorators and metaclasses
- Examine the use of Python’s dynamic typing and duck typing principles
# Filtering even numbers greater than 10
numbers = [12, 3, 45, 22, 18, 7, 4, 102, 20]
# Using overly complex logic with nested conditions
def filter_numbers():
filtered_numbers = []
for number in numbers:
if number % 2 == 0:
if number > 10:
filtered_numbers.append(number)
return filtered_numbers
# Using list comprehension for simplicity and readability
def filter_numbers():
return [number for number in numbers if number % 2 == 0 and number > 10]
Efficient Use of Python Standard Library:
- Check if Python’s built-in functions and libraries are leveraged to a full extent
- Review the use of Python’s built-in data structures, such as lists, tuples, dictionaries, and sets, ensuring they are used optimally for their intended purposes
- Check for the effective use of Python’s file handling and I/O operations
from collections import Counter
# Inefficient use of file handling and word counting
def count_words():
word_count = {}
file = open('example.txt', 'r') # Opening file in a less optimal way
lines = file.readlines()
for line in lines:
words = line.strip().split(' ')
for word in words:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
file.close() # Manually closing the file
return word_count
# Efficient use of file handling and word counting using Python's built-in functions
# The with statement ensures that the file is properly closed after its suite finishes,
# even if an exception is raised during the execution of that suite
def count_words():
with open('example.txt', 'r') as file:
return Counter(file.read().split())
Python-Specific Challenges:
- Evaluate the code for proper handling of Python-specific challenges, such as the Global Interpreter Lock (GIL) in multi-threaded applications
- Review the handling of dynamic features, such as late binding and runtime type checking
# Incorrect runtime type checking
def add_incorrect(a, b):
if type(a) is not int or type(b) is not int: # Using 'type' for type checking
raise ValueError("Both arguments must be integers")
return a + b
# Correct runtime type checking using type hints (Python 3.5+)
def add_correct(a: int, b: int) -> int:
return a + b
Code functionality
When reviewing code functionality, ask yourself this fundamental question: Does the code do what it should? Here’s what else to keep in mind.
Intended Purpose:
- Confirm that the code meets desired outcomes and aligns with project specifications while considering Python’s dynamic and interpreted nature
- Test the code under various scenarios to validate consistent and accurate results, given Python’s versatility in handling different types of data
- Check if the code leverages Python’s idiomatic features like list comprehensions or generator expressions for efficient logic implementation
Proper Handling of Edge Cases:
- Assess the code’s ability to handle unexpected scenarios: Python’s dynamic typing and extensive library support necessitate careful handling of edge cases
- Evaluate responses to unexpected inputs, such as null values or incorrect data types
- Check for anticipatory logic in managing potential errors and anomalies
# Incorrect handling: Does not anticipate or properly handle unexpected inputs
def divide_incorrect(a, b):
return a / b
# Correct handling: Includes anticipatory logic for null values and incorrect data types
def divide_correct(a, b):
if a is None or b is None:
raise ValueError("Input values cannot be None")
if not all(isinstance(x, (int, float)) for x in [a, b]):
raise TypeError("Input values must be numbers")
if b == 0:
raise ValueError("Division by zero is not allowed")
return a / b
Efficient Error Handling and Logging:
- Check how effectively the exception handling mechanism is used to catch and manage errors
- Review error messages for clarity and helpfulness: Python enables customizable error messages for informative troubleshooting
- Inspect if Python’s built-in logging module is used to its full potential to facilitate monitoring and debugging
import math
import logging
# Less effective exception handling
def sqrt_less_effective(num):
try:
result = math.sqrt(num)
return result
except Exception:
return "Error occurred" # Generic error message, not very helpful
# Effective exception handling with clear error messages and logging
def sqrt_effective(num):
try:
if num < 0:
raise ValueError("Cannot compute the square root of a negative number")
result = math.sqrt(num)
except ValueError as e:
logging.error(f"ValueError encountered: {e}")
raise # Reraising the exception for further handling or logging
return result
Python-Specific Functionalities:
- Evaluate the use of Python decorators for enhancing functionality without modifying the core logic
- Check for the effective use of Python’s standard library, which offers diverse modules to simplify complex functionalities
- Assess the integration with Python frameworks or third-party libraries where necessary, such as requests for HTTP operations or Pandas for data manipulation
Testing and Validation:
- Assess the quality of the unit tests, ensuring they are well-structured, readable, and maintainable
- Check if there are integration tests to simulate real-world interactions with external systems like databases or APIs
- Examine the use of mock objects and test fixtures for isolating and testing specific components or functionalities in Python
Performance
Performance evaluation in Python development focuses on code efficiency and optimization. It involves assessing resource utilization, execution speed, and data processing efficiency.
Efficient Algorithms and Data Structures:
- Examine algorithm choices, ensuring they are optimal for the tasks at hand and consider Python-specific implications like the Global Interpreter Lock (GIL)
- Assess algorithm efficiency in terms of time complexity, particularly important in Python where certain operations can be slow and need optimization for scalability
- Check if the code uses Python’s built-in data types and structures, like lists, dictionaries, and sets, in the most efficient manner
# Check for duplicates in list
numbers_list = [1, 3, 5, 7, 9, 3]
# Inefficient usage: Using a list where a set would be more appropriate
def check_duplicates_inefficient(numbers_list):
for i in range(len(numbers_list)):
for j in range(i + 1, len(numbers_list)):
if numbers_list[i] == numbers_list[j]:
return True
return False
# Efficient usage: Using a set to check for duplicates
def check_duplicates_efficient(numbers_list):
unique_numbers = set(numbers_list)
return len(numbers_list) != len(unique_numbers)
Minimal Computational Complexity:
- Evaluate computational complexity, key in Python to minimize slow execution due to the language’s interpreted nature
- Identify areas for computational efficiency improvement, such as optimizing loops, list comprehensions, or leveraging the efficiency of NumPy arrays for numerical data
- Check if redundant processes can be streamlined with Python’s standard library and optimized third-party packages to replace inefficient custom implementations
# Find common elements
list1 = [1, 2, 3, 4, 5]
list2 = [3, 4, 5, 6, 7]
# Redundant approach (using loops)
def redundant_approach(list1, list2):
common_elements = []
for element in list1:
if element in list2:
common_elements.append(element)
return common_elements
# Streamlined approach (using itertools.filter)
def streamlined_approach(list1, list2):
return list(filter(lambda x: x in list2, list1))
Performance Optimization:
- Review the code for optimization techniques, especially in computation-heavy applications
- Analyze potential bottlenecks, such as I/O operations, network latency, or inefficient use of Python’s threading and multiprocessing capabilities
- Confirm the absence of common performance issues, like memory leaks, which can be particularly tricky in Python due to its garbage collection system, or excessive CPU usage due to inefficient algorithms
import math
# Inefficient algorithm causing excessive CPU usage
def find_prime_numbers(n):
primes = []
for num in range(2, n):
prime = True
for i in range(2, num):
if num % i == 0:
prime = False
break
if prime:
primes.append(num)
return primes
# Efficient algorithm to reduce CPU usage
def find_prime_numbers(n):
is_primal = [True] * (n + 1)
for i in range(2, math.floor(math.sqrt(n))):
if is_primal[i]:
j = i*i
while j <= n:
is_primal[j] = False
j += i
return [i for i in range(2, n) if is_primal[i]]
Profiling and Performance Testing:
- Verify if the codebase utilizes profiling tools like cProfile or line_profiler to identify sections of code with potential performance bottlenecks
- Determine if the project incorporates performance testing frameworks like PyTest or unittest to establish performance benchmarks
- Examine the use of JIT compilers like PyPy or optimization tools like Cython for performance-critical code sections
Leveraging Python’s Asynchronous Capabilities:
- Evaluate the use of asynchronous programming with asyncio or other libraries for handling I/O-bound and high-latency operations efficiently
- Check for the effective use of async/await syntax in Python 3.5+, crucial for writing non-blocking code and improving the performance of I/O-bound applications
- Review the implementation of concurrent.futures for managing a pool of threads or processes, optimizing CPU-bound tasks in Python
import requests
import asyncio
import aiohttp
# Wrong usage (blocking I/O calls)
def download_file(url):
"""Downloads a file synchronously (blocking)"""
response = requests.get(url)
with open(f"file_{url.split('/')[-1]}", "wb") as f:
f.write(response.content)
def run_download():
url1 = "https://example.com/file1.txt"
url2 = "https://example.com/file2.txt"
download_file(url1)
download_file(url2)
run_download()
# Correct usage (using async/await for non-blocking I/O)
async def download_file_async(url):
"""Downloads a file asynchronously (non-blocking)"""
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
if response.status == 200:
filename = f"file_{url.split('/')[-1]}"
async with open(filename, "wb") as f:
await f.write(await response.read())
async def main():
url1 = "https://example.com/file1.txt"
url2 = "https://example.com/file2.txt"
tasks = [download_file_async(url) for url in [url1, url2]]
await asyncio.gather(*tasks) # Run tasks concurrently
asyncio.run(main())
Scalability and maintainability
Imagine your Python project taking off – more users, more data, more features. Here’s how to ensure your code can handle the growth and remain manageable.
Modular and Reusable Design:
- Assess whether the code is segmented into logical units like modules or packages
- Check if modules are loosely coupled and have minimal dependencies between them
- Check if functionalities are broken down into well-defined, reusable functions and classes
- Evaluate the reusability of code components: a wealth of reusable modules and functions can be found in Python’s standard library
import math
# Wrong usage (calculations are specific to this scenario)
def main():
# All calculations done within the main block (not reusable)
radius = 5
area_circle = 3.14 * radius * radius
print(f"Circle Area: {area_circle}")
length = 10
width = 6
area_rectangle = length * width
print(f"Rectangle Area: {area_rectangle}")
if __name__ == "__main__":
main()
# Correct usage (using reusable functions)
def calculate_circle_area(radius):
"""Calculates the area of a circle"""
return math.pi * radius * radius
def calculate_rectangle_area(length, width):
"""Calculates the area of a rectangle"""
return length * width
def main():
# Use reusable functions for calculations
radius = 5
circle_area = calculate_circle_area(radius)
print(f"Circle Area: {circle_area}")
length = 10
width = 6
rectangle_area = calculate_rectangle_area(length, width)
print(f"Rectangle Area: {rectangle_area}")
if __name__ == "__main__":
main()
Large Dataset Considerations:
- Review the code’s handling of large datasets, a common requirement in Python apps, particularly in data science and machine learning
- Analyze the architecture and design patterns for scalability, such as using lazy loading, generators, or efficient data processing techniques like vectorization with NumPy
- Identify bottlenecks that may impede scaling, such as excessive memory allocations or inefficient database queries
import numpy as np
data = np.random.rand(1000000)
# Inefficient data processing without vectorization
def process_data_inefficient(data):
total = sum(data)
mean = total / len(data)
return mean
# Efficient data processing with NumPy
def process_data_efficient(data):
# Using NumPy for vectorized operations
np_data = np.array(data)
mean = np.mean(np_data)
return mean
Leveraging Python Ecosystem for Scalability:
- Evaluate the integration with Python frameworks like Django or Flask, which can impact scalability in web apps
- Assess the use of Python libraries such as Pandas, NumPy, or SciPy in data-heavy applications for their impact on performance and scalability
- Review the implementation of asynchronous programming using asyncio or other frameworks
Compatibility, dependencies, and security
This part of our Python code review checklist ensures the code functions as intended across different environments and dependencies are properly managed. Here are the steps to follow.
Proper Management of Dependencies:
- Evaluate the use and integration of external libraries and frameworks, given Python’s rich ecosystem of third-party packages
- Check for accurate specification and management of dependencies, using tools like pip for package installation and virtualenv or pipenv for creating isolated environments
- Review the compatibility of library versions with the project’s Python version
Check for deprecated libraries or those with known security vulnerabilities
Minimal Dependency Conflicts:
- Identify potential conflicts among dependencies, a common issue in Python projects due to multiple packages interacting
- Inspect the code for dependency redundancy: look for multiple packages that offer similar functionalities
- Examine strategies for resolving dependency conflicts, crucial for Python projects to maintain stability and functionality
Version Pinning and Dependency Locking:
- Check for version pinning of critical dependencies to avoid unexpected breaks due to package updates
- Review the use of dependency locking mechanisms, such as Pipfile.lock or requirements.txt, to ensure reproducible builds and deployments
Security Considerations:
- Review the use of protection mechanisms against well-known security vulnerabilities like SQL injection, cross-site scripting, and insecure direct object references
- Verify if sensitive data like passwords, tokens, or credit card numbers is encrypted in transit and at rest and if proper access controls are in place
- Ensure all user input is sanitized: validating data types, lengths, and expected formats
- Check how user roles and permissions are managed to ensure only authorized users can access specific functionalities
- Ensure proper session timeouts and invalidation mechanisms are in place to prevent unauthorized access
@app.route('/search', methods=['GET'])
def search():
user_query = request.args.get('query')
db_connection = sqlite3.connect('database.db')
cursor = db_connection.cursor()
----------------------------
# Wrong: vulnerable to SQL injection
cursor.execute(f"SELECT * FROM users WHERE name LIKE '%{user_query}%'")
----------------------------
# Correct: Protected against SQL Injection
query = "SELECT * FROM users WHERE name LIKE ?"
cursor.execute(query, ('%' + user_query + '%',))
----------------------------
results = cursor.fetchall()
cursor.close()
db_connection.close()
return str(results)
Version control
Version control is essential for managing and tracking code changes effectively, especially in collaborative environments. It ensures a clear history of development and is crucial for maintaining high-quality code.
Meaningful Commit Messages:
- Review commit messages for clarity and informativeness: they should succinctly describe changes, updates to logic, optimizations, or bug fixes
- Ensure consistency in the format and style of commit messages
- Confirm that commit messages provide enough context for understanding changes, such as modifications in data structures, algorithm enhancements, or important refactoring decisions
Bad commit messages: “fixed bug”, “updates”, “fixed all errors”, “improved performance”
Good commit messages: “fix: handle null values in user input”, “feat: implement user registration feature”, “test: add unit tests for login functionality”
Proper Use of Branches and Pull Requests:
- Ensure good version control hygiene is practiced – using descriptive branching strategies and no large, unfocused commits
- Ensure pull requests are thoroughly reviewed, focusing on code quality, adherence to project standards, and integration with existing code
- Check if there are overly complex or redundant branches to maintain a clean and manageable codebase
Effective Collaboration and Code Integration:
- Review collaboration processes on code changes: ensure they promote effective teamwork and are suited to the project’s scale and complexity
- Confirm smooth handling of code integrations and effective resolution of merge conflicts
- Assess the use of Python code review tools and practices, ensuring they are contributing to maintaining code quality and consistency
Leveling up your Python code review
While this Python code review checklist empowers you to conduct effective reviews in-house, there are significant benefits to leveraging independent code review services.
Benefits of independent code reviews
Here is why hiring an external expert to perform an in-depth analysis of your codebase is a sound idea:
- Fresh Perspective: It’s a known fact that when developers become very familiar with their own codebase, they can develop a certain “blind spot” and lose sight of the bigger picture. External reviewers will help identify overlooked mistakes.
- Objective Analysis: External reviewers are not influenced by team dynamics and existing relationships, so their reviews are more critical and always impartial.
- Deeper Review: Code reviews at Redwerk go beyond just functionality. Our reviewers will inspect your code for security vulnerabilities, identify performance bottlenecks, and help you achieve readable and maintainable code.
- Stronger Expertise: External reviewers work on diverse projects, encountering a wide range of coding styles and potential pitfalls. This multifaceted experience allows them to recognize patterns that can indicate potential problems, whereas your internal team is usually more focused on the specific functionalities of your project.
Our Python code review expertise & services
Before we talk about how you can benefit from our services, we’d like to briefly mention the results of our most recent Python code review.
Complete Network, a US network IT support company, partnered with Redwerk to audit their quote management app, which is written in Python. We were asked to review the app’s backend API. Our code reviewers reported 40 critical issues concerning the architecture, performance, and security.
We also shared some easy wins for increasing performance with Django caching and Python speed-up tools. With our help, Complete Network boosted their code maintainability by 80% and learned Python code review best practices to avoid future headaches.
Whether you’re prepping your product for a future acquisition or major release, a comprehensive and unbiased code review can be your safety net. Here is how we can help:
- Project Review: Our static code analysis coupled with automated code review will highlight areas for improvement in functionality, security, maintainability, and performance.
- Due Diligence: Our code review can be part of a due diligence audit, providing a clear picture of the code’s quality, potential risks, and long-term maintainability.
- Pre-Deployment Review: Ensure your Python project is ready for launch with our pre-deployment review. We’ll identify and address any lingering issues to guarantee a smooth rollout.
- Security Review: We can conduct a targeted Python security code review to scrutinize your code against industry-standard practices, uncover security vulnerabilities, and carefully examine external libraries and dependencies.
Have specific areas you’d like us to focus on? Contact us to discuss how we can tailor our review process to address your unique needs.
Final thoughts
A code review is an investment in the future of your project, saving you time, money, and headaches down the road. By using the expertise of external reviewers, you gain access to a wider pool of knowledge and experience. This, combined with their fresh perspective and focus on code review best practices, will allow you to identify subtle issues and optimization opportunities that your internal team, fantastic as they are, might miss.