MATLAB Performance Optimization

This file contains comprehensive performance optimization guidelines for MATLAB development. Use these rules when writing performance-critical MATLAB code, optimizing existing code, and building responsive applications.

Based on official MathWorks documentation and community best practices.

Profiling and Benchmarking

Measure Before Optimizing

Always profile code before attempting optimization to identify actual bottlenecks
Use timeit for accurate function benchmarking; it handles warm-up runs and returns the median of multiple measurements
Use tic/toc for timing code sections within larger programs, but run multiple iterations for reliable measurements
Use the MATLAB Profiler (profile on, profile viewer) for line-by-line analysis, but note it disables some JIT optimizations

Benchmarking Best Practices

Wrap code in a function handle for timeit: t = timeit(@() myFunction(input))
Perform 3-5 warm-up runs before measuring with tic/toc to allow JIT compilation
Run timing tests multiple times and use the median (more robust than mean)
Ensure code executes for at least 0.1 seconds for reliable measurements
Close unnecessary applications and maintain consistent test conditions

Profiler Usage

Start profiling with profile on, stop with profile off, view with profile viewer
Look for functions consuming more than 10% of total time
Examine self-time (excluding child calls) to find actual bottlenecks
Use flame graphs in the Profile Summary to visualize call hierarchies

Vectorization Guidelines

Core Vectorization Principles

Replace explicit loops with array operations whenever possible
Use element-wise operators (.*, ./, .^) for array computations
Leverage built-in functions that operate on entire arrays: sum, mean, max, min, cumsum, diff
Use logical indexing instead of find() when you need values rather than indices

Implicit Expansion (R2016b+)

Use implicit expansion instead of repmat or bsxfun for broadcasting operations
Two arrays have compatible sizes when each dimension is either identical or one of them equals 1
Replace legacy bsxfun calls: use A - mean(A) instead of bsxfun(@minus, A, mean(A))
Implicit expansion is faster and uses less memory than repmat

Page-wise Operations (R2020b+)

Use pagemtimes, pagetranspose, pageinv for batch matrix operations on 3-D arrays
Page-wise functions can be 30-40x faster than equivalent loops over slices
Page dimensions support implicit expansion for flexible batch operations

When Not to Vectorize

Avoid vectorization when it requires creating very large temporary arrays that exceed memory
Do not sacrifice significant code readability for marginal performance gains
Recognize that modern MATLAB (R2015b+) optimizes many loop patterns well
Accept that some algorithms with sequential dependencies cannot be vectorized

Avoid arrayfun/cellfun for Performance

arrayfun and cellfun are typically not faster than explicit loops
Use them for code brevity only, not performance optimization
Exception: arrayfun on gpuArray can create efficient GPU kernels

Memory Management

Pre-allocation

Always pre-allocate arrays before loops: result = zeros(n, 1)
Use NaN(m, n) when distinguishing uninitialized elements from zeros is important
Pre-allocate directly with target type: A = zeros(100, 'int8') not A = int8(zeros(100))
Pre-allocate cell arrays with cell(m, n) and string arrays with strings(m, n)

Avoid Growing Arrays in Loops

Never use patterns like result = [result, newValue] inside loops
Each concatenation forces memory reallocation and data copying
Pre-allocation can improve execution speed by 10-25x or more

Memory-Efficient Data Types

Use single instead of double when 7 decimal digits of precision is sufficient (halves memory)
Use logical for boolean data (8x less memory than double)
Use appropriate integer types (uint8, int16, uint32) for whole numbers
Specify output class in file operations: fread(fid, n, 'uint8=>uint8')

Copy-on-Write and In-Place Operations

MATLAB uses copy-on-write; data is only copied when modified
Enable in-place optimization by using the same variable for input and output: x = processData(x)
In-place optimization only works in functions, not scripts or command line
In-place optimization does not apply inside try/catch blocks or with global/persistent variables

Large Data Strategies

Use matfile to access parts of MAT-file variables without loading entire files (requires v7.3 format)
Avoid using the end keyword with matfile; it loads the entire variable into memory
Use tall arrays for data too large to fit in memory
Use datastores for incremental processing of large file collections

Parallel Computing

When to Parallelize

Parallelize when loop iterations are independent and each takes significant time (>100ms)
Avoid parallelization for short iterations (<1ms) where overhead exceeds computation time
Verify that data transfer overhead is less than computation time saved
Profile serial code first; do not parallelize already-fast vectorized code

parfor Best Practices

Use parfor for loops with independent iterations of roughly uniform duration
Run the outer loop in parallel to minimize per-iteration overhead
Understand variable classifications: sliced, broadcast, and reduction variables
Avoid large broadcast variables; use parallel.pool.Constant for repeated large data
Cannot nest parfor inside parfor; cannot use break, return, or modify loop iterator

parfeval for Asynchronous Execution

Use parfeval when you need intermediate results, progress updates, or early termination
Retrieve results with fetchOutputs (blocking) or fetchNext (as-completed order)
Use cancel(future) to stop pending computations
Chain operations with afterEach and afterAll callbacks

backgroundPool for Responsive Apps

Use backgroundPool with parfeval to keep apps responsive during calculations
Background workers cannot directly update UI components; update UI in afterEach/afterAll callbacks
Support cancellation by storing and managing Future objects
Some functions (file I/O, Java) are not supported in thread-based workers

Parallel Pool Management

Use parpool('Threads') for lower overhead and shared memory (but limited function support)
Use parpool('Processes') for full MATLAB language support and robustness
Allocate minimum 4 GB RAM per worker
Use ticBytes/tocBytes to measure data transfer overhead

GPU Computing

Use gpuArray for large arrays with highly parallel operations
Keep data on GPU as long as possible; minimize CPU-GPU transfers
Use single precision for better GPU performance
Use gather only for final results that need to return to CPU

App Designer Performance

Startup Optimization

Use a lightweight default tab that displays first
Defer non-essential initialization to user-triggered callbacks
Implement lazy loading for tree nodes and large datasets
Distribute components across multiple tabs rather than concentrating in one

Responsive UI Design

Use backgroundPool with parfeval for long computations
Update UI in afterEach/afterAll callbacks, not from worker threads
Use drawnow limitrate in loops for smooth animations without blocking
Implement cancellation support for long-running operations

Efficient Callback Design

Prefer ValueChangedFcn over ValueChangingFcn for heavy operations
Share callbacks between related components to reduce code duplication
Implement debouncing for rapidly-firing events
Wrap callbacks in try/catch for graceful error handling

Graphics Performance

Reuse plot objects by updating XData/YData instead of recreating plots
Use animatedline for streaming data visualization
Disable unnecessary axes interactions and toolbars for static plots
Set fixed axis limits (XLimMode = 'manual') to prevent auto-scaling overhead

Memory in Apps

Store data in appropriate property types (single, integers when applicable)
Clear temporary variables and cached data when no longer needed
Implement pagination for large table displays
Copy properties to local variables before loops to avoid repeated property access overhead

Timer Objects

Use timer objects for periodic updates with configurable ExecutionMode and Period
Always clean up timers in CloseRequestFcn: stop(timer) then delete(timer)
Use 'BusyMode', 'drop' to skip callbacks if previous execution is still running
Reuse graphics objects in timer callbacks instead of creating new plots

uihtml for Custom Components

Use uihtml to embed high-performance web-based visualizations (D3.js, Chart.js)
Communicate between MATLAB and JavaScript via Data property and events
All supporting files must be local; cannot link to external URLs or CDNs
Use for custom interactive widgets, rich text editors, or third-party JavaScript libraries

JIT Compiler Optimization

JIT-Friendly Code Patterns

Use functions instead of scripts for better JIT optimization
Prefer local functions over nested functions when variable sharing is not needed
Keep loop bounds constant; define bounds before the loop
Maintain consistent data types within variables throughout execution
Use short-circuit operators (&&, ||) instead of element-wise (&, |) for scalar conditions

Patterns That Prevent JIT Optimization

Avoid eval and dynamic code execution; use direct operations or function handles instead
Avoid clear all in code; clear specific variables if needed
Avoid functions that query MATLAB state: exist, whos, inputname, dbstack
Avoid changing the MATLAB path during execution (cd, addpath, rmpath)
Minimize use of global and persistent variables

Data Structure Performance

Use string arrays instead of cell arrays of character vectors (2-40x faster, less memory)
Use struct of arrays instead of array of structs for large datasets
Access table columns (T.columnName) rather than individual cells (T{i, 'column'}) in loops
Pre-allocate tables; never grow tables row by row in loops

MEX Files and Code Generation

Consider MEX files only when pure MATLAB cannot meet performance requirements
Use MATLAB Coder for automatic C/C++ code generation with optimizations
Generated code leverages BLAS, LAPACK, and FFTW libraries automatically
MEX provides greatest benefit for complex iterative algorithms, less benefit for built-in operations

File I/O Performance

MAT-File Versions

Use default v7 format for general use with mixed data types (faster for structs, cells, tables)
Use v7.3 format (save(..., '-v7.3')) only when variables exceed 2 GB or partial loading is needed
Consider disabling compression for faster saves when disk space is not a concern

Partial Loading with matfile

Use matfile objects to read/write portions of variables without loading entire files
Requires v7.3 format for efficient partial operations
Avoid the end keyword; use size(m, 'varName') to get dimensions
Read large chunks at once rather than many small reads

Text and CSV Files

Convert frequently-used CSV files to MAT or Parquet format for faster subsequent access
Use detectImportOptions and specify VariableTypes to avoid auto-detection overhead
Read only needed columns with SelectedVariableNames
Use textscan with explicit format specifiers for fastest text parsing

Parquet Files

Prefer Parquet over CSV for large datasets (10-12x smaller, 10-150x faster reads)
Use row filters with parquetread for efficient data subsetting
Column-oriented storage allows reading only needed columns
Use parquetDatastore for processing multiple Parquet files

Binary I/O

Read/write entire arrays in single fread/fwrite operations
Avoid fseek in loops; use the skip parameter of fread instead
Binary format is approximately 10x faster than text format

Network I/O

Copy large files locally before processing; network access can be 10-30x slower
Write files locally first, then copy to network location
Avoid v7.3 format over network when possible (compression can bottleneck on single core)

Caching Strategies

Use memoize for automatic caching of expensive function calls
Use persistent variables for manual caching within locked functions
Clear old data before loading new data to reduce peak memory usage

Quick Reference: Common Anti-Patterns

Anti-Pattern	Better Approach
Growing arrays in loops	Pre-allocate before loop
Using `find()` to get values	Use logical indexing directly
`repmat` for broadcasting	Use implicit expansion
`eval` for dynamic code	Use function handles or dynamic field access
`arrayfun` for speed	Use explicit loops or vectorization
Array of structs	Struct of arrays
Scalar table indexing in loops	Extract column, then index array
`clear all` in code	Clear specific variables
Global variables	Pass as function arguments
Nested `parfor`	Parallelize outer loop only

FilesExpand file tree

matlab-performance-optimization.md

Latest commit

History