Skip to content

Commit 78f7580

Browse files
Update computer architecture content
1 parent 01baa12 commit 78f7580

20 files changed

+9006
-683
lines changed

Computer_architecture/ARM_Architecture.md

Lines changed: 807 additions & 0 deletions
Large diffs are not rendered by default.

Computer_architecture/CPU_Architecture.md

Lines changed: 774 additions & 0 deletions
Large diffs are not rendered by default.

Computer_architecture/Direct_Memory_Access.md

Lines changed: 683 additions & 0 deletions
Large diffs are not rendered by default.

Computer_architecture/Memory_Systems.md

Lines changed: 1100 additions & 0 deletions
Large diffs are not rendered by default.

Operating_System/Embedded_Linux.md

Lines changed: 699 additions & 0 deletions
Large diffs are not rendered by default.

Operating_System/Multi_threading.md

Lines changed: 1280 additions & 0 deletions
Large diffs are not rendered by default.

Operating_System/Real_time_Linux.md

Lines changed: 640 additions & 0 deletions
Large diffs are not rendered by default.

Operating_System/System_Programming.md

Lines changed: 1015 additions & 0 deletions
Large diffs are not rendered by default.

Performance_Optimization/Benchmarking_Frameworks.md

Lines changed: 258 additions & 65 deletions
Large diffs are not rendered by default.

Performance_Optimization/Code_Optimization_Techniques.md

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,101 @@
11
# Code Optimization Techniques
22

3+
## Quick Reference: Key Facts
4+
5+
- **Algorithmic optimization** provides orders-of-magnitude improvements vs. 10-20% from other techniques
6+
- **Compiler optimization** can outperform hand-optimized assembly when code is written clearly
7+
- **Memory access patterns** often impact performance more than algorithmic complexity
8+
- **Loop optimization** (unrolling, vectorization) targets the most performance-critical code sections
9+
- **Function inlining** eliminates call overhead but increases code size
10+
- **Branch prediction** optimization makes common cases the first branch
11+
- **SIMD instructions** can process multiple data elements simultaneously
12+
- **Compiler flags** balance performance vs. compilation time and reliability
13+
314
## The Foundation of Performance Optimization
415

516
Code optimization represents the most fundamental level of performance improvement in embedded systems, where the choice of algorithms, data structures, and compiler configurations can have orders-of-magnitude impact on system performance. Unlike other optimization techniques that might provide 10-20% improvements, algorithmic optimization can transform an unusable system into a highly efficient one. This makes it the first and most important consideration in any optimization effort.
617

718
The optimization process begins with understanding that performance is not a single metric but a complex interplay of multiple factors: execution speed, memory usage, power consumption, and real-time responsiveness. Each of these factors can become a bottleneck depending on the specific requirements of the application. A system optimized for speed might consume excessive power, while a system optimized for power might fail to meet real-time deadlines. The art of optimization lies in finding the right balance for each specific use case.
819

20+
## Core Concepts
21+
22+
### **Concept: Algorithmic Complexity vs. Real Performance**
23+
**Why it matters**: Big-O notation provides theoretical guidance, but constant factors, cache behavior, and data characteristics determine real-world performance.
24+
25+
**Minimal example**:
26+
```c
27+
// O(n²) but cache-friendly vs O(n log n) but cache-unfriendly
28+
void cache_friendly_sort(int arr[], int n) {
29+
// Bubble sort - O(n²) but excellent cache locality
30+
for (int i = 0; i < n-1; i++) {
31+
for (int j = 0; j < n-i-1; j++) {
32+
if (arr[j] > arr[j+1]) {
33+
int temp = arr[j];
34+
arr[j] = arr[j+1];
35+
arr[j+1] = temp;
36+
}
37+
}
38+
}
39+
}
40+
```
41+
42+
**Try it**: Profile both algorithms with different data sizes and cache configurations.
43+
44+
**Takeaways**: Cache behavior often dominates algorithmic complexity for small to medium datasets.
45+
46+
### **Concept: Compiler Optimization Leverage**
47+
**Why it matters**: Modern compilers can transform naive code into highly efficient machine code, often outperforming hand-optimized assembly.
48+
49+
**Minimal example**:
50+
```c
51+
// Let the compiler optimize this
52+
int sum_array(int arr[], int n) {
53+
int sum = 0;
54+
for (int i = 0; i < n; i++) {
55+
sum += arr[i];
56+
}
57+
return sum;
58+
}
59+
60+
// Compiler can vectorize, unroll, and optimize this automatically
61+
```
62+
63+
**Try it**: Compare assembly output with different optimization levels (-O0, -O2, -O3).
64+
65+
**Takeaways**: Write clear, predictable code and let the compiler do the heavy lifting.
66+
67+
### **Concept: Memory Access Patterns**
68+
**Why it matters**: Memory access patterns often impact performance more than algorithmic complexity due to cache behavior.
69+
70+
**Minimal example**:
71+
```c
72+
// Good: Row-major access (cache-friendly)
73+
int sum_matrix_good(int matrix[][100], int rows) {
74+
int sum = 0;
75+
for (int i = 0; i < rows; i++) {
76+
for (int j = 0; j < 100; j++) {
77+
sum += matrix[i][j]; // Sequential memory access
78+
}
79+
}
80+
return sum;
81+
}
82+
83+
// Bad: Column-major access (cache-unfriendly)
84+
int sum_matrix_bad(int matrix[][100], int rows) {
85+
int sum = 0;
86+
for (int j = 0; j < 100; j++) {
87+
for (int i = 0; i < rows; i++) {
88+
sum += matrix[i][j]; // Strided memory access
89+
}
90+
}
91+
return sum;
92+
}
93+
```
94+
95+
**Try it**: Benchmark both functions with different matrix sizes.
96+
97+
**Takeaways**: Access data in the order it's stored in memory.
98+
999
## Algorithmic Optimization: The Foundation of Performance
10100
11101
Algorithmic optimization represents the most fundamental level of performance improvement, where the choice of algorithms and data structures can have orders-of-magnitude impact on system performance. Unlike other optimization techniques that might provide 10-20% improvements, algorithmic optimization can transform an unusable system into a highly efficient one. This makes it the first and most important consideration in any optimization effort.
@@ -81,6 +171,90 @@ Instruction scheduling is critical for performance. The compiler can often reord
81171
82172
SIMD (Single Instruction, Multiple Data) instructions can process multiple data elements simultaneously, providing significant performance improvements for data-parallel operations. Modern compilers can automatically vectorize many loops to use SIMD instructions, but the code must be written in a way that allows the compiler to recognize vectorization opportunities.
83173
174+
## Visual Representations
175+
176+
### Optimization Impact Hierarchy
177+
```
178+
Performance Impact
179+
180+
├── Algorithmic (10x - 100x)
181+
├── Memory Access (2x - 10x)
182+
├── Compiler Optimization (1.5x - 3x)
183+
├── Loop Optimization (1.2x - 2x)
184+
└── Instruction-Level (1.1x - 1.5x)
185+
```
186+
187+
### Compiler Optimization Flow
188+
```
189+
Source Code → Parse → Optimize → Generate Assembly
190+
│ │ │ │
191+
│ │ ├── Local (Safe)
192+
│ │ ├── Global (Risky)
193+
│ │ └── Target-Specific
194+
│ └── AST
195+
└── Compiler Flags
196+
```
197+
198+
### Memory Access Pattern Comparison
199+
```
200+
Row-Major (Good): Column-Major (Bad):
201+
[1][2][3][4] [1][5][9][13]
202+
[5][6][7][8] [2][6][10][14]
203+
[9][10][11][12] [3][7][11][15]
204+
[13][14][15][16] [4][8][12][16]
205+
206+
Cache hits: ████████ Cache hits: ██
207+
Cache misses: ██ Cache misses: ████████
208+
```
209+
210+
## Guided Labs
211+
212+
### Lab 1: Compiler Optimization Analysis
213+
1. **Setup**: Create a simple function with loops and function calls
214+
2. **Compile**: Use different optimization levels (-O0, -O1, -O2, -O3)
215+
3. **Analyze**: Compare assembly output and execution time
216+
4. **Document**: Note which optimizations the compiler applied
217+
218+
### Lab 2: Memory Access Pattern Impact
219+
1. **Implement**: Both row-major and column-major matrix operations
220+
2. **Profile**: Use cache profiling tools (perf, valgrind)
221+
3. **Measure**: Execution time with different data sizes
222+
4. **Analyze**: When does cache behavior dominate?
223+
224+
### Lab 3: Algorithmic vs. Implementation Trade-offs
225+
1. **Compare**: Simple O(n²) algorithm vs. complex O(n log n) algorithm
226+
2. **Profile**: Memory usage, cache misses, execution time
227+
3. **Vary**: Data sizes from small (fits in cache) to large (exceeds cache)
228+
4. **Conclude**: When does each approach win?
229+
230+
## Check Yourself
231+
232+
### Understanding Check
233+
- [ ] Can you explain why O(n²) might be faster than O(n log n) for small datasets?
234+
- [ ] Do you understand when to let the compiler optimize vs. manual optimization?
235+
- [ ] Can you identify cache-friendly vs. cache-unfriendly memory access patterns?
236+
- [ ] Do you know which compiler flags to use for different optimization goals?
237+
238+
### Application Check
239+
- [ ] Can you profile code to identify the actual performance bottlenecks?
240+
- [ ] Can you restructure loops to improve cache locality?
241+
- [ ] Can you choose appropriate optimization levels for your target system?
242+
- [ ] Can you balance performance vs. code size vs. compilation time?
243+
244+
### Analysis Check
245+
- [ ] Can you analyze assembly output to understand compiler optimizations?
246+
- [ ] Can you use profiling tools to measure cache performance?
247+
- [ ] Can you identify when algorithmic changes vs. implementation changes are needed?
248+
- [ ] Can you measure the real-world impact of optimizations?
249+
250+
## Cross-links
251+
252+
- **[Memory Management](./Memory_Management.md)** - Understanding memory layout and allocation
253+
- **[Performance Profiling](./Performance_Profiling.md)** - Measuring optimization effectiveness
254+
- **[Build Systems](../System_Integration/Build_Systems.md)** - Configuring compiler optimization
255+
- **[Real-Time Systems](../Real_Time_Systems/FreeRTOS_Basics.md)** - Performance requirements and constraints
256+
- **[Hardware Fundamentals](../Hardware_Fundamentals/Clock_Management.md)** - Understanding system timing
257+
84258
## Conclusion
85259
86260
Code optimization techniques provide the foundation for high-performance embedded systems. Algorithmic optimization can provide orders-of-magnitude improvements, while compiler optimization can provide significant additional improvements with minimal effort. The key is to understand the optimization techniques available and apply them systematically based on the specific requirements and constraints of the target system.

0 commit comments

Comments
 (0)