Skip to content

marioprieta/DLX-Assembly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DLX Assembly Implementation of the 3n + 1 Sequence

This project provides two versions of an assembly program that implements the 3n + 1 algorithm (Collatz Conjecture) on a DLX architecture. The code calculates a sequence starting from a given integer and generates key statistics on the sequence, including its maximum, mean, and derived values stored in a list. The project includes an optimized and a non-optimized version of the program, demonstrating various assembly-level optimization techniques.

Files

  • OPTIMIZADO.S: Optimized assembly version of the 3n + 1 algorithm. This version includes optimizations for reduced execution time, lower instruction count, and minimized stalls.
  • SINOPTIMIZAR.S: Non-optimized assembly version of the 3n + 1 algorithm. This version executes the algorithm in a straightforward manner, without optimizations for speed or efficiency.
  • utils: Contains a shell script to compile and execute the code in the DLX architecture simulator.
  • WINDLX.BAT: A batch file in the WINDLX directory to run the DLX architecture simulator on Windows.

Environment Setup

Compilation and Execution

In the utils folder, there is a shell script that compiles the code and runs it in the DLX architecture simulator. For Windows users, the WINDLX.BAT file in the WINDLX directory launches the simulator directly.

Hardware Configuration

This project uses a simulated DLX CPU with the following hardware specifications:

  • DLX CPU with a 5-stage pipelined architecture (image available in PIPELINE).

  • Registers: 32 general-purpose registers (GPRs) and 32 floating-point registers (FPRs), each 32 bits. FPRs can be paired for 64-bit double-precision values.

  • Memory: Byte-addressable, big-endian format with 32-bit addresses.

  • Instructions: All instructions are 32-bit and must be aligned in memory.

  • Special Registers: A few special registers can be transferred to/from the GPRs.

    PIPELINE

Program Details

The program computes the following sequence statistics:

  1. Maximum Value: The largest value encountered in the sequence.
  2. Mean Value: The average of the sequence values.
  3. Sequence Length: The total number of terms in the sequence.
  4. List Calculations: Various derived values based on the initial, mean, maximum values, and sequence length.

Key Components

1. Main Algorithm

Both versions of the algorithm follow similar structures with variations in efficiency and instruction count:

  • Sequence Generation: For a given integer n, the sequence is generated by the rules:
    • If n is even, divide it by 2.
    • If n is odd, multiply it by 3 and add 1.
  • Termination Condition: The sequence generation continues until the value 1 is reached.

2. Sequence Statistics

  • Maximum and Mean Calculations: The program tracks the maximum sequence value and the sum of terms to calculate the mean.
  • List Calculations: Various products and ratios of the initial, mean, maximum values, and sequence length are computed and stored in a list, including (Initial/Maximum) * Length and (Mean/Initial) * Length.

Optimizations (in OPTIMIZADO.S)

The optimized version applies several techniques to improve performance:

  • Reduced Instruction Count: Redundant instructions are minimized, using fewer registers and combining operations.
  • Loop Unrolling: The main sequence loop is unrolled to reduce branch instructions and increase parallelism.
  • Multiplication and Division Reductions: Integer and floating-point calculations are simplified by replacing divisions with multiplications where possible (e.g., multiplying by 0.1111111 instead of dividing by 9).
  • Stall Minimization: Dependencies between instructions are reduced by reordering calculations, improving pipeline efficiency and reducing RAW stalls.
  • Conditional Branch Reduction: Only one conditional branch is used per loop iteration.

Performance Comparison

The optimized version demonstrates significant improvements in execution cycles and stall reduction:

  • Execution Cycles: The optimized program executes fewer cycles due to minimized instruction count.
  • Instruction Count: Reduced through efficient loop unrolling and instruction reordering.
  • Stalls: Fewer RAW stalls and control stalls, as the optimized program incurs only 17 RAW stalls versus 702 in the non-optimized version.

Getting Started

  1. Compile and Run: Compile the .S files using an assembler compatible with DLX architecture and load them into a DLX simulator.
  2. Input Configuration: Set valor_inicial to the desired starting integer. The program initializes this value and computes the sequence and statistics automatically.
  3. Results: Upon execution, the sequence statistics are saved in memory locations defined in the .data section, including secuencia_valor_medio (mean of the sequence), secuencia_maximo (maximum of the sequence), and lista_valor_medio (mean of derived list values).

About

DLX assembly project implementing the 3n + 1 algorithm with optimized and non-optimized versions. Calculates sequence stats like max, mean, and derived values. Optimizations reduce cycles, instructions, and stalls. Includes compile scripts for DLX simulators on Windows/Linux, detailed hardware config, and performance comparisons.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors