
ELEC522
4 x 4 Linear System Solver with ARM Core and CORDIC QRD and Matrix-Multiplication Accelerator Integration
In this project, we explore the integration of a programmable processing core with a custom accelerator in the context of Vivado through Xilinx Platform Studio (Xilinx SDK) and Xilinx System Generator. Assess the performance speed-up between a pure C-code version of the algorithm and C-code with links to the custom accelerator.
4 x 4 Linear System Solver with ARM Core and CORDIC QRD and Matrix-Multiplication Accelerator Integration
We went ahead and used our work in Project 5 (QR Decomposition modules using CORDIC) and Project 2 (4x4 matrix multiplication or 4x4 matrix-vector multiplication). First we had to modify our project 5 because it had a few errors with inputs of different quadrants. Now it works correctly, but cannot handle complex inputs, so we will only be dealing with real linear equations.
The goal is to solve 4 linear equations in the format Ax = b, where A is a 4x4 matrix, b is a 4x1 vector, and x is the unknown 4x1 vector that solves this system. We do this using QR
decomposition on A, which can be represented by 4x4 matrices Q and R. Q is normalized and orthogonal, while R is upper diagonal. So with this representation, Ax = b becomes QRx = b, by multiplying both sides by Q’, we get Rx = Q’b, where Q’b is a 4x1 matrix, and through this we can solve x using back substitution.
Therefore our goal is to first decompose A into Q and R using our project 5 modules, then multiply Q’ with b using our project 2 module, and then through this, perform back substitution using ARM.
​
Here's our implementation Sysgen Block:


Going more in detail, the left side is all project 5, where the rows of the matrix (padded with a couple 1’s to produce Q) are input and then decomposed into Q and R
On the left we added registers to store the values of Q and the 10 non-trivial values of R to wait until they’re all ready to be entered into the matrix multiplication sequentially using multiplexers. Below is the HLS block that performs matrix multplications once both matrices (Q and b padded with 0’s) are fed in the form of two vectors

