Verilog Coding Tips and Tricks: Verilog Code for Matrix Multiplication - for 2 by 2 Matrices

Wednesday, November 18, 2015

Verilog Code for Matrix Multiplication - for 2 by 2 Matrices

Here is the Verilog code for a simple matrix multiplier. The input matrices are of fixed size 2 by 2 and so the output matrix is also fixed at 2 by 2. I have kept the size of each matrix element as 8 bits.

Verilog doesn't allow you to have multi dimensional arrays as inputs or output ports. So I have converted the three dimensional input and output ports to one dimensional array. Inside the module I have created 3D temporary variables which are initialized to the inputs at the beginning of the always statement. 

The matrix multiplier is also synthesisable. When synthesised for Virtex 4 fpga, using Xilinx XST, a maximum combinational path delay of 9 ns was obtained. 

Matrix multiplier:

//Module for calculating Res = A*B
//Where A,B and C are 2 by 2 matrices.
module Mat_mult(A,B,Res);

    //input and output ports.
    //The size 32 bits which is 2*2=4 elements,each of which is 8 bits wide.    
    input [31:0] A;
    input [31:0] B;
    output [31:0] Res;
    //internal variables    
    reg [31:0] Res;
    reg [7:0] A1 [0:1][0:1];
    reg [7:0] B1 [0:1][0:1];
    reg [7:0] Res1 [0:1][0:1]; 
    integer i,j,k;

    always@ (or B)
    //Initialize the matrices-convert 1 D to 3D arrays
        {A1[0][0],A1[0][1],A1[1][0],A1[1][1]} = A;
        {B1[0][0],B1[0][1],B1[1][0],B1[1][1]} = B;
        i = 0;
        j = 0;
        k = 0;
        {Res1[0][0],Res1[0][1],Res1[1][0],Res1[1][1]} = 32'd0; //initialize to zeros.
        //Matrix multiplication
        for(i=0;< 2;i=i+1)
            for(j=0;< 2;j=j+1)
                for(k=0;< 2;k=k+1)
                    Res1[i][j] = Res1[i][j] + (A1[i][k] * B1[k][j]);
        //final output assignment - 3D array to 1D array conversion.            
        Res = {Res1[0][0],Res1[0][1],Res1[1][0],Res1[1][1]};            


Testbench Code:

module tb;

    // Inputs
    reg [31:0] A;
    reg [31:0] B;
    // Outputs
    wire [31:0] Res;

    // Instantiate the Unit Under Test (UUT)
    Mat_mult uut (

    initial begin
        // Apply Inputs
        A = 0;  B = 0;  #100;
        A = {8'd1,8'd2,8'd3,8'd4};
        B = {8'd5,8'd6,8'd7,8'd8};

Simulation waveform:

The codes were simulated using Xilinx ISE 13.1. The following waveform verifies that the design is working correctly. 


  1. when you perform multiplication of 8 bit elements and then add them together (in this case 2 multiplications and one addition for obtaining each element of the resultant matrix), you need to have the elements of the resultant matrix to be much wider to avoid overflow. This code is not correct.

    1. The code is basic. It doesnt take care of overflows. Doesnt mean its not correct. For lower value of matrix elements it will work fine.

  2. sir..can you help us with this question.
    Implement 8 bits ALU with 8 bits register.
    a)Design 8 bits ALU that X as input (e.g:A,B..more inputs) and produces ones 8 bits result. The ALU should have 5 operation for ALU and LOGIC.
    b)Design a 8x8 bit register file
    c)Design a Control Unit
    I hope you can help us.

    1. I can write the codes for a fee. contact me at lalnitt (at) gmail (dot) com.

  3. how did you display A1,B1 and Res1 in the simulation(the test bench code)?
    we are using 14.2 ISE

  4. when I tried this code in vivado 2016.2 it throws an error when implementation. The synthesis works well.
    'The design is empty. there are no leaf cells in the design.
    Check if opt_design has removed all the leaf cells of your design. check whether you have initiated and connected all of the top level ports.'
    Please let me know what's the issue. Thank you.


  5. The code works as you said. thanks, for your explanations.

  6. How did you display A1,B1 and Res1 in matrix form?