Verilog Coding Tips and Tricks: Verilog Code for Matrix Multiplication - for 2 by 2 Matrices

Wednesday, November 18, 2015

Verilog Code for Matrix Multiplication - for 2 by 2 Matrices


UPDATE :  A Better Synthesizable Matrix Multiplier is available here.



Here is the Verilog code for a simple matrix multiplier. The input matrices are of fixed size 2 by 2 and so the output matrix is also fixed at 2 by 2. I have kept the size of each matrix element as 8 bits.

Verilog doesn't allow you to have multi dimensional arrays as inputs or output ports. So I have converted the three dimensional input and output ports to one dimensional array. Inside the module I have created 3D temporary variables which are initialized to the inputs at the beginning of the always statement. 

The matrix multiplier is also synthesisable. When synthesised for Virtex 4 fpga, using Xilinx XST, a maximum combinational path delay of 9 ns was obtained. 

Matrix multiplier:

//Module for calculating Res = A*B
//Where A,B and C are 2 by 2 matrices.
module Mat_mult(A,B,Res);

    //input and output ports.
    //The size 32 bits which is 2*2=4 elements,each of which is 8 bits wide.    
    input [31:0] A;
    input [31:0] B;
    output [31:0] Res;
    //internal variables    
    reg [31:0] Res;
    reg [7:0] A1 [0:1][0:1];
    reg [7:0] B1 [0:1][0:1];
    reg [7:0] Res1 [0:1][0:1]; 
    integer i,j,k;

    always@ (or B)
    begin
    //Initialize the matrices-convert 1 D to 3D arrays
        {A1[0][0],A1[0][1],A1[1][0],A1[1][1]} = A;
        {B1[0][0],B1[0][1],B1[1][0],B1[1][1]} = B;
        i = 0;
        j = 0;
        k = 0;
        {Res1[0][0],Res1[0][1],Res1[1][0],Res1[1][1]} = 32'd0; //initialize to zeros.
        //Matrix multiplication
        for(i=0;< 2;i=i+1)
            for(j=0;< 2;j=j+1)
                for(k=0;< 2;k=k+1)
                    Res1[i][j] = Res1[i][j] + (A1[i][k] * B1[k][j]);
        //final output assignment - 3D array to 1D array conversion.            
        Res = {Res1[0][0],Res1[0][1],Res1[1][0],Res1[1][1]};            
    end 

endmodule

Testbench Code:

module tb;

    // Inputs
    reg [31:0] A;
    reg [31:0] B;
    // Outputs
    wire [31:0] Res;

    // Instantiate the Unit Under Test (UUT)
    Mat_mult uut (
        .A(A), 
        .B(B), 
        .Res(Res)
    );

    initial begin
        // Apply Inputs
        A = 0;  B = 0;  #100;
        A = {8'd1,8'd2,8'd3,8'd4};
        B = {8'd5,8'd6,8'd7,8'd8};
    end
      
endmodule

Simulation waveform:

The codes were simulated using Xilinx ISE 13.1. The following waveform verifies that the design is working correctly. 



20 comments:

  1. when you perform multiplication of 8 bit elements and then add them together (in this case 2 multiplications and one addition for obtaining each element of the resultant matrix), you need to have the elements of the resultant matrix to be much wider to avoid overflow. This code is not correct.

    ReplyDelete
    Replies
    1. The code is basic. It doesnt take care of overflows. Doesnt mean its not correct. For lower value of matrix elements it will work fine.

      Delete
  2. sir..can you help us with this question.
    Implement 8 bits ALU with 8 bits register.
    a)Design 8 bits ALU that X as input (e.g:A,B..more inputs) and produces ones 8 bits result. The ALU should have 5 operation for ALU and LOGIC.
    b)Design a 8x8 bit register file
    c)Design a Control Unit
    I hope you can help us.
    THANK YOU.

    ReplyDelete
    Replies
    1. I can write the codes for a fee. contact me at lalnitt (at) gmail (dot) com.

      Delete
    2. I want 8 by 8 Matrix multiplication verilog code asap

      Delete
  3. how did you display A1,B1 and Res1 in the simulation(the test bench code)?
    we are using 14.2 ISE

    ReplyDelete
  4. when I tried this code in vivado 2016.2 it throws an error when implementation. The synthesis works well.
    'The design is empty. there are no leaf cells in the design.
    Check if opt_design has removed all the leaf cells of your design. check whether you have initiated and connected all of the top level ports.'
    Please let me know what's the issue. Thank you.

    ReplyDelete


  5. The code works as you said. thanks, for your explanations.

    ReplyDelete
  6. How did you display A1,B1 and Res1 in matrix form?

    ReplyDelete
  7. What changes can be done to implement matrix multiplication as FSM?

    ReplyDelete
  8. please send the code for strassen matrix multiplication

    ReplyDelete
  9. I create ip from this code using xilinx vivado . then i add this ip with zynq processing system ip. and generate bit stream successfully , now i am trying to write code IN xsdk in c language .
    but my sdk code is not working properly ..
    code is below...

    #include "xparameters.h"
    #include "xil_io.h"
    #include "xbasic_types.h"
    #include
    #include "myip_matix_Ani.h"


    #define MAT_A_ROWS 2
    #define MAT_A_COLS 2
    #define MAT_B_ROWS 2
    #define MAT_B_COLS 2

    int main()
    {
    int A[2][2], B[2][2],j, i;
    int C[2][2];


    xil_printf("enter 4 numbers for A matrix\n");

    for(i=0;i<2;i++)
    for(j=0;j<2;j++)
    scanf("%d", &A[i][j]);
    {
    if((XPAR_MYIP_MATIX_ANI_0_S00_AXI_BASEADDR+(j*sizeof(int)+i*MAT_A_ROWS*sizeof(int)))<=XPAR_MYIP_MATIX_ANI_0_S00_AXI_HIGHADDR)
    Xil_Out32(XPAR_MYIP_MATIX_ANI_0_S00_AXI_BASEADDR+(j*sizeof(int)+i*MAT_A_ROWS*sizeof(int)), A[i][j]);
    }

    xil_printf("enter 4 numbers for B matrix\n");


    for(i=0;i<2;i++)
    for(j=0;j<2;j++)
    scanf("%d", &B[i][j]);
    {

    if((XPAR_MYIP_MATIX_ANI_0_S00_AXI_BASEADDR+4+(j*sizeof(int)+i*MAT_A_ROWS*sizeof(int)))<=XPAR_MYIP_MATIX_ANI_0_S00_AXI_HIGHADDR)
    Xil_Out32(XPAR_MYIP_MATIX_ANI_0_S00_AXI_BASEADDR+4+(j*sizeof(int)+i*MAT_A_ROWS*sizeof(int)), B[i][j]);
    }
    for(i=0;i<2;i++)
    for(j=0;j<2;j++)

    {
    if((XPAR_MYIP_MATIX_ANI_0_S00_AXI_BASEADDR+8+(j*sizeof(int)+i*MAT_A_ROWS*sizeof(int)))<=XPAR_MYIP_MATIX_ANI_0_S00_AXI_HIGHADDR)
    C[i][j]= Xil_In32(XPAR_MYIP_MATIX_ANI_0_S00_AXI_BASEADDR+8+(j*sizeof(int)+i*MAT_A_ROWS*sizeof(int)));
    }
    xil_printf("{\r\n");
    for (i = 0; i < MAT_A_ROWS; i++)

    for (j = 0; j < MAT_B_COLS; j++)
    xil_printf("%d\n",C[i][j]);
    }

    can anyone tell me why it is not giving correct results..

    thanks

    ReplyDelete
    Replies
    1. hey, can get ur files used for ip with Zynq processor system ip. Pllease, i am doing some experiments

      Delete
  10. Sir does it require FPGA kit for execution

    ReplyDelete
  11. how to write verilog code for n*n matrix transpose

    ReplyDelete
    Replies
    1. https://github.com/RodSernaPerez/verilog-utils/blob/master/transpose.v

      Delete
  12. How to write code for a generic matrix multiplier? I want the code asap!!

    ReplyDelete
  13. How to write code for transpose of matrix of size 64×64

    ReplyDelete
  14. how to write 8x8 matrix multiplication

    ReplyDelete