UPDATE : A Better Synthesizable Matrix Multiplier is available here.
Here is the Verilog code for a simple matrix multiplier. The input matrices are of fixed size 2 by 2 and so the output matrix is also fixed at 2 by 2. I have kept the size of each matrix element as 8 bits.
Verilog doesn't allow you to have multi dimensional arrays as inputs or output ports. So I have converted the three dimensional input and output ports to one dimensional array. Inside the module I have created 3D temporary variables which are initialized to the inputs at the beginning of the always statement.
The matrix multiplier is also synthesisable. When synthesised for Virtex 4 fpga, using Xilinx XST, a maximum combinational path delay of 9 ns was obtained.
Matrix multiplier:
//Module for calculating Res = A*B
//Where A,B and C are 2 by 2 matrices.
module Mat_mult(A,B,Res);
//input and output ports.
//Where A,B and C are 2 by 2 matrices.
module Mat_mult(A,B,Res);
//input and output ports.
//The size 32 bits which is 2*2=4 elements,each of which is 8 bits wide.
input [31:0] A;
input [31:0] B;
output [31:0] Res;
//internal variables
reg [31:0] Res;
reg [7:0] A1 [0:1][0:1];
reg [7:0] B1 [0:1][0:1];
reg [7:0] Res1 [0:1][0:1];
integer i,j,k;
always@ (A or B)
begin
//Initialize the matrices-convert 1 D to 3D arrays
{A1[0][0],A1[0][1],A1[1][0],A1[1][1]} = A;
{B1[0][0],B1[0][1],B1[1][0],B1[1][1]} = B;
i = 0;
j = 0;
k = 0;
{Res1[0][0],Res1[0][1],Res1[1][0],Res1[1][1]} = 32'd0; //initialize to zeros.
//Matrix multiplication
for(i=0;i < 2;i=i+1)
for(j=0;j < 2;j=j+1)
for(k=0;k < 2;k=k+1)
Res1[i][j] = Res1[i][j] + (A1[i][k] * B1[k][j]);
//final output assignment - 3D array to 1D array conversion.
Res = {Res1[0][0],Res1[0][1],Res1[1][0],Res1[1][1]};
end
endmodule
input [31:0] A;
input [31:0] B;
output [31:0] Res;
//internal variables
reg [31:0] Res;
reg [7:0] A1 [0:1][0:1];
reg [7:0] B1 [0:1][0:1];
reg [7:0] Res1 [0:1][0:1];
integer i,j,k;
always@ (A or B)
begin
//Initialize the matrices-convert 1 D to 3D arrays
{A1[0][0],A1[0][1],A1[1][0],A1[1][1]} = A;
{B1[0][0],B1[0][1],B1[1][0],B1[1][1]} = B;
i = 0;
j = 0;
k = 0;
{Res1[0][0],Res1[0][1],Res1[1][0],Res1[1][1]} = 32'd0; //initialize to zeros.
//Matrix multiplication
for(i=0;i < 2;i=i+1)
for(j=0;j < 2;j=j+1)
for(k=0;k < 2;k=k+1)
Res1[i][j] = Res1[i][j] + (A1[i][k] * B1[k][j]);
//final output assignment - 3D array to 1D array conversion.
Res = {Res1[0][0],Res1[0][1],Res1[1][0],Res1[1][1]};
end
endmodule
Testbench Code:
module tb;
// Inputs
reg [31:0] A;
reg [31:0] B;
// Outputs
wire [31:0] Res;
// Instantiate the Unit Under Test (UUT)
Mat_mult uut (
.A(A),
.B(B),
.Res(Res)
);
initial begin
// Apply Inputs
A = 0; B = 0; #100;
A = {8'd1,8'd2,8'd3,8'd4};
B = {8'd5,8'd6,8'd7,8'd8};
end
endmodule
// Inputs
reg [31:0] A;
reg [31:0] B;
// Outputs
wire [31:0] Res;
// Instantiate the Unit Under Test (UUT)
Mat_mult uut (
.A(A),
.B(B),
.Res(Res)
);
initial begin
// Apply Inputs
A = 0; B = 0; #100;
A = {8'd1,8'd2,8'd3,8'd4};
B = {8'd5,8'd6,8'd7,8'd8};
end
endmodule
Simulation waveform:
The codes were simulated using Xilinx ISE 13.1. The following waveform verifies that the design is working correctly.
when you perform multiplication of 8 bit elements and then add them together (in this case 2 multiplications and one addition for obtaining each element of the resultant matrix), you need to have the elements of the resultant matrix to be much wider to avoid overflow. This code is not correct.
ReplyDeleteThe code is basic. It doesnt take care of overflows. Doesnt mean its not correct. For lower value of matrix elements it will work fine.
Deletesir..can you help us with this question.
ReplyDeleteImplement 8 bits ALU with 8 bits register.
a)Design 8 bits ALU that X as input (e.g:A,B..more inputs) and produces ones 8 bits result. The ALU should have 5 operation for ALU and LOGIC.
b)Design a 8x8 bit register file
c)Design a Control Unit
I hope you can help us.
THANK YOU.
I can write the codes for a fee. contact me at lalnitt (at) gmail (dot) com.
DeleteI want 8 by 8 Matrix multiplication verilog code asap
Deletedid you got the code
Deletehow did you display A1,B1 and Res1 in the simulation(the test bench code)?
ReplyDeletewe are using 14.2 ISE
when I tried this code in vivado 2016.2 it throws an error when implementation. The synthesis works well.
ReplyDelete'The design is empty. there are no leaf cells in the design.
Check if opt_design has removed all the leaf cells of your design. check whether you have initiated and connected all of the top level ports.'
Please let me know what's the issue. Thank you.
ReplyDeleteThe code works as you said. thanks, for your explanations.
How did you display A1,B1 and Res1 in matrix form?
ReplyDeleteWhat changes can be done to implement matrix multiplication as FSM?
ReplyDeleteplease send the code for strassen matrix multiplication
ReplyDeleteI create ip from this code using xilinx vivado . then i add this ip with zynq processing system ip. and generate bit stream successfully , now i am trying to write code IN xsdk in c language .
ReplyDeletebut my sdk code is not working properly ..
code is below...
#include "xparameters.h"
#include "xil_io.h"
#include "xbasic_types.h"
#include
#include "myip_matix_Ani.h"
#define MAT_A_ROWS 2
#define MAT_A_COLS 2
#define MAT_B_ROWS 2
#define MAT_B_COLS 2
int main()
{
int A[2][2], B[2][2],j, i;
int C[2][2];
xil_printf("enter 4 numbers for A matrix\n");
for(i=0;i<2;i++)
for(j=0;j<2;j++)
scanf("%d", &A[i][j]);
{
if((XPAR_MYIP_MATIX_ANI_0_S00_AXI_BASEADDR+(j*sizeof(int)+i*MAT_A_ROWS*sizeof(int)))<=XPAR_MYIP_MATIX_ANI_0_S00_AXI_HIGHADDR)
Xil_Out32(XPAR_MYIP_MATIX_ANI_0_S00_AXI_BASEADDR+(j*sizeof(int)+i*MAT_A_ROWS*sizeof(int)), A[i][j]);
}
xil_printf("enter 4 numbers for B matrix\n");
for(i=0;i<2;i++)
for(j=0;j<2;j++)
scanf("%d", &B[i][j]);
{
if((XPAR_MYIP_MATIX_ANI_0_S00_AXI_BASEADDR+4+(j*sizeof(int)+i*MAT_A_ROWS*sizeof(int)))<=XPAR_MYIP_MATIX_ANI_0_S00_AXI_HIGHADDR)
Xil_Out32(XPAR_MYIP_MATIX_ANI_0_S00_AXI_BASEADDR+4+(j*sizeof(int)+i*MAT_A_ROWS*sizeof(int)), B[i][j]);
}
for(i=0;i<2;i++)
for(j=0;j<2;j++)
{
if((XPAR_MYIP_MATIX_ANI_0_S00_AXI_BASEADDR+8+(j*sizeof(int)+i*MAT_A_ROWS*sizeof(int)))<=XPAR_MYIP_MATIX_ANI_0_S00_AXI_HIGHADDR)
C[i][j]= Xil_In32(XPAR_MYIP_MATIX_ANI_0_S00_AXI_BASEADDR+8+(j*sizeof(int)+i*MAT_A_ROWS*sizeof(int)));
}
xil_printf("{\r\n");
for (i = 0; i < MAT_A_ROWS; i++)
for (j = 0; j < MAT_B_COLS; j++)
xil_printf("%d\n",C[i][j]);
}
can anyone tell me why it is not giving correct results..
thanks
hey, can get ur files used for ip with Zynq processor system ip. Pllease, i am doing some experiments
DeleteSir does it require FPGA kit for execution
ReplyDeletehow to write verilog code for n*n matrix transpose
ReplyDeletehttps://github.com/RodSernaPerez/verilog-utils/blob/master/transpose.v
DeleteHow to write code for a generic matrix multiplier? I want the code asap!!
ReplyDeleteHow to write code for transpose of matrix of size 64×64
ReplyDeletehow to write 8x8 matrix multiplication
ReplyDelete