Few years back I wrote the VHDL code for a 4 bit Wallace tree multiplier. In this post I want to convert the VHDL into a Verilog code. A Wallace tree multiplier is much faster than the normal multiplier designs.
The design uses half adder and full adder Verilog designs I have implemented few weeks. back. These modules will be instantiated for the implementation 4 bit Wallace multiplier.
4 bit Wallace tree multiplier:
module wallace(A,B,prod);
//inputs and outputs
input [3:0] A,B;
output [7:0] prod;
//internal variables.
wire s11,s12,s13,s14,s15,s22,s23,s24,s25,s26,s32,s33,s34,s35,s36,s37;
wire c11,c12,c13,c14,c15,c22,c23,c24,c25,c26,c32,c33,c34,c35,c36,c37;
wire [6:0] p0,p1,p2,p3;
//initialize the p's.
assign p0 = A & {4{B[0]}};
assign p1 = A & {4{B[1]}};
assign p2 = A & {4{B[2]}};
assign p3 = A & {4{B[3]}};
//final product assignments
assign prod[0] = p0[0];
assign prod[1] = s11;
assign prod[2] = s22;
assign prod[3] = s32;
assign prod[4] = s34;
assign prod[5] = s35;
assign prod[6] = s36;
assign prod[7] = s37;
//first stage
half_adder ha11 (p0[1],p1[0],s11,c11);
full_adder fa12(p0[2],p1[1],p2[0],s12,c12);
full_adder fa13(p0[3],p1[2],p2[1],s13,c13);
full_adder fa14(p1[3],p2[2],p3[1],s14,c14);
half_adder ha15(p2[3],p3[2],s15,c15);
//second stage
half_adder ha22 (c11,s12,s22,c22);
full_adder fa23 (p3[0],c12,s13,s23,c23);
full_adder fa24 (c13,c32,s14,s24,c24);
full_adder fa25 (c14,c24,s15,s25,c25);
full_adder fa26 (c15,c25,p3[3],s26,c26);
//third stage
half_adder ha32(c22,s23,s32,c32);
half_adder ha34(c23,s24,s34,c34);
half_adder ha35(c34,s25,s35,c35);
half_adder ha36(c35,s26,s36,c36);
half_adder ha37(c36,c26,s37,c37);
endmodule
Testbench code:
The testbench code checks the correctness of results for the whole range of inputs A and B.
Simulation waveform:
The codes were simulated using Xilinx ISE 13.1. The functionality of the codes were verified. A part of the waveform is pasted below:
The design uses half adder and full adder Verilog designs I have implemented few weeks. back. These modules will be instantiated for the implementation 4 bit Wallace multiplier.
4 bit Wallace tree multiplier:
module wallace(A,B,prod);
//inputs and outputs
input [3:0] A,B;
output [7:0] prod;
//internal variables.
wire s11,s12,s13,s14,s15,s22,s23,s24,s25,s26,s32,s33,s34,s35,s36,s37;
wire c11,c12,c13,c14,c15,c22,c23,c24,c25,c26,c32,c33,c34,c35,c36,c37;
wire [6:0] p0,p1,p2,p3;
//initialize the p's.
assign p0 = A & {4{B[0]}};
assign p1 = A & {4{B[1]}};
assign p2 = A & {4{B[2]}};
assign p3 = A & {4{B[3]}};
//final product assignments
assign prod[0] = p0[0];
assign prod[1] = s11;
assign prod[2] = s22;
assign prod[3] = s32;
assign prod[4] = s34;
assign prod[5] = s35;
assign prod[6] = s36;
assign prod[7] = s37;
//first stage
half_adder ha11 (p0[1],p1[0],s11,c11);
full_adder fa12(p0[2],p1[1],p2[0],s12,c12);
full_adder fa13(p0[3],p1[2],p2[1],s13,c13);
full_adder fa14(p1[3],p2[2],p3[1],s14,c14);
half_adder ha15(p2[3],p3[2],s15,c15);
//second stage
half_adder ha22 (c11,s12,s22,c22);
full_adder fa23 (p3[0],c12,s13,s23,c23);
full_adder fa24 (c13,c32,s14,s24,c24);
full_adder fa25 (c14,c24,s15,s25,c25);
full_adder fa26 (c15,c25,p3[3],s26,c26);
//third stage
half_adder ha32(c22,s23,s32,c32);
half_adder ha34(c23,s24,s34,c34);
half_adder ha35(c34,s25,s35,c35);
half_adder ha36(c35,s26,s36,c36);
half_adder ha37(c36,c26,s37,c37);
endmodule
Testbench code:
The testbench code checks the correctness of results for the whole range of inputs A and B.
module tb;
// Inputs
reg [3:0] A;
reg [3:0] B;
// Outputs
wire [7:0] prod;
integer i,j,error;
// Instantiate the Unit Under Test (UUT)
wallace uut (
.A(A),
.B(B),
.prod(prod)
);
initial begin
// Apply inputs for the whole range of A and B.
// 16*16 = 256 inputs.
error = 0;
for(i=0;i <=15;i = i+1)
for(j=0;j <=15;j = j+1)
begin
A <= i;
B <= j;
#1;
if(prod != A*B) //if the result isnt correct increment "error".
error = error + 1;
end
end
endmodule
// Inputs
reg [3:0] A;
reg [3:0] B;
// Outputs
wire [7:0] prod;
integer i,j,error;
// Instantiate the Unit Under Test (UUT)
wallace uut (
.A(A),
.B(B),
.prod(prod)
);
initial begin
// Apply inputs for the whole range of A and B.
// 16*16 = 256 inputs.
error = 0;
for(i=0;i <=15;i = i+1)
for(j=0;j <=15;j = j+1)
begin
A <= i;
B <= j;
#1;
if(prod != A*B) //if the result isnt correct increment "error".
error = error + 1;
end
end
endmodule
Synthesis Results:
The design was successfully synthesised for Virtex 4 fpga and a maximum combinational path delay of 8.652ns was obtained.
Can you show the dot diagram of Wallace tree multiplier with stages how you wrote the code
ReplyDeletecan you upload vhdl code for 8 bit wallace tree multiplier
ReplyDeletecan you upload vhdl code for 8 bit wallace tree multiplier
ReplyDeleteThis comment has been removed by the author.
ReplyDelete32bit Wallace tree
ReplyDeleteseems like code is wrong. If it is correct how does c32, c24, c25 came in the second stage
ReplyDeletecan u upload verilog code for Dadda tree multiplier
ReplyDelete