Verilog Coding Tips and Tricks: Verilog code for 4 bit Wallace tree multiplier

Monday, November 16, 2015

Verilog code for 4 bit Wallace tree multiplier

Few years back I wrote the VHDL code for a 4 bit Wallace tree multiplier. In this post I want to convert the VHDL into a Verilog code. A Wallace tree multiplier is much faster than the normal multiplier designs.

The design uses half adder and full adder Verilog designs I have implemented few weeks. back. These modules will be instantiated for the implementation 4 bit Wallace multiplier.

4 bit Wallace tree multiplier:

module wallace(A,B,prod);
    //inputs and outputs
    input [3:0] A,B;
    output [7:0] prod;
    //internal variables.
    wire s11,s12,s13,s14,s15,s22,s23,s24,s25,s26,s32,s33,s34,s35,s36,s37;
    wire c11,c12,c13,c14,c15,c22,c23,c24,c25,c26,c32,c33,c34,c35,c36,c37;
    wire [6:0] p0,p1,p2,p3;

//initialize the p's.
    assign  p0 = A & {4{B[0]}};
    assign  p1 = A & {4{B[1]}};
    assign  p2 = A & {4{B[2]}};
    assign  p3 = A & {4{B[3]}};

//final product assignments    
    assign prod[0] = p0[0];
    assign prod[1] = s11;
    assign prod[2] = s22;
    assign prod[3] = s32;
    assign prod[4] = s34;
    assign prod[5] = s35;
    assign prod[6] = s36;
    assign prod[7] = s37;

//first stage
    half_adder ha11 (p0[1],p1[0],s11,c11);
    full_adder fa12(p0[2],p1[1],p2[0],s12,c12);
    full_adder fa13(p0[3],p1[2],p2[1],s13,c13);
    full_adder fa14(p1[3],p2[2],p3[1],s14,c14);
    half_adder ha15(p2[3],p3[2],s15,c15);

//second stage
    half_adder ha22 (c11,s12,s22,c22);
    full_adder fa23 (p3[0],c12,s13,s23,c23);
    full_adder fa24 (c13,c32,s14,s24,c24);
    full_adder fa25 (c14,c24,s15,s25,c25);
    full_adder fa26 (c15,c25,p3[3],s26,c26);

//third stage
    half_adder ha32(c22,s23,s32,c32);
    half_adder ha34(c23,s24,s34,c34);
    half_adder ha35(c34,s25,s35,c35);
    half_adder ha36(c35,s26,s36,c36);
    half_adder ha37(c36,c26,s37,c37);


Testbench code:

The testbench code checks the correctness of results for the whole range of inputs A and B. 

module tb;

    // Inputs
    reg [3:0] A;
    reg [3:0] B;

    // Outputs
    wire [7:0] prod;
    integer i,j,error;

    // Instantiate the Unit Under Test (UUT)
    wallace uut (

    initial begin
        // Apply inputs for the whole range of A and B.
        // 16*16 = 256 inputs.
        error = 0;
        for(i=0;<=15;= i+1)
            for(j=0;<=15;= j+1) 
                A <= i; 
                B <= j;
                if(prod != A*B) //if the result isnt correct increment "error".
                    error = error + 1;  

Simulation waveform:

The codes were simulated using Xilinx ISE 13.1. The functionality of the codes were verified. A part of the waveform is pasted below:

Synthesis Results:

The design was successfully synthesised for Virtex 4 fpga and a maximum combinational path delay of 8.652ns was obtained.


  1. Can you show the dot diagram of Wallace tree multiplier with stages how you wrote the code

  2. can you upload vhdl code for 8 bit wallace tree multiplier

  3. can you upload vhdl code for 8 bit wallace tree multiplier

  4. This comment has been removed by the author.