Verilog Coding Tips and Tricks: square root

Long back I had shared a Verilog module for finding the square root of a number. This function too was synthesisable, but as it was implemented with a conventional 'for' loop, it was purely combinatorial. If you want to find the square root of a relatively larger number, then the resource usage was very high.

In such cases, it makes sense to use a clocked design. Such a clocked design enables us to reuse one set of resources over and over. The advantage of such a design is that it uses far less resources while the disadvantage being the low speed.

For example, in the design I have shared in this post, to find the square root of a N-bit number, you need to wait N/2 clock cycles.

The code is written based on Figure (8) from this paper: A New Non-Restoring Square Root Algorithm and Its VLSI Implementations.

The codes are well commented, so I wont write much about how it works here. Please refer to the block diagram from the paper in case you have some doubts.

Let me share the codes now:

square_root.v:

//Synthesisable Design for Finding Square root of a number.
module square_root
    #(parameter N = 32)
    (   input Clock,  //Clock
        input reset,  //Asynchronous active high reset.      
        input [N-1:0] num_in,   //this is the number for which we want to find square root.
        output reg done,     //This signal goes high when output is ready
        output reg [N/2-1:0] sq_root  //square root of 'num_in'
    );

    reg [N-1:0] a;   //original input.
    reg [N/2+1:0] left,right;     //input to adder/sub.r-remainder.
    reg signed [N/2+1:0] r;
    reg [N/2-1:0] q;    //result.
    integer i;   //index of the loop. 

    always @(posedge Clock or posedge reset) 
    begin
        if (reset == 1) begin   //reset the variables.
            done <= 0;
            sq_root <= 0;
            i = 0;
            a = 0;
            left = 0;
            right = 0;
            r = 0;
            q = 0;
        end    
        else begin
            //Before we start the first clock cycle get the 'input' to the variable 'a'.
            if(i == 0) begin  
                a = num_in;
                done <= 0;    //reset 'done' signal.
                i = i+1;   //increment the loop index.
            end
            else if(i < N/2) begin //keep incrementing the loop index.
                i = i+1;  
            end
            //These statements below are derived from the block diagram.
            right = {q,r[N/2+1],1'b1};
            left = {r[N/2-1:0], a[N-1:N-2]};
            a = {a[N-3:0], 2'b0};  //shifting left by 2 bit.
            if ( r[N/2+1] == 1)    //add or subtract as per this bit.
                r = left + right;
            else
                r = left - right;
            q = {q[N/2-2:0], ~r[N/2+1]};
            if(i == N/2) begin    //This means the max value of loop index has reached. 
                done <= 1;    //make 'done' high because output is ready.
                i = 0; //reset loop index for beginning the next cycle.
                sq_root <= q;   //assign 'q' to the output port.
                //reset other signals for using in the next cycle.
                left = 0;
                right = 0;
                r = 0;
                q = 0;
            end
        end    
    end

endmodule

Testbench: tb.v

//Testbench for out square root calculator design.
module tb;  //testbench module is always empty. No input or output ports.

reg Clock, reset;
wire done;
parameter N = 16;   //width of the input.
reg [N-1:0] num_in;
reg [N:0] i;
wire [N/2-1:0] sq_root;
integer error,actual_result;  //this indicates the number of errors encountered during simulation.
parameter Clock_period = 10;    //Change clock period here. 

//Apply the inputs to the design and check if the results are correct. 
//The number of inputs for which the results were wrongly calculated are counted by 'error'. 
initial
begin
    Clock = 1;
    error = 0;
    i=1;
    //First we apply reset input for one clock period.
    reset = 1;
    #Clock_period;
    reset = 0;
    //Test the design for all the combination of inputs.
    //Since we have (2^16)-1 inputs, we test all of them one by one. 
    while(i<=2**N-1) begin
        apply_input(i);
        i = i+1;    
    end
    #Clock_period;
    reset = 1;   //all inputs are tested. Apply reset
    num_in = 0;     //reset the 'num_in'
    $stop;  //Stop the simulation, as we have finished testing the design.
end

task apply_input;
    input [N:0] i;
begin
    num_in = i[N-1:0];  
    wait(~done);    //wait for the 'done' to finish its previous high state
    wait(done); //wait until 'done' output goes High.
    wait(~Clock);   //we sample the output at the falling edge of the clock.
    actual_result = $rtoi($floor($pow(i,0.5))); //Calculate the actual result.
    //if actual result and calculated result are different increment 'error' by 1.
    if(actual_result != sq_root) 
        error = error + 1; 
end
endtask

//generate a 50Mhz clock for testing the design.
always #(Clock_period/2) Clock <= ~Clock;

//Instantiate the matrix multiplier
square_root #(.N(N)) find_sq_root 
        (.Clock(Clock), 
        .reset(reset), 
        .num_in(num_in), 
        .done(done),
        .sq_root(sq_root)
        );

endmodule   //End of testbench.

Simulation Waveform from ModelSim:

To reach the end of the testbench, you need to simulate only for 5.5 msec of simulation time. The simulation will automatically stop once all the input combinations are tested.

UPDATE: A CLOCKED SQUARE ROOT CALCULATOR IN VERILOG IS AVAILABLE HERE.

In Verilog, there are no built-in operator to find the square root of a number. There are algorithms for doing this, but you have to write the code from scratch.

Here, I want to share a Verilog function for finding the square root of a binary number. The function is based on "Non-Restoring Square Root algorithm". You can learn more about the algorithm from this paper. The function takes a 32 bit input number and returns a 16 bit square root. The block diagram of the algorithm is given below:

     Here D is the input number. R is the remainder of the operation for non-perfect squares. Q contains the square root of 'D'.

The Verilog function along with the testbench code is given below:

module testbench;

reg [15:0] sqr;

//Verilog function to find square root of a 32 bit number.
//The output is 16 bit.
function [15:0] sqrt;
  input [31:0] num;  //declare input
  //intermediate signals.
  reg [31:0] a;
  reg [15:0] q;
  reg [17:0] left,right,r;
  integer i;
begin
  //initialize all the variables.
a = num;
q = 0;
i = 0;
left = 0;   //input to adder/sub
right = 0;  //input to adder/sub
r = 0;  //remainder
  //run the calculations for 16 iterations.
  for(i=0;i<16;i=i+1) begin
right = {q,r[17],1'b1};
left = {r[15:0],a[31:30]};
a = {a[29:0],2'b00};   //left shift by 2 bits.
  if (r[17] == 1) //add if r is negative
r = left + right;
  else   //subtract if r is positive
r = left - right;
q = {q[14:0],!r[17]};
  end
sqrt = q;   //final assignment of output.
end
endfunction //end of Function

//simulation-Apply inputs.
  initial begin
sqr = sqrt(32'd4000000);   #100;
sqr = sqrt(32'd96100);  #100;
sqr = sqrt(32'd25); #100;
sqr = sqrt(32'd100000000);  #100;
sqr = sqrt(32'd33); #100;
sqr = sqrt(32'd3300);   #100;
sqr = sqrt(32'd330000); #100;
sqr = sqrt(32'd3300000000); #100;
  end

endmodule

Simulation waveform:

The code was synthesised and simulated using Xilinx ISE 14.6.

From the waveform, you can see that, for perfect squares we get the exact square root as result. But for other inputs, there is an error of +/- 1.

Another way to increase the precision is adding multiples of two zeros to the right of input.

Notice the last 4 inputs in the testbench.

Actual Square root of 33 = 5.7445, But we get 5.

Further for input, 3300 we get 57 as output. Adding 2 zeros we get one fractional digit in the output.

Further for input, 330000 we get 574 as output. Adding 4 zeros we get two fractional digits in the output.

Further for input, 3300000000 we get 57445 as output. Adding 8 zeros we get four fractional digits in the output.

Verilog Coding Tips and Tricks

Pages

Tuesday, December 15, 2020

Synthesizable Clocked Square Root Calculator In Verilog

square_root.v:

Testbench: tb.v

Simulation Waveform from ModelSim:

Thursday, November 2, 2017

A Verilog Function for finding SQUARE ROOT

UPDATE: A CLOCKED SQUARE ROOT CALCULATOR IN VERILOG IS AVAILABLE HERE.

Total Pageviews