Verilog Coding Tips and Tricks: synthesisable

Showing posts with label synthesisable. Show all posts

Tuesday, December 15, 2020

Synthesizable Clocked Square Root Calculator In Verilog

Long back I had shared a Verilog module for finding the square root of a number. This function too was synthesisable, but as it was implemented with a conventional 'for' loop, it was purely combinatorial. If you want to find the square root of a relatively larger number, then the resource usage was very high.

In such cases, it makes sense to use a clocked design. Such a clocked design enables us to reuse one set of resources over and over. The advantage of such a design is that it uses far less resources while the disadvantage being the low speed.

For example, in the design I have shared in this post, to find the square root of a N-bit number, you need to wait N/2 clock cycles.

The code is written based on Figure (8) from this paper: A New Non-Restoring Square Root Algorithm and Its VLSI Implementations.

The codes are well commented, so I wont write much about how it works here. Please refer to the block diagram from the paper in case you have some doubts.

Let me share the codes now:

square_root.v:

//Synthesisable Design for Finding Square root of a number.
module square_root
    #(parameter N = 32)
    (   input Clock,  //Clock
        input reset,  //Asynchronous active high reset.      
        input [N-1:0] num_in,   //this is the number for which we want to find square root.
        output reg done,     //This signal goes high when output is ready
        output reg [N/2-1:0] sq_root  //square root of 'num_in'
    );

    reg [N-1:0] a;   //original input.
    reg [N/2+1:0] left,right;     //input to adder/sub.r-remainder.
    reg signed [N/2+1:0] r;
    reg [N/2-1:0] q;    //result.
    integer i;   //index of the loop. 

    always @(posedge Clock or posedge reset) 
    begin
        if (reset == 1) begin   //reset the variables.
            done <= 0;
            sq_root <= 0;
            i = 0;
            a = 0;
            left = 0;
            right = 0;
            r = 0;
            q = 0;
        end    
        else begin
            //Before we start the first clock cycle get the 'input' to the variable 'a'.
            if(i == 0) begin  
                a = num_in;
                done <= 0;    //reset 'done' signal.
                i = i+1;   //increment the loop index.
            end
            else if(i < N/2) begin //keep incrementing the loop index.
                i = i+1;  
            end
            //These statements below are derived from the block diagram.
            right = {q,r[N/2+1],1'b1};
            left = {r[N/2-1:0], a[N-1:N-2]};
            a = {a[N-3:0], 2'b0};  //shifting left by 2 bit.
            if ( r[N/2+1] == 1)    //add or subtract as per this bit.
                r = left + right;
            else
                r = left - right;
            q = {q[N/2-2:0], ~r[N/2+1]};
            if(i == N/2) begin    //This means the max value of loop index has reached. 
                done <= 1;    //make 'done' high because output is ready.
                i = 0; //reset loop index for beginning the next cycle.
                sq_root <= q;   //assign 'q' to the output port.
                //reset other signals for using in the next cycle.
                left = 0;
                right = 0;
                r = 0;
                q = 0;
            end
        end    
    end

endmodule

Testbench: tb.v

//Testbench for out square root calculator design.
module tb;  //testbench module is always empty. No input or output ports.

reg Clock, reset;
wire done;
parameter N = 16;   //width of the input.
reg [N-1:0] num_in;
reg [N:0] i;
wire [N/2-1:0] sq_root;
integer error,actual_result;  //this indicates the number of errors encountered during simulation.
parameter Clock_period = 10;    //Change clock period here. 

//Apply the inputs to the design and check if the results are correct. 
//The number of inputs for which the results were wrongly calculated are counted by 'error'. 
initial
begin
    Clock = 1;
    error = 0;
    i=1;
    //First we apply reset input for one clock period.
    reset = 1;
    #Clock_period;
    reset = 0;
    //Test the design for all the combination of inputs.
    //Since we have (2^16)-1 inputs, we test all of them one by one. 
    while(i<=2**N-1) begin
        apply_input(i);
        i = i+1;    
    end
    #Clock_period;
    reset = 1;   //all inputs are tested. Apply reset
    num_in = 0;     //reset the 'num_in'
    $stop;  //Stop the simulation, as we have finished testing the design.
end

task apply_input;
    input [N:0] i;
begin
    num_in = i[N-1:0];  
    wait(~done);    //wait for the 'done' to finish its previous high state
    wait(done); //wait until 'done' output goes High.
    wait(~Clock);   //we sample the output at the falling edge of the clock.
    actual_result = $rtoi($floor($pow(i,0.5))); //Calculate the actual result.
    //if actual result and calculated result are different increment 'error' by 1.
    if(actual_result != sq_root) 
        error = error + 1; 
end
endtask

//generate a 50Mhz clock for testing the design.
always #(Clock_period/2) Clock <= ~Clock;

//Instantiate the matrix multiplier
square_root #(.N(N)) find_sq_root 
        (.Clock(Clock), 
        .reset(reset), 
        .num_in(num_in), 
        .done(done),
        .sq_root(sq_root)
        );

endmodule   //End of testbench.

Simulation Waveform from ModelSim:

To reach the end of the testbench, you need to simulate only for 5.5 msec of simulation time. The simulation will automatically stop once all the input combinations are tested.

Sunday, December 13, 2020

Generic Verilog Code for Binary to Gray and Gray to Binary converter

Few years back I had written a 4 bit converter for conversion between Gray and Binary codes. After receiving much positive response I decided to write a generic version of the same.

Let me share the codes...

Binary to Gray Code Converter:

//Binary to Gray code Converter
//The 'parameter' keyword below is how we give the inputs/outputs a generic size.
module bin2gray #(parameter N = 4)
        ( input [N-1:0] bin,    //binary input
        output [N-1:0] G);      //Gray output

assign G[N-1] = bin[N-1];

//generate xor gates.
//the loop index need to be declared as 'genvar' and it can be done
//as you can see inside the 'for' loop.
//The instantiation is labelled as 'xor_gates_b2g'. 
//Always put a label when you generate instantiations.
//The 'generate' keyword need not be explicitly written.
for(genvar i=N-2;i>=0;i=i-1) begin : xor_gates_b2g
    xor(G[i],bin[i+1],bin[i]);
end

endmodule

Gray Code to Binary Converter:

//Gray code to Binary Converter
module gray2bin #(parameter N = 4)
        ( input [N-1:0] G,    //Gray input
        output [N-1:0] bin);      //Binary output

assign bin[N-1] = G[N-1];

for(genvar i=N-2;i>=0;i=i-1) begin : xor_gates_g2b
    xor(bin[i],G[i],bin[i+1]);
end

endmodule

Testbench:

//Testbench which connects both the converters back to back.
module tb;  //testbench module is always empty.

parameter N = 16;   //Change this to control the number of bits in the input/output.
reg [N-1:0] bin;
wire [N-1:0] G,bin_out;
reg [N:0] i;
integer error;  //this counts the number of errors during simulation.

    //Both the converters are connected back to back to see the binary input going to the
    //first module is the same as the output coming out of the second module.
    bin2gray #(.N(N)) uut1
        (
          .bin(bin),
          .G(G)
        );
 
    gray2bin #(.N(N)) uut2
        (
          .G(G),  
          .bin(bin_out)
        );
          
    initial 
    begin
        error = 0;  //initialize the error as zero.
        for(i=0;i<2**N;i=i+1) begin     //loop through all the  available inputs 
            bin = i[N-1:0];
            #5;
            //Count the number of errors.It should be zero at the end of simulation.
            if(bin != bin_out)  
                error = error + 1;
            #5;
        end
        #10;
        $stop;  //All possible inputs are tested. So stop the simulation.
    end          

endmodule

The codes were tested using Modelsim 10.4a version. Simply change the value of the parameter 'N' in the testbench to test for different sized converters.

A screenshot of the simulation waveform is shown below:

Saturday, December 12, 2020

Synthesizable Matrix Multiplication in Verilog

Long back I had posted a simple matrix multiplier which works well in simulation but couldn't be synthesized. But many people had requested for a synthesizable version of this code. So here we go.

The design takes two matrices of 3 by 3 and outputs a matrix of 3 by 3. Each element is stored as 8 bits. This is not a generic multiplier, but if you understand the code well, you can easily extend it for different sized matrices.

Each matrix has 9 elements, each of which is 8 bits in size. So I am passing the matrix as a 72 bit 1-Dimensional array in the design. The following table shows how the 2-D elements are mapped into the 1-D array.

Row	Column	Bit’s Position in 1-D array
0	0	7:0
0	1	15:8
0	2	23:16
1	0	31:24
1	1	39:32
1	2	47:40
2	0	55:48
2	1	63:56
2	2	71:64

Let me share the codes now...

matrix_mult.v:

//3 by 3 matrix multiplier. Each element of the matrix is 8 bit wide. 
//Inputs are named A and B and output is named as C. 
//Each matrix has 9 elements each of which is 8 bit wide. So the inputs is 9*8=72 bit long.
module matrix_mult
    (   input Clock,
        input reset, //active high reset
        input Enable,    //This should be High throughout the matrix multiplication process.
        input [71:0] A,
        input [71:0] B,
        output reg [71:0] C,
        output reg done     //A High indicates that multiplication is done and result is availble at C.
    );   

//temperory registers. 
reg signed [7:0] matA [2:0][2:0];
reg signed [7:0] matB [2:0][2:0];
reg signed [7:0] matC [2:0][2:0];
integer i,j,k;  //loop indices
reg first_cycle;    //indicates its the first clock cycle after Enable went High.
reg end_of_mult;    //indicates multiplication has ended.
reg signed [15:0] temp; //a temeporary register to hold the product of two elements.

//Matrix multiplication.
always @(posedge Clock or posedge reset)    
begin
    if(reset == 1) begin    //Active high reset
        i = 0;
        j = 0;
        k = 0;
        temp = 0;
        first_cycle = 1;
        end_of_mult = 0;
        done = 0;
        //Initialize all the matrix register elements to zero.
        for(i=0;i<=2;i=i+1) begin
            for(j=0;j<=2;j=j+1) begin
                matA[i][j] = 8'd0;
                matB[i][j] = 8'd0;
                matC[i][j] = 8'd0;
            end 
        end 
    end
    else begin  //for the positve edge of Clock.
        if(Enable == 1)     //Any action happens only when Enable is High.
            if(first_cycle == 1) begin     //the very first cycle after Enable is high.
                //the matrices which are in a 1-D array are converted to 2-D matrices first.
                for(i=0;i<=2;i=i+1) begin
                    for(j=0;j<=2;j=j+1) begin
                        matA[i][j] = A[(i*3+j)*8+:8];
                        matB[i][j] = B[(i*3+j)*8+:8];
                        matC[i][j] = 8'd0;
                    end 
                end
                //re-initalize registers before the start of multiplication.
                first_cycle = 0;
                end_of_mult = 0;
                temp = 0;
                i = 0;
                j = 0;
                k = 0;
            end
            else if(end_of_mult == 0) begin     //multiplication hasnt ended. Keep multiplying.
                //Actual matrix multiplication starts from now on.
                temp = matA[i][k]*matB[k][j];
                matC[i][j] = matC[i][j] + temp[7:0];    //Lower half of the product is accumulatively added to form the result.
                if(k == 2) begin
                    k = 0;
                    if(j == 2) begin
                        j = 0;
                        if (i == 2) begin
                            i = 0;
                            end_of_mult = 1;
                        end
                        else
                            i = i + 1;
                    end
                    else
                        j = j+1;    
                end
                else
                    k = k+1;
            end
            else if(end_of_mult == 1) begin     //End of multiplication has reached
                //convert 3 by 3 matrix into a 1-D matrix.
                for(i=0;i<=2;i=i+1) begin   //run through the rows
                    for(j=0;j<=2;j=j+1) begin    //run through the columns
                        C[(i*3+j)*8+:8] = matC[i][j];
                    end
                end   
                done = 1;   //Set this output High, to say that C has the final result.
            end
    end
end
 
endmodule

tb_matrix_mult.v:

//Testbench for testing the 3 by 3 matrix multiplier.
module tb_matrix_mult;  //testbench module is always empty. No input or output ports.

reg [71:0] A;
reg [71:0] B;
wire [71:0] C;
reg Clock,reset, Enable;
wire done;
reg [7:0] matC [2:0][2:0];
integer i,j;
parameter Clock_period = 10;    //Change clock period here. 

initial
begin
    Clock = 1;
    reset = 1;
    #100;   //Apply reset for 100 ns before applying inputs.
    reset = 0;
    #Clock_period;
    //input matrices are set and Enable input is set High
    A = {8'd9,8'd8,8'd7,8'd6,8'd5,8'd4,8'd3,8'd2,8'd1};
    B = {8'd1,8'd9,8'd8,8'd7,8'd6,8'd5,8'd4,8'd3,8'd2};
    Enable = 1;
    wait(done); //wait until 'done' output goes High.
    //The result C should be (93,150,126,57,96,81,21,42,36)
    #(Clock_period/2);  //wait for half a clock cycle.
    //convert the 1-D matrix into 2-D format to easily verify the results.
    for(i=0;i<=2;i=i+1) begin
        for(j=0;j<=2;j=j+1) begin
            matC[i][j] = C[(i*3+j)*8+:8];
        end
    end
    #Clock_period;  //wait for one clock cycle.
    Enable = 0; //reset Enable.
    #Clock_period;
    $stop;  //Stop the simulation, as we have finished testing the design.
end

//generate a 50Mhz clock for testing the design.
always #(Clock_period/2) Clock <= ~Clock;

//Instantiate the matrix multiplier
matrix_mult matrix_multiplier 
        (.Clock(Clock), 
        .reset(reset), 
        .Enable(Enable), 
        .A(A),
        .B(B), 
        .C(C),
        .done(done));


endmodule   //End of testbench.

Simulation Results:

The design was simulated successfully using Modelsim SE 10.4a version. Screenshot of the simulation waveform is shown below:

Please let me know if you are unable to get the code to work or if its not synthesisable. Good luck with your projects.

Thursday, November 2, 2017

A Verilog Function for finding SQUARE ROOT

UPDATE: A CLOCKED SQUARE ROOT CALCULATOR IN VERILOG IS AVAILABLE HERE.

In Verilog, there are no built-in operator to find the square root of a number. There are algorithms for doing this, but you have to write the code from scratch.

Here, I want to share a Verilog function for finding the square root of a binary number. The function is based on "Non-Restoring Square Root algorithm". You can learn more about the algorithm from this paper. The function takes a 32 bit input number and returns a 16 bit square root. The block diagram of the algorithm is given below:

     Here D is the input number. R is the remainder of the operation for non-perfect squares. Q contains the square root of 'D'.

The Verilog function along with the testbench code is given below:

module testbench;

reg [15:0] sqr;

//Verilog function to find square root of a 32 bit number.
//The output is 16 bit.
function [15:0] sqrt;
  input [31:0] num;  //declare input
  //intermediate signals.
  reg [31:0] a;
  reg [15:0] q;
  reg [17:0] left,right,r;
  integer i;
begin
  //initialize all the variables.
a = num;
q = 0;
i = 0;
left = 0;   //input to adder/sub
right = 0;  //input to adder/sub
r = 0;  //remainder
  //run the calculations for 16 iterations.
  for(i=0;i<16;i=i+1) begin
right = {q,r[17],1'b1};
left = {r[15:0],a[31:30]};
a = {a[29:0],2'b00};   //left shift by 2 bits.
  if (r[17] == 1) //add if r is negative
r = left + right;
  else   //subtract if r is positive
r = left - right;
q = {q[14:0],!r[17]};
  end
sqrt = q;   //final assignment of output.
end
endfunction //end of Function

//simulation-Apply inputs.
  initial begin
sqr = sqrt(32'd4000000);   #100;
sqr = sqrt(32'd96100);  #100;
sqr = sqrt(32'd25); #100;
sqr = sqrt(32'd100000000);  #100;
sqr = sqrt(32'd33); #100;
sqr = sqrt(32'd3300);   #100;
sqr = sqrt(32'd330000); #100;
sqr = sqrt(32'd3300000000); #100;
  end

endmodule

Simulation waveform:

The code was synthesised and simulated using Xilinx ISE 14.6.

From the waveform, you can see that, for perfect squares we get the exact square root as result. But for other inputs, there is an error of +/- 1.

Another way to increase the precision is adding multiples of two zeros to the right of input.

Notice the last 4 inputs in the testbench.

Actual Square root of 33 = 5.7445, But we get 5.

Further for input, 3300 we get 57 as output. Adding 2 zeros we get one fractional digit in the output.

Further for input, 330000 we get 574 as output. Adding 4 zeros we get two fractional digits in the output.

Further for input, 3300000000 we get 57445 as output. Adding 8 zeros we get four fractional digits in the output.

Verilog Coding Tips and Tricks

Pages