How to build a document scanner with OpenCV ?

Created	@September 26, 2023 9:25 PM
Tags
URL	https://dvic.devinci.fr/en/resource/tutorial/scanner-opencv/

axel_thevenotmatrix-transformation-3dbillard-interactifinnovation

Designing a manual document scanner can be done in a very simple way using python and OpenCV.

We need to use the numpy and OpenCV libraries. If they are not already setup, just run the following lines at the command prompt (powershell, bash...).

$ pip install numpy
$ pip install opencv-python

Step 1: Create a trapezoid to define the transformation

First of all, a trapezoid has to be created, whose extremities will be moved in the image to surround the document to be scanned.

import numpy as np
 
class Trapezoid:

    def __init__(self, img_size, r=5, color=(143, 149, 47),
                bottom_color=(196, 196, 196), border_color=(255, 0, 135)):
        # Initialize the contours
        self.contours = np.array([[[0 + r, 0 + r]],
                               [[0 + r, img_size[1] - r]],
                               [[img_size[0] - r, img_size[1] - r]],
                               [[img_size[0] - r, 0 + r]]])

        # Initialize the radius of the borders
        self.r = r
        # Initialize the colors of the trapezoid
        self.color = color
        self.bottom_color = bottom_color
        self.border_color = border_color

    def get_border_index(self, coord):
        # A border is return if the coordinates are in its radius
        for i, b in enumerate(self.contours[:, 0, :]):
            dist = sum([(b[i] - x) ** 2 for i, x in enumerate(coord)]) ** 0.5
            if  dist < self.r:
                return i
        # If no border, return None
        return None

    def set_border(self, border_index, coord):
        self.contours[border_index, 0, :] = coord

The trapeze is in the form of an object. It is defined by its corners, which therefore define its contour. We then define a circle radius to display the corners of the trapezoid, which we will be able to be dragged with the mouse in the interface. Finally, colors are defined. The bottom_color will be used to visualize what is the bottom of the document in the image to differentiate it from the other edges.

Step 2 : Create a Scanner class

We can now begin the design of the scanner, which will be completed during this tutorial.This class will allow us to interface with OpenCV. We will be able to drag and drop the corners of the trapezoid in order to delimit the document to be scanned.

import cv2
import numpy as np

class Scanner():

    def __init__(self, input_path, output_path):
        self.input = cv2.imread(input_path)
        self.output_path = output_path
        # get the shape and size of the input
        self.shape = self.input.shape[:-1]
        self.size = tuple(list(self.shape)[::-1])

        # create a trapezoid to drag and drop and its perspective matrix
        self.M = None
        self.trapezoid = Trapezoid(self.size,
                                r=min(self.shape) // 100 + 2,
                                color=(153, 153, 153),
                                border_color=(255, 0, 136),
                                bottom_color=(143, 149, 47))

        # Initialize the opencv window
        cv2.namedWindow('Rendering', cv2.WINDOW_NORMAL)
        cv2.setMouseCallback('Rendering', self.drag_and_drop_border)

        # to remember wich border is dragged if exists
        self.border_dragged = None

So we create the class constructor. We import the image, the backup path. Note that the variables self.shape and self.size are the same but with their values in opposite directions. Indeed OpenCV treats images with a size format (width, height) while numpy treats them with a format (height, width).

A trapezoid and a self.M. perspective matrix are then instantiated. This matrix will allow the image to be distorted for scanning the document.

Finally we build an OpenCV window and a variable to remember the corner entered for the drag and drop interface. cv2.setMouseCallback allows to call a function at each mouse event on the window. The function self.drag_and_drop_border must therefore be defined.

Step 3 : Drag and Drop the trapezoid in the image

Then it is time for drag and drop. First we start by displaying the trapeze on the image. Then we will create a function that will be called at each mouse event.

There is nothing important to note in the function to display the trapezoid except that the bottom side of the document is marked for visual reference.

To allow drag and drop of the corners of this trapezoid, each time the left mouse button is pressed, we will check if these coordinates are within the circle defining a corner. We can then move it with the mouse until we release the click.

    def draw_trapezoid(self, img):
        # draw the contours of the trapezoid
        cv2.drawContours(img, [self.trapezoid.contours], -1,
                       self.trapezoid.color, self.trapezoid.r // 3)
        # draw its bottom
        cv2.drawContours(img, [self.trapezoid.contours[1:3]], -1,
                       self.trapezoid.bottom_color, self.trapezoid.r // 3)
        # Draw the border of the trapezoid as circles
        for x, y in self.trapezoid.contours[:, 0, :]:
            cv2.circle(img, (x, y), self.trapezoid.r,
                      self.trapezoid.border_color, cv2.FILLED)
        return img

    def drag_and_drop_border(self, event, x, y, flags, param):
        # If the left click is pressed, get the border to drag
        if event == cv2.EVENT_LBUTTONDOWN:
            # Get the selected border if exists
            self.border_dragged = self.trapezoid.get_border_index((x, y))

        # If the mouse is moving while dragging a border, set its new positionAxel THEVENOT
        elif event == cv2.EVENT_MOUSEMOVE and self.border_dragged is not None:
                self.trapezoid.set_border(self.border_dragged, (x, y))

        # If the left click is released
        elif event == cv2.EVENT_LBUTTONUP:
            # Remove from memory the selected border
            self.border_dragged = None

Etape 4 : Matrice de perspective

Now we have to create the perspective matrix, which allows us to switch from the coordinates of the input image to the coordinates of the output image. If you want to know more about transformation matrices, I invite you to visit my tutorial on this subject.

With OpenCV, it's simpler. We just need to know the coordinates of the corners of the source image (here, the trapezoid) and the coordinates of the destination image (the output image in the format of the input image).

    def actualize_perspective_matrices(self):
        # get the source points (trapezoid)
        src_pts = self.trapezoid.contours[:, 0].astype(np.float32)

        # set the destination points to have the perspective output image
        h, w = self.shape
        dst_pts = np.array([[0, 0],
                          [0, h - 1],
                          [w - 1, h - 1],
                          [w - 1, 0]], dtype="float32")
 
        # compute the perspective transform matrices
        self.M = cv2.getPerspectiveTransform(src_pts, dst_pts)

The interface can now be activated in real time by updating the perspective matrix at each step.

    def run(self):
        while True:
            self.actualize_perspective_matrices()

            # get the output image according to the perspective transformation
            img_output = cv2.warpPerspective(self.input, self.M, self.size)

            # draw current state of the trapezoid
            img_input = self.draw_trapezoid(self.input.copy())
            # Display until the 'Enter' key is pressed
            cv2.imshow('Rendering', np.hstack((img_input, img_output)))
            if cv2.waitKey(1) & 0xFF == 13:
                break
 
        # Save the image and exit the process
        cv2.imwrite(self.output_path, img_output)
        cv2.destroyAllWindows()

Step Bonus : Get a better output

Because it is free, you might as well improve the image quality a little. Just pass the output image through this function in the loop of the run() function.

    def bonus(self, img):
        """Make the output better"""
        hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
        value = hsv[..., 2]
        value = (value - np.min(value)) / (np.max(value) - np.min(value))
        hsv[..., 2] = (value * 255).astype(np.uint8)
        return cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)

Step 5 : Run the program

We can add these last lines and run the script from the bash.

import os
import argparse 

# Argument parser
ap = argparse.ArgumentParser()
ap.add_argument('-i', '--input', type=str,
                help="Input image")
ap.add_argument('-o', '--output', type=str, default=None,
                help='Output image : default {Input image + "_scanned"}')
args = ap.parse_args()

if __name__ == '__main__':

    if args.input is None:
        print('No input image')
    elif not os.path.exists(args.input):
        print('The input image path does not exists')
    else:
        if args.output is None:
            path, format = args.input.split('.')
            path += '_scanned'
            args.output = f'{path}.{format}'

    scanner = Scanner(args.input, args.output)
    scanner.run()

Now all that remains is to test !

$ python scanner.py -i picture.jpg