In this blog post I would like to go through a simple technique that could be used for detecting documents in an image. We will be using EmguCV for the image processing part. It is a .NET wrapper for OpenCV.

Prerequisities

  • .NET Framework
  • EmguCV (I am using 3.1 version)
  • Camera (You can use external camera or even develop on mobile devices using Xamarin) or static image with some document or paper

Intro

If you ever wondered how applications such as Office Lense or Scanbot work then you are in a good place to learn the basics!

There are multiple ways how one can approach document detection. In this blog post I will focus on document detection using a Canny Edge Detector which was developed by an Australian computer scientist called John F. Canny. Another possible approache is using Hough transform which is nicely explained in Dropbox's blog post Fast and Accurate Document Detection for Scanning, they also use Machine learning so if you are interested in Maching learning then give the Dropbox's blog post a try.

Overview

As I mentioned, we will use Canny Edge Detection to find a document's contours in an image. Canny Edge Detection extracts or "highlights" important structural information from objects in the image. In the following images you can see raw image and image with Canny Edge Detection applied. We want to achieve something similar for our documente detector.

Valve original (1).PNG Valve monochrome canny (6).PNG
By Simpsons contributor at English Wikipedia, CC BY-SA 3.0, Link

There are 3 required steps for Canny Edge Detection to work correctly and smootly:

  1. Convert image to Grayscale
  2. Apply Gaussian Blur
  3. Apply Canny algorithm

After applying Canny Edge detection on the image we will find the contour which is the most likely to be document's contour and highlight it.

Detection process

In the following steps I will show you how to implement the detection in C#.

Converting image to Grayscale

Canny algorithm requires the input to be Grayscale so we have to start by converting our image to Grayscale. First of all we have to load our image from the file.

var image = new Image<Bgr, byte>("C:/Projects/DocumentDetection/document.jpg");

Now you have two possible ways how to convert Bgr image into Gray image. You can either use

var grayScaleImage = image.Convert<Gray, byte>();

or

using (var grayScaleImage = new UMat())
    CvInvoke.CvtColor(image, convertedImage, typeof(Bgr), typeof(Gray));

So we will end up with something like

using (var image = new Image<Bgr, byte>("C:/Projects/DocumentDetection/document.jpg"))
    var grayScaleImage = image.Convert<Gray, byte>();

which load and image from "C:/Projects/DocumentDetection/document.jpg" and converts it into Grayscale.

Applying GaussianBlur

After conversion into Grayscal we have to apply Gaussian Blur so we smooth the image and remove any noise that would make the edge detection worse.

Similar to the color conversion, we have two ways of how we can blur the image. Either using Image.SmoothGaussian(int kernelWidth, int kernelHeight, double sigma1, double sigma2) where kernelWidth and kernelHeight are the width and the height of the Gaussian kernel and sigma1 and sigma2 are its standard deviations. I found that Image.SmoothGaussian(5, 5, 0, 0) are quite good values for learning purposes

using (var image = new Image<Bgr, byte>("C:/Projects/DocumentDetection/document.jpg"))
using (var grayScaleImage = image.Convert<Gray, byte>())
    var blurredImage = grayScaleImage.SmoothGaussian(5, 5, 0, 0);

or we can use CvInvoke.GaussianBlur(IInputArray src, IOutputArray dst, Size ksize, double sigmaX) with same values

using (var image = new Image<Bgr, byte>("C:/Projects/DocumentDetection/document.jpg"))
using (var grayScaleImage = image.Convert<Gray, byte>())
    CvInvoke.GaussianBlur(grayScaleImage, grayScaleImage, new Size(5,5), 0);

Now we have Blurred and Grayscale image.

Applying Canny algorithm

Canny algorithm is the last algorithm, that we will use, that modifies the image. We will use CvInvoke.Canny(IInputArray image, IOutputArray edges, double threshold1, double threshold2). Thresholds are used for hysteresis procuders. You can read more about that at Feature Detection - Canny.

using (var image = new Image<Bgr, byte>("C:/Projects/DocumentDetection/document.jpg"))
using (var grayScaleImage = image.Convert<Gray, byte>())
using (var blurredImage = grayScaleImage.SmoothGaussian(5, 5, 0, 0))
using (var cannyImage = new UMat())
    CvInvoke.Canny(blurredImage, cannyImage, 50, 150);

So now we have Canny image in which we can look for contours.

Finding largest contours

Currently we have following code

using (var image = new Image<Bgr, byte>("C:/Projects/DocumentDetection/document.jpg"))
using (var grayScaleImage = image.Convert<Gray, byte>())
using (var blurredImage = grayScaleImage.SmoothGaussian(5, 5, 0, 0))
using (var cannyImage = new UMat())
    CvInvoke.Canny(blurredImage, cannyImage, 50, 150);

In cannyImage we have the original image after applying abovementioned algorithms. In this image we have to find contours and return only contours, which are probable to be document's contours. To find contours we will use CvInvoke.FindContours(IInputOutputArray image, IOutputArray contours, IOutputArray hierarchy, RetrType mode, ChainApproxMethod method).

Our code will look like this

using (var image = new Image<Bgr, byte>("C:/Projects/DocumentDetection/document.jpg"))
using (var grayScaleImage = image.Convert<Gray, byte>())
using (var blurredImage = grayScaleImage.SmoothGaussian(5, 5, 0, 0))
using (var cannyImage = new UMat())
{
    CvInvoke.Canny(blurredImage, cannyImage, 50, 150);
    using (var contours = new VectorOfVectorOfPoint())
        CvInvoke.FindContours(cannyImage, contours, null, RetrType.Tree, ChainApproxMethod.ChainApproxSimple);
}   

Now, in the contours variable, we have all the contours found in the image. You can implement some kind of method that will select only contours which have minimally some area size and return top 5 with the largest areas. I will call the method RetrieveTopContours(...). This method depends on you, on your images.

Finding the most probable contour - document's contour

Now we have top 5 contours with largest areas. From these contours we have to select one contour which is the most likely to be the document's contour.

Following code will do this:

  1. For each contour in contours (VectorOfPoint[])
  2. Calculate contours' perimeter
  3. Approximate a polygonal curve with the specified precision
  4. If contour exists AND contour has 4 corners AND contour is convex then return contour
foreach (var contourVector in contours)
{
    using (var contour = new VectorOfPoint())
    {
        var peri = CvInvoke.ArcLength(contourVector, true);
        CvInvoke.ApproxPolyDP(contourVector, contour, 0.1 * peri, true);
        if (contour != null && contour.ToArray().Length == 4 && CvInvoke.IsContourConvex(contour))
            return contour;
    }
}

Highlighting contour

You can draw the contour using CvInvoke.DrawContours or using platform specific APIs. I have used UIBezierPath in iOS.

Result

I tried document detection on iOS using Xamarin and EmguCV and ended up with following result

Previous Post