Extracting Sudokus from images using scikit-image

written by Henrik Blidh on 2016-02-10

I uploaded my old Sudoku solver to Github recently and then freshened it up and released it on PyPI so it could be installed with pip. It works well and I am quite content with it, except for the fact that you have to enter the Sudoku by hand...

Thus, I decided to make an application that takes an image and extracts a Sudoku from it: SudokuExtract. Since it will be dealing with image analysis tasks I will have need of some Python package for that. I have used OpenCV extensively, both from Python and C++, and never really liked the Python bindings or the ways to install them. I prefer pip installations, especially if I want to deploy the solution on e.g. Heroku.

I decided to use scikit-image instead, since it ties together nicely with NumPy, SciPy, Pillow and Scikit-Learn and other such packages that I use frequently.

I found a blog post doing the very same thing using OpenCV. Let's see if we can use something like that approach.

Original Sudoku

Using the same image as they do in the blog post, I follow their lead and do some filtering, adaptive thresholding and binary filling, although I have:

import numpy as np
from scipy import ndimage as ndi
from skimage.filters import gaussian_filter, threshold_adaptive

bimg = gaussian_filter(img, sigma=1.0)
bimg = threshold_adaptive(bimg, 20, offset=2/255)
bimg = -bimg
bimg = ndi.binary_fill_holes(bimg)

Binary Sudoku Blobs

Nice. Using scikit-image's labeling solutions and region properties methods to establish what blobs that are present:

label_image = label(bimg, background=False)
label_image += 1

regions = regionprops(label_image)
regions.sort(key=attrgetter('area'), reverse=True)

The regions are now sorted in order of size, from largest to smallest. Given that the Sudoku probably is the most prominent item in the picture, it is likely that the first blob is the desired one.

In the next post we will go on with trying to "straighten out" the blobs detected and see if they actually contain a Sudoku!