Tuesday, September 24, 2013

A Sudoku Image Recognition Program                                                


Newspaper image capture
It is amazing to me, that the eye can recognize objects so quickly and apparently so easily. I suspect that a significant amount of parallel processing occurs locally in the eye that helps with edge and surface detection. The conscious brain is usually not even aware of what must be the huge amount of computing to make sense of the morning newspaper. I get a glimpse of the enormity of the problem when I open one of those multi language instruction booklets and it takes me a few moments to realize that I’m trying to read German.

A computer is far worse off because the only thing it “knows” is what is going on at a single given pixel. In one image there are a lot of pixels (640x480=307,200) and as the “attention” moves to the next pixel, it forgets what it had just seen. I write algorithms for computers as if I were making instructions for someone with extreme tunnel vision who has little ability to make long term memories.

Gray-scale conversion
But first I must capture a USB camera bitmaped image in the computer. Some kind soul, made this relatively easy in VB6, by writing a short module that utilizes the SendMessage command of the USER32 API. Download at: USBCammod.bas

Then by using simple commands, the USB cam is turned on and images are continuously sent to the windows clipboard. The clipboard data is constantly copied to a VB6 picturebox and you see a live video image in the form’s picturebox. Still images are captured from the clipboard in the same way.
  
mCapHwnd = capCreateCaptureWindow("WebcamCapture", 0, 0, 0, 640, 480, H, 0)
 DoEvents
 SendMessage mCapHwnd, CONNECT, 0, 0
 CamRunning = True

 Do While CamRunning
     SendMessage mCapHwnd, GET_FRAME, 0, 0
     SendMessage mCapHwnd, COPY, 0, 0
     PIC.Picture = Clipboard.GetData
     DoEvents
 Loop

Monochromatic image
A bitmapped color image can be converted to a gray-scale image by averaging each RGB pixel value and then writing the average back into each of the RGB pixel locations. A RGB color value of (10, 9, 2) is converted to gray- scale value of (7, 7, 7). Gray-scale images can be converted into two toned (monochromatic) images by comparing to a cutoff value. Gray-scale values above the cutoff produce white points and values below the cutoff are black points.

Background cutoff values
The trick of making a decent monochrome image is to find where  to set the cutoff. The cutoff needs to be set to unique values in different areas of an image. I used an algorithm that looked at local blocks of 96x96 pixels that found the gray-scale distribution. The highest distribution peek of a newsprint image is always the white background color. I defined the local cutoff value to be the first low (1/4 peek value) distribution area on the dark side of the peek. So now, the gray-scale values of the image are graded against the local cutoff and a strong monochromatic image is produced. I also copy these monochromatic values into a 640x480 byte array because VB6 can do calculations on array data faster than it can draw or read pixel values from an image.

So starting with a monochromatic image of a newspaper, I need to identify the Sudoku grid and determine the beginning values. Here is the algorithm that I used:

Detection of horizontal & vertical lines
    1.    Look for relatively horizontal and vertical lines in the image and overwrite the monochromatic image with these red lines. It  is relatively easy to look at the points along possible lines and grade how many black points are intercepted. If the number of black points passes a threshold, then draw the line.


Cleaned up image
    2.    Cleanup the image by removing all red pixels that had not overwritten monochromatic black.






Flood-fill with yellow
    3.    Starting from the image edges, flood-fill all white or black pixels with the color yellow. The yellow flood does not cross the solid red lined boarder of the Sudoku grid and leaves it’s interior cells white with black numbers.


    4.    Record the coordinates of the four corners of the Sudoku grid by detecting the first white pixels found, while working diagonally inward from the image corners. I had to made a test that eliminated corner detection false alarms (small islands of white in the sea of yellow).

Corner detection & parsing
    5.    Using the four Sudoku corner coordinates, calculate the coordinates of all 81 cell corners (marked by blue dots).

    6.    Parse each Sudoku cell by grading the number of pixels that match an idealized cartoon of the numbers 1 to 9 and a blank. The cartoon image corners are stretched to match up with each of the Sudoku cell corners. The cartoon is then jiggled around a bit, left & right, up & down, to find the best fit to the monochrome image. The highest scoring cartoon should be the correct cell value. Draw in the cell value (overwrites in red).

Puzzle solved
    7.    Solve the puzzle by repeatedly applying the LookForSingles and the LookForPairs algorithms. Draw in the cell value.

    8.    Write the solution and cell coordinates to a text file stored on the hard drive. (To be used in a future project)



The program is long, gangly and uses 50% of the CPU power for 37 seconds to visual recognize a Sudoku puzzle and parse the starting numbers. The puzzle solving part  takes less than a second. I include here, a captured screen video of the program running. The video capture software heavily uses the CPU and slows the Sudoku recognition considerably. In the video, it takes 108 seconds to identify and solve the puzzle.










No comments:

Post a Comment