?

Log in

No account? Create an account
Algorithm - The Mad Schemes of Dr. Tectonic [entries|archive|friends|userinfo]
Beemer

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Algorithm [Jun. 27th, 2012|10:41 pm]
Beemer
Okay, I started this more than a week ago and got stalled-out partway through and haven't posted about all kinds of other interesting things since then, so I am just going to finish it off to get it out of my brain.

I was working on a problem for a couple days because I had an interesting idea for a solution and was compelled to try it out. Like, even though it wasn't all that important, I just HAD TO KNOW whether it would work.

So here's the problem: you've got a region that's defined with a grid. You want to draw a line around the edge of the region. How do you find the points that define the boundary of your region?

Which is to say, you've got a (square) grid of locations, and the grid cells that are a part of the region have a value of 1, and all the other grid cells have a value of 0. You know the coordinates for the center of each grid cell. You can assume the region is all in one piece and has no holes in it. You need the coordinates of points all along the edge of the region, and they need to be sorted in order, such that if you draw line segments from one to the next, you've outlined the region. Even better if you can find the minimal set of points to define the region. Best of all is if you can also figure out a way to get rid of stair-steps along diagonals -- but if you do that, it has to be in such a way that two adjacent regions won't overlap.

(And for anybody doing a tech hire, I think this might make a good interview question.)


So obviously, since we know the coordinates of the center, we can get the corners of any grid cell by adding and subtracting half the grid spacing in the appropriate combinations. But we only want the corners for grid cells along the edge of the region.

Now, you could write some code to check each grid cell against its neighbors and only calculate those boundary corner coordinates, but you have to do it four times over with a slightly different case for each direction, and dealing with the edges of the grid is a pain, and it just becomes very messy very quickly.

The first insight I had was that you can skip over all that nonsense: calculate the coordinates for all four corners of each cell in the region, and put them in a list. Now count how many times each coordinate appears in the list. If it's four, that's a point on the interior of the region, where four cells come together: throw it out. When you're done, the list only has boundary points left in it. We're halfway there! Now we just have to sort the list.

The first thing I tried was this algorithm I found on stackoverflow. The idea is that you sort your list using a comparison that checks whether one point is to the left or right of the other relative to the center of the polygon. (Using the angle from 12 o'clock to the center to the point, in other words.) Now, I don't know whether the algorithm is buggy, or whether I violated some unstated assumption, or whether I made an error in converting it to perl (which wants a different flavor comparison function), but it didn't work well at all. It got all messed-up around concavities, and in a few cases would list the points in a way that wrapped multiple times around the region. So THAT's no good.

But then I had a second insight and realized, a-ha! These are corner points, and the grid is square. so neighboring corners must always be one unit away from one another! So I can start with any random point and search for another point that's 1 unit away. When you find it, put the first point into the sorted list and start looking for a neighbor of the new point. Repeat until everything is sorted. If you ever don't find a neighbor, just start over with the first remaining point in the unsorted list. Worst case, you'll have some rearranging/correction to do, but it'll mostly be long sequential segments, so it won't be too bad.

This worked a lot better. The only places it encountered major difficulties were on narrow necks and inlets only one grid-cell wide.

And now an aside for the third major insight I had somewhere in here, which is something that makes this a manageable problem. Coordinates are pairs of numbers, and in most languages that's kind of a pain to deal with. If you store them in two lists, one for X and one for Y, you have to keep the lists synched, and you can't easily sort them, and treating them as 2-element lists gets into list-of-list territory the syntax gets hairy right quick. Making a custom object class is kind of overkill and may be major hassle. So what do you do? Well, I'm getting my coordinates from an external file, so I'm reading in pairs of numbers separated by a tab, splitting the string into X and Y, and then fighting with the question of how to store them. And then I realized: I don't have to split them at all! I can just store that string, "X-tab-Y", as the primary representation of the coordinate pair, and only bother extracting the X and Y components when I actually need to do math like checking the distance between two points. No need to recombine X and Y for output, either. This made things WAY simpler.

So this is working decently, and I fixed up a couple test cases by hand and it wasn't too bad. But I still have the stairstep jaggies along the diagonals and a bunch of redundant points along the straightaways and a non-trivial amount of correction to do by hand. It's a functional solution, but not yet an elegant one.

And thinking about the diagonals, I realized: maybe instead of corners, I actually want to use edges. That is, instead of adding (1,1),(-1,-1),etc. to the center coordinates, add (1,0),(0,1) etc., and throw out anything that you have *2* copies of rather than 4. Well, that's a beautiful solution for eliminating the stairsteps, but still gets confused in the narrow places.

But then, Ah-HA! I realized you can use the corners and edges together! Because each corner has exactly two edge points that are a distance 1/2 away, and likewise with each edge point. So now you can unambiguously sort the points into the correct order. Victory!

And I tried it and it worked PERFECTLY. Even in the case where the region was in three separate pieces! (Well, as well as it possibly could have.)

Of course, now you've got an excess of points, and it would be nice to thin them down to just the minimal necessary set. So first you throw out all the corner points (which is easy if you kept separate lists of edges and points back at the beginning). Then go through the remaining list and look at each point's neighbors. Calculate the relative direction vector from the left-hand neighbor to the current point, and from the current point to the right-hand neighbor, by subtracting them. If they're different, add the current point to the list of final points; otherwise, don't.

And done! Once you've figured out how it all fits together, you can go back through and clean things up, and with judicious use of an apply-to-all function like map(), it even looks kinda elegant.

I was gonna post my resulting code, but I don't have access to it at the moment and I want to be done with this, so that'll have to do.
LinkReply

Comments:
(Deleted comment)
[User Picture]From: dr_tectonic
2012-06-28 05:53 am (UTC)
Ha, yes. Boundary polygon, not bounding box. Makes it a very different problem.

Though they're not completely arbitrary regions; I think the algorithm would break if you're actually working on the globe (as I am) and your region touches a pole or spans the international date line (not cases I have to deal with, thankfully).
(Reply) (Parent) (Thread)
[User Picture]From: dr_tectonic
2012-06-28 05:56 am (UTC)
If you were using this to see how somebody thinks, the bounding box version of the question might be an interesting lead-in, just to see how quickly they can articulate the solution...
(Reply) (Parent) (Thread)
[User Picture]From: detailbear
2012-06-29 05:47 am (UTC)
All I can think of is Minesweeper and Game of Life. So would this pseudocode work? (Please excuse the bad form. It's been a while.)

On the initial grid, do a raster scan and on each grid cell with value 1, compare north, east, west and south neighbours. If any neighbour has a value of zero, mark as an edge cell by recording adjoining zero cells appear on the N/E/S/W side. This gives an outline of the region.

Rasterwise, find the first edge cell (will be marked WN, WNE or WNES). Left edge is FirstPoint. Top edge is TemporaryEndPoint. Add FirstPoint to vertex list. Direction-of-travel := EAST. Next cell is East.

While TemporaryEndPoint != FirstPoint

If Direction-of-travel = EAST
..top edge is TemporaryEndPoint.
..While cell marked N
....top edge is TemporaryEndPoint.
....Next cell to East.
..End While
..If cell marked NE
....add top edge to vertex list.
....right edge is TemporaryEndPoint.
....Direction-of-travel := SOUTH.
....Next cell to South.
..Elseif cell not edge
....add TemporaryEndPoint to vertex list.
....Direction-of-travel := NORTH.
....Next cell to North.
..Endif

Elseif Direction-of-travel = NORTH
..left edge is TemporaryEndPoint.
..While cell marked W
....left edge is TemporaryEndPoint.
....Next cell to North.
..End While
..If cell marked WN
....add left edge to vertex list.
....Top edge is TemporaryEndPoint.
....Direction-of-travel := EAST.
....Next cell to East.
..Elseif cell not an edge
....add TemporaryEndPoint to vertex list.
....Direction-of-travel := West.
....Next cell to West.
..Endif

Elseif Direction-of-travel = WEST
..bottom edge is TemporaryEndPoint.
..While cell marked S
....bottom edge is TemporaryEndPoint.
....Next cell to West.
..EndWhile
..If cell marked SW
....add bottom edge to vertex list
....left edge is TemporaryEndPoint.
....Direction-of-travel := NORTH.
....Next cell to North.
..Elseif cell not edge
....add TemporaryEndPoint to vertex list.
....Direction-of-travel := SOUTH.
....Next cell to South.
..Endif

Elseif Direction-of-travel = SOUTH
..right edge is TemporaryEndPoint.
..While cell marked E
....right edge is TemporaryEndPoint.
....Check cell to North.
..EndWhile
..If cell marked ES
....add right edge to vertex list
....Bottom edge is TemporaryEndPoint
....Direction-of-travel := WEST.
....Next cell is West
..ElseIf cell not an edge
....add TemporaryEndPoint to vertex list.
....Direction-of-travel := West
....Next cell to West.
..EndIf
EndWhile
Done

"Next cell to West" would translate to X := X-1 if that is not obvious. "Top edge" is coordinates of center less 0.5 on the X value.
(Reply) (Thread)
[User Picture]From: dr_tectonic
2012-10-16 10:27 pm (UTC)
I've had this post open in a tab for four months now, intending to respond to your comment, and today I was revisiting the code and was reminded of it!

I think this probably would work. It's a straightforward and sensible approach, and it seems conceptually correct.

There are two wrinkles about this approach that would lead me in a different direction. The first is that scanning and marking the grid with NSEW values to indicate the neighbors is easy conceptually, but in practice it's kind of a pain. The data structures needed for the four neighbor flags are a little messy to set up, and raster scanning neighbors on a grid requires a whole bunch of special-case code deal with the edges, which is annoying. The second is that I find it hard to hold code in my mind when it's got a whole bunch of repeated very similar if-statements in it. I always make a typo somewhere when I'm writing it up, and then have a devil of a time tracking down the bug. I'd want to consolidate the similar chunks of code by switching from absolute directions to a relative coordinate frame (forward/left/right), but then you have to figure out a clever way to go back and forth between the relative positioning and the annotated grid and it makes my brain itch.

None of which is to say that yours is not an equally good solution; if it works, it's a good solution, and I think yours would work. It's just a different style of coding than what I find aesthetically pleasing.
(Reply) (Parent) (Thread)
[User Picture]From: detailbear
2012-10-19 12:13 am (UTC)
Thank you. Some day when my brain returns I'll read that in detail. In the mean time, please send frost to kill ragweed.
(Reply) (Thread)