||[Jun. 27th, 2012|10:41 pm]
Okay, I started this more than a week ago and got stalled-out partway through and haven't posted about all kinds of other interesting things since then, so I am just going to finish it off to get it out of my brain.|
I was working on a problem for a couple days because I had an interesting idea for a solution and was compelled to try it out. Like, even though it wasn't all that important, I just HAD TO KNOW whether it would work.
So here's the problem: you've got a region that's defined with a grid. You want to draw a line around the edge of the region. How do you find the points that define the boundary of your region?
Which is to say, you've got a (square) grid of locations, and the grid cells that are a part of the region have a value of 1, and all the other grid cells have a value of 0. You know the coordinates for the center of each grid cell. You can assume the region is all in one piece and has no holes in it. You need the coordinates of points all along the edge of the region, and they need to be sorted in order, such that if you draw line segments from one to the next, you've outlined the region. Even better if you can find the minimal set of points to define the region. Best of all is if you can also figure out a way to get rid of stair-steps along diagonals -- but if you do that, it has to be in such a way that two adjacent regions won't overlap.
(And for anybody doing a tech hire, I think this might make a good interview question.)
So obviously, since we know the coordinates of the center, we can get the corners of any grid cell by adding and subtracting half the grid spacing in the appropriate combinations. But we only want the corners for grid cells along the edge of the region.
Now, you could write some code to check each grid cell against its neighbors and only calculate those boundary corner coordinates, but you have to do it four times over with a slightly different case for each direction, and dealing with the edges of the grid is a pain, and it just becomes very messy very quickly.
The first insight I had was that you can skip over all that nonsense: calculate the coordinates for all four corners of each cell in the region, and put them in a list. Now count how many times each coordinate appears in the list. If it's four, that's a point on the interior of the region, where four cells come together: throw it out. When you're done, the list only has boundary points left in it. We're halfway there! Now we just have to sort the list.
The first thing I tried was this algorithm I found on stackoverflow. The idea is that you sort your list using a comparison that checks whether one point is to the left or right of the other relative to the center of the polygon. (Using the angle from 12 o'clock to the center to the point, in other words.) Now, I don't know whether the algorithm is buggy, or whether I violated some unstated assumption, or whether I made an error in converting it to perl (which wants a different flavor comparison function), but it didn't work well at all. It got all messed-up around concavities, and in a few cases would list the points in a way that wrapped multiple times around the region. So THAT's no good.
But then I had a second insight and realized, a-ha! These are corner points, and the grid is square. so neighboring corners must always be one unit away from one another! So I can start with any random point and search for another point that's 1 unit away. When you find it, put the first point into the sorted list and start looking for a neighbor of the new point. Repeat until everything is sorted. If you ever don't find a neighbor, just start over with the first remaining point in the unsorted list. Worst case, you'll have some rearranging/correction to do, but it'll mostly be long sequential segments, so it won't be too bad.
This worked a lot better. The only places it encountered major difficulties were on narrow necks and inlets only one grid-cell wide.
And now an aside for the third major insight I had somewhere in here, which is something that makes this a manageable problem. Coordinates are pairs of numbers, and in most languages that's kind of a pain to deal with. If you store them in two lists, one for X and one for Y, you have to keep the lists synched, and you can't easily sort them, and treating them as 2-element lists gets into list-of-list territory the syntax gets hairy right quick. Making a custom object class is kind of overkill and may be major hassle. So what do you do? Well, I'm getting my coordinates from an external file, so I'm reading in pairs of numbers separated by a tab, splitting the string into X and Y, and then fighting with the question of how to store them. And then I realized: I don't have to split them at all! I can just store that string, "X-tab-Y", as the primary representation of the coordinate pair, and only bother extracting the X and Y components when I actually need to do math like checking the distance between two points. No need to recombine X and Y for output, either. This made things WAY simpler.
So this is working decently, and I fixed up a couple test cases by hand and it wasn't too bad. But I still have the stairstep jaggies along the diagonals and a bunch of redundant points along the straightaways and a non-trivial amount of correction to do by hand. It's a functional solution, but not yet an elegant one.
And thinking about the diagonals, I realized: maybe instead of corners, I actually want to use edges. That is, instead of adding (1,1),(-1,-1),etc. to the center coordinates, add (1,0),(0,1) etc., and throw out anything that you have *2* copies of rather than 4. Well, that's a beautiful solution for eliminating the stairsteps, but still gets confused in the narrow places.
But then, Ah-HA! I realized you can use the corners and edges together! Because each corner has exactly two edge points that are a distance 1/2 away, and likewise with each edge point. So now you can unambiguously sort the points into the correct order. Victory!
And I tried it and it worked PERFECTLY. Even in the case where the region was in three separate pieces! (Well, as well as it possibly could have.)
Of course, now you've got an excess of points, and it would be nice to thin them down to just the minimal necessary set. So first you throw out all the corner points (which is easy if you kept separate lists of edges and points back at the beginning). Then go through the remaining list and look at each point's neighbors. Calculate the relative direction vector from the left-hand neighbor to the current point, and from the current point to the right-hand neighbor, by subtracting them. If they're different, add the current point to the list of final points; otherwise, don't.
And done! Once you've figured out how it all fits together, you can go back through and clean things up, and with judicious use of an apply-to-all function like map(), it even looks kinda elegant.
I was gonna post my resulting code, but I don't have access to it at the moment and I want to be done with this, so that'll have to do.