Shifting Bits and iBeacon

Note 12/29/2017: This article was written several years ago and is in draft form; it was never published due to concerns over IP protection. Since those concerns are no longer relevant and the topic no longer fresh in my mind, I’m publishing it as-is for archival purposes.

Recently I’ve been tasked here at The Tap Lab with investigating how to incorporate Apple’s iBeacon technology into our games. Our game Tiny Tycoons is a location-based empire builder involving real world shops and other businesses, so it seemed only natural that a protocol designed primarily to enable retail outlets to detect when their customers are in close proximity might yield some interesting gameplay options. While experimenting I found myself using bit shifting, a low-level operation most programmers are familiar with but rarely if ever use unless doing systems programming. It’s not a particularly difficult concept to grasp, but having never actually done it before I found myself revisiting some basic principles.

iBeacon is a protocol developed by Apple that defines a simple way for devices called “beacons” to broadcast their location and identity over Bluetooth Low-Energy (BLE). BLE has been included in most iOS devices for a couple of years now (as well as many non-Apple products), but only recently with the release of iOS 7 has Apple enabled iBeacon functionality. BLE is a version of Bluetooth that uses very little energy, allowing devices to broadcast/listen for signal for years on a single battery. The strength of the signal can be used to determine how close the beacon is to listening device, such as a smartphone, within more granular range than GPS (under 150ft).

The canonical use case is that retailers would place beacons throughout their store next to certain products. When customers running the store’s app on their phone come within a certain proximity of the beacon/product, the app would display information related to that product or special deals. Another popular example is museum exhibits.

BLE allows for all sorts of other communication between devices, but iBeacon is just a simple abstraction built on top of it. Beacons simply broadcast a UUID and a 16-bit major and 16-bit minor value, and applications listen for a particular UUID and interpret the major/minor values in whatever way they see fit. For example, the Museum in Science might have an app that listens for their particular UUID, with the major value representing which section the visitor is in and the minor which exhibit they are looking at. The museum’s app has a database mapping these values to information about the exhibits, or perhaps they make a call to a remote API that sends the information down. Beacons are placed throughout the museum and set to broadcast the museum’s UUID and the major/minor values corresponding to each particular exhibit. Visitors are encouraged to download the app to their phone at the front desk.

As you can see, the “official” use cases are quite narrow. We wanted to see how this technology could benefit mobile gaming. The most obvious way to integrate this functionality into Tiny Tycoons was to alert the user when they walked by a business that was (virtually) owned by someone within the game. We’d tell them who owned the property and how much it was worth, and encourage them to make an offer on it in-game. If they were physically visiting a place in real life, there’s a good chance they have some interest in the location. At the very least, it would be a great way to bridge the virtual and real world in a way we hoped would excite players.

We ran into a few problems with this. The first we were prepared for: iBeacon would only work for businesses that actually had beacons installed at their location, which at the time (and still is, as far as I know) just Apple Stores. Being a new technology, no one had rolled it out yet, though there was plenty of buzz and we were sure many chains were actively looking into it. Even Apple’s rollout was a bit shaky, as evidenced by our exploratory trip to the Apple Store. But that was fine; we would implement support for it in our game, generate some positive press, and as more businesses adopted iBeacon our integration would pay off more and more.

The problem we did not foresee was not aligning with Apple’s vision for iBeacon, particularly with regard to third-party apps. In order for our game to detect beacons, we would have to listen for one or more specific UUID’s. What should we listen for? We wanted a solution that we could get up and running with at least one business right away, but that was scalable to potentially all businesses once the technology took off. Ideally, adding a business would not even require an explicit partnership, but would happen automatically when said businesses rolled out beacons in their store. Unfortunately, Apple had not said much about how it intended the technology to be used outside of the aforementioned use cases, so choosing an approach that satisfied the above conditions proved difficult.

We thought about defining a Tiny Tycoons/Tap Lab UUID and listening for that, but that would require getting each business to set their beacons to broadcast our UUID. Even if beacon technology reached the point where it was possible to programmatically add unlimited UUIDs to a single beacon, it seemed unlikely we would be able to convince large companies like Best Buy or Starbucks to broadcast ours. Reaching the millions of small businesses across the world would be impossible. The best case scenario would be if eventually a centralized broker evolved (perhaps even Apple) that one could register their UUID with, and could then push said UUID through an API to the beacons of all member businesses. I doubt, however, that this was Apple’s intent, or if it is even possible or practical to broadcast that many different UUIDs on a single beacon. Regardless, such a service did not currently exist.

The second option would be to program our game to listen for a list of known UUIDs for certain businesses. The server could send down an updated list on each startup, and businesses could be added to the database as they adopted the iBeacon technology. Our research though seemed to indicate that Apple didn’t intend for these UUIDs to be publicly published and utilized by applications other than the first party. We also had severe doubts, again, about whether it was possible/practical for a single iOS device to listen for tens, hundreds, or even thousands of different UUIDs at the same time. It seemed unlikely that this was an approach Apple had in mind, and it was highly unlikely to scale.

The third option would be to petition Apple to define, or define ourselves, a universal UUID that businesses could broadcast that would allow third-party developers like us to build presence apps not tied to one particular business. If a business wanted to participate in the third-party iBeacon app ecosystem, they would simply broadcast the standardized UUID from their store and would instantly be recognizable by any past, present, or future app also listening for that UUID, games included. I still believe this is a pretty sweet idea, but I’m also skeptical this is part of Apple’s vision. Regardless, it is not immediately doable, and a small game company like ours does not have the clout to establish a global standard for retail technology.

In the end we went with the first option, but it was strictly for demonstration purposes; it required us to bring our own beacon into a business before it could be detected in game. But no long-term practical solution presented itself, and we were forced to move on. But we continued to look for ways to integrate iBeacon into our games.

Another important detail about Apple’s iBeacon tech is that any iOS device with BLE can itself act as a beacon, in addition to detecting them. Apple seems to have intended beacons to be static devices tied to a particular spatial region, and most commercial beacons so far appear to be simple, cheap custom pieces of hardware that can be discretely affixed to various surfaces. The ability of devices to also be beacons however enables all sorts interesting use cases that we perhaps unintended, as having a user’s device both broadcast and listen for other devices enables a sort of “who’s nearby” functionality for an app. The same thing could be accomplished with GPS, though with a bit less granularity and a lot less of the “neato” factor.

We decided to try implementing a feature that alerted players of Tiny Tycoons when other players were nearby, and brought up that nearby player’s (in-game) info in order to encourage friending, gifting, or bartering over properties. While the physical proximity mattered very little from a gameplay perspective–as all of these things could be done with someone half way around the world–the we thought the idea that there were other players nearby would excite our users, and might even lead to some (hopefully positive) real-world interaction.

Because this feature did not involve outside businesses, we didn’t have to worry about multiple UUIDs and could just use our own. On the surface, implementing the feature seemed trivial: we have each player’s phone both broadcast and listen for our UUID, and when detected we send a local notification that, when swiped, takes the player to the nearby player’s info screen in the game. Simple enough. The slightly tricky part is identifying the player who is nearby. Everyone is broadcasting the same UUID, how do we differentiate them once detected?

Going back to the museum example, each beacon is capable of broadcasting two 16-bit integers, known as the major and minor values. That is the extent of the communication allowed before you need to break out of the iBeacon protocol and start using full-on BLE. This is not a lot of information, but enough to store an ID number for a particular store department or museum exhibit that can then be looked up in a database containing more details information.

Like most games backed by a server, Tiny Tycoons stores players in a database table keyed by an integer ID column. Seems obvious that we would have each player’s device broadcast their player ID inside either the major or minor value, and then have other players’ devices make a call to the server with that ID to look up a nearby player’s info once detected. I’m sure you see where this is going. At the time of this writing, there are currently 400,000+ player rows in the database. A 16-bit unsigned integer (short) can store values ranging from 0 to 2^16=65,535. Oops. Neither the major not the minor value alone is big enough to hold an ID for a game with more than that many players.

However, the major and minor values together contain enough bits to hold values ranging from 0 to 2^32=4.3 billion. If we ever have more players than that, I’ll probably be too busy on my yacht to care about overflow. So there’s theoretically enough bits of information available in the iBeacon protocol to broadcast a player ID. But there’s still the minor problem that the major/minor values are treated as two 16-bit integers rather than one 32-bit integer. I needed to figure out a way to split a player ID into two shorts when broadcasting, then combining them back into an integer when picking up that broadcast on another device. This is where bit-shifting comes in.

Like many programmers, I had never actually used bit-shifting in any of my code. I was aware of it’s existence and had a general idea of what it did and what it was for. Every language I had ever learned about had the << and >> operators. I had used basic bitwise OR operations for setting flags in the Windows API, and I knew there existed a fast way to swap variables without using a temp via XOR operations that I might be asked during a tech interview some day. But being a practical person, I had never been presented with a problem that require bit shifting to get the job done. So I had to brush up a bit on my binary.

As a refresher, the number 6 stored in a 16-bit integer would look something like this: 00000000 00000110, with each digit starting from the right representing a power of 2. If you shift each digit to the left one (6 << 1), you get 12 (00000000 00001100). If you shift it the other way, you get 3 (6 >> 1). Any digits that fall off the end are lost, and any new digits introduced on the other side are set to 0 (there’s no “wrap-around”). So 6 >> 2 is 1 (00000000 00000001). That’s really all there is to it.

How does this apply to the problem at hand? Take a 32-bit player ID, say 127,721 (00000000 00000001 11110010 11101001). We need to somehow turn this into two 16-bit shorts in a reversible way. The most obvious solution is to chop the integer in half, and store the first two bytes (00000000 00000001 = 1) in the major value, and the last two bytes (11110010 11101001 = 62,185) in the minor. How do we do that though? This depends on your language, but one way available in Objective-C and most other languages is, of course, bit shifting!

Let’s start with the major value, the first two bytes. We essentially need to chop off the last two bytes and then replace the first two bytes with 0, so that we end up with the integer 00000000 00000000 00000000 00000001, which can be safely cast to a 16-bit short (if you try to cast something without 0’s in the first two bytes, you’re going to get weird results). What we’re essentially doing here is shifting the entire integer 16 places to the right, which has the effect of dropping the original right-most 2 bytes off the edge and padding the new left-most 2 bytes with 0’s as described above. So we are left with:

uint32_t playerId = … ;
uint16_t major = playerId >> 16;

Simple! Now how about the major value? In this case, we just want to chop off the 2 left-most bytes, meaning we want to 0 them out and then cast to a 16 bit short as before. We know that a good way to get rid of bits is to shift them off the end of the integer, so we could start by shifting the player ID 16 bits to the left, which would give us 11110010 11101001 00000000 00000000. At this point the relevant bytes are in the wrong spot on the left, so we shift them back 16 places to the right. The original 2 left-most bytes are now replaced with 0’s, just like we wanted.

uint16_t minor = (playerId << 16) >> 16;

We are now left with two 16-bit values that can be broadcast over a beacon. The real question though is whether this process can be reversed by a device that detects said beacon. Given two 16 bit shorts, how can they be combined back into a 32 bit player ID? The answer, again, is bit shifting. Let’s start by extracting both values from the beacon into 32-bit unsigned integers.

uint32_t major = [[beacon major] intValue];
uint32_t minor = [[beacon minor] intValue];

// Major: 00000000 00000000 00000000 00000001
// Minor: 00000000 00000000 11110010 11101001

The cast to 32-bit pads each value with 2 bytes worth of 0’s on the left. You might have noticed that, if we shift the right-most two bytes of the major value back to their original position on the left (reversing our original operation), we can simply add the two values together to get our original 32-bit player ID:

// Major: 00000000 00000001 00000000 00000000
// Minor: 00000000 00000000 11110010 11101001
// +    : 00000000 00000001 11110010 11101001 (127,721)

That’s exactly what we do:

uint32_t playerId32 = (major << 16) + minor;
NSNumber *playerId = [NSNumber numberWithUnsignedInt:playerId32];

And there you have it. Many of you might be rolling your eyes thinking, “this is elementary stuff”. Indeed it is, and perhaps I’m just a bit slow, but I did have to dedicate a bit of time to working this all out, and I suspect I’m not the only one who isn’t regularly shifting bits around at their current coding gig. You may also have noticed that you can avoid doing any bit shifting at all, and instead just use division and multiplication (x << y = x * 2^y and x >> y = x / 2^y). This may be true, but to even arrive at that solution you need to understand what’s going on at the binary level. Plus you may (depending, I imagine, on the language implementation) open yourself up to some wrap-around issues that bit-shifting avoids.

As for the feature itself, the fact that this little exercise is necessary in the first place indicates that this is probably not what Apple had in mind for iBeacon. Given the limited bandwidth provided by two 16-bit integers, it seems clear the intent was for a small number of beacons to exist within any given domain. Even with this trick, a limit of 4.3 billion puts an absolute restriction on the features iBeacon can support (app developers can always resort, of course, to the underlying Bluetooth protocol). If you think 4.3 billion is sufficient, consider the fact that the very reason we are running out of IPv4 addresses and being forced to adopt IPv6 is that the traditional 32-bits of an IPv4 address only allows for 4.3 billion of them. And it’s just not enough for all the devices of our rapidly evolving world.

We will continue to experiment with iBeacon and see if it can unlock any interesting gameplay concepts. I’m very curious to see if it takes off as a technology standard and how it is used by other developers. Will they stick to the narrowly defined retail use cases, or will someone come up with a killer app that uses the technology in a way that Apple did not intend? Will Apple remain quiet about their plans for the technology, or will they step in and provide guidance/interference? Will a 3rd party app ecosystem evolve around iBeacon, and will Apple support it? All of these things remain to be seen, and I can’t wait to find out.

 

Image Occlusion Testing with Quartz/UIKit

Note 12/29/2017: This article was written several years ago and is in draft form; it was never published due to concerns over IP protection. Since those concerns are no longer relevant and the topic no longer fresh in my mind, I’m publishing it as-is for archival purposes.

Here at The Tap Lab we’re currently putting the finishing touches on our second game, Bigfoot Hunter. While you should definitely check it out, the basic mechanic in the game involves looking around an environment and snapping a photo of bigfoot when he runs out. Your score is based on how “good” the picture is, and one of the metrics by which you’re judged is whether the bigfoot in your photo is obstructed by trees, bushes, or other pieces of the environment. The environment consists of several “layers” at different depths, each composed of a variety of the aforementioned pieces. We currently use the standard iOS UI libraries (UIKit) to render the environment, so these layers are really just UIViews with UIImageViews as subviews. When a snapshot is taken, these layers are composited on top of each other to generate the image.

Determining how centered the bigfoot is and how fast he is moving (in order to score “blurriness”) is relatively straight-forward and simply involves positional state data about the bigfoot. The final metric by which the player is scored–how much of the bigfoot is occluded by the environment–is a bit more difficult. The images themselves are rectangular, so bounding boxes are an obvious, simple and performant solution. However the images, especially things like trees with long branches, have large areas of transparency that makes this approach far too inaccurate. The other solution that comes to mind immediately is looking at the image data itself and determining what percentage of bigfoot pixels are still visible after the composite image is rendered.

This approach is super-accurate but also resource intensive. Images take up a lot of memory on mobile devices and anything that involves looping through pixel data on the CPU is incredibly slow. We were hoping to allow the player to take many rapid-fire shots and score them instantly in the game; spending a full second calculating occlusion was not an acceptable user experience. The details of the pixel-based algorithm are also somewhat unclear at first glance. I would imagine, when presented with this problem, most engineers would believe it is likely doable and relatively straightforward but may not be able to articulate an implementation right away. Given the original bigfoot image and the composite image, you could compare each pixel in the bigfoot’s bounding box to see if it has been unchanged; but what if it was alpha blended? What if the environment is the same/similar color?

We had an existing pixel-based solution in the game, but it was resource-intensive and used a somewhat roundabout method to determine which pixels remained visible. My task was to come up with a more performant algorithm. I have done pixel manipulation in the past (on Windows) but was not very familiar with iOS/Objective-C and even less familiar with Quartz and UIKit. So my implementation below may not be the best, but it seems to work. If you are using an actual game engine for your game, which I recommend you do, it probably already contains a much faster implementation all ready to go. But if you’re stuck using UIKit for whatever reason, I hope this will be of help.

First off, a lightning-speed refresher on computer graphics if you are rusty or unfamiliar. A traditional image (JPEG, PNG, etc; collectively known as “bitmap images”) are made up of pixels, each with a particular color. There are a variety of ways of representing color but the most common is 32-bit RGBA; one byte for red, green, blue, and alpha respectively (each with a value between 0 and 255), totaling 4 bytes per pixel. A bitmap image then is represented in memory as a long array of bytes, with every 4 representing a pixel in the image starting from the top left and going left to right, top to bottom.

The alpha channel represents how transparent the pixel is, with 255 being opaque and 0 being fully transparent. When drawing images on top of each other, their colors are blended together based on the alpha value. This is how you get images in games (say, of trees) that are still rectangular but don’t have a white background; their opaque pixels are drawn over the background and obscure it, while their transparent areas just leave the existing background pixel as is. Parts of the image that are somewhere in the middle, like translucent glass, blend the background pixel with the image pixel into a new color representing both.

In order to do any drawing on most operating systems, you need a graphics context, which represents a drawing surface stored in a chunk of memory and the various state that accompanies it (dimensions, color format, etc). This context often but not always represents a part of the screen being displayed to the user; in our case, we will simply be performing drawing operations in memory and the user will never see it. In order to access the actual pixel data in that memory, we will be using a particular kind of context called a Bitmap Context.

The abstract version of the occlusion algorithm is as follows:

1. Create a context the size of the bigfoot image and draw the bigfoot image into it. Loop through each pixel and add up the alpha channel values. This represents the total area of the original bigfoot image that is visible when unobstructed. There are other ways to do this (for example, normalizing each pixel to a 1 or 0) but this is easiest for now.

2. Set the context’s blend mode to kCGBlendModeDestinationOut. This is one of many ways to blend an image on top of another based on their alpha values. This particular blend function has the effect of “image subtraction”; namely, anything rendered after that would normally occlude an existing pixel instead “erases” it by setting it to transparent. This has the effect of leaving only unoccluded pixels in the context remaining.

3. Render any foreground images into the context using the aforementioned blend mode, making sure to move them to mirror their position relative to the bigfoot image within the game. This will leave only visible bigfoot pixels left in the context.

4. Loop through the pixels in the context and add up the remaining visible alpha values as in Step 1. If the bigfoot image was only partially visible in the snapshot (and thus was “occluded” by the camera bounds), make sure to only count pixels in the visible region. We now have a visible vs total count which we can use to calculate the percentage of the image that is visible.

The above algorithm is accurate but slow, the main culprit being the looping through pixels in steps 1 and 4. Modern images are quite high-resolution, and even the tightest loop with that many iterations on the CPU is going to take some time (versus the massively parallel capabilities of the GPU). While there probably is a way to utilize the GPU for such an operation, it likely involves using OpenGL and possibly shaders, and if you’re using UIKit for your graphics you’re probably hoping to avoid refactoring to integrate OpenGL.

So how do we speed this up? Reducing the number of loop iterations, and thus the number of pixels, is a fairly obvious solution, and happens to be quite effective. If we simply scale the image down, we can reduce the number of pixel tests we have to do by orders of magnitude without significantly reducing the accuracy. In fact, since this particular graphics context will never be displayed to the user, we can actually scale the image down pretty dramatically without having to worry what it looks like.

Scaling could be accomplished by, say, only testing every 4th or 8th pixel, or even some sort of random sampling. But memory usage is also a concern, so instead of rendering the full-res images into the graphics context, we can kill two birds and draw only the scaled down version of the image, saving space. Now you might ask, why use any extra memory at all and just sample the source images in parallel and compare their alpha values? While there may be a way to accomplish this, from what I found it appears that the only way to access an image’s underlying pixel data is to draw them to a separate bitmap context.

Side note: My original implementation only copied the alpha channel into the context (grayscale), saving even more space since the color channels were irrelevant. However, this process of transforming the color space seems to be a rather intensive operation under the hood, and ended up degrading performance significantly. It also doesn’t lend itself well to outputting PNGs for debugging purposes, so I kept in the color channels.

So let’s get to the gritty details. Assume we have a function that takes as input two images, one representing the background (aka the bigfoot being occluded) and one representing the combined foreground (things occluding the bigfoot). In practice the foreground could actually be an array of images that could be drawn in the context as described in step 3; for the purposes of this article we’ll just use 1 (we’re using CGImageRefs because most of the low-level graphics functions we’ll be using are in Quartz. To get a CGImageRef from a UIImage, simply call uiimage.CGImage).

We also pass in CGRects representing the position and size of the two images so that we can determine where they are relative to each other. This may not be necessary for all purposes but if you are working on a game that has elements moving around the screen, images themselves do not encode any information about their location (they do, however, keep state about their height and width, which you can get from CGImageGetWidth/Height). We also take in the bounds of the screen itself in case the background image is partially off-screen. Finally, we pass in scaleFactor, which specifies how much we want to reduce the resolution of the images to increase performance, with 1 being no reduction. The higher this number, the faster but less accurate the algorithm.

(double)occlusionPercentageForBackground:(CGImageRef)background foreground:(CGImageRef)foreground snapshotFrame:(CGRect)snapshotFrame backgroundFrame:(CGRect)backgroundFrame foregroundFrame:(CGRect)foregroundFrame scaleFactor:(float)scaleFactor;

First, we apply a scaling transform to all of the geometry involved, essentially transforming our entire world-space into a lower resolution to increase performance. Unlike some graphics libraries, Core Graphics applies the transform to the origin as well as the size of the frame, so all of the geometry maintains their positions relative to each other; no manual adjustment is necessary.

// Scale down
CGAffineTransform bitmapScale = CGAffineTransformMakeScale (1.0 / scaleFactor, 1.0 / scaleFactor);

snapshotFrame = CGRectApplyAffineTransform(snapshotFrame, bitmapScale);
foregroundFrame = CGRectApplyAffineTransform(foregroundFrame, bitmapScale);
backgroundFrame = CGRectApplyAffineTransform(backgroundFrame, bitmapScale);

In order to determine what percentage of the bigfoot (background) is visible, we have to know how much space he occupies unoccluded. Therefore we must render the entire bigfoot, even if not all of him is in the snapshot frame. Additionally, nothing outside of the bigfoot’s image dimensions matters; a tree may cover the entire length of the game world, but we only care about the branch that passes through the bigfoot’s frame. This essentially puts the bigfoot image at the center of our working space, where “working space” refers to the chunk of memory (context) we will be performing our bitmap operations in. It also means that our context needs to be no larger than the size of the bigfoot image.

With that in mind, we position the origin of the bigfoot frame to be (0,0) so that when we draw it, it will fit perfectly into the context. We must also transform the foreground and snapshot frames though so that when we draw them into the context later, they will still be positioned properly relative to the bigfoot. Most of the foreground, as we mentioned, won’t even fit into the context as it doesn’t pass through the bigfoot’s frame; that is okay, as it will just be ignored and won’t take up space or CPU cycles.

// Make everything relative to bigfoot, set bigfoot to 0, 0
snapshotFrame = CGRectOffset(snapshotFrame, -backgroundFrame.origin.x, -backgroundFrame.origin.y);
foregroundFrame = CGRectOffset(foregroundFrame, -backgroundFrame.origin.x, -backgroundFrame.origin.y);
backgroundFrame = CGRectMake(0, 0, (int)backgroundFrame.size.width, (int)backgroundFrame.size.height);

We now calculate the size of the bitmap context we’ll be drawing into and create it. As we just discussed, the context needs to be big enough to fit the bigfoot’s image but no more than that. Since each pixel takes up 4 bytes (red, green, blue, alpha), the size in bytes of each row of the context will be the width of the bigfoot image times 4, and the total size will be the size of a row times the height of the image.

int bytesPerRow = backgroundFrame.size.width * 4;
int totalBytes = bytesPerRow * backgroundFrame.size.height; 

// Create context
UInt8 *bitmapData = calloc(totalBytes, 1);
CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();

// For more information, see docs for CGBitmapContextCreate()
CGContextRef bitmapContext = CGBitmapContextCreate(bitmapData, backgroundFrame.size.width, backgroundFrame.size.height, 8, bytesPerRow, colorSpace, (CGBitmapInfo)kCGImageAlphaPremultipliedLast);

CGColorSpaceRelease(colorSpace);

Next we draw the bigfoot image into the context. With the origin at 0,0 and the context sized to be the same dimensions, the bigfoot should fit perfectly. Remember that we’ve scaled down the frame representing the bigfoot (backgroundFrame); it is now smaller than the actual image contained in the background variable. This will cause Core Graphics to downsample the image as it draws it into the context.

Once the bigfoot is in the context, we need to add up the total amount that is “visible” (non-transparent) before the bigfoot is obstructed with foreground elements. There are a number of ways one could do this; the simplest is to just add up the alpha value of each pixel (remember, 0 alpha is transparent and thus not visible, 255 is fully opaque/visible). Recall also that the alpha channel is the fourth by in each pixel, so when we are looping through the bytes in the bitmap context, we increment by 4 each time.

// Draw background and count total alpha
CGContextDrawImage(bitmapContext, backgroundFrame, background);

int totalAlpha = 0;
int visibleAlpha = 0;  

for(int i = 0; i < totalBytes; i+=4) {
    totalAlpha += bitmapData[i+3];
}

Now we need to draw the foreground elements into the context, on top of the bigfoot, positioned relative to the bigfoot as they are in the game. But first we need to set the blend mode so that an opaque pixel from a foreground image, rather than overwriting the underlying bigfoot pixel, simply “erases” it. This allows us to be sure that any remaining pixels represent visible portions of the bigfoot, and not some foreground element. The name for this blend mode is kCGBlendModeDestinationOut, which is really just a constant that represents two standard OpenGL blend functions: one that says to ignore the source (foreground) pixel, and multiply the destination (bigfoot) pixel by 1 minus the alpha of the source pixel, or D*(1-Sa) aka {GL_ZERO, GL_ONE_MINUS_SRC_ALPHA}.

// Draw foreground over background with special blend mode
CGContextSetBlendMode(bitmapContext, kCGBlendModeDestinationOut);
CGContextDrawImage(bitmapContext, foregroundFrame, foreground);

All that remains now is to count up how much of the bigfoot remains visible in the context, using the same method as before (add up the alpha values). The one caveat here is that, while we needed to render the entire bigfoot to get the total potentially visible area, it is possible the entire bigfoot was not in the viewport (snapshotFrame) when the picture was taken. Thus we do not want to count pixels that lie outside the snapshot, even if they are not obstructed by the environment. This may not be an issue for all games, in which case you could skip this part. Keep in mind though that this is also a performance improvement, as we only need to loop through a subset of pixels in the context.

There are many ways to do this. We could perform another graphics operation on the context where we shift the pixels relative to the snapshot, moving the non-visible pixels outside of memory. We could also render a mask or use a similar cropping operation. All of these solutions though require another graphics op and, unless you create a newer, smaller context (another op), leaves you still looping through extra, empty pixels that fall outside the snapshot frame.

The better solution, in my opinion, is simply to restrict your pixel testing to only the portion of the context that represents the overlap between the bigfoot frame and the snapshot frame, which can be calculated like so:

// Count up visible backgroud only in snapshot area
CGRect overlapFrame = CGRectIntersection(snapshotFrame, backgroundFrame);

// At this point, all of the CGRect's have origins in the bottom-left, but the image bytes start from the top-left. So
// we flip the Y-axis of the overlap frame (no need to do the background frame because it defines the origin)
overlapFrame.origin.y = backgroundFrame.size.height - overlapFrame.origin.y - overlapFrame.size.height;

As the comments describe, we also need to flip the Y-axis of our overlap frame due to the different origins of the context and CGRects.

Next, we loop through each pixel location in the overlap frame. The tricky part here is translating the (x,y) coords in the space of the overlap rect to an array index in the memory of the bitmap context. Normally, the equation is index = y * width + x, which makes sense if the memory represents the image from left to right, top to bottom. If you look at the offset calculation, you can see that we add the x- and y-offset of the overlap’s origin to the equation to account for the fact that the overlap frame does not necessarily align with the origin of the context’s frame. We then take this offset and multiply it by 4 (since each pixel is 4 bytes) and add 3 (since the alpha value is the fourth byte) to get the alpha value.

Note that because we scaled the frames earlier, we already take into account the fact that the context is a lower resolution than originally passed in. It also means that, depending on our scale factor, we are looping through considerably less pixels than we would be otherwise, without a significant loss of accuracy.

for(int y = 0; y < overlapFrame.size.height; y++) {
    for(int x = 0; x < overlapFrame.size.width; x++) {
        int offset = (y + overlapFrame.origin.y) * backgroundFrame.size.width + overlapFrame.origin.x + x;
        visibleAlpha += bitmapData[offset*4+3];
    }
}

At this point you might find it useful to save out the context to an image file for debugging purposes. The following snippet will save a copy to your photo album if you’re running on an iOS device. There are similar means of saving to a local file in OS X.

// DEBUG
BOOL TEST_IMAGE = YES;

if (TEST_IMAGE) {
    CGImageRef cgimage = CGBitmapContextCreateImage(bitmapContext);    UIImage *image = [UIImage imageWithCGImage:cgimage]; 
    UIImageWriteToSavedPhotosAlbum(image, nil, nil, nil);
    CGImageRelease(cgimage); 
}

Now that we have before and after alpha sums, we can return the percentage of the bigfoot that is visible, as well as clean up our allocated memory.

CGContextRelease(bitmapContext);
free(bitmapData);

return (float)visibleAlpha / (float)totalAlpha;
}

The result is a reasonably fast means of occlusion testing that can be optimized depending on the accuracy needed versus the speed.

 

Integrating Elastic Map-Reduce with Resque

Background

Like many companies, Shareaholic uses Hadoop/map-reduce to crunch large amounts of data offline at scheduled intervals. We currently use Amazon’s Elastic Map-Reduce (EMR) for this, which provides a scalable, on-demand Hadoop cluster built on top of EC2 and S3. It is a very convenient (though somewhat expensive) solution for engineers comfortable with AWS and who don’t want to build, maintain, and scale their own in-house cluster. S3 provides the implementation of HDFS where input and output data are stored, and one can simply kick off a job through the web interface, command-line tool (version with Ruby 1.9 support), or web API and specify the number and type of machines used.

We are also big fans of Resque, a job queueing framework for Ruby backed by Redis. We particularly like the monitoring capabilities provided by ResqueWeb, which allows you to see which jobs are running, which are queued, and most importantly which ones have failed and why. We use Cavalcade, written by a Shareaholic engineer, on top of Resque to provide the ability to schedule jobs via cron, usually on a nightly, weekly, or monthly basis.

When we first began using EMR, we wanted to be able to monitor the map-reduce jobs in the same place and fashion as we did our Resque jobs. So we developed a mini-framework for wrapping the launching and monitoring of EMR jobs inside of Resque jobs. It’s been working pretty well for us so far and we thought we’d share it with the larger community. It is not the most elegant possible solution; namely, it uses Amazon’s command-line tool to interact with EMR rather than the Web API or rather than using the tool as a library (it is written in Ruby). We started with the command-line tool because we were familiar with it and it’s easy to understand and use, and it’s been working so well we haven’t bothered to go back and change it.

Continue reading

How to find page thumbnails

Recently at Shareaholic we’ve been working on a lot of products that require showing a preview of some piece of content, usually a blog post. This preview usually consists of a title and, most importantly, a thumbnail image. A primary example is the Recommendations product, which shows thumbnails of related posts at the bottom of a blog post. Finding the correct thumbnail image is no trivial task, and I’ve spend a good percentage of my working hours over the past few months developing the library that performs this task for Shareaholic’s various products. I’ve found that there is not a lot of information on this topic floating around, and that what one might assume to be the state of the art (i.e. Facebook’s thumbnail algorithm) is not all that sophisticated compared to what you can roll yourself. So with my employer’s express permission I would like to share with you what I’ve learned in the hopes of saving you some time if you too are attempting to build your own thumbnail scraper.

Continue reading