Serverless IoT Analytics with OpenWhisk Part 1 — Is It Slower?

As I explored the serverless world (the FaaS area), I always wondered how it compared to the “normal” way. Would it slow things down? Is it…

Apr 19, 2017

As I explored the serverless world (the FaaS area), I always wondered how it compared to the “normal” way. Would it slow things down? Is it really more cost-effective? What would be the right scope of a “function”? Would use it for stateful processing still perform?

This is the first part of a series as I tried to figure things out.

IoT analytics fits the event-driven pattern pretty well. A device sends in an event to trigger some analytics processing which runs on a FaaS platform. We have already built Watson IoT Analytics, so it’s perfect for me to compare.

Watson IoT Analytics has a lot of horsepower in the backend, the “servers”, so of course, it scales pretty well, both in the number of users and in the event throughput it is capable of handling. I want to implement the same analytics flow with OpenWhisk, without setting up my own servers. The “serverless” way.

To follow along, some preparation work:

Have your Bluemix Watson IoT Platform, Message Hub, and OpenWhisk instances ready.
Connect a historical data storage extension on Watson IoT Platform to the Message Hub instance you want to use. For simplicity, I use a default topic all-devices for everything.
On Watson IoT Platform, create the device type iot and one device under it named thermal.

Let’s verify if everything is correctly setup. Use Message Hub REST API to create a consumer instance, then use Watson IoT Platform REST API to publish an event. The event should be forwarded to the configured Message Hub topic all-devices. Finally, use Message Hub REST API to fetch the message back from the topic.

> curl -X POST \
  -H "Content-Type: application/vnd.kafka.v1+json" \
  -H "X-Auth-Token: <Message Hub api_key>" \
  --data '{"name": "my_consumer_instance", "format": "binary", "auto.offset.reset": "largest"}' \
https://kafka-rest-prod01.messagehub.services.us-south.bluemix.net/consumers/my_json_consumer

> curl -u "use-token-auth:<device auth token>" \
  -H Content-Type:application/json \
  --data-ascii "{\"temperature\":20}" \ https://<orgId>.messaging.internetofthings.ibmcloud.com/api/v0002/device/types/iot/devices/thermal/events/status

(A few seconds later ...)

> curl -X GET \
  -H "Accept: application/vnd.kafka.binary.v1+json" \
  -H "X-Auth-Token: <Message Hub api_key>" \
      https://kafka-rest-prod01.messagehub.services.us-south.bluemix.net/consumers/my_json_consumer/instances/my_consumer_instance/topics/all-devices
[{"key":"eyJvJ0=","value":"eyJIn0=","partition":0,"offset":4}]

> curl -X DELETE \
  -H "X-Auth-Token: <Message Hub api_key>" \
https://kafka-rest-prod01.messagehub.services.us-south.bluemix.net/consumers/my_json_consumer/instances/my_consumer_instance

Note the forwarding to Message Hub could take a few seconds because Watson IoT Platform batches the events. My experiments told me the batch duration is likely around 6–7 seconds.

Now that our events can reach Message Hub, we can create an action on OpenWhisk to apply analytics rules. What the action does is to forward an event to a webhook endpoint if it’s temperature exceeds some threshold value. This is exactly what Watson IoT Analytics does, at least its most basic function.

The action code is super easy. Since OpenWhisk’s Message Hub feed does batch messages, we have to handle batches in our action. Once we have the code ready, the remaining work is to create an action with it and set it up with a trigger and a rule:

> wsk action create iotp-<orgId> ./iot-analytics.js
ok: created action iotp-<orgId>

> wsk trigger create iotp-<orgId>-trigger --feed /_/Bluemix_serverless-iot-kafka_Credentials-1/messageHubFeed \
  --param isJSONData true \
  --param topic all-devices
ok: invoked /_/Bluemix_serverless-iot-kafka_Credentials-1/messageHubFeed with id 1458846b661c45789007a833cc819621
(... omitted ...)
ok: created trigger iotp-<orgId>-trigger

> wsk rule create iotp-<orgId>-rule iotp-<orgId>-trigger iotp-<orgId>
ok: created rule iotp-<orgId>-rule

I use WaitHook (which is super cool!) as the webhook end point so I can easily verify the result. Send some events again and you shall see them pop up on WaitHook page.

Not too bad, right? A few steps and we have our serverless IoT Analytics!

Latency

So back to my first question: is there any performance penalty for using FaaS?

To understand the latency of this solution, I tested a single device sending events with different rates: from 1 event per second to 1 event per 50 ms[4]. The OpenWhisk action posts to WaitHook in the end. I also open a websocket connection to WaitHook to stream back the response in order to measure the real end to end response time.

The end to end response time (1 through 4 marked in the chart below) is around 5.9 sec (95%) and 3.6 sec (median). This does not seem like fast at all. However, if we look at the OpenWhisk action’s latency, which is from 3 to 4 below, it is only 0.54 sec (95%) and 0.3 sec (median). This actually is quite good because it includes the webhook posting and the websocket communication back to my device, not just the action’s invocation. BTW, it’s 0.23 sec (95%) and 0.21 sec (median) from 1 to 2 below.

somewhat “scaled” to time spent on the edges …

So the major time spent is from 2 to 3 (more precisely, it’s the first part from Watson IoT to Message Hub). As I mentioned before, Watson IoT Message Hub extension seems to use a long batch duration, which is probably 6–7 sec. Unfortunately, there’s no tweaking for that available yet.

If this level of latency is acceptable for you, then it is perfect. But if you are looking for a near real-time IoT analytics, this is clearly not going to meet your needs. There’s really no easy “serverless” way[1] unless Watson IoT somehow provides shorter batch duration.

To see how fast it can be, even though not really “serverless”, we can use Watson IoT MQTT feed[2]. This feed allows OpenWhisk actions to directly consume device events from Watson IoT without the intermediate Message Hub. However there’s an intermediate feed provider needs to be deployed. The main difference is, there’s no long batch.

Now the 95th percentile end to end response time is just 0.56 sec (95%) and 0.35 sec (median)! It is much much faster.

With this level of latency, I believe this is good enough for most IoT real-time analytics use cases. Of course, the downside is there’s an extra “server” to maintain/operate/scale.

again, “scaled” to time spent on the edges …

Scalability

Obviously, it scales without me doing any work. Each incoming event triggers an action invocation on OpenWhisk. The more events or the faster they come, the more action instances are launched by OpenWhisk in parallel. The upper bound is only the limit imposed by OpenWhisk.

Better yet, when there’s no event, no resource is used. No usage, no cost!

Cost

Talking about cost. OpenWhisk charges by actual usage of time and memory. Both of my actions above generally take 300 ms. Unless your action is way faster than 100 millisecond (it rounds up to the next 100 ms), there’s really no need to think too much[3]. For OpenWhisk, just make sure you choose the minimum memory actually required by your action.

One aspect is worth considering though. The cost model is by action invocation. If we compare the two methods I use before. The Message Hub one roughly invokes my action once per 6 seconds (because Wastson IoT batches messages). The MQTT way on the other hand invokes my action once for each incoming message. Let’s say we send in 100 messages in a second, the difference in cost is huge!

This is obviously a tradeoff. I don’t want to have as high as 6~7 seconds latency, but I don’t need invocation per message either (too expensive). Also from the action’s processing time perspective, there’s not much difference between processing 1 or 10 messages at a time. So a better approach would be to enable batching for the MQTT feed.

Development Experience

As you can see, my OpenWhisk action code is so simple that I don’t have much to talk about. But considering how less I did and spent … I have to say I’m totally convinced this is the way to prototype.

One obvious and very enjoyable experience about developing OpenWhisk is: the deployment is damn fast! I usually just issue wsk action update ... right after finishing my code change, then immediately start sending. It ALWAYS works as I expect. I haven’t yet encountered any problem on deployment yet.

One more thing about deployment. An action’s very first invocation takes longer than subsequent invocations. I use a bare minimum action to test, the first invocation (after update) usually takes 50 ms while subsequent takes just 3ms. I guess that’s “warming-up” time and OpenWhisk does some kind of “caching” for action instances. This is great for optimization opportunity.

Like Bluemix and Cloud Foundry, logging is not very convenient. But for this little experiment, command wsk activation poll suffices.

Notes

Well, there is one actually: Watson IoT Analytics. For the curious mind, the end to end response time for using Watson IoT Analytics itself is 1.93 sec (95%) and 1.43 sec (median). This is mainly because of Spark Streaming’s micro-batch interval we use.
The original version was outdated so I updated it to work with current OpenWhisk. Also, I have encountered performance issue which turned out to be caused by the free Cloudant instance rate limit. I added a simple cache to the feed so that it does not go out to lookup Cloudant for each message.
If your action takes much less than 100 ms, then you should look for opportunity to aggregate multiple calls.
For the Message Hub approach, I did later test 1 event per 25 ms and 1 event per 10 ms, which amounts to 240 and 600 events per batch per invocation. Understandably, the processing time of each invocation takes longer since now it needs to wait the completion of all 240/600 webhook posting. However, the processing time just slightly increases to 6~7 seconds for 25 ms case and ~14 (though with much bigger variance) seconds for 10 ms case .

Bryan's Reflective Path

Discussion about this post