Stuff your events!
Ignore everything you've ever been told about measurements and metadata being separate.
You get the best benefit out of a system like Honeycomb by shoving all the measurements and all the metadata into the same event. Seriously, the following is too few for an image conversion event, despite being an awful pain to scroll past on a phone:
nametimestampduration_mstrace.trace_idtrace.parent_idtrace.span_idservice_namecode.filecode.functioncode.linecode.moduleduration_ms_log10duration_ms_log2images.cache.check_msimages.cache.hit_bytesimages.cache.hit_countimages.cache.write_msimages.cold_countimages.convert.duration_msimages.convert.in_bytesimages.convert.in_countimages.convert.open_msimages.convert.out_bytesimages.convert.out_countimages.convert.save_msimages.duration_msimages.failedimages.fetch.duration_msimages.fetch.request_bytesimages.fetch.request_countimages.http.hostimages.http.methodimages.http.pathimages.http.status_codeimages.image-pathimages.sign_msinstance.hostinstance.nonceinstance.uptime_msinstance.uptime_ms_log10instance.uptime_ms_log2language.buildlanguage.datelanguage.versionrelease.branchrelease.commitrelease.environmentrelease.release_age_daysrelease.release_time
The first three are the least you can possibly send. What happened, when, and how long did it take?
nametimestampduration_ms
The next four are necessary to stitch traces together from spans (trace for “event”) handled by multiple endpoints and service:
trace.trace_idtrace.parent_idtrace.span_idservice_name
After that, we get into metadata and measurements present in most of our events:
- the software release that handled the event
- the software's language and runtime
- the process or container running the software
- the particular line of code in the software
Finally, we get to the meat: a couple dozen attributes specific to this event, including:
- how long it took to check the cache
- the count and size of what came out of the cache
- how long it took to fetch the data
- how how much we got back
- the size of the data we converted
- how long that took
- the count and size of the results
- how long it took to save it to the cache
… and we're probably still under-doing it, but we have to ship sometime, right?
Honeycomb copes happily with dozens to hundreds of columns. If you know it or can measure it, seriously, stuff it in, especially anything that'll help you correlate events to:
- the user that caused the event
- the organisation to which that user belongs
- the infrastructure through which the event passed