Stuff your events!

Ignore everything you've ever been told about measurements and metadata being separate.

You get the best benefit out of a system like Honeycomb by shoving all the measurements and all the metadata into the same event. Seriously, the following is too few for an image conversion event, despite being an awful pain to scroll past on a phone:

name
timestamp
duration_ms
trace.trace_id
trace.parent_id
trace.span_id
service_name
code.file
code.function
code.line
code.module
duration_ms_log10
duration_ms_log2
images.cache.check_ms
images.cache.hit_bytes
images.cache.hit_count
images.cache.write_ms
images.cold_count
images.convert.duration_ms
images.convert.in_bytes
images.convert.in_count
images.convert.open_ms
images.convert.out_bytes
images.convert.out_count
images.convert.save_ms
images.duration_ms
images.failed
images.fetch.duration_ms
images.fetch.request_bytes
images.fetch.request_count
images.http.host
images.http.method
images.http.path
images.http.status_code
images.image-path
images.sign_ms
instance.host
instance.nonce
instance.uptime_ms
instance.uptime_ms_log10
instance.uptime_ms_log2
language.build
language.date
language.version
release.branch
release.commit
release.environment
release.release_age_days
release.release_time

The first three are the least you can possibly send. What happened, when, and how long did it take?

name
timestamp
duration_ms

The next four are necessary to stitch traces together from spans (trace for “event”) handled by multiple endpoints and service:

trace.trace_id
trace.parent_id
trace.span_id
service_name

After that, we get into metadata and measurements present in most of our events:

the software release that handled the event
the software's language and runtime
the process or container running the software
the particular line of code in the software

Finally, we get to the meat: a couple dozen attributes specific to this event, including:

how long it took to check the cache
the count and size of what came out of the cache
how long it took to fetch the data
how how much we got back
the size of the data we converted
how long that took
the count and size of the results
how long it took to save it to the cache

… and we're probably still under-doing it, but we have to ship sometime, right?

Honeycomb copes happily with dozens to hundreds of columns. If you know it or can measure it, seriously, stuff it in, especially anything that'll help you correlate events to:

the user that caused the event
the organisation to which that user belongs
the infrastructure through which the event passed

Copyright:	© 2020 Garth Kidd
License:	CC-BY-NC-SA-4.0
Reading time:	2 minutes?