Stuff your events!
Ignore everything you've ever been told about measurements and metadata being separate.
You get the best benefit out of a system like Honeycomb by shoving all the measurements and all the metadata into the same event. Seriously, the following is too few for an image conversion event, despite being an awful pain to scroll past on a phone:
name
timestamp
duration_ms
trace.trace_id
trace.parent_id
trace.span_id
service_name
code.file
code.function
code.line
code.module
duration_ms_log10
duration_ms_log2
images.cache.check_ms
images.cache.hit_bytes
images.cache.hit_count
images.cache.write_ms
images.cold_count
images.convert.duration_ms
images.convert.in_bytes
images.convert.in_count
images.convert.open_ms
images.convert.out_bytes
images.convert.out_count
images.convert.save_ms
images.duration_ms
images.failed
images.fetch.duration_ms
images.fetch.request_bytes
images.fetch.request_count
images.http.host
images.http.method
images.http.path
images.http.status_code
images.image-path
images.sign_ms
instance.host
instance.nonce
instance.uptime_ms
instance.uptime_ms_log10
instance.uptime_ms_log2
language.build
language.date
language.version
release.branch
release.commit
release.environment
release.release_age_days
release.release_time
The first three are the least you can possibly send. What happened, when, and how long did it take?
name
timestamp
duration_ms
The next four are necessary to stitch traces together from spans (trace for “event”) handled by multiple endpoints and service:
trace.trace_id
trace.parent_id
trace.span_id
service_name
After that, we get into metadata and measurements present in most of our events:
- the software release that handled the event
- the software's language and runtime
- the process or container running the software
- the particular line of code in the software
Finally, we get to the meat: a couple dozen attributes specific to this event, including:
- how long it took to check the cache
- the count and size of what came out of the cache
- how long it took to fetch the data
- how how much we got back
- the size of the data we converted
- how long that took
- the count and size of the results
- how long it took to save it to the cache
… and we're probably still under-doing it, but we have to ship sometime, right?
Honeycomb copes happily with dozens to hundreds of columns. If you know it or can measure it, seriously, stuff it in, especially anything that'll help you correlate events to:
- the user that caused the event
- the organisation to which that user belongs
- the infrastructure through which the event passed