Monitorama Conference - Baltimore 2019

Monitorama Conference - Baltimore 2019
Tools
Follow ups

Monitorama Conference - Baltimore 2019

Dashboard Renaissance - Cory from Splunk
- Book: The Design of Everyday Things by Dan Norman
- What’s the goal/purpose of this dashboard?
- what action to take?
- most important things in your dashboard should be on the top left and least important on bottom right
- What to include
  - RED and USE techniques
  - prefer symptoms over causes
- Pre-attentive processing - position, angle/slope, size/length, volume, color/density
- What chart type
  - Heatmaps show outliers
  - Saturation is low accuracy
  - Bar charts - better for comparison of a few values
- Scales, Units, Norms, Labels
- https://www.usability.gov/what-and-why/visual-design.html - accessibility
Before, During, and After Chaos - Nora Jones @nora_js
- Different phases of Chaos Engineering
- Chaos Engineering
  - O’Reilly e-book
  - AWS re:Invent 2017 Nora Jones - Youtube video
- obscure hindsight bias
- Serial propensity effect -
- https://www.oreilly.com/library/view/velocity-conference-2017/9781491976265/video311370.html
Logs, Metrics, endpoints - Bryan Liles - Tanzu Build, VMWare
- Logging best practices
  - Best format: JSON. (Jsonnet?). slf4j for json format?
  - log to stdout
  - syslog - logs at scale
  - add context to your log messages
- Metrics
  - USE (, saturation, ),
  - RED (rate, error, duration),
  - Four Golden Signals (latency, traffic, error, saturation) - this may work better at scale
  - Best practices
    - OpenMetrics/Prometheus
    - Keep the number of metrics to manageable size (may be 100s, not 1000s)
- Endpoints
  - TCP, HTTP, Custom Reponse
  - Best practices - look at the deck
- Traces
  - Distributed tracing (OpenTracing, Jaeger)
  - traces are composed of spans
- OpenTelemetry - capture metrics and distributed traces
Jeff from Netflix
- Mantis - Open source streaming microservices monitoring solution
  - answers new questions that you forgot to log
  - low latency
  - cost-effective
Developing meaningful SLIs - Alex Hidalgo (Squarespace Engineering http://slidesgala.com)
- Service Level Indicator
- SLI (engineering) = User journey (product team) = KPI (business)
M.E.L.T. Level Up - Ron from NewRelic
- MELT
  - Metrics - Micrometer, Istio, Prometheus, OpenTelemetry, DropWizard
  - Events - alerts and deployments are example of events
  - Logs - json:api, fluentbit, cloudwatch,
  - Traces - Zipkin, OpenTelemetry, Istio
- Platform - Zipkin, OpenTelemetry, Istio, Micrometer, DropWizard
LightStep
- Tech paper - Dapper, distributed tracing paper
Stackdriver monitoring from Google
Designing Alerts to Direct Attention - Ryan Frantz
- https://www.pbs.org/newshour/science/3-brain-technologies-to-watch-in-2018
- Mental model of a system
Fitness Function-Driven Development - Rosemary WAng ThoughtWorks
- Evolutionary Architecture
- Fitness function - borrowed from Genetrics algorithm
  - SLO can be a Fit func
- Benefits
  - KonMari method for former assumptions, tools and telemetry
  - Highlights gaps in process, tooling and telemetry
  - open discussions for tech deby
  - develop mutual learning context
- https://github.com/joatmon08/2019-monitorama
Presentation - Pete Cheslock @petecheslock
- http://lusislog.blogspot.com/2011/06/why-monitoring-sucks.html
- https://mattturck.com/data2019/
- Logging best practices
  - Log in JSON format
  - put everything in the json
  - send it all to your Data Lake (georgehart.com)
- Presentation Material: https://pete.wtf/decks/MonitoramaBaltimore2019.pdf
How to interview
- Diversity
- Preparation - get trained on interviewing. be comfortable
- Read the job description. Find the right candidate for the job. This isn’t about you
- Read the CV. Don’t Google - beware of legal issues
- Pair up
- Present yourself professionally - don’t take phone calls
- Colloborate, don’t confront
Adopting a Product mindset for SRE and Observability teams
You can’t spell “monitoring” without “monoid” - Kevin from NewRelic
- Easy is not the same as simple - Rich Hikey’s talk - e.g., Writing simple code is not easy
- What is Monoid? a single algebraic structure with a single
  - Temporal and dimensional aggregation
Observability Graph - Homin Lee from Datadog
- Corollary to Conway’s Law: Observability follows your org chart.
- Gore’s hypothesis (Goretex fabric)
  - a.k.a. prequel to Dunbar’s number
  - a.k.a. thing you heard about from The Tipping Point
- Graph with ontology is Knowledge graph
Observing Observability - Philip O’Toole from Google Cloud
- Why observability is difficult

Tools

BigPanda
Catchpoint.com
Circonus
Datadog
Elastic APM
Honeycomb.io
InfluxDB
Jaeger
Loggly
NewRelic.com
OpenCensus
OpenMetrics
OpenTracing
OpenTSDB
Prometheus
SignalFx
Site24x7.com
StackDriver from Google
Zipkin

Follow ups

Mantis - perhaps to visualize the service mapping and reliability of services
Define SLIs for services we depend on
Fitness Function-Driven Development
Evaluate
- http://play.honeycomb.io
Create a matrix of tools and their features (MELT, alerts, USP, SaaS, real-time, cost etc.)
GUTS (Grand Unified Telemetry System) overview videos
Learn about AIOps, Monoids
Convert DAP logs to JSON format - Benefits - Implications
https://slidesgala.com
https://vimeo.com/monitorama
https://speakerdeck.com/monitorama
What is ScyllaDB?
Statistics for Engineers - Heinrich Hartmann
If CMDB is static, what’s the alternative?

KnowledgeShop

Learn & Share

Monitorama Conference - Baltimore 2019

Monitorama Conference - Baltimore 2019

Tools

Follow ups