What is the business problem you are trying to solve?
How does this type of service address the problem?
how do providers of this type of service differentiate themselves with respect to your business problem?
Resiliency
What are your resiliency requirements?
Availabilty
Consistency
PArtition tolerance
Durability - Load balancers, service registry, have state
Security
Authentication
how does a client prove identity
how are credentials provisioned/stored
how are credentials delivered
how are credentials rotated
Authorization
what permission types are supported
are permissions grouped into roles
are roles customizable
how are roles assigned to actors
Regulatory Compliance
data residency/sovereignty - e.g., German companies want data stored only in Germany
encryption
Data at Rest
hardware level: disk encryption
encrypting data files in a DB
application level: encrypted password on a row level in DB
Data in Flight
Auditability
what happened
when did it happen
what actor caused it
where did it happen
why did it happen
Certification Checkboxes
SOX compliant
PCI
HIPAA
FedRAMP
NIST 800-53
FIPS 140-2, …
Economics
who is operating the service? public cloud, hybrid, private cloud
what is your expected rate of consumption
how is your rate of consumption projected to grow
how is the service priced/costed
is the equation cost effective relative to your consumption rate and growth rate?
Scalability
do you need to scale?
how id your load/volume expected to grow? don’t go for some fancy auto-scaling mechanism if you don’t need it.
is your load/volume busty?
is your load/volume unpredictable?
does the service support scaling according to these needs?
Provider ‘Lock-In’
is there a sensible way to leverage multiple providers
is the service supported by open/defacto standards
is there a meaningful abstraction layer
are you subject to “data gravity”? Amazon provides a truck to migrate your data from your data center to AWS. Now what if you wanted to move out of AWS? Will that vendor provide a truck?
Available Tooling
how good is the documentation
does the service have a well designed API
are client libraries available for your language(s) of choice?
Does your app framework of choice support the service?
is good management tooling available?
is there a management API? ß
is there automation tooling available for management?
Undifferentiated heavy lifting
what gaps do you need to close that EVERYONE has to close?
what will it cost you to close those gaps?
what will it cost you to keep them closed?
are there ecosystem partners in ths business of closing these gaps?
is the provider working on closing these gaps?
Differentiating Features
(this is where we often start, but is the least important one. Don’t look for the latest shiny thing)
there’s a lot of parity out there
but could be the decision maker (AWS, Azure, Google - Google has its own fiber for networking)
Scorecard
summarized view of why you made a particular choice
you want an organized way to demonstrate why you made your decision
Scorecard
KISS
No binary
Simple ranges: 1-3 or 1-5
Add weights for prioritization
Call out subcategories when valuable
It’s not continuous delivery if you can’t deploy right now.
Job of a a pipeline is to avoid release candidates
Continuous deployment is not continuous delivery. Deployment is installing in prod. Delivery/Release is when the business approves to toggle the flag and users start to see the changes.
test before you commit http://thoughtworks.github.io/talisman
have you included private keys? Authentication tokens?
Static Application Security Testing (SAST) - like FindBugs?
Dynamic Application Security Testing (DAST)
Performance Testing
Load testing - simplest form of performance testing.
Stress testing - to understand the upper limits of capacity within the system
Soak testing (or endurance testing) is usually done to determine if the system can sustain the continuous expected load * Spike testing
Build pipeline
Run as much as possible in parallel
Managing Risk
Deployment patterns
Canary release
Dark launching - Facebook messenger trial running their app selectively
http://githubengineering.com/move-fast
Feeback loops
create useful logging for everything
Run (some of) your tests against production
Ensure alerts are useful
Optimized for Recovery
MTBF (Mean time between failures)
MTTR (Mean time to repair)
State of devops report
Knight Capital
deployed untested software to prod which had an obsolete function. Incident happened due to a techician forgetting to copy the new Retail Liquidity Program (Lack of automated testing costed the company $440,000,000)
8 containers are low-level technology. Only the infrastructure provider should care, but not the app developer
9 Automation (CI/CD/DevOps) and cultural changes are key changes to success
10 Cloud native middleware microservices leverage various technologies, open source frameworks and infrastructure components like containers or messaging