Monthly Archives: May 2022

Book Notes: Thinking in Systems

Thinking in Systems: A Primer by Donella H. Meadows, Diana Wright
My rating: 4 of 5 stars

Summary of Systems Principles

Systems

  • A system is more than the sum of its parts.
  • Many of the interconnections in systems operate through the flow of information.
  • The least obvious part of the system, its function or purpose, is often the most crucial determinant of the system’s behaviour.
  • System structure is the source of system behaviour. System behaviour reveals itself as a series of events over time.

Stocks, Flows, and Dynamic Equilibrium

  • A stock is the memory of the history of changing flows within the system.
  • If the sum of inflows exceeds the sum of outflows, the stock level will rise.
  • If the sum of outflows exceeds the sum of inflows, the stock level will fall.
  • If the sum of outflows equals the sum of inflows, the stock level will not change – it will be held in dynamic equilibrium.
  • A stock can be increased by decreasing its outflow rate as well as by increasing its inflow rate.
  • Stocks act as delays or buffers or shock absorbers in systems. Stocks allow inflows and outflows to be de-coupled and independent.

Feedback Loops

  • A feedback loop is a closed chain of causal connections from a stock, through a set of decisions or rules or physical laws or actions that are dependent on the level of the stock, and back again through a flow to change the stock.
  • Balancing feedback loops are equilibrating or goal-seeking structures in systems and are both sources of stability and sources of resistance to change.
  • Reinforcing feedback loops are self-enhancing, leading to exponential growth or to runaway collapses over time.
  • The information delivered by a feedback loop even nonphysical feedback-can affect only future behaviour; it can’t deliver a signal fast enough to correct behaviour that drove the current feedback.
  • A stock-maintaining balancing feedback loop must have its goal set appropriately to compensate for draining or inflowing processes that affect that stock. Otherwise, the feedback process will fall short of or exceed the target for the stock. Systems with similar feedback structures produce similar dynamic behaviours.

Shifting Dominance, Delays, and Oscillations

  • Complex behaviours of systems often arise as the relative strengths of feedback loops shift, causing first one loop and then another to dominate behaviour.
  • A delay in a balancing feedback loop makes a system likely to oscillate.
  • Changing the length of a delay may make a large change in the behaviour of a system.

Scenarios and Testing Models

  • System dynamics models explore possible futures and ask “what if” questions.
  • Model utility depends not on whether its driving scenarios are realistic (since no one can know that for sure), but on whether it responds with a realistic pattern of behaviour.

Constraints on Systems

  • In physical, exponentially growing systems, there must be at least one reinforcing loop driving the growth and at least one balancing loop constraining the growth, because no system can grow forever in a finite environment.
  • Nonrenewable resources are stock-limited.
  • Renewable resources are flow-limited.

Resilience, Self-Organization, and Hierarchy

  • There are always limits to resilience.
  • Systems need to be managed not only for productivity or stability, they also need to be managed for resilience.
  • Systems often have the property of self-organization the ability to structure themselves, to create new structure, to learn, diversify, and complexify.
  • Hierarchical systems evolve from the bottom up. The purpose of the upper layers of the hierarchy is to serve the purposes of the lower layers.

Source of System Surprises

  • Many relationships in systems are nonlinear.
  • There are no separate systems. The world is a continuum. Where to draw a boundary around a system depends the purpose of the discussion.
  • At any given time, the input that is most important to a system is the one that is most limiting.
  • Any physical entity with multiple inputs and outputs is surrounded by layers of limits.
  • There always will be limits to growth.
  • A quantity growing exponentially toward a limit reaches that limit in a surprisingly short time.
  • When there are long delays in feedback loops, some sort of foresight is essential.
  • The bounded rationality of each actor in a system may not lead to decisions that further the welfare of the system as a whole.

Mindsets and Models

  • Everything we think we know about the world is a model.
  • Our models do have a strong congruence with the world.
  • Our models fall far short of representing the real world fully.

Springing the System Traps

Policy Resistance

Trap: When various actors try to pull a system state toward various goals, the result can be policy resistance. Any new policy, especially if it’s effective, just pulls the system state farther from the goals of other actors and produces additional resistance, with a result that no one likes, but that everyone expends considerable effort in maintaining.

The Way Out: Let go. Bring in all the actors and use the energy formerly expended on resistance to seek out mutually satisfactory ways for all goals to be realized or redefinitions of larger and more important goals that everyone can pull toward together.

The Tragedy of the Commons

Trap: When there is a commonly shared resource, every user benefits directly from its use, but shares the costs of its abuse with everyone else. Therefore, there is very weak feedback from the condition of the resource to the decisions of the resource users. The consequence is overuse of the resource, eroding it until it becomes unavailable to anyone.

The Way Out: Educate and exhort the users, so they understand the consequences of abusing the resource. And also restore or strengthen the missing feedback link, either by privatizing the resource so each user feels the direct consequences of its abuse or (since many resources cannot be privatized) by regulating the access of all users to the resource.

Drift to Low Performance

Trap: Allowing performance standards to be influenced by past performance, especially if there is a negative bias in perceiving past performance. sets up a reinforcing feedback loop of eroding goals that sets a system drifting toward low performance.

The Way Out: Keep performance standards absolute. Even better, let standards be enhanced by the best actual performances instead of being discouraged by the worst. Set up a drift toward high performance!

Escalation

Trap: When the state of one stock is determined by trying to surpass the state of another stock and vice versa-then there is a reinforcing feed back loop carrying the system into an arms race, a wealth race, a smear campaign, escalating loudness, escalating violence. The escalation is expo nential and can lead to extremes surprisingly quickly. If nothing is done, the spiral will be stopped by someone’s collapse because exponential growth cannot go on forever.

The Way Out: The best way out of this trap is to avoid getting in it. If caught in an escalating system, one can refuse to compete (unilaterally disarm), thereby interrupting the reinforcing loop. Or one can negotiate a new system with balancing loops to control the escalation.

Success to the Successful

Trap: If the winners of a competition are systematically rewarded with the means to win again, a reinforcing feedback loop is created by which, if it is allowed to proceed uninhibited, the winners eventually take all, while the losers are eliminated.

The Way Out: Diversification, which allows those who are losing the competition to get out of that game and start another one; strict limitation on the fraction of the pie any one winner may win (antitrust laws); policies that level the playing field, removing some of the advantage of the strongest players or increasing the advantage of the weakest; policies that devise rewards for success that do not bias the next round of competition.

Shifting the Burden to the Intervenor

Trap: Shifting the burden, dependence, and addiction arise when a solu tion to a systemic problem reduces (or disguises) the symptoms, but does nothing to solve the underlying problem. Whether it is a substance that dulls one’s perception or a policy that hides the underlying trouble, the drug of choice interferes with the actions that could solve the real prob Jem.

If the intervention designed to correct the problem causes the self-main taining capacity of the original system to atrophy or erode, then a destruc live reinforcing feedback loop is set in motion. The system deteriorates; more and more of the solution is then required. The system will become more and more dependent on the intervention and less and less able to maintain its own desired state.

The Way Out: Again, the best way out of this trap is to avoid getting in. Beware of symptom-relieving or signal-denying policies or practices that don’t really address the problem. Take the focus off short-term relief and put it on long-term restructuring.

If you are the intervenor, work in such a way as to restore or enhance the system’s own ability to solve its problems, then remove yourself.

If you are the one with an unsupportable dependency, build your system’s own capabilities back up before removing the intervention. Do it right away. The longer you wait, the harder the withdrawal process will be.

Rule Beating

Trap: Rules to govern a system can lead to rule-beating-perverse behaviour that gives the appearance of obeying the rules or achieving the goals, but that actually distorts the system.

The Way Out: Design, or redesign, rules to release creativity not in the direction of beating the rules, but in the direction of achieving the purpose of the rules.

Seeking the Wrong Goal

Trap: System behavior is particularly sensitive to the goals of feedback loops. If the goals-the indicators of satisfaction of the rules-are defined inaccurately or incompletely, the system may obediently work to produce a result that is not really intended or wanted.

The Way Out: Specify indicators and goals that reflect the real welfare of the system. Be especially careful not to confuse effort with result or you will end up with a system that is producing effort, not result.

  1. Numbers: Constants and parameters such as subsidies, taxes, and standards
  2. Buffers: The sizes of stabilizing stocks relative to their flows
  3. Stock-and-Flow Structures: Physical systems and their nodes intersection
  4. Delays: The lengths of time relative to the rates of system changes
  5. Balancing Feedback Loops: The strength of the feedbacks relative to the impacts they are trying to correct
  6. Reinforcing Feedback Loops: The strength of the gain of driving loops
  7. Information Flows: The structure of who does and does not have access to information
  8. Rules: Incentives, punishments, constraints
  9. Self-Organization: The power to add, change, or evolve system structure
  10. Goals: The purpose of the system
  11. Paradigms: The mind-set out of which the system-its goals, structure, rules, delays, parameters arises
  12. Transcending Paradigms

Guidelines for Living in a World of Systems

  1. Get the beat of the system.
  2. Expose your mental models to the light of day.
  3. Honour, respect, and distribute information.
  4. Use language with care and enrich it with systems concepts.
  5. Pay attention to what is important, not just what is quantifiable.
  6. Make feedback policies for feedback systems.
  7. Go for the good of the whole.
  8. Listen to the wisdom of the system.
  9. Locate responsibility within the system.
  10. Stay humble-stay a learner.
  11. Celebrate complexity.
  12. Expand time horizons.
  13. Defy the disciplines.
  14. Expand the boundary of caring.
  15. Don’t erode the goal of goodness.

Book Notes: Software Engineering at Google

Software Engineering at Google: Lessons Learned from Programming Over Time by Titus Winters, Tom Manshreck, Hyrum Wright
My rating: 4 of 5 stars

What is Software Engineering?

  • “Software engineering” differs from “programming” in dimensionality: program ming is about producing code. Software engineering extends that to include the maintenance of that code for its useful life span.
  • There is a factor of at least 100,000 times between the life spans of short-lived code and long-lived code. It is silly to assume that the same best practices apply universally on both ends of that spectrum.
  • Software is sustainable when, for the expected life span of the code, we are capable of responding to changes in dependencies, technology, or product requirements. We may choose to not change things, but we need to be capable.
  • Hyrum’s Law: with a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.
  • Every task your organization has to do repeatedly should be scalable (linear or better) in terms of human input. Policies are a wonderful tool for making processes scalable.
  • Process inefficiencies and other software-development tasks tend to scale up slowly. Be careful about boiled-frog problems.
  • Expertise pays off particularly well when combined with economies of scale. 
  • “Because I said so” is a terrible reason to do things.
  • Being data driven is a good start, but in reality, most decisions are based on a mix of data, assumption, precedent, and argument. It’s best when objective data makes up the majority of those inputs, but it can rarely be all of them.
  • Being data driven over time implies the need to change directions when the data changes (or when assumptions are dispelled). Mistakes or revised plans are inevitable.

Culture

How to Work Well on Teams?

  • Be aware of the trade-offs of working in isolation.
  • Acknowledge the amount of time that you and your team spend communicating and in interpersonal conflict. A small investment in understanding personalities and working styles of yourself and others can go a long way toward improving productivity. 
  • If you want to work effectively with a team or a large organization, be aware of your preferred working style and that of others.

Knowledge Sharing

  • Psychological safety is the foundation for fostering a knowledge-sharing environment.
  • Start small: ask questions and write things down. . Make it easy for people to get the help they need from both human experts and documented references.
  • At a systemic level, encourage and reward those who take time to teach and broaden their expertise beyond just themselves, their team, or their organization.
  • There is no silver bullet: empowering a knowledge-sharing culture requires a combination of multiple strategies, and the exact mix that works best for your organization will likely change over time.

Engineering for Equity

  • Bias is the default.
  • Diversity is necessary to design properly for a comprehensive user base. . Inclusivity is critical not just to improving the hiring pipeline for underrepresented groups, but to providing a truly supportive work environment for all people.
  • Product velocity must be evaluated against providing a product that is truly useful to all users. It’s better to slow down than to release a product that might cause harm to some users.

How to Lead a Team

  • Don’t “manage” in the traditional sense; focus on leadership, influence, and serv ing your team.
  • Delegate where possible; don’t DIY (Do It Yourself).
  • Pay particular attention to the focus, direction, and velocity of your team.

Leading at Scale

  • Always Be Deciding: Ambiguous problems have no magic answer; they’re all about finding the right trade-offs of the moment, and iterating.
  • Always Be Leaving: Your job, as a leader, is to build an organization that automatically solves a class of ambiguous problems over time without you needing to be present.
  • Always Be Scaling: Success generates more responsibility over time, and you must proactively manage the scaling of this work in order to protect your scarce resources of personal time, attention, and energy.

Measuring Engineering Productivity

  • Before measuring productivity, ask whether the result is actionable, regardless of whether the result is positive or negative. If you can’t do anything with the result, it is likely not worth measuring.
  • Select meaningful metrics using the GSM framework. A good metric is a reasonable proxy to the signal you’re trying to measure, and it is traceable back to your original goals.
  • Select metrics that cover all parts of productivity (QUANTS). By doing this, you ensure that you aren’t improving one aspect of productivity (like developer veloc ity) at the cost of another (like code quality).
  • Qualitative metrics are metrics, too! Consider having a survey mechanism for tracking longitudinal metrics about engineers’ beliefs. Qualitative metrics should also align with the quantitative metrics; if they do not, it is likely the quantitative metrics that are incorrect.
  • Aim to create recommendations that are built into the developer workflow and incentive structures. Even though it is sometimes necessary to recommend additional training or documentation, change is more likely to occur if it is built into the developer’s daily habits.

Processes

Style Guide and Rules

  • Rules and guidance should aim to support resilience to time and scaling.
  • Know the data so that rules can be adjusted.
  • Not everything should be a rule.
  • Consistency is key. Automate enforcement when possible.

Code Review

  • Code review has many benefits, including ensuring code correctness, comprehension, and consistency across a codebase.
  • Always check your assumptions through someone else; optimize for the reader.
  • Provide the opportunity for critical feedback while remaining professional.
  • Code review is important for knowledge sharing throughout an organization.
  • Automation is critical for scaling the process.
  • The code review itself provides a historical record.

Documentation

  • Documentation is hugely important over time and scale.
  • Documentation changes should leverage the existing developer workflow.
  • Keep documents focused on one purpose.
  • Write for your audience, not yourself.

Testing Overview

  • Automated testing is foundational to enabling software to change.
  • For tests to scale, they must be automated.
  • A balanced test suite is necessary for maintaining healthy test coverage.
  • “If you liked it, you should have put a test on it.” 
  • Changing the testing culture in organizations takes time.

Unit Testing

  • Strive for unchanging tests.
  • Test via public APIs.
  • Test state, not interactions.
  • Make your tests complete and concise.
  • Test behaviors, not methods.
  • Structure tests to emphasize behaviors.
  • Name tests after the behavior being tested.
  • Don’t put logic in tests.
  • Write clear failure messages.
  • Follow DAMP (Descriptive And Meaningful Phrases) over DRY (Don’t Repeat Yourself) when sharing code for tests.

Test Doubles

  • A real implementation should be preferred over a test double.
  • A fake is often the ideal solution if a real implementation can’t be used in a test.
  • Overuse of stubbing leads to tests that are unclear and brittle.
  • Interaction testing should be avoided when possible: it leads to tests that are brittle because it exposes implementation details of the system under test.

Larger Testing

  • Larger tests cover things unit tests cannot.
  • Large tests are composed of a System Under Test, Data, Action, and Verification.
  • A good design includes a test strategy that identifies risks and larger tests that mitigate them.
  • Extra effort must be made with larger tests to keep them from creating friction in the developer workflow.

Deprecation

  • Software systems have continuing maintenance costs that should be weighed against the costs of removing them.
  • Removing things is often more difficult than building them to begin with because existing users are often using the system beyond its original design.
  • Evolving a system in place is usually cheaper than replacing it with a new one, when turndown costs are included.
  • It is difficult to honestly evaluate the costs involved in deciding whether to deprecate: aside from the direct maintenance costs involved in keeping the old system around, there are ecosystem costs involved in having multiple similar systems to choose between and that might need to interoperate. The old system might implicitly be a drag on feature development for the new. These ecosystem costs are diffuse and difficult to measure. Deprecation and removal costs are often similarly diffuse.

Tools

Version Control and Branch Management

  • Use version control for any software development project larger than “toy project with only one developer that will never be updated.
  • There’s an inherent scaling problem when there are choices in “which version of this should I depend upon?”
  • One-Version Rules are surprisingly important for organizational efficiency. Removing choices in where to commit or what to depend upon can result in significant simplification.
  • In some languages, you might be able to spend some effort to dodge this with technical approaches like shading, separate compilation, linker hiding, and so on. The work to get those approaches working is entirely lost labor-your software engineers aren’t producing anything, they’re just working around technical debts.
  • Previous research (DORA/State of DevOps/Accelerate) has shown that trunk based development is a predictive factor in high-performing development organizations. Long-lived dev branches are not a good default plan.
  • Use whatever version control system makes sense for you. If your organization wants to prioritize separate repositories for separate projects, it’s still probably) wise for interrepository dependencies to be unpinned/”at head”/”trunk based.” There are an increasing number of VCS and build system facilities that allow you to have both small, fine-grained repositories as well as a consistent “virtual” head/trunk notion for the whole organization.

Code Search

  • Helping your developers understand code can be a big boost to engineering pro ductivity. At Google, the key tool for this is Code Search.
  • Code Search has additional value as a basis for other tools and as a central, standard place that all documentation and developer tools link to.
  • The huge size of the Google codebase made a custom tool-as opposed to, for example, grep or an IDE’s indexing-necessary.
  • As an interactive tool, Code Search must be fast, allowing a “question and answer” workflow. It is expected to have low latency in every respect: search, browsing, and indexing.
  • It will be widely used only if it is trusted, and will be trusted only if it indexes all code, gives all results, and gives the desired results first. However, earlier, less powerful, versions were both useful and used, as long as their limits were understood.

Build Systems and Build Philosophy

  • A fully featured build system is necessary to keep developers productive as an organization scales.
  • Power and flexibility come at a cost. Restricting the build system appropriately makes it easier on developers.
  • Build systems organized around artifacts tend to scale better and be more reliabl than build systems organized around tasks.
  • When defining artifacts and dependencies, it’s better to aim for fine-grained modules. Fine-grained modules are better able to take advantage of parallelism and incremental builds.
  • External dependencies should be versioned explicitly under source control. Relying on “latest” versions is a recipe for disaster and unreproducible builds.

Critique: Google’s Code Review Tool

  • Trust and communication are core to the code review process. A tool can enhance the experience, but it can’t replace them.
  • Tight integration with other tools is key to great code review experience.
  • Small workflow optimizations, like the addition of an explicit “attention set,” can increase clarity and reduce friction substantially.

Static Analysis

  • Focus on developer happiness. We have invested considerable effort in building feedback channels between analysis users and analysis writers in our tools, and aggressively tune analyses to reduce the number of false positives..
  • Make static analysis part of the core developer workflow. The main integration point for static analysis at Google is through code review, where analysis tools provide fixes and involve reviewers. However, we also integrate analyses at additional points (via compiler checks, gating code commits, in IDEs, and when browsing code).
  • Empower users to contribute. We can scale the work we do building and maintaining analysis tools and platforms by leveraging the expertise of domain experts. Developers are continuously adding new analyses and checks that make their lives easier and our codebase better.

Dependency Management

  • Prefer source control problems to dependency management problems: if you can get more code from your organization to have better transparency and coordination, those are important simplifications.
  • Adding a dependency isn’t free for a software engineering project, and the complexity in establishing an “ongoing” trust relationship is challenging. Importing dependencies into your organization needs to be done carefully, with an understanding of the ongoing support costs
  • A dependency is a contract: there is a give and take, and both providers and consumers have some rights and responsibilities in that contract. Providers should be clear about what they are trying to promise over time.
  • SemVer is a lossy-compression shorthand estimate for “How risky does a human think this change is? SemVer with a SAT-solver in a package manager takes those estimates and escalates them to function as absolutes. This can result in either overconstraint (dependency hell) or under constraint (versions that should work together that don’t).
  • By comparison, testing and CI provide actual evidence of whether a new set of versions work together.
  • Minimum-version update strategies in Sem Ver/package management are higher fidelity. This still relies on humans being able to assess incremental version risk accurately, but distinctly improves the chance that the link between API provider and consumer has been tested by an expert.
  • Unit testing, CI, and (cheap) compute resources have the potential to change our understanding and approach to dependency management. That phase-change requires a fundamental change in how the industry considers the problem of dependency management, and the responsibilities of providers and consumers both.
  • Providing a dependency isn’t free: “throw it over the wall and forget” can cost you reputation and become a challenge for compatibility. Supporting it with stability can limit your choices and pessimize internal usage. Supporting without stability can cost goodwill or expose you to risk of important external groups depending on something via Hyrum’s Law and messing up your “no stability” plan.

Large-Scale Changes

  • An LSC process makes it possible to rethink the immutability of certain technical decisions.
  • Traditional models of refactoring break at large scales.
  • Making LSCS means making a habit of making LSCs.

Continuous Integration

  • A CI system decides what tests to use, and when.
  • CI systems become progressively more necessary as your codebase ages and grows in scale.
  • CI should optimize quicker, more reliable tests on presubmit and slower, less deterministic tests on post-submit.
  • Accessible, actionable feedback allows a CI system to become more efficient.

Continuous Delivery

  • Velocity is a team sport: The optimal workflow for a large team that develops code collaboratively requires modularity of architecture and near-continuous integration.
  • Evaluate changes in isolation: Flag guard any features to be able to isolate problems early.
  • Make reality your benchmark: Use a staged rollout to address device diversity and the breadth of the userbase. Release qualification in a synthetic environment that isn’t similar to the production environment can lead to late surprises.
  • Ship only what gets used: Monitor the cost and value of any feature in the wild to know whether it’s still relevant and delivering sufficient user value.
  • Shift left: Enable faster, more data-driven decision making earlier on all changes through CI and continuous deployment.
  • Faster is safer: Ship early and often and in small batches to reduce the risk of each release and to minimize time to market.

Compute as a Service

  • Scale requires a common infrastructure for running workloads in production.
  • A compute solution can provide a standardized, stable abstraction and environment for software.
  • Software needs to be adapted to a distributed, managed compute environment.
  • The compute solution for an organization should be chosen thoughtfully to provide appropriate levels of abstraction.