Website Pages:

Welcome to Pipeline Design Patterns
Vocabulary
A Theory of Pipeline
Pipeline Design Patterns
Cache
Bake / Baking
Conform
Case Studies

Sections in this Page:

Cache
Execution Cache
Access Cache

Definitions:

A Vocabulary for Diagrams
Glossary

Cache

A cache is the result of a process, saved to avoid re-executing that process unless needed. Typically caches are checked against the process inputs to decide whether to re-execute.

There are several reasons to cache a file:

  • Execution is expensive and needs to be avoided if possible.

  • Dependencies are burdensome to manage. It's useful to get an output and keep it if the processes are changing constantly.

  • Access is expensive. For example, getting the input from a remote location, due to network latency, etc. In this case, the input can be cached locally.

Execution Cache

This is how we draw execution or evaluation caches.

cache
Click on diagram to Zoom/Unzoom.

This is interpreted as:

  • check the timestamps of the Cache vs the Input, and if the former is not out of date:

  • optimize the exection path inside Process in some way, perhaps by eliding some or all of the Process.

Motivation

Here's a way to think of this:

cache_pre
Click on diagram to Zoom/Unzoom.

In this diagram there is an Intermediate File between Process1 and Process2. If we check the timestamp of that against the Input File, we can potentially elide Process1.


Or, to draw it another way, we can combine Process1 and Process2 into a single Composite Process, and move the Intermediate File down below.

cache_large
Click on diagram to Zoom/Unzoom.

Note: This is an example of how we can think of many of the patterns as graph transformations. Any time we find a file between two processes, we have an opportunity to cache.

Access Cache

coming soon