The model of declarative eventing allows for listening to very specific events and then triggering specific actions. This model simplifies the developer experience, as well as optimizing the system by reducing network traffic.

AWS S3 bucket trigger

In looking AWS to explain changes in S3 can trigger Lambda functions, I found that the AWS product docs focus on the GUI configuration experience. This probably makes it easy for new folks to write a specific Lambda function; however, it a little harder to see the system patterns before gaining a lot of hands-on experience.

The trigger-action association can be seen more clearly in a Terraform configuration. Under the hood, Teraform must be using AWS APIs for setting up the trigger). The configuration below specifies that whenever a json file is uploaded to a specific bucket with the path prefix “content-packages” then a specific Lambda function will be executed:

resource "aws_s3_bucket_notification" "bucket_terraform_notification" {
    bucket = "${aws_s3_bucket.terraform_bucket.id}"
    lambda_function {
        lambda_function_arn = "${aws_lambda_function.terraform_func.arn}"
        events = ["s3:ObjectCreated:*"]
        filter_prefix = "content-packages/"
        filter_suffix = ".json"
    }
}

— via justinsoliz’ github gist

Google Cloud events

To illustrate an alternate developer experience, the examples below are shown with Firebase JavaScript SDK for Google Cloud Functions, which is idiomatic for JavaScript developers using the Fluent API style, popularized by jQuery. The same functionality is available via command line options using gcloud, the Google Cloud CLI.

** Cloud Storage trigger**

Below is an example of specifying a trigger for a change to a Google Cloud Storage object in a specific bucket:

exports.generateThumbnail = functions.storage.bucket('my-bucket').object().onChange((event) => {
  // ...
});

Cloud Firestore trigger

This approach to filtering events at their source is very powerful when applied to database operations, where a developer can listen to a specific database path, such as with Cloud Firestore events:

exports.createProduct = functions.firestore
  .document('products/{productId}')
  .onCreate(event => {
    // Get an object representing the document
    // e.g. {'name': 'Wooden Doll', 'description': '...}
    var newValue = event.data.data();

    // access a particular field as you would any JS property
    var name = newValue.name;

    // perform desired operations ...
});

Martin Fowler’s talk “The Many Meanings of Event-Driven Architecture” at GOTO2017 provides a good overview of different patterns that all are described as “event-driven” systems. At the end of the talk, he references to an earlier event-driven article, which offers a good prose description of these different patterns that folks are calling event-driven programming. In this talk, he covers specific examples that illustrate the patterns, grounding them in specific applications.

Event Notification

Person -> CRM -> Insurance Quoting -> CommunicationsFor example: address changed

Scenario: CRM system stores information about people. An insurance quoting system generates insurance rates based on demographics and address. When someone’s address changes, we need to calculate a new value for the cost of insurance.

We often don’t want these systems to be coupled, instead we want a reversal of dependencies. This patterns is used in relatively large scale systems, and also a long-established client-side pattern to separate GUIs and the rest of your code.

The change becomes a first class notion. We bundle the notification + data about the change.

Events OR Commands
* Commands enforce the coupling, it’s very different from an event, it conveys intent
* Naming makes all the diffrence

Additional property → easy to add systems without modifying the original system

Martin notes “the dark side of event notification” where your system quickly becomes hard to reason about because there is not statement of overall behavior.

Event-Carried State Transfer

Named in contrast to REST (Representational State Transfer), the event carries ALL of the data needed about the event, which completely de-couples the target system from the system that originates the event.

Of course, this introduces data replication and eventual consistency (which can be good for some use cases); however, this is a less common pattern since this lack of consistency can actually make the system more complex.

Event Sourcing

This is one of my favorite patterns which Martin explains nicely in the talk with two examples:

  • Version control is an event source system for code.
  • Accounting ledgers track every credit or debit, which are the source records (events), and the balance is calculated from those records.

Benefits

  • auditing: natural part of the system
  • debugging: easy to replay a subset of events locally
  • historic state: time travel!
  • alternative state: branching, correcting errors
  • memory image: application state can be volatile (since persistence is achieved with event log, processing can happen quickly in memory based on recent events that can quickly regenerate state based on recent snapshots)

Drawbacks

  • unfamiliar
  • external systems: everything needs to be an event
  • event schema: what happens when your data types change?
  • identifiers: how to create identifiers to reliably replay events

Common (Related) Challenges

  • **asynchronous processing** can be hard to reason about. This isn’t required for an event sourcing system, yet it is easy to add and needed in most server-side systems. Useful to remember that this is distinct from the core event sourcing pattern.
  • **versioning** is another option that is quite useful, yet also adds complexity. Greg Young’s advice: don’t have any business logic between the event and the internal representation of a record.

Martin talks about the difference between input event (the intention) and the output event (the effect). In deciding what to store think about how we would fix a bug. The key thing is to be clear about what you are storing, and probably most of the time you want to store both.

CQRS

Coined by Greg Young, Command Query Responsibility Segregation, is where your write model is different from your read model. Two software components: one for updating the current model (the command component), and one for reading the state (the query component).

Martin suggests that we need to be wary of this approach. A good pattern when used appropriately (which you could say of any model). But isn’t Event Sourcing just a special case of this? Maybe the special case is what provides a structure that make it easier to reason about.

Ghostscript lets you do all sorts of PDF and PostScript transformations. It’s got a command-line tool which is great for page-level operations.

Installation on a mac is pretty easy with homebrew:

brew install ghostscript

The syntax for the commands not very memorable, but easy once you know it. To get a PNG from a PDF:

gs -sDEVICE=pngalpha -o output.png input.pdf

We can also use the API to call the engine from C code. Note: to use this from a program we need to publish our source code (or purchase a license from the nice folks who create and maintain Ghostscript), which seems fair to me. See license details.

To do the exact same thing as above, I created a small program based on the example in the API doc, which I’ve posted on github.

What I really want to do is to replace text in a PDF (using one PDF as a template to create another). It seems like all the code I need is available in the Ghostscript library, but maybe not exposed in usable form:

  • Projects Seeking Developers Driver Architecture was the only place in the docs that I learned that we can’t add a driver without modifying the source code: “Currently, drivers must be linked into the executable.” Might be nice for these to be filed as bugs so interested developers might discuss options here. Of course, not sure that making a driver is a good solution to my problem at all.
  • There’s an option -dFILTERTEXT that removes all text from a PDF that I thought might provide a clue. I found the implementation in gdevoflt.c with a comment that it was derived from gdevflp.c.
  • gdevflp: This device is the first ‘subclassing’ device; the intention of subclassing is to allow us to develop a ‘chain’ or ‘pipeline’ of devices, each of which can process some aspect of the graphics methods before passing them on to the next device in the chain.

So, this appears to require diving into the inner-workings of Ghostscript, yet the code seems to be structured so that it is easily modifiable for exactly this kind of thing. It seems like it would be possible to add a filter that modifies text, rather than just deleting it, as long as the layout is unaffected. This implies setting up for building and debugging GhostScript from source and the potential investment of staying current with the codebase, which might not work for my very intermittent attention.