AWS Simple Workflow Service (SWF) – Part 1

AWS Simple Workflow Service (SWF) – Part 1

Programming Applications in Java with the Flow Framework for AWS’ Simple Workflow Service (SWF). In life, we often follow steps to accomplish something, for example, trying to figure why the lamp does not work. This is an easy problem at first. This gets more complicated when other people are involved. First, you need people qualified to do the work. Imagine a janitor in the above diagram. You communicate the problem that the lamp is out, you have access to them and they are qualified to fix this, and then they begin the steps necessary to do so.

Now imagine this example with a group of janitors. Who does what and when? How does Joe know Sally already checked that it is plugged in? It gets very complicated, very fast.

Now imagine, trying to automate even this simple issue with a computer. Pieces have to communicate steps back and forth to know if the problem is solved and to move onto the next step, and what that next step is. (Ok, it is plugged in, what now?) They have to know who did what, and what they should do next. Sound exhausting? It is!

In steps Amazon Simple Workflow Service to the rescue! It lets you focus on the steps themselves (in this case the colored steps above), without having to manage the communication between the pieces doing the work, or how many pieces are needed and for how long. So we have 1000 lamps out? No problem, Amazon just “boots” up a few more janitors until it’s done (with each janitor knowing who checked what!), and then “turns the janitors off” when the work is completed. We build the steps, Amazon does the rest, automatically.


[Newvem's big data engine connects to your AWS account via API in a secure, non-invasive, and read-only manner:

  • Collects raw usage metrics
  • Performs proprietary analytics on the data to identify user specific insights
  • Delivers insights based on user specific AWS usages patterns

Learn More]


Why Workflows

In general terms, a workflow process depicts sequences of operations (called steps) needed for achieving an end result. These steps represent work done by Agents, like persons (or groups of persons) or computers.

We don’t often perceive these processes as they happen, but they are fairly common and intuitive: For instance, right now I’m an agent in one process – Writing – in a Workflow Process for Content Production. Once I finish this article, I’ll mail my editor, who will coordinate with other Agents (SEO, Copy Editing, Technical Review, Formatting), and the end result will be an article.

However, the whole process carries tacit States and Metadata: Articles are going to be revised several times, and communication must be carried out for validation, making corrections if needed. By being intuitive, we often don’t even notice that we’re working as agents in several Workflow Processes at any given time.

In the example above, the Workflow Engine, which tracks and delegates its execution, is the Editor, and it’s a single point of failure. This tacit Workflow Process can be made explicit, even using industry-wide notations, such as Business Process Modeling Notation (BPMN).

Documenting a process helps turns it into an Organizational Standard. At this point, Workflow Instances could be distributed among a team of Editors, transferring the risk of relying on a single person to a team, and mitigating risks. With feedback, this process could be improved, and further requirements (trust me, they often creep up – that is what governments are made for) and other steps (and agents) are brought into it.

(If you’re a Manager and you’ve made it this far, I hope you are convinced about using Simple Workflow Service. If not, stay tuned for the next part -”Sample Cases for SWF Usage”).

How AWS Simple Workflow (SWF) Service works

In rough terms your Workflow Processes are defined as Workflow Types, and they are registered on Simple Workflow Service in named registries called Domains. Individual steps are registered as Activity Types.

Work to be done is distributed among Task Lists, which are called dynamic queues and are long-polled by Agents called Workers. There are two kinds of Workers: Deciders (for Workflow Types) and Activity Workers. As such, there are two kinds of Task Lists, one for each Decider/Workflow Type (called a “Decision Task List”) and Activity Type.

When a system starts a Workflow Type (using the StartWorkflowExecution API Call), it creates a Workflow Execution with an Unique Id, along with its state (an arbitrary payload, but generally JSON) and a list of Events and a (initially empty) List of Performed Activities.

The Workflow Execution is sent to a Decision Task List, and a Decider Worker picks it up (via the PollForDecisionTask API Call), and submits back their results (RespondDecisionTaskCompleted). In the last step, the Decider submits a list of Decisions, which are individual further actions that must be taken, like Managing Task Scheduling Activities, Registering Custom Events, Managing Workflow Execution Status (Canceling, Failing), Applying Timers, and Sending Signals.

Those Activity Types are fed into Activity Task Lists, and similarly, they’re picked by Activity Workers which submit them back into Simple Workflow Service.

Activity Worker Results are merged with the previous Activity Worker Results for the same Workflow Execution, and are sent again to the Decider Task Lists, until completion (successful or not) of the Workflow Execution.

Agent Workers are Threads, and their basic outline is to poll their Task Lists, adhering to a long-polling protocol: Connections are kept up to 60 seconds or until a task arrives, and recycled if they timeout.

Workflow Types also keep other interesting properties besides the basic Description and Date of Registration, like its Status (Registered/Deprecated), the Task List it is bound to by default, and specially, Timeout values for its Completion (“Execution Start to Close Timeout”) and the Default Timeout Value for an Individual Task Completion (“Task Start to Close Timeout”), along with its Child Policy, which sets how Child Workflow Instances are dealt with when the Parent Workflow Instance is Terminated.

In a Domain, you also declare Activity Types, which are very similar in structure to Workflow Types, but with different kinds of timeouts,  such as:

  • Task Scheduling to its Start

  • Task Scheduling to its Close

  • Task Start to Close timeout,

  • Task Heartbeat Timeout

Ok, we’re done with most of the theory. Stay tuned for the next post “Sample Use Cases for SWF Usage”.


[Newvem continuously tracks and analyzes complete resources utilization patterns, and provides a down-to-the-hour picture of your AWS consumption and usage behavior, as well as future capacity estimates. Learn More]


About the Author

Aldrin Leal, Cloud Architect and Partner at ingenieux Aldrin Leal works as an Architect and QA Consultant, specially for Cloud and Big Data cases. Besides his share of years worth between the trenches in projects ranging from Telecom, Aerospatial, Government and Mining Segments, he is also fond with a passion to meet new paradigms and figure a way to bring them into new and existing endeavours.

Contact Aldrin

Lamp intro by Ryan Terwedo, Founder of cloudRIA, cloud tech for Investment Firms. cloudRIA designs and builds cloud based infrastructure for RIA and other Financial / Investment firms specializing in integrating 3rd party web based, internal proprietary and legacy desktop solutions in the the browser with access on Macs and PCs. He also claims to have amassed a large sticker and star collection from elementary school from excelling at all non color related activities.


Keywords: Amazon web services, Amazon AWS console, Simple Workflow Service, SWF, Task Scheduling, AWS API, Amazon Cloud Services, SWF Engine

You must be to post a comment.

* As a bonus, you'll receive our weekly newsletter!

Hitchhiker's Guide to The Cloud

Newvem's eBook for Cloud Operations