Exercises in Restful Integration and Continuous Delivery

Moved to a New Address

2016-01-02T19:21:00.000+01:00

This blog moved to a new address.

Exercises in Restful Integration and Continuous Delivery

Welcome!

2014-09-30T10:11:00.000+02:00

IO::Iron Policies - No Typing Errors to Iron.io Services!

Policies is a way to limit the names of message queues, code packages, caches and items (item keys) to a predefined group of possible strings. This can limit the chances for typos and enforce an enterprise policy. The policies are loaded from a JSON file which is specified either when creating a IO::Iron::Iron*::Client object, or in the config file .iron.json (or equivalent).

Policies in Config file

Add the item policies to the config file. The value of the item is the file name of the policies file.

Example config file:

    {
        "project_id":"51bdf5fb2267d84ced002c99",
        "token":"-Q9OEHZPhdZtd0KHBzzdUJIqV_E",
        "host":"cache-aws-us-east-1.iron.io",
        "policies":"iron_policies.json"
    }

Policies file specified when creating the client

    my $policies_filename = '/etc/ironmq/global_policies.json';
    my $client = IO::Iron::IronCache::Client->new('policies' => $policies_filename);

Examples of Policies File and Explanation of Configuration

The 'default' policies JSON file:

    (
    'definition' => {
        'character_group' => {
        },
        'no_limitation' => 1, # There is an unlimited number of alternatives.
    },
    'queue' => { 'name' => [ '[:alnum:]{1,}' ], },
    'cache' => {
        'name' => [ '[:alnum:]{1,}' ],
        'item_key' => [ '[:alnum:]{1,}' ]
        },
    'worker' => { 'name' => [ '[:alnum:]{1,}' ], },
    );

The above file would set an open policy for IronMQ, IronCache and IronWorker alike. The file is divided into four parts: definition for defining meta options, and queue|cache|worker parts for defining the changing strings (queue|cache|worker names and item keys). The character group alnum covers all ascii alphabetic characters (both lower and upper case) and digits (0-9).

N.B. The option no_limitation controls the open/closed policy. If no_limitation is set (1=set), the policy control is turned off.

An example of policies file

    {
        "__comment1":"Use normal regexp. [:digit:] = number:0-9, [:alpha:] = alphabetic character, [:alnum:] = character or number.",
        "__comment2":"Do not use end/begin limitators '^' and '\$'. They are added automatically.",
        "__comment3":"Note that character groups are closed inside '[::]', not '[[:]]' as normal POSIX groups.",
        'definition' => {
            'character_group' => {
                "[:lim_uchar:]":"ABC",
                "[:low_digit:]":"0123"
            },
        },
        "cache":{
            "name":[
                "cache_01_main",
                "cache_[:alpha:]{1}[:digit:]{2}"
            ],
            "item_key":[
                "item.01_[:digit:]{2}",
                "item.02_[:lim_uchar:]{1,2}"
            ]
        }
    }

This policies file sets policies for cache names and item keys. Both have two templates. Template "cache_01_main" is without wildcards: the template list can also only contain predefined names or keys. Sometimes this could be exactly the wanted behaviour, especially in regard to cache and message queue names.

Items beginning with '__' are considered comments. Comments can not be inserted into lists, such as I.

The definition part contains the list character_group for user-defined groups. The following groups are predefined:

[:alpha:]
: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
[:alnum:]
: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
[:digit:]
: 0123456789
[:lower:]
: abcdefghijklmnopqrstuvwxyz
[:upper:]
: ABCDEFGHIJKLMNOPQRSTUVWXYZ
[:word:]
: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_

All lower ASCII (7-bit) characters are allowed in names and in character groups, except for the reserved characters (RFC 3986):

!$&'()*+,;=:/?#[]@

A character group definition is closed inside characters '[::]', not '[[:]]' as normal POSIX groups. Only the equivalents of the POSIX groups mentioned above can be used; e.g. POSIX group [[:graph:]] is not available.

When using the character groups in a name or key, only two markings are allowed: [:group:]{n} and [:group:]{n,n}, where 'n' is an integer. This limitation (not being able to use any regular expression) is due to the double functionality of the policy: a) it acts as a filter when creating and naming new message queues, code packages, caches and cache items; 2) it can be used to list all possible names, for example when quering for cache items.

2014-08-31T23:39:00.001+02:00

IO::Iron::Applications - Command line tools for Iron.io services

IO::Iron::Applications is an auxiliary package for IO::Iron. IO::Iron contains the library for using the Iron.io cloud services in Perl programs. IO::Iron::Applications contains command line programs to operate those services.

IO::Iron::Applications is my addition to the IO::Iron interphase library package which I wrote earlier. The Iron.io WWW interface hud.iron.io Dashboard is great but a bit slow to use when you only need to quickly change some values in IronCache, send a message to IronMQ, erase or empty cache for debugging purposes or other similar activities. With these command line utilities the same functions can be performed fast from a normal shell and not using a web browser.

Policies

The programs take use of the IO::Iron package feature policies so wildcard characters can be used in cache names, item keys, etc.

For example, if iron_cache_policies.json:

    {
        "definition":{
            "character_group":{
                "[:lim_uchar:]":"ABC",
                "[:low_digit:]":"01"
            }
        },
        "cache":{
            "name":[
                "cache_[:lim_uchar:]{1}0[:digit:]{1}"
            ],
            "item_key":[
                "item.02_[:lim_uchar:]{1,2}[:low_digit:]{1}"
            ]
        }
    }

then

    ironcache list items .* --cache cache_A01 --policies iron_cache_policies.json

would print out:

    Cache                         Item                expires
    cache_A01                     item.02_A0                               Key not exists.
    cache_A01                     item.02_A1                               Key not exists.
    cache_A01                     item.02_AA0                              Key not exists.
    cache_A01                     item.02_AA1                              Key not exists.
    cache_A01                     item.02_AB0                              Key not exists.
    cache_A01                     item.02_AB1                              Key not exists.
    cache_A01                     item.02_AC0                              Key not exists.
    cache_A01                     item.02_AC1                              Key not exists.
    cache_A01                     item.02_B0                               Key not exists.
    cache_A01                     item.02_B1                               Key not exists.
    cache_A01                     item.02_BA0                              Key not exists.
    cache_A01                     item.02_BA1                              Key not exists.
    cache_A01                     item.02_BB0                              Key not exists.
    cache_A01                     item.02_BB1                              Key not exists.
    cache_A01                     item.02_BC0                              Key not exists.
    cache_A01                     item.02_BC1                              Key not exists.
    cache_A01                     item.02_C0                               Key not exists.
    cache_A01                     item.02_C1                               Key not exists.
    cache_A01                     item.02_CA0                              Key not exists.
    cache_A01                     item.02_CA1                              Key not exists.
    cache_A01                     item.02_CB0                              Key not exists.
    cache_A01                     item.02_CB1                              Key not exists.
    cache_A01                     item.02_CC0                              Key not exists.
    cache_A01                     item.02_CC1                              Key not exists.

On the command line, all normal regular expression are allowed. E.g.

    item.02_A.{1}0

would return

    Cache                         Item                expires
    cache_A01                     item.02_AA0                              Key not exists.
    cache_A01                     item.02_AB0                              Key not exists.
    cache_A01                     item.02_AC0                              Key not exists.

Following command line programs are available:

ironcache

clear: Clear a cache.: E.g. ironcache clear cache_main
delete: Delete a cache.: E.g. ironcache delete cache_main
delete: Delete item from cache.: E.g. ironcache delete item item.01_AB1
get: Get item/items from cache/caches.: E.g. ironcache get item item.02_A.{2} --cache cache_A01 --config iron_cache.json --policies iron_cache_policies_test_01.json --warn
increment: Increment an item/items in cache/caches.: E.g. ironcache increment item item.02_AC1,item.02_BC1 --cache cache_A01 --value 225
list: List caches or items in a cache/caches.: E.g. ironcache list items .* --cache cache_A01; E.g. ironcache list caches
put: Put or replace item/items to a cache/caches.: E.g. ironcache put item item.02_CC1,item.02_CC2 --cache cache_A01 -- value 123
show: Show the properties of a cache/caches.: E.g. ironcache show cache cache_A01

Revision Control and Project Culture

2014-06-30T18:20:00.000+02:00

Version control, or simply repository control, is one of the most important parts of a software project. After all, it is in many cases used daily. No wonder, then, that version control is not only part of the project structure, but also part of its culture.

This blog entry is partly based on a report, Jämförelse: Subversion och Git, written for Init Ab, a consulting company headquartered in Stockholm.

Centralized and Distributed Version Control

A repository is the place where the source code of a program is kept. The control to a repository is organized with revision control software. This software maintains a monopoly on read and write access to the repository.

Two recently popular programs in this area are Subversion and Git. They represent very different views on version control.

Subversion is the leading program among centralized version control software. A centrally controlled repository is the "classic" way to arrange control over source code. In this system every user first copies the needed parts of the software to his or her local disk and, when done with making changes to it, commits the changed files to the central repository. For every operation, access to the repository is required.

In a decentralized (i.e. distributed) revision control software there is no absolute central repository. Instead, a new user copies the whole repository from any other existing user. Together with the current code also the history of changes is copied. Every user maintains a complete copy of the repository and therefore there is also no need for centralized backups. In practise, it is customary for a project to keep a "dummy user" account which is used for release testing, nightly builds or linked to a continuous integration system, for example Hudson.

Growing Popularity

According to recent studies by Eclipse Community Survey¹ and ITJobsWatch² in the last few years Git has become as popular as Subversion also in business world. Among Open Source hobbyist developers Git has been popular already for some time. However, as the statistics show us, Subversion hasn't actually been losing ground to Git. Subversion is the direct descendant of once hugely popular CVS, Concurrent Version System, and there is still a great number of enterprises who are running CVS and will only consider changing to Subversion.

Results of the Eclipse Community Survey regarding SVN and Git usage.
Year	Git	Subversion
2009	2.4%	57.5%
2010	6.8%	58.3%
2011	12.8%	51.3%
2012	27.6%	46.0%
2013	36.3%	37.8%

ITJobsWatch: Git & Subversion.
Year	Permanent positions:		Rank:
	Git	Subversion	Git	Subversion
2012	1167	3354	263	91
2013	2049	2836	157	107
2014	3605	3265	90	99

(De)centralized Culture

I will not concentrate on technical side of revision control but rather on the cultural aspects that these two very different solutions foster.

Version control, or simply repository control is one of the most important parts handling a project or participating in one. After all, we use it daily. The program which we use to access the repository is one of our most often used tools. Therefore, when it feels like it refuses to co-operate with us, it immediately becomes a major irritation. So it must be simple, reliable and fast.

But more than a tool for programmers, version control is also a link between project leadership (maybe even middle-level management, depending on company structure) and developers and architects. It provides us with (inflexible?) boundaries to how we shape our work.

Ben Collins-Sussman, one of Subversions designers, claims that decentralized version control works badly for teams which don't consist of equally competent people. He quotes some requests³ he got when developing Subversion:

Can you guys please give Subversion on Google Code the ability to hide specific branches?
Can you guys make it possible to create open source projects that start out hidden to the world, then get revealed when they're ready?
Hi, I want to rewrite all my code from scratch, can you please wipe all the history?

Developers are humans and they have a tendency to want to work privately, in a cave, then spring "perfect" code on their community, as if no mistakes had ever been made. In a decentralized version control environment it can be too easy to "slip" into isolation, thinking that committing into your own repository has the same purpose as committing to the central repository. But this is not the case. The local copy of the repository is for the developers hourly or daily use for local backups; but the central repository is "public" so the project manager and others can see where the developer is going. The project policy could be to commit every day before finishing work, and if the central repository is connected to a continuous integration system with unit tests, errors and bad solutions will be discovered earlier. Collins-Sussman quotes Google's culture och mantra: don't run from failure - fail often, fail quickly, and learn.

On the other hand, if the team is small and every developer about at the same level, decentralized version control can foster meritocracy and friendly competing spirit. In a true decentralized version control environment (without a "centralized dummy user") changes are copied directly from one user to another so trusting the other's code becomes a necessity.

A decentralized environment is not the only way to foster meritocracy, however. The Apache Software Foundation is also known for its meritocratic structure in open source projects. They use Subversion exclusively. Project participants are divided into three groups: users who can make suggestions and bug reports, developers who submit their code but cannot commit, and committers who have write access to the repository. Anyone can become user and being a developer only requires to checkout the freely available source code from the Subversion repository. The committers' group replenishes itself from the developers' group by selecting with a common decision the ones whose submitted source code has the best quality. The GNOME Foundation, Apache Software Foundation, Mozilla Foundation, and The Document Foundation officially claim to be meritocracies.

Centralized version control favours a more structured organization, whereas decentralized can suit a self-forming or self-governing team, or hobbyist group. On the hand other, the technical know-how must be somewhat higher, especially when using Git. Git is powerful but somewhat complicated to use, more error-prone (or gives that appearance) in daily usage than its main decentralized competitors Bazaar or Mercurial, not to mention centralized Subversion.

Naturally decentralized version control can suit a well structured organization or a company, as well, but it requires stricter guidelines and processes to guide its usage which in part may nullify its benefits.

Conclusion

The question of team and organization's culture is the most important. As mentioned above, version control is a daily tool, and its users' culture will influence the way it is being used; but also the opposite: the version control tool will influence the users by favouring certain work flows and usage patterns over others.

References

1. Eclipse Community Survey Report 2013, Retrieved 2014-06-13.
2. ItJobsWatch, Retrieved 2014-06-13.
3. Brian W. Fitzpatrick and Ben Collins-Sussman, Team Geek, A Software Developer's Guide to Working Well with Others, 2012, First Edition, O'Reilly Media.

HtmlUnit - For Integration Testing and Webcrawling

2014-05-31T21:56:00.001+02:00

To put it in just a few words: HtmlUnit is a web browser without a window.

Intended for integration testing, HtmlUnit allows user programmatically to manipulate a webpage on a high level, i.e. as if doing it with a normal web browser. The calling program can fill and submit forms, click on buttons, imagemaps and hyperlinks, or activate JavaScript created object. JavaScript, cookies and AJAX are supported. So are proxies and immediate redirection.

GUI integration testing

This kind of testing is about as close to human testing we can get with automated testing. Testing static webpages is always easy because the content only gets loaded once from the remote server but nowadays webpages have more often dynamic content than not. Once the page is loaded not only the outward appearance but also the content itself is changed with the help of JavaScript, CSS (Cascading Style Sheets), AJAX and Adobe Flash (although flash - being a self contained "applet" or videoplayer - is outside the scope of HtmlUnit.

With HtmlUnit the test program can "crawl" through the HTML code section by section confirming that content is correct. Or it can jump straight to a certain part identified by id or name tag. It can "hover" the mouse pointer (emulated, of course) over parts of text or a button on a form, or e.g. select an item from a select (list) button which is wired with JavaScript, and then confirm that the page or form content changes as planned.

HtmlUnit does UI testing for webpages, or more precisely integration testing for HTML elements' and JavaScript's integration.

Webcrawling

Because HtmlUnit is a headless (i.e. windowless) web browser, it can also be used to programmatically browse websites and extract information. On many webpages JavaScript is intimately linked to the processing of forms so that a form cannot be submitted properly without JavaScript's help. These kind of pages are of course examples of poor webform design (separation of concerns is not completed; business logic is mixed with the program flow) - but ours being an imperfect world, even they must be accepted. And that's where HtmlUnit shows what it's made of.

There is plenty of pages where user only needs to log in through the front page, and immediate the sought after information is available, or maybe via a simple form, like logging to your telephone company's website only to see how much saldo or network quota you still have left for the current month. Many simple hardware devices, such as home routers, only provide a Web interface, no SOAP or REST API. HtmlUnit to the rescue! Earlier it was impossible or close to it to get to this content.

Let's see an example in Java:

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlInput;
final WebClient webClient = new WebClient(BrowserVersion.CHROME, proxyIP, proxyPort);
webClient.getOptions().setRedirectEnabled(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getCookieManager().setCookiesEnabled(true);

We have imported some HtmlUnit element classes. We create a new WebClient instance by tell it which browser it should spoof and which server to use as a proxy. Both of these are optional. Sometimes an HTTP server or the client side JavaScript changes layout of the page depending on the requesting browser. We also enable redirection, JavaScript support and cookies support. Another way:

   final WebClient webClient = new WebClient();

Let's continue. We want to find the submit button and input fields for userid and password. Once we get them, we can finish logging in by clicking the submit button and loading a new page in the bargain.

HtmlInput submitButton = null;
HtmlPage titlePage = null;
try {
    titlePage = webClient.getPage(hostname);
} catch (IOException e) {
    e.printStackTrace();
}
final List forms = titlePage.getForms();
// iterate through the list to find what we need.

	[...]
submitButton = loginForm.getInputByName("login");
final HtmlTextInput usernameTextField = loginForm.getInputByName("login_id");
final HtmlPasswordInput passwordTextField = loginForm.getInputByName("login_password");
usernameTextField.setValueAttribute(userId);
passwordTextField.setValueAttribute(password);

try {
  entryPage = submitButton.click();
} catch (IOException e) {
  e.printStackTrace();
}
List links = entryPage.getAnchors();
for (HtmlAnchor link : links) {
  logger.debug("Entry Page link: " + link.asXml());
  if (link.asXml().contains("create_new_entry.new")) {
    linkToJobAdPage = link;
  }
}

HtmlUnit for Perl

HtmlUnit is not a Java monopoli just because it was developed on Java. It's also available for other programming languages.

Celerity is a JRuby wrapper around HtmlUnit – a headless Java browser with JavaScript support.

WWW::HtmlUnit is the Perl equivalent, an Inline::Java based wrapper of the HtmlUnit v2.14 library

IO::Iron gets command line tools

2014-04-30T01:52:00.000+02:00

Now that all the functions of Iron.io's IronMQ, IronCache and IronWorker services are turned into Perl client libraries, it is time to think about not only the programmer but also the tester and application supporter's needs. They require an easy and quick access to the services: command line tools.

Perl has several possible frameworks for creating command line utilities: e.g. CLI::Framework, App::Cmd, Badger, CLI::Application and CLI::Dispatch. From these I picked App::Cmd mostly because of its decentralized nature.

Command Line Tool Design for Continuous Integration

One of the principles of Continuous Integration states that "Everyone commits to the baseline every day". To make the programmers' load lighter every change should be in as much isolation as possible. Centrally located code which refers to individual parts of the system should always be generated automatically so the programmer does not need to remember and bother to keep up-to-date any kind of central index / reference table / central documentation / user reference or any other kind of central keeping place for things. Automatic code generation not only ensures that the "keeping place" is always up-to-date but also avoids typing errors. In Continuous Integration repository it also prevents or at least limits the possibility of merge conflicts in same files.

Equally important is to follow the practise of "Generate User Documentation from Program Code". App::Cmd is a good example of this. Actually, App::Cmd uses Getopt::Long::Descriptive which is its own small system (or framework) for processing command line options and parameters. The options and parameters are defined in a meta language (in a Perl hash data structure) and the Getopt::Long::Descriptive package uses this data structure to present the same options to user when needed, e.g. when user mistypes a parameter name.

my ($opt, $usage) = describe_options(
'my-program %o <some-arg>',
[ 'server|s=s', "the server to connect to", { required => 1 } ],
[ 'port|p=i', "the port to connect to", { default => 79 } ],
[],
[ 'verbose|v', "print extra stuff" ],
[ 'help', "print usage message and exit" ],
);

becomes on the text terminal:

my-program [-psv] [long options...] <some-arg>
-s --server the server to connect to
-p --port the port to connect to
-v --verbose print extra stuff
--help print usage message and exit

Loose Coupling at Runtime on Application Level

However, App::Cmd goes even further. When several distinctively different command functions are combined into one application, they are completely separated into individual files. This is a fine example of loose coupling inside one application. The commands do not know of each others' existence and neither do the programmers need to know of it. In IronMQ's case, the executable ironmq contains individual commands like 'add', 'delete', 'show' and 'list' but their existence is not documented permanently (statically) anywhere. One programmer works on one command and completes his or her work regardless of whether the other programmers have finished with the other commands. When user executes ironmq, App::Cmd framework discovers at runtime which commands and parameters are available.

This fastens the application development by making interconnected parts not reliant of each other. It speeds Continuous Integration and time to deployment.

Dist::Zilla as a Continuous Delivery Tool

2014-03-30T15:09:00.000+02:00

I just recently converted my IO::Iron distribution to using Dist::Dilla as a release and build automation tool. Dist::Dilla is mainly targeted at people writing free software Perl packages for releasing into CPAN (Perl free software archive) but if used properly it can make easier the releasing of any software.

Before

When I started to build the IO::Iron distribution, I already knew of Dist::Zilla but two things kept me from adapting it. Firstly, I considered it too difficult to learn for such a small project (which later grew), and, secondly, being bloated and suffering from featuritis. Instead, I went with the classic solution of using Module::Starter to begin, and continued with manually editing the Makefile.PL and every other file, including MANIFEST, README and Changes. I used my private Subversion repository. I uploaded to CPAN via the CPAN Author page page.

After I had forgotten to update the Changes file a few times, I started to reconsider Dist::Zilla. The more I read about it, e.g. Dave Rolsky's excellent blog entry Walking Through a Real dist.ini, the more it seemed to make sense. About two weeks ago I decided to take the time required, a day or two, and go through the setting up of Dist::Zilla and converting IO::Iron.

After

It was worth the effort. Dist::Zilla does not replace the Makefile.PL which is used when user takes a distribution into use. Makefile.PL builds, tests and installs at user's end. But Dist::Zilla prepares the distribution for uploading. It automates almost all the repeating steps involved when releasing: determines prerequisites, manages version numbers and Changes file, checks that the changes have been committed, and - above all - builds the Makefile.PL.

Dist::Zilla streamlines the code-test-commit-release -cycle and defines a workflow, thus rising release quality.

Inner and Outer Workings

Using Dist::Zilla is done with the command line tool dzil. It is very similar to Make in outward appearance. Dist::Zilla itself is actually a frame for defining workflow stages. All functionality is executed by plugins. Building a release is divided into stages or roles similiar to what Makefile.PL uses: build, test, install, release, etc. The plugins are attached into separate stages. For example, gathering the distribution files and reading them into memory (from which they will later be written into a new build directory) is a stage and the equivalent roles are FileGatherer and FileInjector. All required plugins which fill these roles will be executed at this stage, and a plugin can read an existing file from disk, or create a file dynamically.

Creating Distributions

When creating a CPAN distribution, such as IO::Iron, whose "distribution source code" is now located publicly at Github, the last action (i.e. plugin) when executing dzil release is normally "UploadToCPAN", but this can be changed by editing the dist.ini file. CPAN distribution format is convenient also for other code releases than just CPAN packages. Instead of uploading, the last action in the chain could be committing code to the repository, or making a direct installation.

Continuous Delivery

Dist::Zilla's modular structure makes it adaptable to new situations, even to different programming languages. It is not limited to Perl, not even to programming. With a different set of plugins it could just as well serve as a blog authoring (automatic spell checking, abbreviation expanding, date/version managing) and uploading tool. It becomes a competitor to e.g. Maven [http://maven.apache.org], which is best known in conjunction with Java (although it is more of a project management tool than software authoring tool).

In the field of continuous delivery and continuous integration Dist::Zilla contributes to lightening the strain of programmers from remembering often repeated actions, codifying workflow and rising the quality of releases, especially when releasing often. The plugins are reusable pieces of code easily shared among developers. This in turn reduces the time and effort of rebuilding the build system when updating old projects or creating new ones.

Cloud Architectural Challenges

2013-12-27T12:12:00.000+01:00

Building applications for cloud.

How is creating software for cloud different from previous architecture? Not that much really. There is a great deal of hype which makes it seem that cloud software architecture is a huge improvement compared to "traditional" software architecture - whatever that is: mainframe architecture, thin/fat-client architecture, server farm architecture, any other architecture...

The differences between other architectural styles and cloud architecture rise not from the offerings of the cloud but rather from the unique challenges posed by operating in the cloud. The challenges are two, and they are mainly a question of uncertainty...

Challenge: Out of Sync

The cloud is - by its very nature - a distributed environment. A distributed application consists of multiple parts which communicate with each other but don't necessarily run in the same system. When an application has functionality which is used at different times or different frequencies, or can be run parallel to other parts, then it often would make sense to remove it outside the normal execution process and perhaps even set it into a different system to be connected only when required.

This kind of functionality might for example be the billing or archiving functions of a website. After making an order in a web-shop, the website will not wait until the customer's credit card is actually billed; the web-shop returns control to the user immediately and a different subsystem of the application handles the credit card billing. When it is finished, the user will get a confirmation email and user's profile in the web-shop will be updated.

Workers

The example above is very trivial. The subsystem handling the billing is a "worker", a unit which is activated only when the application requires its service. The unit might exist outside the application's system, maybe even on a different server. The important thing is that the application does not wait for it to return anything. It runs alone according to the parameters the application provides to it, and after running it closes itself down automatically without need for any interaction. Therefore, it runs out of sync with the main application.

Being Out of Sync

When programs run inside one server and one operating system, the communication is instantaneous or - in practice - real time. But communication in the cloud is not in real time, sometimes not even stable. The connection might take a long time to establish, or it might break, or simply be slow. The vendor might have an unscheduled maintenance break or the service might have been relocated physically to a different server. In general, subsystems often use IP addressing to connect to each other. The two main architectural choices are RPC (remote procedure call) interface or REST (representational state transfer) interface. The main difference between these is less in the implementation and more in their philosophy.

RPC is an interface for a tightly coupled application where the subsystems are in fact subroutines and the main program waits for the completion of the subroutine before continuing. REST on the other hand is an API which can function both synchronously and asynchronously. REST is best used when subsystems of the application are autonomous services which can - in principle - be offered to any application. REST encourages the design of the API into the form of a (public) service. REST is stateless API so no client context is stored on the server between requests.

Because of the possible or (pessimistically) likely problems in connection between the application and its subsystems in the cloud, it is generally better to go for REST style architectural design. Properly implemented it provides a robust loosely coupled system fit for cloud.

Messaging

Of course, the distributed parts (workers and others) need to communicate with each other. There are many ways to do it but generally the best is an out-of-sync way: a message queue. A message queue is an external application, "messaging middle-ware", to whose care the application gives a message and then "forgets it". Another part of the application polls the message queue at preset intervals and reads the message when it is available. The message queue guarantees that a message will never get lost but it doesn't know how quickly the other subsystem will read it or act upon it. It does not wait for a return message. It will wait, however, for a receipt from so it knows the message was handled. When using a message queue to link subsystems together, an API (REST or other) is not necessary.

Challenge: Unreliability

As mentioned above, cloud is a volatile environment, also in the sense that vendor companies may come and go, new services promoted and old ones canceled. Cloud architecture is also about preparing for the eventuality of migration to new services or platforms. It is the natural additional price to pay when seeking "affordable" cloud services as most companies always do.

Design and prepare for eventual platform or vendor change.

The platforms that cloud vendors and service providers offer include not only real or virtual servers but also "platforms" that are more like services, such as databases, messaging middle-ware, worker platforms, and of course varied special services like log collectors (IT operation oriented) or daily currency rate providers (business oriented). All purchased services, not to mention free services, have a tendency to change APIs or even disappear as time goes by.

Cloud architecture has ways to prepare for this eventuality. Most of these are coding practices that can be forced for instance if the implementation is done using a framework. Connection to external APIs can be isolated, database connection abstracted into an ORM (object relational mapper), message queue connection as well. Unfortunately this also means that the risk for complicating the implementation rises.

Another isolation layer could be a proxy server for REST or RPC calls. A proxy server can provide additional security as it could also keep the remote services' passwords and other connection details hidden from the service users.

Change is always pending

With cloud architecture, preparing for trouble and change is always paramount because in the cloud an application can have very little control over its environment. The cloud creates a new kind of approach into dealing with vendors: pay-as-you-go. If the costs of vendor service are billed accurately according to the actual usage of resources (memory, CPU cycles, bandwidth, tech support requests, ...), this will prompt the design of applications better optimized for cloud environment.

IronMQ - Message Queue in the cloud

2013-11-04T02:01:00.001+01:00

Why didn't anybody think of it before!

The advent of cloud services are breaking apart the server-centered thinking: with the cloud - or in the cloud - all Internet services are close to each other. The trunk line connections even between separate clouds provide fast enough access speed to actually start "picking" the services. Paas (Platform as a service) will give way to Saas (Software as a service), or maybe even "Service as a service". Selecting any Internet service will be possible if services are compatible enough and "close" enough.

Cloud makes the services close enough but it's not enough by itself. Distributed computing and application integration requires a reliable way for the applications to talk to each other, preferably without syncronization because in a real word (cloud/Internet) services and applications don't necessarily go at the same speed. One of the best ways to balance the sending and receiving is to use a message queue.

Until now message queues have been limited inside one server, with a few exceptions, such as IBM's Websphere MQ. And even then, the interface to the message queues has been via linkable system libraries, which binds them to platforms or even specific programming languages. And of course the message queue must have an available node, port or other connection point accessible from within the server.

Iron.io has changed that! If cloud makes services available to all applications, then there should be a message queue inside cloud - but outside servers. IronMQ is that messages queue; and its API is in line with most cloud services because it is a REST compatible API.

IronMQ is "Message queue as a service", the first of its kind. Customer may pay on a per-message basis which goes perfectly with the idea of Saas. For a hobbyist it's a heaven since the payments only start running after the first 10 million requests (REST calls).

Iron.io uses OAuth for user authentication, and access protocol is of course HTTPS. REST interface for a message queue is not the big innovation here; there is other message queues which also provide a REST interface to supplement their normal socket interface, or linkable library interface. What is an innovation is how well IronMQ is interacting in cloud/Internet environment: from a passive party (what a message queue by nature is) it turns into an active party via its "push queues". Push queue is a queue which "knows" who is going to read the messages. It simply means that the message is relayed to another HTTP (or HTTPS) endpoint. The subscriber does not need to keep polling the queue for new messages; it simply sets up an HTTP(S) server/reader and waits for the messages. Besides remote HTTP endpoints, messages can also be pushed to different queues or IronWorker, Iron.io's worker system.

IronMQ pushes the concept of push queues even further: it accepts messages pushed to it by the REST compatible method of Webhooks, user-defined HTTP callbacks. They are usually triggered by some event, such as pushing code to a repository or a comment being posted to a blog. When that event occurs the source site makes an HTTP request to the URI configured for the webhook.

Iron.io has three cloud services: IronMQ, IronWorker and IronCache (a key-value storage). All of them have a REST interface and excellent cloud-usability. Cloud applications are often parts of integration systems. But the integration itself has been difficult because most integration tools, such as message queues, are running "inside" servers and are good at providing "internal" services. Iron.io's services are "between" servers and they are accessible by the most widely used REST protocol, HTTP.

Exercises in Restful Integration and Continuous Delivery

2013-10-26T23:38:00.001+02:00

Having participated in several application integration projects, and seen both
great success and horrible blunders, this blog is a web diary and collection
of notes on things that I've seen work or fail; either tested by myself or simply witnessed working or failing.

RestChess is an attempt to integrate cloud/Internet services some of which are of old technology and some very new. My modest attempt is to "do things right", both in actual application integration and in delivery. Continuous Delivery may be a buzzword but also a good goal.