Expression Pattern Language (eXPL)

Data Analysis

Data analysis is the process of taking raw data and working with it to produce useful information in a meaningful format. It may involve operations such as transformations, filtering, sorting and grouping. To be successful as a data-centric language, eXPL must provide the tools to perform data analysis. This section provides three examples of eXPL analysing data, the first two simple, and the last more complex. The aim is to give some assurance eXPL is a practical language when it comes to working with data. For example, the last example has data taken from a spreadsheet which had empty cells indicating "no data available". You can see how eXPL handles incomplete data sets.

Numbering, Filtering and Formating

The AsiaTopTen application of tutorial5 selects the first 10 asian mega cites from a list of 35 mega cities, numbers them from 1 to 10 and formats the population numbers according to the locale. So this is what the data looks like going in:

7,"Beijing","China","Asia",21650000

The first value is a rank out of 35, followed by city, country, continent and population. This is the program:

axiom mega_city (Rank,Megacity,Country,Continent,Population)...;
integer count = 0;
template asia_top_ten (
rank = count ? Continent == "Asia" && count++ < 10,
city = Megacity,
country = Country,
population = Population.format);
query<axiom> asia_top_ten (mega_city : asia_top_ten);

These are the first two cities of the solution:

asia_top_ten(rank=1, city=Tokyo, country=Japan, population=37,900,000)
asia_top_ten(rank=2, city=Delhi, country=India, population=26,580,000)...

Here variable "count" is declared outside the asia_top_ten template so it is unaffected by backtracking. The count variable controls what Asian cities are selected and doubles for setting the rank value. It is obvious from the post increment (count++) that selection precedes any evaluation on the left hand side of the ? operator.

Note that population value produced by the built-in format function will vary according to locale, but eXPL provides a way to control what locale is used. This is covered in the section on scopes.

Grouping

Grouping is arranging items int categories. A minimal coding approach is to do a 2-step cascading query, with the first step selecting from a list of categories, and the second step selecting items of the selected category. This approach is used in application GroupedMegaCities of tutorial5 which groups cities by continent. The second template also rearranges the order of items so continent comes first, which helps convey the grouping concept.

Numerical Analysis

The eXPL language allows a flexible approach to dealing with data sets, a domain of the familiar spreadsheet. The following example shows how a sizeable and incomplete table of values can be input into an eXPL script to perform an analysis and produce readable data.

The "Increased Agriculture" application of tutorial5 runs a "more_agriculture" query whic produces a list of countries which have increased the area under agriculture by more than 1% over the twenty years between 1990 and 2010. If you look at the agriculture-land.xpl file, you will see data for 210 countries with percentage values for each year spanning a 50 year period. Some of the values are 'NaN' indicating data not available.

include "agriculture-land.xpl";
include "surface-land.xpl";
template agri_20y (
  double agri_change = Y2010 - Y1990,
  country ? agri_change > 1.0);
template surface_area_increase (
  country? country == agri_20y.country,
  double surface_area =
    (agri_20y.agri_change)/100*surface_area_Km2);
query<axiom> more_agriculture(Data : agri_20y,
  surface_area : surface_area_increase);

The first 3 results are:

country=Albania, surface_area=986.1249999999999
country=Algeria, surface_area=25722.79200000004
country=American Samoa, surface_area=10.0

To find more evaluation examples, have a look at Calculator.