Machine Learning

Enterprise cloud company ServiceNow is eyeing a future where organisations won't need to build out their own machine learning technologies, turning instead to service providers to offer the capability as-a-service.

In Australia for the ServiceNow Now Forum in Sydney on Wednesday, the company's global CMO Dan Rogers touted machine learning-as-a-service as the way forward for his company's customers.

"In Silicon Valley, there's a lot of buzz around machine learning, a lot of hype," he said.

"In fact, if you don't have artificial intelligence, natural language processing on your business card, you really don't exist."

Rogers said that ServiceNow -- and many of the other cloud vendors -- will look to provide machine learning-as-a-service, but it is dependent upon the customer to have a rich and contextual data set to be able to train.

"Machine learning needs a problem to fix, there are many vendors that just have the technology -- we think it's important to define exactly the problem that you're trying to fix and machine learning lends itself very well to ranking, to rating, to categorisation, and prediction."

ServiceNow recently surveyed 500 CIOs across 11 countries, finding Australia is leading the world on the implementation of automation and machine learning initiatives.

57 percent of Australian CIOs surveyed rated their organisation's use of machine learning as mature, based on assets deployed, employee skills, and the level of integration into businesses processes.

The global average, according to the ServiceNow report [PDF], was 38 percent.

Australia was followed by Germany, with 51 percent of those surveyed displaying maturity in the machine learning space, while the United States reported a 42 percent maturity, the United Kingdom 39 percent, and Singapore 32 percent.

"The research reveals that Australian companies are already implementing changes to organisational structures, processes, and training to accommodate digital labour, such as redefining job descriptions to focus on work with intelligent machines," The New Agenda for Transformative Leadership: Reimagine Business for Machine Learning report said.

32 percent of Australian CIOs reported that they are developing machine learning capabilities by building capacity within specialised internal teams, which was also flagged by ServiceNow as the highest globally.

In addition, 65 percent reported they had already made changes to IT structures to accommodate machine learning, with 57 percent saying they had initiated company-wide organisational changes.

Of those that identified barriers to successful integration of machine learning initiatives, 80 percent cited insufficient data quality; 78 percent said outdated processes substantially interfered in the success of the technology; 63 percent found regulatory complexity or uncertainty to be a hindering factor; and 59 percent reported that a lack of funding for technology and skills was limiting their abilities to successfully rollout the technology.

The report also highlighted that a majority of Australian CIOs expect machine learning to deliver significant value to their organisation, with 93 percent believing it will increase the speed and accuracy of decisions; 70 percent think that machine learning will increase competitiveness; 65 percent feel machine learning will deliver "top-line" growth; and 52 percent expect it to deliver increased employee productivity.

"Advances in machine learning are already starting to have a meaningful impact on the workplace, enabling organisations and individuals to reach previously unseen levels of productivity and growth. It's fantastic to see Australian enterprises leading the way," ServiceNow Australia and New Zealand managing director David Oakley added.

Source: ZDNet


Summarized counts:

We clearly stated that the Count functions simply tally the number of documents per unit of time. But what if the data that you are using actually has a field value that contains a summarized count already? For example, in the following data, the events_per_min field represents a summarized number of occurrences of something (online purchases in this case) that occurred at the last minute:


    "metrictype": "kpi",

    "@timestamp": "2016-02-12T23:11:09.000Z",

    "events_per_min": 22,

    "@version": "1",

    "type": "it_ops_kpi",

    "metricname": "online_purchases",

    "metricvalue": "22",

    "kpi_indicator": "online_purchases"


To get the ML job to recognize that the events_per_min field is the thing that needs to be tallied (and not the documents themselves), we need to set a summary_count_field_name directive (which is only settable in the UI in Advanced jobs):



After specifying events_per_min as summary_count_field_name, the appropriate detector configuration in this case simply employs the low_count function:


The results of running the job give exactly what we expect—a detection of some cases when my customer online purchases were lower than they should have been, including times when the orders dropped completely to zero, as well as a partial loss of orders on one midday:


Splitting the counts

This can be done with the Count functions. This makes it handy to get many simultaneous event rate analyses at once, accomplished with either the Multi Metric job or the Advanced job UI wizards.

Some common use cases for this are as follows:

  • Finding an increase in error messages in a log by error ID or type
  • Finding a change in log volume by host; perhaps some configuration was changed
  • Determining whether certain products suddenly are selling better or worse than they used to

To accomplish this, the same mechanisms are used. For example, in a Multi Metric job, one can choose a categorical field to split the data while using a Count (event rate) function:


This result in the following, where it was determined that only one of the many entities being modeled was actually unusual (the spike in the volume of requests for the airline AAL):


As you can see, it is extremely easy to see volume-based variations across a wide number of unique instances of a categorical field in the data. We can see at a glance which entities are unusual and which are not.

Other counting functions

In addition to the functions described so far, there are several other counting functions that enable a broader set of use cases.

Non-zero count

The non-zero count functions (non_zero_count, low_non_zero_count, and high_non_zero_count) allow the handling of count-based analysis, but also allow for accurate modeling in cases where the data may be sparse and you would not want the non-existence of data to be explicitly treated as zero, but rather as null. In other words, a dataset in time looks like the following:


Data with the non_zero_count functions will be interpreted as the following:


The act of treating zeros as null can be useful in cases where the non-existence of measurements at regular intervals is expected. Some practical examples of this are as follows:

  • The number of airline tickets purchased per month by an individual
  • The number of times a server reboots in a day
  • The number of login attempts on a system per hour

Distinct count

The distinct count functions (distinct_count, low_distinct_count, and high_distinct_count) measure the uniqueness (cardinality) of values for a particular field. There are many possible uses of this function, particularly when used in the context of population analysis to uncover entities that are logging an overly diverse set of field values. A good classic example is looking for IP addresses that are engaged in port scanning, accessing an unusually large number of distinct destination port numbers on remote machines:


  "function" : "high_distinct_count",

  "field_name" : "dest_port",

  "over_field_name": "src_ip"


Notice that the src_ip field is defined as the over field, thus invoking population analysis and comparing the activity of source IPs against each other. An additional discussion on population analysis follows next.

If you found this article interesting, you can explore Machine Learning with the Elastic Stack to leverage Elastic Stack’s machine learning features to gain valuable insight from your data. Machine Learning with the Elastic Stack is a comprehensive overview of the embedded commercial features of anomaly detection and forecasting.


Exploring Count Functions in Elastic ML

Learn about count functions in this article by Rich Collier, a solutions architect at Elastic. Joining the Elastic team from the Prelert acquisition, Rich has over 20 years' experience as a solutions architect and pre-sales systems engineer for software, hardware, and service-based solutions.

Elastic ML jobs contain detectors for a combination of a function applied to some aspect of the data (for example, a field). The detectors we will be exploring in this article will be those that simply count occurrences of things over time.

The three main functions to get familiar with are as follows:

  • Count: Counts the number of documents in the bucket resulting from a query of the raw data index
  • High Count: The same as Count, but will only flag an anomaly if the count is higher than expected
  • Low Count: The same as Count, but will only flag an anomaly if the count is lower than expected

You will see that there are a variety of one-sided functions in ML (to only detect anomalies in a certain direction). Additionally, it is important to know that this function is not counting a field or even the existence of fields within a document; it is merely counting the documents.

To get a more intuitive feeling for what the Count function does, let's see what a standard (non-ML) Kibana visualization shows us for a particular dataset when that dataset is viewed with a Count aggregation on the Y-Axis and a 10-minute resolution of the Date Histogram aggregation on the X-Axis:


From the preceding screenshot, we can make a few observations:

  • This vertical bar visualization counts the number of documents in the index for each 10-minute bucket of time and displays the resulting view. We can see, for example, that the number of documents at the 11:10 AM mark on February 9 has a spike in documents/events that seems much higher than the typicalrate (the points of time excluding the spike); in this case, the count is 277.
  • To automate the analysis of this data, we plotted it with an ML job. We can use a Single Metric Jobsince there is only one time series (a count of all docs in this index). Configuring the job will look like the following, after the initial steps of the Single Metric Job wizard are completed:


We can see that the Count aggregation function is used (although High Count would also have been appropriate), and the Bucket span is set to the same value we have when we build our Kibana visualization. After running the job, the resulting anomaly is found:


Of course, the anomaly of 277 documents/events is exactly what we had hoped would be found, since this is exactly what we saw when we manually analyzed the data in the vertical bar visualization earlier.

Notice what happens, however, if the same data is analyzed with a 60m bucket span instead of a 10m one:


Note that because the rate spike that occurred was so short, when the event count aggregates over the span of an hour, the spike doesn't look anomalous anymore, and as such ML doesn't even consider it anomalous.

As mentioned earlier, the one-sided functions of Low Count and High Count are especially useful when trying to find deviations only in one direction. Perhaps you only want to find a drop of orders on your e-commerce site (because a spike in orders would be good news!), or perhaps you only want to spot a spike in errors (because a drop in errors is a good thing too!).

Remember, the Count functions count documents, not fields. If you have a field that represents a summarized count of something, then that will need special treatment as described in the next section.

Continue Reading Article: Machine Learning with the Elastic Stack Part-2

Frequent pattern tree growth

We will study the different frequent pattern tree growth from the following rows:

  • Row 1: Every FP-Tree starts with a null node as a root node. Let's draw the first row of the tree order along with their frequency:
  • Row 2: It has got {B,C,D}A is missing, so we cannot merge it with the earlier node. Hence, we will have to create another node, altogether as shown here:
  • Row 3: It has got {A,D}B and C are missing, but we can tie it with the earlier node. A encounters a repetition, so frequency will change. It becomes 2 now:
  • Row 4: It has got {A,B}. We can tie it with the earlier node and will traverse on the previous node. A and B encounters a repetition, so frequency will change for it. It becomes 3 and 2 respectively:
  • Row 5: It has got {A,B,C}. Again, it can be tied with the earlier node and A, B, and C see a repetition, so the frequency will change for them. It becomes 4, 3, and 2 respectively:


Now, let's count the frequency of the final tree that we have got and compare the frequency of each item with the table to ensure that we have got the correct frequencies in the table:

  • A:4
  • B:4
  • C:3
  • D:3

Now, we will go from bottom to top. We will find out the branches where D appears:

We can see that there are three branches where D appears:

  • BC: 1
  • ABC: 1
  • A: 1

These branches are termed as conditional pattern base for D. While we do this, there are points to be kept in mind:

  • Even if we traverse from bottom to top, we write the branches in a top-to-bottom manner
  • D is not part of it
  • 1 represents the frequency of occurrence of D in each branch

Now, the conditional pattern for D results in the conditional frequencies for A, B, and C, which are 2, 2, and 2. All are less than the minimum support (3). Hence, there can't be any conditional FP- Tree for it.

Now, let's do it for C. C appears in the following branches:

The branches end up like this:

  • B:1
  • AB:2

It results in A:2 and B:3. So, B fit with the bill in accordance with the minimum support. Now, the conditional tree ends up like this:

Similarly, conditional pattern finding is done for different combinations. Thus, it sets up the frequent item dataset.

Let's see how it can be done in Python. We will be using a library called pyfpgrowth. Also, we shall create an itemset in the following section.

Importing the library

In order to perform validation, we will import the library and build the transactions as shown here:

import pyfpgrowth

We build our transactions as follows:

transaction = [["bread", "butter", "cereal"],

["butter", "milk"],

["bread", "milk"],

["butter", "cereal", "milk"],

["egg", "bread"],

["egg", "butter"],

["cereal", "milk"],

["bread", "butter", "cereal", "egg"],

["cereal", "bread", "butter"]]

Minimum support is defined now to find the pattern find_frequent_patterns(), where transactions are the list of items bought at each transaction, and 2 is the minimum threshold set for support count:

patterns = pyfpgrowth.find_frequent_patterns(transaction, 2)

Finally, we have to define the confidence to get the rules. Rules are generated based on the patterns and 0.5 is the minimum threshold set for confidence. Then, we store the rules in a dataframe named rulesrules initially consists of an antecedent, a consequent, and the confidence value:

rules = pyfpgrowth.generate_association_rules(patterns, 0.5)


We get the output as follows:

This is how we get the rules. FP-growth tends to have the edge over Apriori as it is faster and more efficient.

If you found this article interesting, you can explore Machine Learning Quick Reference as a hands-on reference guide for developing, training, and optimizing your machine learning models. Machine learning makes it possible to learn about the unknowns and gain hidden insights into your datasets by mastering many tools and techniques. Machine Learning Quick Reference guides you to do just that in a very compact manner.

Finding association rules

In order to find the association rules, we have to first search for all of the rules that have support greater than the threshold support. But the question arises: how do we find these? A possible way to find this is by brute force, which means to list all the possible association rules and calculate the support and confidence for each rule. Later, remove all the rules that fail the confidence and support thresholds.

Given there are n items in the set I, the total number of possible association rules is 3n - 2n+1 + 1.

If X is a frequent itemset with k elements, then there are 2k - 2 association rules.

Let's see how to execute association rules in Python:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd


data = pd.read_csv('association_mining.csv', header = None)


transactions = []

for i in range(0, 7501):

transactions.append([str(data.values[i,j]) for j in range(0, 20)])

If we are asking for an item to appear three times in a day for seven days' time, the support will be 3 x 7/70517051 is the total number of transactions. We will keep the confidence as 20% in the beginning:

from apyori import apriori

rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)


results = list(rules)


We can visualize the output by running the results command from the preceding code:

Frequent pattern growth

Frequent pattern growth (FP-growth) is a frequent itemset generation technique (similar to Apriori). FP-Growth builds a compact-tree structure and uses the tree for frequent itemset mining and generating rules. It is faster than Apriori and can throw results with large datasets.

Let's go through the steps of FP-Growth:

  1. Setting up the transactions: This step sets up the items by frequency. However, the items are set up vertically, not horizontally, that means transforming input from transaction to items:




(B, C, D, A)


(B, C, D)


(D, A)


(A, B)


(A, C, B)

  1. Finding the frequency: Now, we have to find out the frequency of each item individually:











Let's set up the minimum threshold or minimum support as 50%:

  • Min Support = (5*50/100) = 2.5
  • Ceiling of minimum support = 2.5 ~ 3


  1. Prioritize the items by frequency: Since all the items have a frequency greater than or equal to minimum support, all the items will be part of it. Also, based on their frequency, priority or rank will be assigned to the items:
















The order of the items is: A, B, C, and D (by frequency in descending order)

  1. Ordering the items by priority: Now the order of items will be set according to the priority given to various items based on frequency. Currently, the order is A, B, C, and D:



Order by priority


(B, C, D, A)

(A, B, C, D)


(B, C, D)

(B, C, D)


(D, A)

(A, D)


(A, B)

(A, B)


(A, C, B)

(A, B, C)

Continue Reading Article: Machine Learning Quick Reference Part-3

Page 1 of 4

© copyright 2017 All Rights Reserved.

A Product of HunterTech Ventures