Tutorial

This tutorial describes how to use the n6sdk library to implement an n6-like REST API that provides access to your own network incident data source.

Setting up the development environment

Prerequisites

You need to have:

  • A Linux system + the bash shell used to interact with it + basic Unix-like OS tools such as mkdir, cat etc. (other platforms and tools could also be used – but this tutorial assumes using the aforementioned ones) + your favorite text editor installed;
  • the Python 2.7 language interpreter installed (on Debian GNU/Linux it can be installed with the command: sudo apt-get install python2.7);
  • The git version control system installed (on Debian GNU/Linux it can be installed with the command: sudo apt-get install git);
  • the virtualenv tool installed (see: http://virtualenv.readthedocs.org/en/latest/installation.html; on Debian GNU/Linux it can be installed with the command: sudo apt-get install python-virtualenv);
  • Internet access.

Obtaining the n6sdk source code

We will start with creating the “workbench” directory for all our activities:

$ mkdir <the workbench directory>

(Of course, <the workbench directory> needs to be replaced with the actual name (absolute path) of the directory you want to create.)

Then, we need to clone the n6sdk source code repository:

$ cd <the workbench directory>
$ git clone https://github.com/CERT-Polska/n6sdk.git

Now, in the <the workbench directory>/n6sdk/ subdirectory we have the source code of the n6sdk library.

Installing and setting up the necessary stuff

Next, we will create and activate our Python virtual environment:

$ virtualenv dev-venv
$ source dev-venv/bin/activate

Then, we can install the n6sdk library:

$ cd n6sdk
$ python setup.py install

Then, we need to create our project:

$ cd ..
$ pcreate -s n6sdk Using_N6SDK

– where Using_N6SDK is the name of our new n6sdk-based project. Obviously, when creating your real project you will want to pick another name. Anyway, for the rest of this tutorial we will use Using_N6SDK as the project name (and, consequently, using_n6sdk as the “technical” package name, automatically derived from the given project name).

Now, we have the skeleton of our new project. You may want to customize some details in the newly created files, especially the version and description fields in Using_N6SDK/setup.py.

Then, we need to install our new project for development:

$ cd Using_N6SDK
$ python setup.py develop
$ cd ..

We can check whether everything up to now went well by running the Python interpreter...

$ python

...and trying to import some of the installed components:

>>> import n6sdk
>>> import n6sdk.data_spec.fields
>>> n6sdk.data_spec.fields.Field
<class 'n6sdk.data_spec.fields.Field'>
>>> import using_n6sdk
>>> exit()

Overview of data processing and architecture

When a client sends a HTTP request to the n6 REST API, the following data processing is performed on the server side:

  1. Receiving the HTTP request

    n6sdk uses the Pyramid library (see: http://docs.pylonsproject.org/projects/pyramid/en/1.5-branch/) to perform processing related to HTTP communication, request data (for example, extracting query parameters from the URL’s query string) and routing (deciding what function shall be invoked with what arguments depending on the given URL) – however there are the n6sdk-specific wrappers and helpers used to adjust some important factors: n6sdk.pyramid_commons.DefaultStreamViewBase, n6sdk.pyramid_commons.HttpResource and n6sdk.pyramid_commons.ConfigHelper (see below: Gluing it together). These three classes can be customized by subclassing them and extending selected methods, however it is beyond the scope of this tutorial.

  2. Authentication

    Authentication is performed using a mechanism provided by the Pyramid library: authentication policies. The simplest policy is implemented as the n6sdk.pyramid_commons.AnonymousAuthenticationPolicy class (it is a dummy policy: all clients are identified as "anonymous"); it can be replaced with a custom one (see below: Custom authentication policy).

    The result is an object containing authentication data.

  3. Cleaning query parameters provided by the client

    Here “cleaning” means: validation and adjustment (normalization) of the parameters (already extracted from the request’s URL).

    An instance of a data specification class (see below: Data specification class) is responsible for doing that.

    The result is a dictionary containing the cleaned query parameters.

  4. Retrieving result data from the data backend API

    The data backend API, responsible for interacting with the actual data storage, needs to be implemented as a class (see below: Implementing the data backend API).

    For a client request (see above: 1. Receiving the HTTP request), an appropriate method of the sole instance of this class is called with the authentication data (see above: 2. Authentication) and the cleaned client query parameters dictionary (see above: 3. Cleaning query parameters...) as call arguments.

    The result of the call is an iterator which yields dictionaries, each containing the data of one network incident.

  5. Cleaning the result data

    Each of the yielded dictionaries is cleaned. Here “cleaning” means: validation and adjustment (normalization) of the result data.

    An instance of a data specification class (see below: Data specification class) is responsible for doing that.

    The result is another iterator (which yields dictionaries, each containing cleaned data of one network incident).

  6. Rendering the HTTP response

    The yielded cleaned dictionaries are processed to produce consecutive fragments of the HTTP response which are successively sent to the client. The key component responsible for transforming the dictionaries into the response body is a renderer. Note that n6sdk renderers (being a custom n6sdk concept, different from Pyramid renderers) are able to process data in an iterator (“stream-like”) manner, so even if the resultant response body is huge it does not have to fit as a whole in the server’s memory.

    The n6sdk library provides two standard renderers: json (to render JSON-formatted responses) and sjson (to render responses in a format similar to JSON but more convenient for “stream-like” or “pipeline” data processing).

    Implementing and registering custom renderers is possible, however it is beyond the scope of this tutorial.

Data specification class

Basics

A data specification determines:

  • how query parameters (already extracted from the query string part of the URL of a client HTTP request) are cleaned (before being passed in to the data backend API) – that is:
    • what are the legal parameter names;
    • whether particular parameters are required or optional;
    • what are valid values of particular parameters (e.g.: a time.min value must be a valid ISO-8601-formatted date and time);
    • whether, for a particular parameter, there can be many alternative values or only one value (e.g.: time.min can have only one value, and ip can have multiple values);
    • how particular parameter values are normalized (e.g.: a time.min value is always transformed to a Python datetime.datetime object, converting any time zone information to UTC);
  • how result dictionaries (each containing data of one incident) yielded by the data backend API are cleaned (before being passed in to a response renderer) – that is:
    • what are the legal result keys;
    • whether particular items are required or optional;
    • what are valid types and values of particular items (e.g.: a time value must be either a datetime.datetime object or a string being a valid ISO-8601-formatted date and time);
    • how particular items are normalized (e.g.: a time value is always transformed to a Python datetime.datetime object, converting any time zone information to UTC).

The declarative way of defining a data specification is somewhat similar to domain-specific languages known from ORMs (such as the SQLAlchemy‘s or Django‘s ones): a data specification class (n6sdk.data_spec.DataSpec or some subclass of it) looks like an ORM “model” class and particular query parameter and result item specifications (being instances of n6sdk.data_spec.fields.Field or of subclasses of it) are declared similarly to ORM “fields” or “columns”.

For example, consider the following simple data specification class:

class MyDataSpecFromScratch(n6sdk.data_spec.BaseDataSpec):

    id = UnicodeLimitedField(
        in_params='optional',
        in_result='required',
        max_length=64,
    )

    time = DateTimeField(
        in_params=None,
        in_result='required',

        extra_params=dict(
            min=DateTimeField(           # `time.min`
                in_params='optional',
                single_param=True,
            ),
            max=DateTimeField(           # `time.max`
                in_params='optional',
                single_param=True,
            ),
            until=DateTimeField(         # `time.until`
                in_params='optional',
                single_param=True,
            ),
        ),
    )

    address = AddressField(
        in_params=None,
        in_result='optional',
    )

    ip = IPv4Field(
        in_params='optional',
        in_result=None,

        extra_params=dict(
            net=IPv4NetField(            # `ip.net`
                in_params='optional',
            ),
        ),
    )

    asn = ASNField(
        in_params='optional',
        in_result=None,
    )

    cc = CCField(
        in_params='optional',
        in_result=None,
    )

    count = IntegerField(
        in_params=None,
        in_result='optional',
        min_value=0,
        max_value=(2 ** 15 - 1),
    )

Note

In a real project you should inherit from DataSpec rather than from BaseDataSpec. See the following sections, especially Your first data specification class.

What do we see in the above listing is that:

  1. id is a text field: its values are strings, not longer than 64 characters (as its declaration is an instance of n6sdk.data_spec.fields.UnicodeLimitedField created with the constructor argument max_length set to 64). It is optional as a query parameter and required (obligatory) as an item of a result dictionary.
  2. time is a date-and-time field (as its declaration is an instance of n6sdk.data_spec.fields.DateTimeField). It is not a legal query parameter, and it is required as an item of a result dictionary.
  3. time.min, time.max and time.until are date-and-time fields (as their declarations are instances of n6sdk.data_spec.fields.DateTimeField). They are optional as query parameters, and they are not legal items of a result dictionary. Unlike most of other fields, these three fields do not allow to specify multiple query parameter values (note the constructor argument single_param set to True).
  4. address is a field whose values are lists of dictionaries containing ip and optionally asn and cc (as the declaration of address is an instance of n6sdk.data_spec.fields.AddressField). It is not a legal query parameter, and it is optional as an item of a result dictionary.
  5. ip is an IPv4 address field (as its declaration is an instance of n6sdk.data_spec.fields.IPv4Field). It is optional as a query parameter and it is not a legal item of a result dictionary (note that in a result dictionary the address field contains the corresponding data).
  6. ip.net is an IPv4 network definition (as its declaration is an instance of n6sdk.data_spec.fields.IPv4NetField). It is optional as a query parameter and it is not a legal item of a result dictionary.
  7. asn is an autonomous system number (ASN) field (as its declaration is an instance of n6sdk.data_spec.fields.ASNField). It is optional as a query parameter and it is not a legal item of a result dictionary (note that in a result dictionary the address field contains the corresponding data).
  8. cc is 2-letter country code field (as its declaration is an instance of n6sdk.data_spec.fields.CCField). It is optional as a query parameter and it is not a legal item of a result dictionary (note that in a result dictionary the address field contains the corresponding data).
  9. count is an integer field: its values are integer numbers, not less than 0 and not greater than 32767 (as the declaration of count is an instance of n6sdk.data_spec.fields.IntegerField created with the constructor arguments: min_value set to 0 and max_value set to 32767). It is not a legal query parameter, and it is optional as an item of a result dictionary.

To create your data specification class you will, most probably, want to inherit from n6sdk.data_spec.DataSpec. In its subclass you can:

  • add new field specifications as well as modify (extend), replace or remove (mask) field specifications defined in DataSpec;
  • extend the DataSpec‘s cleaning methods.

(See comments in Using_N6SDK/using_n6sdk/data_spec.py as well as the following sections of this tutorial.)

You may also want to subclass n6sdk.data_spec.fields.Field (or any of its subclasses, such as UnicodeLimitedField, IPv4Field or IntegerField) to create new kinds of fields whose instances can be used as field specifications in your data specification class (see some portions of the following sections of this tutorial...).

Your first data specification class

Let us open the <the workbench directory>/Using_N6SDK/using_n6sdk/data_spec.py file with our favorite text editor and uncomment the following lines in it (within the body of the UsingN6sdkDataSpec class):

id = Ext(in_params='optional')

source = Ext(in_params='optional')

restriction = Ext(in_params='optional')

confidence = Ext(in_params='optional')

category = Ext(in_params='optional')

time = Ext(
    extra_params=Ext(
        min=Ext(in_params='optional'),    # search for >= than...
        max=Ext(in_params='optional'),    # search for <= than...
        until=Ext(in_params='optional'),  # search for <  than...
    ),
)

ip = Ext(
    in_params='optional',
)

url = Ext(
    in_params='optional',
)

Our UsingN6sdkDataSpec data specification class is a subclass of n6sdk.data_spec.DataSpec which, by default, has all query parameters disabled – so here we enabled some of them by uncommenting these lines. (We can remove the rest of commented lines.)

Note

You should always ensure that you do not enable in your data specification class any query parameters that are not supported by your data backend API (see: Implementing the data backend API).

Apart from changing (extending) inherited field specifications, we can also add some new fields. For example, let us add, near the beginning of our data specification class definition, a new field specification: mac_address.

from n6sdk.data_spec import DataSpec, Ext
from n6sdk.data_spec.fields import UnicodeRegexField  # remember to add this line


class UsingN6sdkDataSpec(DataSpec):

    """
    The data specification class for the `Using_N6SDK` project.
    """

    mac_address = UnicodeRegexField(
        in_params='optional',  # *can* be in query params
        in_result='optional',  # *can* be in result data

        regex=r'^(?:[0-9A-F]{2}(?:[:-]|$)){6}$',
        error_msg_template=u'"{}" is not a valid MAC address',
    )

(Of course, we do not remove the lines uncommented earlier.)

If we need to get rid of some fields inherited from DataSpec – then we can just set them to None:

class UsingN6sdkDataSpec(DataSpec):

    """
    The data specification class for the `Using_N6SDK` project.
    """

    action = None
    x509fp_sha1 = None

(Of course, we do not remove the lines uncommented and added earlier.)

See also

Please read the apropriate subsection of the next section to learn more about adding, modifying, replacing and getting rid of particular fields.

More on data specification

Note

This section of the tutorial does not need to be read from the beginning to the end. It is intended to be used as a guide to data specification and field specification classes, so please just check out the matter you are interested in.

Data specification’s cleaning methods

The most important methods of any data specification (typically, an instance of n6sdk.data_spec.DataSpec or of its subclass) are:

Normally, these methods are called automatically by the n6sdk machinery.

Each of these methods takes exactly one positional argument which is respectively:

  • for clean_param_dict() – a dictionary of query parameters (representing one client request); the dictionary maps field names (query parameter names) to lists of their raw values (lists – because, as it was said, for most fields there can be more than one query parameter value);
  • for clean_result_dict() – a single result dictionary (representing one network incident); the dictionary maps field names (result keys) to their raw values.

(Here “raw” is a synonym of “uncleaned”.)

Each of these methods also accepts the following optional keyword-only arguments:

  • ignored_keys – an iterable (e.g., a set or a list) of keys that will be completely ignored (i.e., the processed dictionary that has been given as the positional argument will be treated as it did not contain any of these keys; therefore, the resultant dictionary will not contain them either);
  • forbidden_keys – an iterable of keys that must not apperar in the processed dictionary;
  • extra_required_keys – an iterable of keys that must appear in the processed dictionary;
  • discarded_keys – an iterable of keys that will be removed (discarded) after validation of the processed dictionary keys (but before cleaning the values).

If a raw value is not valid and cannot be cleaned (see below: Field specification’s cleaning methods) or any other data specification constraint is violated (including those specified with the forbidden_keys and extra_required_keys arguments mentioned above) an exception – respectively: ParamKeyCleaningError or ParamValueCleaningError, or ResultKeyCleaningError, or ResultValueCleaningError – is raised.

Otherwise, a new dictionary is returned (the input dictionary given as the positional argument is not modified). Regarding returned dictionaries:

  • a dictionary returned by clean_param_dict() maps field names (query parameter names) to lists of cleaned query parameter values;
  • a dictionary returned by clean_result_dict() (containing cleaned data of exactly one network incident) maps field names (result keys) to cleaned result values.

Field specification’s cleaning methods

The most important methods of any field (an instance of n6sdk.data_spec.fields.Field or of its subclass) are:

Each of these methods takes exactly one positional argument: a single uncleaned (raw) value.

Each of these methods returns a single value: a cleaned one.

These methods are called by the data specification machinery in the following way:

Overview of the basic data specification classes

The n6sdk.data_spec.DataSpec and n6sdk.data_spec.AllSearchableDataSpec classes are two variants of a base class for your own data specification class.

Each of them defines all standard n6-like REST API fields – but:

  • DataSpec – has all query parameters disabled. This makes the class suitable for most n6sdk uses: in your subclass of DataSpec you will need to enable (typically, with a <field name> = Ext(in_params='optional') declaration) only those query parameters that your data backend supports.
  • AllSearchableDataSpec – has all query parameters enabled. This makes the class suitable for cases when your data backend supports all or most of standard n6 query parameters. In your subclass of AllSearchableDataSpec you will need to disable (typically, with a <field name> = Ext(in_params=None) declaration) those query parameters that your data backend does not support.

The following list describes briefly all field specifications defined in these two classes.

  • basic event data fields:

    • id:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: required
      • field class: UnicodeLimitedField
      • specific field constructor arguments: max_length=64
      • param/result cleaning example:
        • raw value: "abcDEF... \xc5\x81"
        • cleaned value: u"abcDEF... \u0141"

      Unique incident identifier being an arbitrary text. Maximum length: 64 characters (after cleaning).

    • source:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: required
      • field class: SourceField
      • param/result cleaning example:
        • raw value: "some-org.some-type"
        • cleaned value: u"some-org.some-type"

      Incident data source identifier. Consists of two parts separated with a dot (.). Allowed characters (apart from the dot) are: ASCII lower-case letters, digits and hyphen (-). Maximum length: 32 characters (after cleaning).

    • restriction:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: required
      • field class: UnicodeEnumField
      • specific field constructor arguments: enum_values=n6sdk.data_spec.RESTRICTION_ENUMS
      • param/result cleaning example:
        • raw value: "public"
        • cleaned value: u"public"

      Data distribution restriction qualifier. One of: "public", "need-to-know" or "internal".

    • confidence:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: required
      • field class: UnicodeEnumField
      • specific field constructor arguments: enum_values=n6sdk.data_spec.CONFIDENCE_ENUMS
      • param/result cleaning example:
        • raw value: "medium"
        • cleaned value: u"medium"

      Data confidence qualifier. One of: "high", "medium" or "low".

    • category:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: required
      • field class: UnicodeEnumField
      • specific field constructor arguments: enum_values=n6sdk.data_spec.CATEGORY_ENUMS
      • param/result cleaning example:
        • raw value: "bots"
        • cleaned value: u"bots"

      Incident category label (some examples: "bots", "phish", "scanning"...).

    • time

      • in params: N/A
      • in result: required
      • field class: DateTimeField
      • result cleaning examples:
        • example synonymous raw values:
          • "2014-11-05T23:13:00.000000" or
          • "2014-11-06 01:13+02:00" or
          • datetime.datetime(2014, 11, 5, 23, 13, 0) or
          • datetime.datetime(2014, 11, 6, 1, 13, 0, 0, <tzinfo with UTC offset 2h>)
        • cleaned value: datetime.datetime(2014, 11, 5, 23, 13, 0)

      Incident occurrence time (not when-entered-into-the-database). Value cleaning includes conversion to UTC time.

    • time.min:

      • in params: optional in AllSearchableDataSpec, None in DataSpec, marked as single_param in both
      • in result: N/A
      • field class: DateTimeField
      • param cleaning examples:
        • example synonymous raw values:
          • "2014-11-06T01:13+02:00" or
          • u"2014-11-05 23:13:00.000000"
        • cleaned value: datetime.datetime(2014, 11, 5, 23, 13, 0)

      The earliest time the queried incidents occurred at. Value cleaning includes conversion to UTC time.

    • time.max:

      • in params: optional in AllSearchableDataSpec, None in DataSpec, marked as single_param in both
      • in result: N/A
      • field class: DateTimeField
      • param cleaning examples:
        • example synonymous raw values:
          • u"2014-11-06T01:13+02:00" or
          • "2014-11-05 23:13:00.000000"
        • cleaned value: datetime.datetime(2014, 11, 5, 23, 13, 0)

      The latest time the queried incidents occurred at. Value cleaning includes conversion to UTC time.

    • time.until:

      • in params: optional in AllSearchableDataSpec, None in DataSpec, marked as single_param in both
      • in result: N/A
      • field class: DateTimeField
      • param cleaning examples:
        • example synonymous raw values:
          • u"2014-11-06T01:13+02:00" or
          • "2014-11-05 23:13:00.000000"
        • cleaned value: datetime.datetime(2014, 11, 5, 23, 13, 0)

      The time the queried incidents occurred before (i.e., exclusive; a handy replacement for time.max in some cases). Value cleaning includes conversion to UTC time.

  • address-related fields:

    • address

      • in params: N/A
      • in result: optional
      • field class: ExtendedAddressField
      • result cleaning examples:
        • example synonymous raw values:
          • [{"ipv6": "::1"}, {"ip": "123.10.234.169", "asn": 999998}] or
          • [{u"ipv6": "::0001"}, {"ip": "123.10.234.169", u"asn": "999998"}] or
          • [{"ipv6": "0000:0000::0001"}, {u"ip": "123.10.234.169", u"asn": "15.16958"}]
        • cleaned value: [{u"ipv6": u"::1"}, {u"ip": "123.10.234.169", u"asn": 999998}]

      Set of network addresses related to the returned incident (e.g., for malicious web sites: taken from DNS A or AAAA records; for sinkhole/scanning: communication source addresses) – in the form of a list of dictionaries, each containing:

      • obligatorily:

        • either "ip" (IPv4 address in quad-dotted decimal notation, cleaned using a subfield being an instance of IPv4Field)
        • or "ipv6" (IPv6 address in the standard text representation, cleaned using a subfield being an instance of IPv6Field)

        – but not both "ip" and "ipv6";

      • plus optionally – all or some of:

        • "asn" (autonomous system number in the form of a number or two numbers separated with a dot, cleaned using a subfield being an instance of ASNField),
        • "cc" (two-letter country code, cleaned using a subfield being an instance of CCField),
        • "dir" (the indicator of the address role in terms of the direction of the network flow in layers 3 or 4; one of: "src", "dst"; cleaned using a subfield being an instance of DirField),
        • "rdns" (the domain name from the PTR record of the .in-addr-arpa domain associated with the IP address, without the trailing dot; cleaned using a subfield being an instance of DomainNameField).

      Note

      The cleaned IPv6 addresses is in the “condensed” form – in contrast to the “exploded” form used for param cleaning of ipv6 and ipv6.net. .

    • ip:

      IPv4 address (in quad-dotted decimal notation) related to the queried incidents.

    • ip.net:

      IPv4 network (in CIDR notation) containing IP addresses related to the queried incidents.

    • ipv6:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: N/A
      • field class: IPv6Field
      • param cleaning examples:
        • example synonymous raw values:
          • u"abcd::1" or
          • "ABCD::1" or
          • u"ABCD:0000:0000:0000:0000:0000:0000:0001"
          • "abcd:0000:0000:0000:0000:0000:0000:0001" or
        • cleaned value: u"abcd:0000:0000:0000:0000:0000:0000:0001"

      IPv6 address (in the standard text representation) related to the queried incidents.

      Note

      Cleaned values are in the “exploded” form – in contrast to the “condensed” form used for result cleaning of address.

    • ipv6.net:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: N/A
      • field class: IPv6NetField
      • param cleaning examples:
        • example synonymous raw values:
          • "abcd::1/128" or
          • u"ABCD::1/128" or
          • "ABCD:0000:0000:0000:0000:0000:0000:0001/128"
          • u"abcd:0000:0000:0000:0000:0000:0000:0001/128" or
        • cleaned value: (u"abcd:0000:0000:0000:0000:0000:0000:0001", 128)

      IPv6 network (in CIDR notation) containing IPv6 addresses related to the queried incidents.

      Note

      The address part of each cleaned value is in the “exploded” form – in contrast to the “condensed” form used for result cleaning of address.

    • asn:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: N/A
      • field class: ASNField
      • param cleaning examples:
        • example synonymous raw values:
          • u"999998" or
          • u"15.16958"
        • cleaned value: 999998

      Autonomous system number of IP addresses related to the queried incidents; in the form of a number or two numbers separated with a dot (see the examples above).

    • cc:

      Two-letter country code related to IP addresses related to the queried incidents.

  • fields related to black list events:

    • expires:

      • in params: N/A
      • in result: optional
      • field class: DateTimeField
      • result cleaning examples:
        • example synonymous raw values:
          • "2014-11-05T23:13:00.000000" or
          • "2014-11-06 01:13+02:00" or
          • datetime.datetime(2014, 11, 5, 23, 13, 0) or
          • datetime.datetime(2014, 11, 6, 1, 13, 0, 0, <tzinfo with UTC offset 2h>)
        • cleaned value: datetime.datetime(2014, 11, 5, 23, 13, 0)

      Black list item expiry time. Value cleaning includes conversion to UTC time.

    • active.min:

      • in params: optional in AllSearchableDataSpec, None in DataSpec, marked as single_param in both
      • in result: N/A
      • field class: DateTimeField
      • param cleaning examples:
        • example synonymous raw values:
          • "2014-11-05T23:13:00.000000" or
          • "2014-11-06 01:13+02:00"
        • cleaned value: datetime.datetime(2014, 11, 5, 23, 13, 0)

      The earliest expiry-or-occurrence time of the queried black list items. Value cleaning includes conversion to UTC time.

    • active.max:

      • in params: optional in AllSearchableDataSpec, None in DataSpec, marked as single_param in both
      • in result: N/A
      • field class: DateTimeField
      • param cleaning examples:
        • example synonymous raw values:
          • u"2014-11-05T23:13:00.000000" or
          • u"2014-11-06 01:13+02:00"
        • cleaned value: datetime.datetime(2014, 11, 5, 23, 13, 0)

      The latest expiry-or-occurrence time of the queried black list items. Value cleaning includes conversion to UTC time.

    • active.until:

      • in params: optional in AllSearchableDataSpec, None in DataSpec, marked as single_param in both
      • in result: N/A
      • field class: DateTimeField
      • param cleaning examples:
        • example synonymous raw values:
          • u"2014-11-06T01:13+02:00" or
          • "2014-11-05 23:13:00.000000"
        • cleaned value: datetime.datetime(2014, 11, 5, 23, 13, 0)

      The time the queried incidents expired or occurred before (i.e., exclusive; a handy replacement for active.max in some cases). Value cleaning includes conversion to UTC time.

    • replaces:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: UnicodeLimitedField
      • specific field constructor arguments: max_length=64
      • param/result cleaning example:
        • raw value: "abcDEF"
        • cleaned value: u"abcDEF"

      id of the black list item replaced by the queried/returned one. Maximum length: 64 characters (after cleaning).

    • status:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: UnicodeEnumField
      • specific field constructor arguments: enum_values=n6sdk.data_spec.STATUS_ENUMS
      • param/result cleaning example:
        • raw value: "active"
        • cleaned value: u"active"

      Black list item status qualifier. One of: "active" (item currently in the list), "delisted" (item removed from the list), "expired" (item expired, so treated as removed by the n6 system) or "replaced" (e.g.: IP address changed for the same URL).

  • fields related to aggregated (high frequency) events

    • count:

      • in params: N/A
      • in result: optional
      • field class: IntegerField
      • specific field constructor arguments: min_value=0, max_value=32767
      • result cleaning examples:
        • example synonymous raw values: 42 or 42.0 or "42"
        • cleaned value: 42

      Number of events represented by the returned incident data record. It must be a positive integer number not greater than 32767.

    • until:

      • in params: N/A
      • in result: optional
      • field class: DateTimeField
      • result cleaning examples:
        • example synonymous raw values:
          • "2014-11-05T23:13:00.000000" or
          • "2014-11-06 01:13+02:00" or
          • datetime.datetime(2014, 11, 5, 23, 13, 0) or
          • datetime.datetime(2014, 11, 6, 1, 13, 0, 0, <tzinfo with UTC offset 2h>)
        • cleaned value: datetime.datetime(2014, 11, 5, 23, 13, 0)

      The occurrence time of the latest [newest] aggregated event represented by the returned incident data record (note: time is the occurrence time of the first [oldest] aggregated event). Value cleaning includes conversion to UTC time.

  • the rest of the standard n6 fields:

    • action:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: UnicodeLimitedField
      • specific field constructor arguments: max_length=32
      • param/result cleaning example:
        • raw value: "Some Text"
        • cleaned value: u"Some Text"

      Action taken by malware (e.g. "redirect", "screen grab"...). Maximum length: 32 characters (after cleaning).

    • adip:

      • in params: N/A
      • in result: optional
      • field class: AnonymizedIPv4Field
      • result cleaning example:
        • raw value: "x.X.234.168"
        • cleaned value: u"x.x.234.168"

      Anonymized destination IPv4 address: in quad-dotted decimal notation, with one or more segments replaced with "x", for example: "x.168.0.1" or "x.x.x.1" (note: at least the leftmost segment must be replaced with "x").

    • dip:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: IPv4Field
      • param/result cleaning example:
        • raw value: "123.10.234.168"
        • cleaned value: u"123.10.234.168"

      Destination IPv4 address (for sinkhole, honeypot etc.; does not apply to malicious web sites) in quad-dotted decimal notation.

    • dport:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: PortField
      • param cleaning example:
        • raw value: "80"
        • cleaned value: 80
      • result cleaning examples:
        • example synonymous raw values: 80 or 80.0 or u"80"
        • cleaned value: 80

      TCP/UDP destination port (non-negative integer number, less than 65536).

    • email

      E-mail address associated with the threat (e.g. source of spam, victim of a data leak).

    • fqdn:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: DomainNameField
      • param/result cleaning examples:
        • example synonymous raw values:
          • u"WWW.ŁÓDKA.ORG.EXAMPLE" or
          • "WWW.\xc5\x81\xc3\x93DKA.ORG.EXAMPLE" or
          • u"wwW.łódka.org.Example" or
          • "www.\xc5\x82\xc3\xb3dka.org.Example" or
          • u"www.xn--dka-fna80b.org.example" or
          • "www.xn--dka-fna80b.example.org"
        • cleaned value: u"www.xn--dka-fna80b.example.org"

      Fully qualified domain name related to the queried/returned incidents (e.g., for malicious web sites: from the site’s URL; for sinkhole/scanning: the domain used for communication). Maximum length: 255 characters (after cleaning).

      Note

      During cleaning, the IDNA encoding is applied (see: https://docs.python.org/2.7/library/codecs.html#module-encodings.idna and http://en.wikipedia.org/wiki/Internationalized_domain_name; see also the above examples), then all remaining upper-case letters are converted to lower-case.

    • fqdn.sub:

      Substring of fully qualified domain names related to the queried incidents. Maximum length: 255 characters (after cleaning).

      See also

      See the above fqdn description.

    • iban

      International Bank Account Number associated with fraudulent activity.

    • injects:

      List of dictionaries containing data that describe a set of injects performed by banking trojans when a user loads a targeted website. (Exact structure of the dictionaries is dependent on malware family and not specified at this time.)

    • md5:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: MD5Field
      • param/result cleaning example:
        • raw value: "b555773768bc1a672947d7f41f9c247f"
        • cleaned value: u"b555773768bc1a672947d7f41f9c247f"

      MD5 hash of the binary file related to the (queried/returned) incident. In the form of a string of 32 hexadecimal digits.

    • modified

      • in params: N/A
      • in result: optional
      • field class: DateTimeField
      • result cleaning examples:
        • example synonymous raw values:
          • "2014-11-05T23:13:00.000000" or
          • "2014-11-06 01:13+02:00" or
          • datetime.datetime(2014, 11, 5, 23, 13, 0) or
          • datetime.datetime(2014, 11, 6, 1, 13, 0, 0, <tzinfo with UTC offset 2h>)
        • cleaned value: datetime.datetime(2014, 11, 5, 23, 13, 0)

      The time when the incident data was made available through the API or modified. Value cleaning includes conversion to UTC time.

    • modified.min:

      • in params: optional in AllSearchableDataSpec, None in DataSpec, marked as single_param in both
      • in result: N/A
      • field class: DateTimeField
      • param cleaning examples:
        • example synonymous raw values:
          • "2014-11-06T01:13+02:00" or
          • u"2014-11-05 23:13:00.000000"
        • cleaned value: datetime.datetime(2014, 11, 5, 23, 13, 0)

      The earliest time the queried incidents were made available through the API or modified at. Value cleaning includes conversion to UTC time.

    • modified.max:

      • in params: optional in AllSearchableDataSpec, None in DataSpec, marked as single_param in both
      • in result: N/A
      • field class: DateTimeField
      • param cleaning examples:
        • example synonymous raw values:
          • u"2014-11-06T01:13+02:00" or
          • "2014-11-05 23:13:00.000000"
        • cleaned value: datetime.datetime(2014, 11, 5, 23, 13, 0)

      The latest time the queried incidents were made available through the API or modified at. Value cleaning includes conversion to UTC time.

    • modified.until:

      • in params: optional in AllSearchableDataSpec, None in DataSpec, marked as single_param in both
      • in result: N/A
      • field class: DateTimeField
      • param cleaning examples:
        • example synonymous raw values:
          • u"2014-11-06T01:13+02:00" or
          • "2014-11-05 23:13:00.000000"
        • cleaned value: datetime.datetime(2014, 11, 5, 23, 13, 0)

      The time the queried incidents were made available through the API or modified before (i.e., exclusive; a handy replacement for modified.max in some cases). Value cleaning includes conversion to UTC time.

    • name:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: UnicodeLimitedField
      • specific field constructor arguments: max_length=255
      • param/result cleaning example:
        • raw value: "LoremIpsuM"
        • cleaned value: u"LoremIpsuM"

      Threat’s exact name, such as "virut", "Potential SSH Scan" or any other... Maximum length: 255 characters (after cleaning).

    • origin:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: UnicodeEnumField
      • specific field constructor arguments: enum_values=n6sdk.data_spec.ORIGIN_ENUMS
      • param/result cleaning example:
        • raw value: "honeypot"
        • cleaned value: u"honeypot"

      Incident origin label (some examples: "p2p-crawler", "sinkhole", "honeypot"...).

    • phone

      Telephone number (national or international). Maximum length: 20 characters (after cleaning).

    • proto:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: UnicodeEnumField
      • specific field constructor arguments: enum_values=n6sdk.data_spec.PROTO_ENUMS
      • param/result cleaning example:
        • raw value: "tcp"
        • cleaned value: u"tcp"

      Layer #4 protocol label – one of: "tcp", "udp", "icmp".

    • registrar

      Name of the domain registrar. Maximum length: 100 characters (after cleaning).

    • sha1:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: SHA1Field
      • param/result cleaning example:
        • raw value: u"7362d67c4f32ba5cd9096dcefc81b28ca04465b1"
        • cleaned value: u"7362d67c4f32ba5cd9096dcefc81b28ca04465b1"

      SHA-1 hash of the binary file related to the (queried/returned) incident. In the form of a string of 40 hexadecimal digits.

    • sport:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: PortField
      • param cleaning example:
        • raw value: u"80"
        • cleaned value: 80
      • result cleaning examples:
        • example synonymous raw values: 80 or 80.0 or "80"
        • cleaned value: 80

      TCP/UDP source port (non-negative integer number, less than 65536).

    • target:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: UnicodeLimitedField
      • specific field constructor arguments: max_length=100
      • param/result cleaning example:
        • raw value: "LoremIpsuM"
        • cleaned value: u"LoremIpsuM"

      Name of phishing target (organization, brand etc.). Maximum length: 100 characters (after cleaning).

    • url:

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: URLField
      • param/result cleaning examples:
        • example synonymous raw values:
          • "ftp://example.com/non-utf8-\xdd" or
          • u"ftp://example.com/non-utf8-\udcdd" or
          • "ftp://example.com/non-utf8-\xed\xb3\x9d"
        • cleaned value: u"ftp://example.com/non-utf8-\udcdd"

      URL related to the queried/returned incidents. Maximum length: 2048 characters (after cleaning).

      Note

      Cleaning involves decoding byte strings using the surrogateescape error handler backported from Python 3.x (see: n6sdk.encoding_helpers.provide_surrogateescape()).

    • url.sub:

      Substring of URLs related to the queried incidents. Maximum length: 2048 characters (after cleaning).

      See also

      See the above url description.

    • url_pattern

      Wildcard pattern or regular expression triggering injects used by banking trojans. Maximum length: 255 characters (after cleaning).

    • username

      Local identifier (login) of the affected user. Maximum length: 64 characters (after cleaning).

    • x509fp_sha1

      • in params: optional in AllSearchableDataSpec, None in DataSpec
      • in result: optional
      • field class: SHA1Field
      • param/result cleaning example:
        • raw value: u"7362d67c4f32ba5cd9096dcefc81b28ca04465b1"
        • cleaned value: u"7362d67c4f32ba5cd9096dcefc81b28ca04465b1"

      SHA-1 fingerprint of an SSL certificate. In the form of a string of 40 hexadecimal digits.

Note

Generally, byte strings (if any), when converted to Unicode strings, are – by default – decoded using the utf-8 encoding.

Adding, modifying, replacing and getting rid of particular fields...

As you already now, typically you create your own data specification class by subclassing n6sdk.data_spec.DataSpec or, alternatively, n6sdk.data_spec.AllSearchableDataSpec.

For variety’s sake, this time we will subclass AllSearchableDataSpec (it has all relevant fields marked as legal query parameters).

Let us prepare a temporary module for our experiments:

$ cd <the workbench directory>/Using_N6SDK/using_n6sdk
$ touch experimental_data_spec.py

Then, we can open the newly created file (experimental_data_spec.py) with our favorite text editor and place the following code in it:

from n6sdk.data_spec import AllSearchableDataSpec
from n6sdk.data_spec.fields import UnicodeEnumField

class ExperimentalDataSpec(AllSearchableDataSpec):

    weekday = UnicodeEnumField(
        in_result='optional',
        enum_values=(
            'Monday', 'Tuesday', 'Wednesday', 'Thursday',
            'Friday', 'Saturday', 'Sunday'),
        ),
    )

We just made a new data specification class – very similar to AllSearchableDataSpec but with one additional field specification: weekday.

We could also modify (extend) within our subclass some of the field specifications inherited from AllSearchableDataSpec. For example:

from n6sdk.data_spec import (
    AllSearchableDataSpec,
    Ext,
)

class ExperimentalDataSpec(AllSearchableDataSpec):
    # ...

    id = Ext(
        # here: changing the `max_length` property
        # of the `id` field -- from 64 to 32
        max_length=32,
    )
    time = Ext(
        # here: enabling bare `time` as a query parameter
        # (in AllSearchableDataSpec, by default, the `time.min`,
        # `time.max`, `time.until` query params are enabled but
        # bare `time` is not)
        in_params='optional',

        # here: making `time.min` a required query parameter
        # (*required* -- that is: a client *must* specify it
        # or they will get HTTP-400)
        extra_params=Ext(
            min=Ext(in_params='required'),
        ),
    )

Please note how n6sdk.data_spec.Ext is used above to extend existing (inherited) field specifications (see also: the Your first data specification class section).

It is also possible to replace existing (inherited) field specifications with completely new definitions...

# ...
from n6sdk.data_spec.fields import MD5Field
# ...

class ExperimentalDataSpec(AllSearchableDataSpec):
    # ...
    id = MD5Field(
        in_params='optional',
        in_result='required',
    )
    # ...

...as well as to remove (mask) them:

# ...
class ExperimentalDataSpec(AllSearchableDataSpec):
    # ...
    count = None

You can also extend the clean_param_dict() and/or clean_result_dict() method:

# ...

def _is_april_fools_day():
    now = datetime.datetime.utcnow()
    return now.month == 4 and now.day == 1


class ExperimentalDataSpec(AllSearchableDataSpec):

    def clean_param_dict(self, params, ignored_keys=(), **kwargs):
        if _is_april_fools_day():
            ignored_keys = set(ignored_keys) | {'joke'}
        return super(ExperimentalDataSpec, self).clean_param_dict(
            params,
            ignored_keys=ignored_keys,
            **kwargs)

    def clean_result_dict(self, result, **kwargs):
        if _is_april_fools_day():
            result['time'] = '1810-03-01T13:13'
        return super(ExperimentalDataSpec, self).clean_result_dict(
            result,
            **kwargs)

Note

Manipulating the optional keyword-only arguments (ignored_keys, forbidden_keys, extra_required_keys, discarded_keys – see above: Data specification’s cleaning methods) of these methods can be useful, for example, when you need to implement some authentication-driven data anonymization or param/result-key-focused access rules (however, in such a case you may also need to add some additional keyword-only arguments to the signatures of these methods, e.g. auth_data; then you will also need to extend the get_clean_param_dict_kwargs() and/or get_clean_result_dict_kwargs() methods of your custom subclass of DefaultStreamViewBase; generally that matter is beyond the scope of this tutorial).

Standard field specification classes

The following list briefly describes all field classes defined in the n6sdk.data_spec.fields module:

  • Field:

    The top-level base class for field specifications.

  • DateTimeField:

    For date-and-time (timestamp) values, automatically normalized to UTC.

  • UnicodeField:

    • base classes: Field
    • most useful constructor arguments or subclass attributes:
      • encoding (default: "utf-8")
      • decode_error_handling (default: "strict")
      • disallow_empty (default: False)
    • raw (uncleaned) result value type: str or unicode
    • cleaned value type: unicode
    • example cleaned value: u"Some text value. Zażółć gęślą jaźń."

    For arbitrary text data.

  • HexDigestField:

    • base classes: UnicodeField
    • obligatory constructor arguments or subclass attributes:
      • num_of_characters (exact number of characters)
      • hash_algo_descr (hash algorithm label, such as "MD5" or "SHA256"...)
    • raw (uncleaned) result value type: str or unicode
    • cleaned value type: unicode

    For hexadecimal digests (hashes), such as MD5, SHA256 or any other...

  • MD5Field:

    • base classes: HexDigestField
    • raw (uncleaned) result value type: str or unicode
    • cleaned value type: unicode
    • example cleaned value: u"b555773768bc1a672947d7f41f9c247f"

    For hexadecimal MD5 digests (hashes).

  • SHA1Field:

    • base classes: HexDigestField
    • raw (uncleaned) result value type: str or unicode
    • cleaned value type: unicode
    • example cleaned value: u"7362d67c4f32ba5cd9096dcefc81b28ca04465b1"

    For hexadecimal SHA-1 digests (hashes).

  • UnicodeEnumField:

    • base classes: UnicodeField
    • obligatory constructor arguments or subclass attributes:
      • enum_values (a sequence or set of strings)
    • raw (uncleaned) result value type: str or unicode
    • cleaned value type: unicode
    • example cleaned value: u"Some selected text value"

    For text data limited to a finite set of possible values.

  • UnicodeLimitedField:

    • base classes: UnicodeField
    • obligatory constructor arguments or subclass attributes:
      • max_length (maximum number of characters)
    • raw (uncleaned) result value type: str or unicode
    • cleaned value type: unicode
    • example cleaned value: u"Some not-too-long text value"

    For text data with limited length.

  • UnicodeRegexField:

    • base classes: UnicodeField
    • obligatory constructor arguments or subclass attributes:
      • regex (regular expression – as a string or compiled regular expression object)
    • raw (uncleaned) result value type: str or unicode
    • cleaned value type: unicode
    • example cleaned value: u"Some matching text value"

    For text data limited by the specified regular expression.

  • SourceField:

    For dot-separated source specifications, such as organization.type.

  • IPv4Field:

    For IPv4 addresses (in decimal dotted-quad notation).

  • IPv6Field:

    • base classes: UnicodeField
    • raw (uncleaned) result value type: str or unicode
    • cleaned value type: unicode
    • example cleaned values:
      • cleaned param value: u"abcd:0000:0000:0000:0000:0000:0000:0001 [note the “exploded” form]
      • cleaned result value: u"abcd::1" [note the “condensed” form]

    For IPv6 addresses (in the standard text representation).

  • AnonymizedIPv4Field:

    For anonymized IPv4 addresses (in decimal dotted-quad notation, with the leftmost octet – and possibly any other octets – replaced with "x").

  • IPv4NetField:

    • base classes: UnicodeLimitedField, UnicodeRegexField
    • raw (uncleaned) result value type: str/unicode or 2-tuple: (<str/unicode>, <int>)
    • cleaned value types:
      • of cleaned param values: 2-tuple: (<unicode>, <int>)
      • of cleaned result values: unicode
    • example cleaned values:
      • cleaned param value: (u"123.10.0.0", 16)
      • cleaned result value: u"123.10.0.0/16"

    For IPv4 network specifications (in CIDR notation).

  • IPv6NetField:

    • base classes: UnicodeField
    • raw (uncleaned) result value type: str/unicode or 2-tuple: (<str/unicode>, <int>)
    • cleaned value types:
      • of cleaned param values: 2-tuple: (<unicode>, <int>)
      • of cleaned result values: unicode
    • example cleaned values:
      • cleaned param value: (u"abcd:0000:0000:0000:0000:0000:0000:0001, 128) [note the “exploded” form of the address part]
      • cleaned result value: (u"abcd::1", 128) [note the “condensed” form of the address part]

    For IPv6 network specifications (in CIDR notation).

  • CCField:

    For 2-letter country codes.

  • URLSubstringField:

    • base classes: UnicodeLimitedField
    • most useful constructor arguments or subclass attributes:
      • decode_error_handling (default: 'surrogateescape')
    • raw (uncleaned) result value type: str or unicode
    • cleaned value type: unicode
    • example cleaned value: u"/xyz.example.c"

    For substrings of URLs.

  • URLField:

    • base classes: URLSubstringField
    • most useful constructor arguments or subclass attributes:
      • decode_error_handling (default: 'surrogateescape')
    • raw (uncleaned) result value type: str or unicode
    • cleaned value type: unicode
    • example cleaned value: u"http://xyz.example.com/path?query=foo#bar"

    For URLs.

  • DomainNameSubstringField:

    • base classes: UnicodeLimitedField
    • raw (uncleaned) result value type: str or unicode
    • cleaned value type: unicode
    • example cleaned value: u"xample.or"

    For substrings of domain names, automatically IDNA-encoded and lower-cased.

  • DomainNameField:

    For domain names, automatically IDNA-encoded and lower-cased.

  • EmailSimplifiedField:

    For e-mail addresses (validation is rather rough).

  • IBANSimplifiedField:

    For International Bank Account Numbers.

  • IntegerField:

    • base classes: Field
    • most useful constructor arguments or subclass attributes:
      • min_value (optional minimum value)
      • max_value (optional maximum value)
    • raw (uncleaned) result value type: str/unicode or an integer number of any numeric type
    • cleaned value type: int or (for bigger numbers) long
    • example cleaned value: 42

    For integer numbers (optionally with minimum/maximum limits defined).

  • ASNField:

    • base classes: IntegerField
    • raw (uncleaned) result value type: str/unicode or int/long
    • cleaned value type: int or (possibly, for bigger numbers) long
    • example cleaned value: 123456789

    For autonomous system numbers, such as 12345 or 123456789, or 12345.65432.

  • PortField:

    • base classes: IntegerField
    • raw (uncleaned) result value type: str/unicode or an integer number of any numeric type
    • cleaned value type: int
    • example cleaned value: 12345

    For TCP/UDP port numbers.

  • ResultListFieldMixin:

    • base classes: Field
    • most useful constructor arguments or subclass attributes:
      • allow_empty (default: False which means that an empty sequence causes a cleaning error)

    A mix-in class for fields whose result values are supposed to be a sequence of values and not single values. Its clean_result_value() checks that its argument is a non-string sequence (list or tuple, or any other collections.Sequence not being str or unicode) and performs result cleaning (as defined in a superclass) for each item of it.

    See also

    See the ListOfDictsField description below.

  • DictResultField:

    • base classes: Field
    • most useful constructor arguments or subclass attributes:
      • key_to_subfield_factory (None or a dictionary that maps subfield names to field classes or field factory functions)
    • raw (uncleaned) result value type: collections.Mapping
    • cleaned value type: dict

    A base class for fields whose result values are supposed to be dictionaries (their structure can be constrained by specifying the key_to_subfield_factory property, described above).

    Note

    This is a result-only field class, i.e. its clean_param_value() raises TypeError.

    See also

    See the ListOfDictsField description below.

  • AddressField:

    For lists of dictionaries – each containing "ip" and optionally "cc" and/or "asn".

  • DirField:

    • base classes: UnicodeEnumField
    • raw (uncleaned) result value type: str or unicode
    • cleaned value type: unicode
    • the only possible cleaned values: u"src" or u"dst"

    For dir values in items cleaned by of ExtendedAddressField instances (dir marks role of the address in terms of the direction of the network flow in layers 3 or 4).

Note

Generally –

  • constructor arguments, when specified, must be provided as keyword arguments;
  • “constructor argument or a subclass attribute” means that a certain field property can be specified in two alternative ways: either when creating a field instance (using a keyword argument for the constructor) or when subclassing the field class (using an attribute of the subclass; see below: Custom field specification classes);
  • raw (uncleaned) parameter value type is always str/unicode;
  • all these classes are cooperative-inheritance-friendly (i.e., super() in subclasses’ clean_param_value() and clean_result_value() will work properly, also with multiple inheritance).

Custom field specification classes

You may want to subclass any of the n6sdk field classes (described above, in Standard field specification classes):

Please, consider the beginning of our <the workbench directory>/Using_N6SDK/using_n6sdk/data_spec.py file:

from n6sdk.data_spec import DataSpec, Ext
from n6sdk.data_spec.fields import UnicodeRegexField


class UsingN6sdkDataSpec(DataSpec):

    """
    The data specification class for the `Using_N6SDK` project.
    """

    mac_address = UnicodeRegexField(
        in_params='optional',  # *can* be in query params
        in_result='optional',  # *can* be in result data

        regex=r'^(?:[0-9A-F]{2}(?:[:-]|$)){6}$',
        error_msg_template=u'"{}" is not a valid MAC address',
    )

It can be rewritten in a more self-documenting and code-reusability-friendly way:

from n6sdk.data_spec import DataSpec, Ext
from n6sdk.data_spec.fields import UnicodeRegexField


class MacAddressField(UnicodeRegexField):

    regex = r'^(?:[0-9A-F]{2}(?:[:-]|$)){6}$'
    error_msg_template = u'"{}" is not a valid MAC address'


class UsingN6sdkDataSpec(DataSpec):

    """
    The data specification class for the `Using_N6SDK` project.
    """

    mac_address = MacAddressField(
        in_params='optional',  # *can* be in query params
        in_result='optional',  # *can* be in result data
    )

Another technique – extending the value cleaning methods (see above: Field specification’s cleaning methods) – offers more possibilities. For example, we could create an integer number field that accepts parameter values with such suffixes as "m" (meters), "kg" (kilograms) and "s" (seconds), ignoring the suffixes:

from n6sdk.data_spec.fields import IntegerField

class SuffixedIntegerField(IntegerField):

    # the `legal_suffixes` class attribute we create here
    # can be overridden with a `legal_suffixes` constructor
    # argument or a `legal_suffixes` subclass attribute
    legal_suffixes = 'm', 'kg', 's'

    def clean_param_value(self, value):
        """
        >>> SuffixedIntegerField().clean_param_value('123 kg')
        123
        """
        value = value.strip()
        for suffix in self.legal_suffixes:
            if value.endswith(suffix):
                value = value[:(-len(suffix))]
                break
        value = super(SuffixedIntegerField,
                      self).clean_param_value(value)
        return value

If – in your implementation of clean_param_value() or clean_result_value() – you need to raise a cleaning error (to signal that a value is invalid and cannot be cleaned) just raise any exception being an instance of standard Exception (or of its subclass); it can (but does not have to) be n6sdk.exceptions.FieldValueError.

When subclassing n6sdk field classes, please do not be afraid to look into the source code of the n6sdk.data_spec.fields module.

Implementing the data backend API

The interface

The network incident data can be stored in various ways: using text files, in an SQL database, using some distributed storage such as Hadoop etc. Implementation of obtaining data from any of such backends is beyond the scope of this document. What we do concern here is the API the n6sdk‘s machinery needs to use to get the data.

Therefore, for the purposes of this tutorial, we will assume that our network incident data is stored in the simplest possible way: in one file, in the JSON format. You will have to replace any implementation details related to this particular way of keeping data and querying for data with an implementation appropriate for the data store you use (file reads, SQL queries or whatever is needed for the particular storage backend) – see the next section: Guidelines for the real implementation.

First, we will create the example JSON data file:

$ cat << EOF > /tmp/our-data.json
     [
       {
         "id": "1",
         "address": [
           {
             "ip": "11.22.33.44"
           },
           {
             "asn": 12345,
             "cc": "US",
             "ip": "123.124.125.126"
           }
         ],
         "category": "phish",
         "confidence": "low",
         "mac_address": "00:11:22:33:44:55",
         "restriction": "public",
         "source": "test.first",
         "time": "2015-04-01 10:00:00",
         "url": "http://example.com/?spam=ham"
       },
       {
         "id": "2",
         "adip": "x.2.3.4",
         "category": "server-exploit",
         "confidence": "medium",
         "restriction": "need-to-know",
         "source": "test.first",
         "time": "2015-04-01 23:59:59"
       },
       {
         "id": "3",
         "address": [
           {
             "ip": "11.22.33.44"
           },
           {
             "asn": 87654321,
             "cc": "PL",
             "ip": "111.122.133.144"
           }
         ],
         "category": "server-exploit",
         "confidence": "high",
         "restriction": "public",
         "source": "test.second",
         "time": "2015-04-01 23:59:59",
         "url": "http://example.com/?spam=ham"
       }
     ]
EOF

Then, we need to open the file <the workbench directory>/Using_N6SDK/using_n6sdk/data_backend_api.py with our favorite text editor and modify it so that it will contain the following code (however, it is recommented not to remove the comments and docstrings the file already contains – as they can be valuable hints for future code maintainers):

import json

from n6sdk.class_helpers import singleton
from n6sdk.datetime_helpers import parse_iso_datetime_to_utc
from n6sdk.exceptions import AuthorizationError


@singleton
class DataBackendAPI(object):

    def __init__(self, settings):
        ## [...existing docstring + comments...]
        # Implementation for our example JSON-file-based "storage":
        with open(settings['json_data_file_path']) as f:
            self.data = json.load(f)

    ## [...existing comments...]

    def generate_incidents(self, auth_data, params):
        ## [...existing docstring + comments...]
        # This is a naive implementation for our example
        # JSON-file-based "storage" (some efficient database
        # query needs to be performed instead, in case of any
        # real-world implementation...):
        for incident in self.data:
            for key, value_list in params.items():
                if key == 'ip':
                    address_seq = incident.get('address', [])
                    if not any(addr.get(key) in value_list
                               for addr in address_seq):
                        break   # incident does not match the query params
                elif key in ('time.min', 'time.max', 'time.until'):
                    [param_val] = value_list  # must be exactly one value
                    db_val = parse_iso_datetime_to_utc(incident['time'])
                    if not ((key == 'time.min' and db_val >= param_val) or
                            (key == 'time.max' and db_val <= param_val) or
                            (key == 'time.until' and db_val < param_val)):
                        break   # incident does not match the query params
                elif incident.get(key) not in value_list:
                    break       # incident does not match the query params
            else:
                # (the inner for loop has not been broken)
                yield incident  # incident *matches* the query params

What is important:

  1. The constructor of the class is supposed to be called exactly once per application run. The constructor must take exactly one argument:

    • settings – a dictionary containing settings from the *.ini file (e.g., development.ini or production.ini).
  2. The class can have one or more data query methods, with arbitrary names (in the above example there is only one: generate_incidents(); to learn how URLs are mapped to particular data query method names – see below: Gluing it together).

    Each data query method must take two positional arguments:

    • auth_data – authentication data, relevant only if you need to implement in your data query methods some kind of authorization based on the authentication data; its type and format depends on the authentication policy you use (see below: Custom authentication policy);
    • params – a dictionary containing cleaned (validated and normalized with clean_param_dict()) client query parameters; the dictionary maps parameter names (strings) to lists of parameter values (see above: Data specification class).
  3. Each data query method must be a generator (see: https://docs.python.org/2/glossary.html#term-generator) or any other callable that returns an iterator (see: https://docs.python.org/2/glossary.html#term-iterator). Each of the generated items should be a dictionary containing the data of one network incident (the n6sdk machinery will use it as the argument for the clean_result_dict() data specification method).

Guidelines for the real implementation

Typically, the following activities are performed in the __init__() method of the data backend API class:

  1. Get the storage backend settings from the settings dictionary (apropriate items should have been placed in the [app:main] section of the *.ini file – see below: Gluing it together).
  2. Configure the storage backend (for example, create the database connection).

Typically, the following activities are performed in a data query method of the data backend API class:

  1. If needed: do any authorization checks based on the auth_data and params arguments; raise n6sdk.exceptions.AuthorizationError on failure.

  2. Translate the contents of the params argument to some storage-specific queries. (Obviously, when doing the translation you may need, for example, to map params keys to some storage-specific keys...).

    Note

    If the data specification includes dotted “extra params” (such as time.min, time.max, time.until, fqdn.sub, ip.net etc.) their semantics should be implemented carefully.

  3. If needed: perform a necessary storage-specific maintenance activity (e.g., re-new the database connection).

  4. Perform a storage-specific query (or queries).

    Sometimes you may want to limit the number of allowed results – then, raise n6sdk.exceptions.TooMuchDataError if the limit is exceeded.

  5. Translate the results of the storage-specific query (queries) to result dictionaries and yield each of these dictionaries (each of them should be a dictionary ready to be passed to the clean_result_dict() method of data specification).

    (Obviously, when doing the translation, you may need, for example, to map some storage-specific keys to the result keys accepted by the clean_result_dict() method of your data specificaton class...)

    If there are no results – just do not yield any items (the caller will obtain an empty iterator).

In case of an internal error, do not be afraid to raise an exception – any instance of Exception (or of its subclass) will be handled automatically by the n6sdk machinery: logged (including the traceback) using the n6sdk.pyramid_commons logger and transformed into pyramid.httpexceptions.HTTPServerError which will break generation of the HTTP response body (note, however, that there will be no HTTP-500 response – because it is not possible to send such an “error response” when some parts of the body of the “data response” have already been sent out).

It is recommended to decorate your data backend API class with the n6sdk.class_helpers.singleton() decorator (it ensures that the class is instantiated only once; any attempt to repeat that causes TypeError).

Custom authentication policy

A description of the concept of Pyramid authentication policies is beyond the scope of this tutorial. Unless you need something more sophisticated than the dummy AnonymousAuthenticationPolicy you can skip to the next chapter of this tutorial.

Otherwise, please read the appropriate portion and example from the documentation of the Pyramid library: http://docs.pylonsproject.org/projects/pyramid/en/1.5-branch/narr/security.html#creating-your-own-authentication-policy (you may also want to search the Pyramid documentation for the term authentication policy) as well as the following paragraphs.

The n6sdk library requires that the authentication policy class has the additional static (decorated with staticmethod()) method get_auth_data() that takes exactly one positional argument: a Pyramid request object. The method is expected to return a value that is not None in case of authentication success, and None otherwise. Apart from this simple rule there are no constraints what exactly the return value should be – the implementer decides about that. The return value will be available as the auth_data attribute of the Pyramid request as well as is passed into data backend API methods as the auth_data argument.

Typically, the authenticated_userid() method implementation makes use of the request’s attribute auth_data (being return value of get_auth_data()), and the get_auth_data() implementation makes some use of the request’s attribute unauthenticated_userid (being return value of the unauthenticated_userid() policy method). It is possible because get_auth_data() is called (by the Pyramid machinery) after the unauthenticated_userid() method and before the authenticated_userid() method.

The n6sdk library provides n6sdk.pyramid_commons.BaseAuthenticationPolicy – an authentication policy base class that makes it easier to implement your own authentication policies. Please consult its source code.

Gluing it together

We can inspect the __init__.py file of our application (<the workbench directory>/Using_N6SDK/using_n6sdk/__init__.py) with our favorite text editor. It contains a lot of useful comments that suggest how to customize the code – however, if we omitted them, the actual Python code would be:

from n6sdk.pyramid_commons import (
    AnonymousAuthenticationPolicy,
    ConfigHelper,
    HttpResource,
)

from .data_spec import UsingN6sdkDataSpec
from .data_backend_api import DataBackendAPI


# (this is how we map URLs to particular data query methods...)
RESOURCES = [
    HttpResource(
        resource_id='/incidents',
        url_pattern='/incidents.{renderer}',
        renderers=('json', 'sjson'),
        data_spec=UsingN6sdkDataSpec(),
        data_backend_api_method='generate_incidents',
    ),
]


def main(global_config, **settings):
    helper = ConfigHelper(
        settings=settings,
        data_backend_api_class=DataBackendAPI,
        authentication_policy=AnonymousAuthenticationPolicy(),
        resources=RESOURCES,
    )
    return helper.make_wsgi_app()

(In the context of descriptions the previous portions of the tutorial contain, this boilerplate code should be rather self-explanatory. If not, please consult the comments in the actual <the workbench directory>/Using_N6SDK/using_n6sdk/__init__.py file.)

Now, yet another important step needs to be completed: customization of the settings in the <the workbench directory>/Using_N6SDK/*.ini files: development.ini and production.ini – to match the environment, database configuration (if any) etc.

Warning

You should not place any sensitive settings (such as real database passwords) in these files – as they are still just configuration templates (which your will want, for example, to add to your version control system) and not real configuration files for production.

In case of our naive JSON-file-based data backend implementation (see above: The interface) we need to add the following line in the [app:main] section of each of the two settings files (development.ini and production.ini):

json_data_file_path = /tmp/our-data.json

Finally, let us run the application (still in the development environment):

$ cd <the workbench directory>
$ source dev-venv/bin/activate   # ensuring the virtualenv is active
$ pserve Using_N6SDK/development.ini

Our application should be being served now. Try visiting the following URLs (with any web browser or, for example, with the wget command-line tool):

  • http://127.0.0.1:6543/incidents.json
  • http://127.0.0.1:6543/incidents.json?ip=11.22.33.44
  • http://127.0.0.1:6543/incidents.json?ip=11.22.33.44&time.min=2015-04-01T23:00:00
  • http://127.0.0.1:6543/incidents.json?category=phish
  • http://127.0.0.1:6543/incidents.json?category=server-exploit
  • http://127.0.0.1:6543/incidents.json?category=server-exploit&ip=11.22.33.44
  • http://127.0.0.1:6543/incidents.json?category=bots&category=server-exploit
  • http://127.0.0.1:6543/incidents.json?category=bots,dos-attacker,phish,server-exploit
  • http://127.0.0.1:6543/incidents.sjson?mac_address=00:11:22:33:44:55
  • http://127.0.0.1:6543/incidents.sjson?source=test.first
  • http://127.0.0.1:6543/incidents.sjson?source=test.second
  • http://127.0.0.1:6543/incidents.sjson?source=some.non-existent
  • http://127.0.0.1:6543/incidents.sjson?source=some.non-existent&source=test.second
  • http://127.0.0.1:6543/incidents.sjson?time.min=2015-04-01T23:00
  • http://127.0.0.1:6543/incidents.sjson?time.max=2015-04-01T23:59:59&confidence=medium,low
  • http://127.0.0.1:6543/incidents.sjson?time.until=2015-04-01T23:59:59

...as well as those causing (expected) errors:

  • http://127.0.0.1:6543/incidents
  • http://127.0.0.1:6543/incidents.jsonnn
  • http://127.0.0.1:6543/incidents.json?some-illegal-key=1&another-one=foo
  • http://127.0.0.1:6543/incidents.json?category=wrong
  • http://127.0.0.1:6543/incidents.json?category=bots,wrong
  • http://127.0.0.1:6543/incidents.json?category=bots&category=wrong
  • http://127.0.0.1:6543/incidents.json?ip=11.22.33.44.55
  • http://127.0.0.1:6543/incidents.sjson?mac_address=00:11:123456:33:44:55
  • http://127.0.0.1:6543/incidents.sjson?time.min=2015-04-01T23:00,2015-04-01T23:30
  • http://127.0.0.1:6543/incidents.sjson?time.min=2015-04-01T23:00&time.min=2015-04-01T23:30
  • http://127.0.0.1:6543/incidents.sjson?time.min=blablabla
  • http://127.0.0.1:6543/incidents.sjson?time.max=blablabla&ip=11.22.33.444
  • http://127.0.0.1:6543/incidents.sjson?time.until=2015-04-01T23:59:59&ip=11.22.33.444

Now, it can be a good idea to try the helper script for automatized basic API testing.

Installation for production (using Apache server)

Warning

The content of this chapter is intended to be a brief and rough explanation how you can glue relevant configuration stuff together. It is not intended to be used as a step-by-step recipe for a secure production configuration. The final configuration (including but not limited to file access permissions) should always be carefully reviewed by an experienced system administrator – BEFORE it is deployed on a publicly available server.

Prerequisites are similar to those concerning the development environment, listed near the beginning of this tutorial. The same applies to the way of obtaining the source code of n6sdk.

See also

See the sections: Prerequisites and Obtaining the n6sdk source code.

The Debian GNU/Linux operating system in the version 7.9 or newer is recommended to follow the guides presented below. Additional prerequisite is that the Apache2 HTTP server is installed and configured together with mod_wsgi (the apache2 and libapache2-mod-wsgi Debian packages).

First, we will create a directory structure and a virtualenv for our server, e.g. under /opt:

$ sudo mkdir /opt/myn6-srv
$ cd /opt/myn6-srv
$ sudo virtualenv prod-venv
$ sudo chown -R $(echo $USER) prod-venv
$ source prod-venv/bin/activate

Then, let us install the necessary packages:

$ cd <the workbench directory>/n6sdk
$ python setup.py install
$ cd <the workbench directory>/Using_N6SDK
$ python setup.py install

(Of course, <the workbench directory>/n6sdk needs to be replaced with the actual name (absolute path) of the directory containing the source code of the n6sdk library; and <the workbench directory>/Using_N6SDK needs to be replaced with the actual name (absolute path) of the directory containing the source code of our n6sdk-based project.)

Now, we will copy the template of the configuration file for production:

$ cd /opt/myn6-srv
$ sudo cp <the workbench directory>/Using_N6SDK/production.ini ./

For security sake, let us restrict access to the production.ini file before we will place any real passwords and other sensitive settings in it:

$ sudo chown root ./production.ini
$ sudo chmod 600 ./production.ini

We need to ensure that the Apache’s user group has read-only access to the file. On Debian GNU/Linux it can be done by executing:

$ sudo chgrp www-data ./production.ini
$ sudo chmod g+r ./production.ini

You may want to customize the settings that the file contains, especially to match your production environment, database configuration etc. Just edit the /opt/myn6-srv/production.ini file.

Then, we will create the WSGI script:

$ cat << EOF > prod-venv/myn6-app.wsgi
from pyramid.paster import get_app, setup_logging
ini_path = '/opt/myn6-srv/production.ini'
setup_logging(ini_path)
application = get_app(ini_path, 'main')
EOF

...and provide the directory for the egg cache:

$ sudo mkdir /opt/myn6-srv/.python-eggs

We need to ensure that the Apache’s user has write access to it. On Debian GNU/Linux it can be done by executing:

$ sudo chown www-data /opt/myn6-srv/.python-eggs

Now, we need to adjust the Apache configuration. On Debian GNU/Linux it can be done by executing:

$ cat << EOF > prod-venv/myn6.apache
<VirtualHost *:80>
  # Only one Python sub-interpreter should be used
  # (multiple ones do not cooperate well with C extensions).
  WSGIApplicationGroup %{GLOBAL}

  # Remove the following line if you use native Apache authorization.
  WSGIPassAuthorization On

  WSGIDaemonProcess myn6_srv \\
    python-path=/opt/myn6-srv/prod-venv/lib/python2.7/site-packages \\
    python-eggs=/opt/myn6-srv/.python-eggs
  WSGIScriptAlias /myn6 /opt/myn6-srv/prod-venv/myn6-app.wsgi

  <Directory /opt/myn6-srv/prod-venv>
    WSGIProcessGroup myn6_srv
    Order allow,deny
    Allow from all
  </Directory>

  # Logging of errors and other events:
  ErrorLog \${APACHE_LOG_DIR}/error.log
  # Possible values for the LogLevel directive include:
  # debug, info, notice, warn, error, crit, alert, emerg.
  LogLevel warn

  # Logging of client requests:
  CustomLog \${APACHE_LOG_DIR}/access.log combined

  # It is recommended to uncomment and adjust the following line.
  #ServerAdmin webmaster@yourserver.example.com
</VirtualHost>
EOF
$ sudo chmod 640 prod-venv/myn6.apache
$ sudo chown root:root prod-venv/myn6.apache
$ sudo mv prod-venv/myn6.apache /etc/apache2/sites-available/myn6
$ cd /etc/apache2/sites-enabled
$ sudo ln -s ../sites-available/myn6 001-myn6

You may want or need to adjust the contents of the newly created file (/etc/apache2/sites-available/myn6) – especially regarding the following directives (see the comments accompanying them in the file):

  • WSGIPassAuthorization,
  • ErrorLog and LogLevel,
  • CustomLog,
  • ServerAdmin.

See also

See:

If we have the default Apache configuration on Debian, we need to disable the default site by removing the symbolic link:

$ rm 000-default

Finally, let us restart the Apache daemon. On Debian GNU/Linux it can be done by executing:

$ sudo service apache2 restart

Our application should be being served now. Try visiting the following URL (with any web browser or, for example, with the wget command-line tool):

http://<your apache server address>/myn6/incidents.json

(Of course, <your apache server address> needs to be replaced with the actual host address of your Apache server, for example 127.0.0.1 or localhost.)