This tutorial describes how to use the n6sdk library to implement an n6-like REST API that provides access to your own network incident data source.
You need to have:
We will start with creating a directory for our example project:
$ mkdir <the main directory of the project>
(Of course, <the main directory of the project> needs to be replaced with the actual name (absolute path) of the directory you want to create.)
Now, we need to populate the directory with necessary files. We can use some of the example files accompanying the n6sdk library:
$ cd <the main N6SDK source directory>/examples/BasicExample
$ cp -a setup.py development.ini production.ini MANIFEST.in \
<the main directory of the project>/
(Of course, <the main n6sdk source directory> needs to be replaced with the actual name (absolute path) of the directory containing the source code of n6sdk.)
The files should be customized. At the absolute minimum, we need to replace the basic_example text with the actual name of our project’s package. In this tutorial it will be using_n6sdk (you can, of course, pick another name):
$ cd <the main directory of the project>
$ sed -i -r -e 's/basic_example/using_n6sdk/g' \
setup.py development.ini production.ini MANIFEST.in
(You may also want to customize other details in these files, especially the version field in the setup.py file.)
We also need to create the actual Python package (as it was said, in this tutorial its name will be using_n6sdk):
$ mkdir using_n6sdk
$ touch using_n6sdk/__init__.py
Now, we will create and activate our Python virtual environment:
$ virtualenv dev-venv
$ source dev-venv/bin/activate
Then, we can install the n6sdk library:
$ cd <the main N6SDK source directory>
$ python setup.py install
...as well as our new Python package – for development (please note that for this package the setup command will be develop, not install):
$ cd <the main directory of the project>
$ python setup.py develop
We can check whether everything up to now went well by running the Python interpreter...
$ python
...and trying importing some of the installed components:
>>> import n6sdk
>>> import n6sdk.data_spec.fields
>>> n6sdk.data_spec.fields.Field
<class 'n6sdk.data_spec.fields.Field'>
>>> import using_n6sdk
>>> exit()
When a client sends a HTTP request to the n6 REST API, the following data processing is performed on the server side:
Receiving the HTTP request
n6sdk uses the Pyramid library (see: http://docs.pylonsproject.org/en/latest/docs/pyramid.html) to perform processing related to HTTP communication and request routing (especially, deciding what function shall be invoked with what parameters depending on the given URL), however there are the n6sdk-specific wrappers and helpers used to configure some important factors: n6sdk.pyramid_commons.DefaultStreamViewBase, n6sdk.pyramid_commons.HttpResource and n6sdk.pyramid_commons.ConfigHelper (see below: Gluing it together). These three classes can be customized by subclassing them and extending selected methods, however it is beyond the scope of this tutorial.
Authentication
Authentication is performed using a mechanism provided by the Pyramid library: authentication policies. The simplest policy is implemented as the n6sdk.pyramid_commons.AnonymousAuthenticationPolicy class (it is a dummy policy: all clients are identified as "anonymous"); it can be replaced with a custom one (see below: Custom authentication policy).
Cleaning query parameters provided by the client
Here “cleaning” means: validation and adjustment (normalization) of the parameters. An instance of a data specification class (see below: Data specification class) is responsible for doing that.
Retrieving result data from the data backend API
The data backend API, responsible for interacting with the actual data storage, needs to be implemented as a class (see below: Implementing the data backend API).
For a client request, an appropriate method of this class is called with authentication data (see above: 2. Optional authentication) and cleaned client query parameters (see above: 3. Cleaning query parameters...) as call arguments. The result of the call is an iterator which yields dictionaries, each containing data of one network incident.
Cleaning the result data
Each of the yielded dictionaries is cleaned. Here “cleaning” means: validation and adjustment (normalization) of the result data. An instance of a data specification class (see below: Data specification class) is responsible for doing that.
The result is another iterator.
Rendering the HTTP response
The yielded cleaned dictionaries are processed to produce consecutive fragments of the HTTP response which are successively sent to the client. The key component responsible for transforming the dictionaries into the response body is a renderer. Note that n6sdk renderers (being a custom n6sdk concept, different from Pyramid renderers) are able to process data in an iterator (“stream-like”) manner, so even if the resultant response body is huge it does not have to fit as a whole in the server’s memory.
The n6sdk library provides two standard renderers: json (to render JSON-formatted responses) and sjson (to render responses in a format similar to JSON but more convenient for “stream-like” or “pipeline” data processing).
Implementing and registering custom renderers is possible, however it is beyond the scope of this tutorial.
A data specification determines:
The declarative way of defining a data specification is somewhat similar to domain-specific languages known from ORMs (such as the SQLAlchemy‘s or Django‘s ones): a data specification class (n6sdk.data_spec.DataSpec or some subclass of it) looks like an ORM “model” class and particular query parameter and result item specifications (being instances of n6sdk.data_spec.fields.Field or of subclasses of it) are declared similarly to ORM “fields” or “columns”.
For example, consider the following simplified and shortened version of the n6sdk.data_spec.DataSpec source code:
class DataSpec(BaseDataSpec):
id = UnicodeLimitedField(
in_params='optional',
in_result='required',
max_length=64,
)
time = DateTimeField(
in_params=None,
in_result='required',
extra_params=dict(
min=DateTimeField( # `time.min`
in_params='optional',
single_param=True,
),
max=DateTimeField( # `time.max`
in_params='optional',
single_param=True,
),
),
)
address = AddressField(
in_params=None,
in_result='optional',
)
ip = IPv4Field(
in_params='optional',
in_result=None,
extra_params=dict(
net=IPv4NetField( # `ip.net`
in_params='optional',
),
),
)
asn = ASNField(
in_params='optional',
in_result=None,
)
cc = CCField(
in_params='optional',
in_result=None,
)
count = IntegerField(
in_params=None,
in_result='optional',
min_value=0,
max_value=(2 ** 15 - 1),
)
### ...other field specifications...
What do we see above:
You may want to create your own custom data specification by subclassing n6sdk.data_spec.DataSpec to create a custom data specification class – in which you can:
(See the following sections.)
You may also want to subclass n6sdk.data_spec.fields.Field (or any of its subclasses, such as UnicodeLimitedField, IPv4Field or IntegerField) to create new kinds of fields whose instances can be used as field specifications in your custom data specification class (see below: Custom field classes).
The most important methods of any data specification (an instance of n6sdk.data_spec.DataSpec or of its subclass) are:
Typically, these methods are called automatically by the n6sdk machinery.
Each of these methods takes exactly one positional argument which is respectively:
Each of these methods also accepts the following optional keyword-only arguments:
Each of these methods returns a new dictionary (in other words, the input dictionary given as the positional argument is not modified). Regarding returned dictionaries:
The most important methods of any field (an instance of n6sdk.data_spec.fields.Field or of its subclass) are:
Each of these methods takes exactly one positional argument: a single uncleaned value.
Each of these methods returns a single value: a cleaned one.
These methods are called by the data specification machinery in the following way:
The data specification’s method clean_param_dict() (described above in the Data specification cleaning methods section), for each actual value extracted from a query parameter’s raw value (a raw value can consist of several comma-separated actual values) taken from the dictionary passed as the argument, calls the clean_param_value() method of the appropriate field.
If the field’s method raises (or propagates) an exception being an instance/subclass of Exception (i.e., practically any exception, excluding KeyboardInterrupt, SystemExit and a few others), the data specification’s method clean_param_dict() catches it (and possibly similar exceptions from other fields) and then raises ParamValueCleaningError.
Note
If the exception raised (or propagated) by the field’s method is FieldValueError (or any other exception derived from _ErrorWithPublicMessageMixin) its public_message will be included in the ParamValueCleaningError‘s public_message).
the data specification’s method clean_result_dict() (described above in the Data specification cleaning methods section) for each value from the dictionary passed as the argument, calls the clean_result_value() method of the appropriate field.
If the field’s method raises (or propagates) an exception being an instance/subclass of Exception (i.e., practically any exception, excluding KeyboardInterrupt, SystemExit and a few others), the data specification’s method clean_result_dict() catches it (and possibly similar exceptions from other fields) and then raises ResultValueCleaningError.
Note
Unlike ParamValueCleaningError raised by clean_param_dict(), the ResultValueCleaningError exception raised by clean_result_dict() in reaction to exception(s) from clean_result_value() does not include in its public_message any information from the underlying exception(s) (instead of that, ResultValueCleaningError‘s public_message is set to the safe default: u"Internal error.").
The rationale for this behaviour is that any exceptions related to result cleaning are strictly internal (contrary to query parameter cleaning).
Thanks to this behaviour, much of the field classes’s code related to parameter value cleaning could be used also for result value cleaning without concern about disclosing some sensitive details in public_message of ResultValueCleaningError.
Warning
For security sake, when extending n6sdk.data_spec.BaseDataSpec.clean_result_dict() ensure that your implementation behaves in the same way as described in this note.
The n6sdk.data_spec.DataSpec class is a ready-to-use data specification class that performs cleaning of all standard n6-like REST API query parameters and result items.
The following list describes briefly all field specifications defined by the class:
id:
Unique incident identifier being an arbitrary text. Maximum length: 64 characters (after cleaning).
source:
Incident data source identifier. Consists of two parts separated with a dot (.). Allowed characters (apart from the dot) are: ASCII lower-case letters, digits and hyphen (-). Maximum length: 32 characters (after cleaning).
restriction:
Data distribution restriction qualifier. One of: "public", "need-to-know" or "internal".
confidence:
Data confidence qualifier. One of: "high", "medium" or "low".
category:
Incident category label (some examples: "bots", "phish", "scanning"...).
time
Incident occurrence time (not when-entered-into-the-database). Value cleaning includes conversion to UTC time.
time.min:
The earliest time the queried incidents occurred at. Value cleaning includes conversion to UTC time.
time.max:
The latest time the queried incidents occurred at. Value cleaning includes conversion to UTC time.
origin:
Incident origin label (some examples: "p2p-crawler", "sinkhole", "honeypot"...).
name:
Threat’s exact name, such as "virut", "Potential SSH Scan" or any other... Maximum length: 255 characters (after cleaning).
target:
Name of phishing target (organization, brand etc.). Maximum length: 100 characters (after cleaning).
address
Set of network addresses related to the returned incident (e.g., for malicious web sites: taken from DNS A records; for sinkhole/scanning: communication source addresses) – in the form of a list of dictionaries, each containing "ip" (IPv4 address in quad-dotted decimal notation, cleaned using a subfield being an instance of IPv4Field), and optionally: "asn" (autonomous system number in the form of a number or two numbers separated with a dot, cleaned using a subfield being an instance of ASNField) and/or "cc" (two-letter country code, cleaned using a subfield being an instance of CCField).
ip:
IPv4 address (in quad-dotted decimal notation) related to the queried incidents.
ip.net:
IPv4 network (in CIDR notation) containing IP addresses related to the queried incidents.
asn:
Autonomous system number of IP addresses related to the queried incidents; in the form of a number or two numbers separated with a dot (see the examples above).
cc:
Two-letter country code related to IP addresses related to the queried incidents.
url:
URL related to the queried/returned incidents. Maximum length: 2048 characters (after cleaning).
Note
Cleaning involves decoding byte strings using the surrogateescape error handler backported from Python 3.x (see: n6sdk.encoding_helpers.provide_surrogateescape()).
url.sub:
Substring of URLs related to the queried incidents. Maximum length: 2048 characters (after cleaning).
See also
The above url description.
fqdn:
Fully qualified domain name related to the queried/returned incidents (e.g., for malicious web sites: from the site’s URL; for sinkhole/scanning: the domain used for communication). Maximum length: 255 characters (after cleaning).
Note
During cleaning, the IDNA encoding is applied (see: https://docs.python.org/2.7/library/codecs.html#module-encodings.idna and http://en.wikipedia.org/wiki/Internationalized_domain_name; see also the above examples), then all remaining upper-case letters are converted to lower-case.
fqdn.sub:
Substring of fully qualified domain names related to the queried incidents. Maximum length: 255 characters (after cleaning).
See also
The above fqdn description.
proto:
Layer #4 protocol label – one of: "tcp", "udp", "icmp".
sport:
TCP/UDP source port (non-negative integer number, less than 65536).
dport:
TCP/UDP destination port (non-negative integer number, less than 65536).
dip:
Destination IPv4 address (for sinkhole, honeypot etc.; does not apply to malicious web sites) in quad-dotted decimal notation.
adip:
Anonymized destination IPv4 address: in quad-dotted decimal notation, with one or more segments replaced with "x", for example: "x.168.0.1" or "x.x.x.1" (note: at least the leftmost segment must be replaced with "x").
md5:
MD5 hash of the binary file related to the (queried/returned) incident. In the form of a string of 32 hexadecimal digits.
sha1:
SHA1 hash of the binary file related to the (queried/returned) incident. In the form of a string of 40 hexadecimal digits.
expires:
Black list item expiry time. Value cleaning includes conversion to UTC time.
active.min:
The earliest expiry-or-occurrence time of the queried black list items. Value cleaning includes conversion to UTC time.
active.max:
The latest expiry-or-occurrence time of the queried black list items. Value cleaning includes conversion to UTC time.
status:
Black list item status qualifier. One of: "active" (item currently in the list), "delisted" (item removed from the list), "expired" (item expired, so treated as removed by the n6 system) or "replaced" (e.g.: IP address changed for the same URL).
replaces:
id of the black list item replaced by the queried/returned one.
until:
For aggregated events: the occurrence time of the latest [newest] aggregated event represented by the returned incident data record (note: time is the occurrence time of the first [oldest] aggregated event). Value cleaning includes conversion to UTC time.
count:
For aggregated events: number of events represented by the returned incident data record. It must be a positive integer number not greater than 32767.
Note
Generally, byte strings (if any), when converted to Unicode strings, are by default decoded using the utf-8 encoding.
You can create your own data specification class by subclassing n6sdk.data_spec.DataSpec.
Let us prepare a separate module for our custom data specification:
$ cd <the main directory of the project>/using_n6sdk
$ touch data_spec.py
Then, we can open the newly created file (data_spec.py) with our favorite text editor and place the following code in it:
from n6sdk.data_spec import DataSpec
from n6sdk.data_spec.fields import UnicodeRegexField
class CustomDataSpec(DataSpec):
mac_address = UnicodeRegexField(
in_params='optional', # *can* be in query params
in_result='optional', # *can* be in result data
regex=r'^(?:[0-9A-F]{2}(?:[:-]|$)){6}$',
error_msg_template=u'"{}" is not a valid MAC address',
)
We just made a new data specification class – very similar to DataSpec but with one additional field specification: mac_address.
We could also modify (extend) within our subclass some of the field specifications inherited from DataSpec. For example:
from n6sdk.data_spec import (
DataSpec,
Ext,
)
class CustomDataSpec(DataSpec):
# ...
id = Ext(
# here: changing the `max_length` property
# of the `id` field -- from 64 to 32
max_length=32,
)
time = Ext(
# here: enabling bare `time` also for queries
# (by default `time.min` and `time.max` query
# params are allowed but bare `time` is not)
in_params='optional',
# here: making `time.max` a required (obligatory,
# not optional) query parameter
extra_params=Ext(
max=Ext(in_params='required'),
),
)
Please note how n6sdk.data_spec.Ext is used above to extend existing (inherited) field specifications.
It is also possible to replace existing (inherited) field specifications with completely new definitions...
# ...
from n6sdk.data_spec.fields import MD5Field
# ...
class CustomDataSpec(DataSpec):
# ...
id = MD5Field(
in_params='optional',
in_result='required',
)
# ...
...as well as to remove (mask) them:
# ...
class CustomDataSpec(DataSpec):
# ...
count = None
You can also extend the clean_param_dict() and clean_result_dict() methods:
# ...
def _is_april_fools_day():
now = datetime.datetime.utcnow()
return now.month == 4 and now.day == 1
class CustomDataSpec(DataSpec):
def clean_param_dict(self, params, ignored_keys=(), **kwargs):
if _is_april_fools_day():
ignored_keys = set(ignored_keys) | {'joke'}
return super(CustomDataSpec, self).clean_param_dict(
params,
ignored_keys=ignored_keys,
**kwargs)
def clean_result_dict(self, result, **kwargs):
if _is_april_fools_day():
result['time'] = '1810-03-01T13:13'
return super(CustomDataSpec, self).clean_result_dict(
result,
**kwargs)
Note
Manipulating the optional keyword-only arguments (ignored_keys, forbidden_keys, extra_required_keys, discarded_keys – see above: Data specification cleaning methods) of these methods can be useful, for example, when you need to implement some authentication-driven data anonymization or param/result-key-focused access rules (however, in such a case you may also need to add some additional keyword-only arguments to the signatures of these methods, e.g. auth_data; then you will also need to extend the get_clean_param_dict_kwargs() and/or get_clean_result_dict_kwargs() methods of your custom subclass of DefaultStreamViewBase; generally that matter is beyond the scope of this tutorial).
The following list briefly describes all field classes defined in the n6sdk.data_spec.fields module:
The top-level base class for field specifications.
For date-and-time (timestamp) values, automatically normalized to UTC.
For arbitrary text data.
For hexadecimal digests (hashes), such as MD5, SHA256 or any other...
For hexadecimal MD5 digests (hashes).
For hexadecimal SHA1 digests (hashes).
For text data limited to a finite set of possible values.
For text data with limited length.
For text data limited by the specified regular expression.
For dot-separated source specifications, such as organization.type.
For IPv4 addresses (in decimal dotted-quad notation).
For anonymized IPv4 addresses (in decimal dotted-quad notation, with the leftmost octet – and possibly any other octets – replaced with "x").
For IPv4 network specifications (in CIDR notation).
For 2-letter country codes.
For substrings of URLs.
For URLs.
For substrings of domain names, automatically IDNA-encoded and lower-cased.
For domain names, automatically IDNA-encoded and lower-cased.
For integer numbers (optionally with minimum/maximum limits defined).
For autonomous system numbers, such as 12345, 123456789 or 12345.65432.
For TCP/UDP port numbers.
A mix-in class for fields whose result values are supposed to be a sequence of values and not single values. Its clean_result_value() checks that its argument is a non-string sequence (list or tuple, or any other collections.Sequence not being str or unicode) and performs result cleaning (as defined in a superclass) for each item of it. See: the AddressField description below.
A base class for fields whose result values are supposed to be dictionaries (whose fixed structure is defined by the key_to_subfield_factory and required_keys properties, described above).
Note
This is a result-only field class, i.e. its clean_param_value() raises NotImplementedError.
For lists of dictionaries containing "ip" and optionally "cc" and/or "asn".
Note
Generally –
See also
The n6sdk.data_spec.DataSpec overview section above.
You may want to subclass any of the n6sdk field classes (described above in the Standard n6sdk field classes section):
Please, consider the example from one of the previous sections:
from n6sdk.data_spec import DataSpec
class CustomDataSpec(DataSpec):
mac_address = UnicodeRegexField(
in_params='optional', # *can* be in query params
in_result='optional', # *can* be in result data
regex=r'^(?:[0-9A-F]{2}(?:[:-]|$)){6}$',
error_msg_template=u'"{}" is not a valid MAC address',
)
It can be rewritten in a more self-documenting and code-reusability-friendly way:
from n6sdk.data_spec import DataSpec
from n6sdk.data_spec.fields import UnicodeRegexField
class MacAddressField(UnicodeRegexField):
regex = r'^(?:[0-9A-F]{2}(?:[:-]|$)){6}$'
error_msg_template = u'"{}" is not a valid MAC address'
class CustomDataSpec(DataSpec):
mac_address = MacAddressField(
in_params='optional', # *can* be in query params
in_result='optional', # *can* be in result data
)
Let us save the above code replacing the contents of the data_spec.py file we created earlier (see: Subclassing n6sdk.data_spec.DataSpec).
Another technique – extending the value cleaning methods (see above: Field cleaning methods) – offers more possibilities. Let us try to create an integer number field that accepts parameter values with such suffixes as "m" (meters), "kg" (kilograms) and "s" (seconds), ignoring the suffixes:
from n6sdk.data_spec.fields import IntegerField
class SuffixedIntegerField(IntegerField):
# the `legal_suffixes` class attribute we create here
# can be overridden with a `legal_suffixes` constructor
# argument or a `legal_suffixes` subclass attribute
legal_suffixes = 'm', 'kg', 's'
def clean_param_value(self, value):
"""
>>> SuffixedIntegerField().clean_param_value('123 kg')
123
"""
value = value.strip()
for suffix in self.legal_suffixes:
if value.endswith(suffix):
value = value[:(-len(suffix))]
break
value = super(SuffixedIntegerField,
self).clean_param_value(value)
return value
If – in your implementation of clean_param_value() or clean_result_value() – you need to raise a cleaning error (to signal that a value is invalid and cannot be cleaned) just raise any exception being an instance/subclass of standard Python Exception; it can (but does not have to) be n6sdk.exceptions.FieldValueError.
When subclassing n6sdk field classes, please do not be afraid to look into the source code of the n6sdk.data_spec.fields module.
The network incident data can be stored in various ways: using text files, in an SQL database, using some distributed storage such as Hadoop etc. Implementation of obtaining data from any of such backends is beyond the scope of this document. What we do concern here is the API the n6sdk‘s machinery needs to use to get the data.
Therefore, for the purposes of this tutorial, we will assume that our network incident data is stored in the simplest possible way: in one file in the JSON format. You will have to replace any implementation details related to this particular way of keeping and querying for data with an implementation appropriate for the data store you use (file reads, SQL queries or whatever is needed for the particular storage backend) – see the next section: Guidelines for the real implementation.
First, we will create the example JSON data file:
$ cat << EOF > /tmp/our-data.json
[
{
"id": "1",
"address": [
{
"ip": "11.22.33.44"
},
{
"asn": 12345,
"cc": "US",
"ip": "123.124.125.126"
}
],
"category": "phish",
"confidence": "low",
"mac_address": "00:11:22:33:44:55",
"restriction": "public",
"source": "test.first",
"time": "2014-04-01 10:00:00",
"url": "http://example.com/?spam=ham"
},
{
"id": "2",
"adip": "x.2.3.4",
"category": "server-exploit",
"confidence": "medium",
"restriction": "need-to-know",
"source": "test.first",
"time": "2014-04-01 23:59:59"
},
{
"id": "3",
"address": [
{
"ip": "11.22.33.44"
},
{
"asn": 87654321,
"cc": "PL",
"ip": "111.122.133.144"
}
],
"category": "server-exploit",
"confidence": "high",
"restriction": "public",
"source": "test.second",
"time": "2014-04-01 23:59:59",
"url": "http://example.com/?spam=ham"
}
]
EOF
Then, we need to create the Python module for our data backend API class:
$ cd <the main directory of the project>/using_n6sdk
$ touch data_backend_api.py
Now we can open the newly created file (data_backend_api.py) with our favorite text editor and place the following code in it:
import json
from n6sdk.class_helpers import singleton
from n6sdk.exceptions import AuthorizationError
@singleton
class DataBackendAPI(object):
def __init__(self, settings):
# STORAGE-SPECIFIC IMPLEMENTATION DETAILS:
# (for our example JSON-file-based storage...)
with open(settings['json_data_file_path']) as f:
self.data = json.load(f)
# one or more data query methods (they can have any names):
def generate_incidents(self, auth_data, params):
# STORAGE-SPECIFIC IMPLEMENTATION DETAILS:
# (this is a naive implementation; in a real one some
# efficient database query needs to be performed here...)
for incident in self.data:
for key, value_list in params.items():
if key in ('ip', 'asn', 'cc'):
address_seq = incident.get('address', [])
if not any(addr.get(key) in value_list
for addr in address_seq):
break # incident does not match the query params
# WARNING: *.min/*.max/*.sub/ip.net queries are
# not supported by this simplified implementation
elif incident.get(key) not in value_list:
break # incident does not match the query params
else:
yield incident # incident matches the query params
What is important:
The constructor of the class is supposed to be called exactly once per application run. The constructor must take exactly one argument:
The class can have one or more data query methods, with arbitrary names (in the above example there is only one: generate_incidents(); to learn how URLs are mapped to particular data query method names – see below: Gluing it together).
Each data query method must take two positional arguments:
Each data query method must be a generator (see: https://docs.python.org/2/glossary.html#term-generator) or any other callable provided that it returns an iterator (see: https://docs.python.org/2/glossary.html#term-iterator). Each of the generated items should be a dictionary containing the data of one network incident (the n6sdk machinery will use it as the argument for the clean_result_dict() data specification method).
Typically, the following activities are performed in the __init__() method of the data backend API class:
Typically, the following activities are performed in a data query method of the data backend API class:
If needed: do any authorization checks based on the auth_data and params arguments; raise n6sdk.exceptions.AuthorizationError on failure.
Translate the contents of the params argument to some storage-specific queries. (Obviously, when doing the translation you may need, for example, to map params keys to some storage-specific keys...).
Note
If the data specification includes dotted “extra params” (such as time.min, time.max, fqdn.sub, ip.net etc.) their semantics should be implemented carefully.
If needed: perform a necessary storage-specific maintenance activity (e.g., re-new a database connection if necessary).
Perform a storage-specific query (or queries).
Sometimes you may want to limit the number of allowed results – then, if the limit is exceeded you raise n6sdk.exceptions.TooMuchDataError.
Translate the results of the storage-specific query (queries) to result dictionaries and yield each of these dictionaries (each of them should be a dictionary ready to be passed to the clean_result_dict() method defined in your data specification class).
(Obviously, when doing the translation you may need, for example, to map some storage-specific keys to the result keys accepted by the clean_result_dict() method of your data specificaton class...)
If there are no results – just do not yield any items (the caller will obtain an empty iterator).
In case of any internal error, raise n6sdk.exceptions.DataAPIError. If it is caused by another exception (that you have caught) it may be good idea to instantiate DataAPIError with the result of traceback.format_exc() call as an argument (for debugging purposes).
It is recommended to decorate your data backend API class with the n6sdk.class_helpers.singleton() decorator (as shown in the example in the The interface section).
A description of the concept of Pyramid authentication policies is beyond the scope of this tutorial. Please read the appropriate paragraph and example from the documentation of the Pyramid library: http://docs.pylonsproject.org/projects/pyramid/en/latest/narr/security.html#creating-your-own-authentication-policy (you may also want to search the Pyramid documentation for the term authentication policy).
The n6sdk library requires that the authentication policy class has the additional static (decorated with staticmethod()) method get_auth_data() that takes exactly one positional argument: a Pyramid request object. The method is expected to return a value that is not None in case of authentication success, and None otherwise. Apart from this simple rule there are no constraints what exactly the return value should be – the implementer decides about that. The return value will be available as the auth_data attribute of the Pyramid request as well as is passed into data backend API methods as the auth_data argument.
Typically, the authenticated_userid() method implementation makes use of the auth_data request attribute (being return value of get_auth_data()), and the get_auth_data() implementation makes some use of the unauthenticated_userid request attribute (being return value of the unauthenticated_userid() policy method). It is possible because get_auth_data() is called (by the Pyramid machinery) after the unauthenticated_userid() method and before the authenticated_userid() method.
The n6sdk library provides n6sdk.pyramid_commons.BaseAuthenticationPolicy – an authentication policy base class that makes it easier to implement your own authentication policies. Please consult its source code.
We will open the __init__.py file of our application (<the main directory of the project>/using_n6sdk/__init__.py) with our favorite text editor and place the following code in it:
from n6sdk.pyramid_commons import (
AnonymousAuthenticationPolicy,
ConfigHelper,
HttpResource,
)
from using_n6sdk.data_backend_api import DataBackendAPI
from using_n6sdk.data_spec import CustomDataSpec
custom_data_spec = CustomDataSpec()
RESOURCES = [
HttpResource(
resource_id='/incidents',
url_pattern='/incidents.{renderer}',
renderers=('json', 'sjson'),
# an *instance* of our data specification class
data_spec=custom_data_spec,
# the *name* of a DataBackendAPI's data query method
data_backend_api_method='generate_incidents',
),
]
def main(global_config, **settings):
helper = ConfigHelper(
# a dict of settings from the *.ini file
settings=settings,
# a data backend API *class*
data_backend_api_class=DataBackendAPI,
# an *instance* of an authentication policy class
authentication_policy=AnonymousAuthenticationPolicy(),
# the list of HTTP resources defined above
resources=RESOURCES,
)
return helper.make_wsgi_app()
You may also need to customize the settings in the <the main directory of the project>/*.ini files (development.ini and production.ini), to match your environment, database configuration (if any) etc.
In case of our naive JSON-file-based data backend implementation (see above: The interface) we need to add the following line in the [app:main] section of each of these two files:
json_data_file_path = /tmp/our-data.json
Finally, let us run the application (still in the development environment):
$ cd <the main directory of the project>
$ source dev-venv/bin/activate # ensuring the virtualenv is active
$ pserve development.ini
Our application should be being served now. Try visiting the following URLs (with any web browser or, for example, with the wget command-line tool):
...as well as those causing (expected) errors:
Prerequisites are similar to those concerning the development environment, listed near the beginning of this tutorial (Setting up the development environment). The Debian GNU/Linux operating system in the version 7.7 or newer is recommended to follow the guides presented below. Additional prerequisite is that the Apache2 HTTP server is installed and configured together with mod_wsgi (the apache2 and libapache2-mod-wsgi Debian packages).
First, we will create a directory structure and a virtualenv for our server, e.g. under /opt:
$ sudo mkdir /opt/myn6-srv
$ cd /opt/myn6-srv
$ sudo virtualenv prod-venv
$ sudo chown -R $(echo $USER) prod-venv
$ source prod-venv/bin/activate
Then, let us install the necessary packages:
$ cd <the main N6SDK source directory>
$ python setup.py install
$ cd <the main directory of the project>
$ python setup.py install
(Of course, <the main n6sdk source directory> needs to be replaced with the actual name (absolute path) of the directory containing the source code of the n6sdk library; and <the main directory of the project> needs to be replaced with the actual name (absolute path) of the directory containing the source code of our n6sdk-based project.)
Now, we will copy the template of the configuration file for production:
$ cd /opt/myn6-srv
$ sudo cp <the main directory of the project>/production.ini ./
You may want to customize the settings it contains, especially to match your production environment, database configuration etc. Just edit the /opt/myn6-srv/production.ini file.
Then, we will create the WSGI script:
$ cat << EOF > prod-venv/myn6-app.wsgi
from pyramid.paster import get_app, setup_logging
ini_path = '/opt/myn6-srv/production.ini'
setup_logging(ini_path)
application = get_app(ini_path, 'main')
EOF
It is also good idea to provide a Python egg cache:
$ sudo mkdir /opt/myn6-srv/.python-eggs
We need to ensure that the Apache’s user has write access to it. On Debian GNU/Linux it can be done by executing:
$ sudo chown www-data /opt/myn6-srv/.python-eggs
Now, we need to adjust the Apache configuration. On Debian GNU/Linux it can be done by executing:
$ cat << EOF > prod-venv/myn6.apache
<VirtualHost *:80>
# Only one Python sub-interpreter should be used
# (multiple ones do not cooperate well with C extensions).
WSGIApplicationGroup %{GLOBAL}
# Remove the following line if you use native Apache authorisation.
WSGIPassAuthorization On
WSGIDaemonProcess myn6_srv \\
python-path=/opt/myn6-srv/prod-venv/lib/python2.7/site-packages \\
python-eggs=/opt/myn6-srv/.python-eggs
WSGIScriptAlias /myn6 /opt/myn6-srv/prod-venv/myn6-app.wsgi
<Directory /opt/myn6-srv/prod-venv>
WSGIProcessGroup myn6_srv
Order allow,deny
Allow from all
</Directory>
# Logging of errors and other events:
ErrorLog \${APACHE_LOG_DIR}/error.log
# Possible values for the LogLevel directive include:
# debug, info, notice, warn, error, crit, alert, emerg.
LogLevel warn
# Logging of client requests:
CustomLog \${APACHE_LOG_DIR}/access.log combined
# It is recommended to uncomment and adjust the following line.
#ServerAdmin webmaster@yourserver.example.com
</VirtualHost>
EOF
$ sudo mv prod-venv/myn6.apache /etc/apache2/sites-available/myn6
$ sudo chown root:root /etc/apache2/sites-available/myn6
$ sudo chmod 644 /etc/apache2/sites-available/myn6
$ cd /etc/apache2/sites-enabled
$ sudo ln -s ../sites-available/myn6 001-myn6
You may want or need to adjust the contents of the newly created file (/etc/apache2/sites-available/myn6) – especially regarding the following directives (see the comments accompanying them in the file):
See also
If we have the default Apache configuration on Debian, we need to disable the default site by removing the symbolic link:
$ rm 000-default
Finally, let us restart the Apache daemon. On Debian GNU/Linux it can be done by executing:
$ sudo service apache2 restart
Our application should be being served now. Try visiting the following URL (with any web browser or, for example, with the wget command-line tool):
http://<your apache server address>/myn6/incidents.json
(Of course, <your apache server address> needs to be replaced with the actual host address of your Apache server, for example 127.0.0.1 or localhost.)