These guidelines document best practices on creating and using Jupyter notebooks. If you’re in need of technical documentation on how to setup your environment then read MISP Playbook technical documentation.
A Jupyter notebook is an open-source interactive environment where users can write and execute computer code, observe the results of the code execution and combine the code with text (in Markdown) and visualisations. In essence it allows you to bundle code and documentation in one place. The computer code gets executed by what’s called a kernel and for our playbooks we will be using Python. The editing of notebooks is mostly done via a web interface and notebooks are stored in a JSON format (for example MISP playbook).
Jupyter notebooks comes in different flavours. Whereas Jupyter Notebook is the classic, traditional, notebook interface, JupyterLab is the latest web-based environment for notebooks. Contrary to the classic interface, the JupyterLab interface has a simple file browser that makes navigating between different notebooks easier and it also comes with a visual debugger that allows you to set breakpoints and inspect variables. In general notebooks can be run in the classic Jupyter Notebook interface as well as in the JupyterLab interface. The MISP playbooks are tested and developed in JupyterLab but should also work in the classic Notebook interface.
There’s more information on Jupyter Notebook, JupyterLab and JupyterHub in the MISP Playbook technical documentation
PyMISP is a Python library used to interact with the MISP REST API. It allows you to setup automation and integration with other tools but it also provides you an easy to use interface to manipulate MISP data and configuration settings.
MISP playbooks address common use-cases encountered by SOCs, CSIRTs or CTI teams to detect, react and analyse intelligence received by MISP.
The MISP playbooks combine Jupyter notebooks and PyMISP to provide you
The MISP playbooks are stored and managed via a public GitHub repository MISP/misp-playbooks.
The MISP playbook structure is detailed in MISP playbook structure. Playbooks consist of
Always start a playbook with a self-explaining title and add the title as a top header (in Markdown with one #
).
To make your life simpler it’s advised to use notebook file names without spaces. Most modern operating systems have no problem with spaces in file names, but some of the supporting tools or scripts do not always take spaces into account. Instead of spaces you can use an underscore (_).
Also it is important to (re-)name a blank notebook immediately as JupyterLab will give an “Untitled.html” name as default to all new notebooks.
The MISP playbooks in this repository all start with pb_
and then the title of the playbook. Spaces in the title are replaced with an underscore.
Document the dependencies required to execute the playbook. Add them to the technical details of the playbook and make sure users can easily install the dependencies, for example by creating a requirements.txt
file. You can create such a file with pip freeze
from within the Python virtual environment.
When you add cells it’s a good idea to have them only do one task at a time. This makes debugging easier and helps with maintaining the playbook.
Use a unique identifier per execution step, for example “PR:1 stepname” or “EN:3 stepname”. This makes it easier to refer to the steps in other sections and also makes the general overview much cleaner.
If you expect your users to provide input by changing a variable then put these variables in separate cells. That way users who are less experienced with Python code do not get overwhelmed with code blocks and can easily spot the areas where they have to provide input.
Ensure that your code cells only do one thing at a time. It makes it easier for users of the playbook to follow the progress, and it makes development much more easier.
When users execute code in a cell in a playbook they expect some form of output. Seeing output is reassuring that the code was executed, even if the output indicates something is wrong. Make sure to print out the result of your code execution per cell.
If you output results via Python (as code blocks) it’s advised to use coloured output. Differentiating between good and bad results with a green or red colour makes it easier for users to spot errors. You can use colours by using for example \033[91m
or \033[92m
in your output.
Next to colours it’s advised to include sufficient newlines (with \n
) in your output. This makes it easier for your playbook users to read or interpret the results.
Include a table of content at the start of your playbook.
In Jupyter Lab you can easily navigate in your notebook with the Table of Contents shortcut.
Do not store credentials in the playbook. In our playbooks we load the keys.py
file from a vault directory. Obviously as a user running the playbook you can still have a peak at the content of that file. But by storing the credentials outside the playbook you make it easier to share them with other users (or organisations), without having to bother with removing credentials.
In general there are two reasons for sharing a playbook with other users,
Whereas with the first option, using it as procedure, you do not want to include the output of the code execution, with the second option you definitely want to include that output.
Clearing the output is relatively easy in Jupyter Lab. Under the menu Edit, you can clear the output (Clear Output) of one cell, or you can clear the output of all cells (Clear All Outputs).
You can also clear the output of all cells by using a command jupyter
command from within the Python virtual environment that is used to execute the playbooks.
jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace playbook.html
The playbooks are Jupyter notebooks. These notebooks are simple JSON files (see The Jupyter Notebook Format). This also means that you can easily store them in GitHub (or GitLab) and track changes between different versions (also see the development guideline on nbdime below).
It is advised that before you add a new playbook to GitHub, you clear the output of the cells. This avoids that you store potential confidential information in a code repository. Obviously if you want to keep track of playbooks for reporting purposes you do want to keep this output.
An easy way to make sure cell output is cleaned is by using a git hook, as explained by Clearing Output Data in Jupyter Notebooks using Pre-commit Framework, or through other means such as Clearing Output Data in Jupyter Notebooks using a Bash Script Hook and Clearing Output Data in Jupyter Notebooks using Git Attributes.
When you start using a playbook it’s good practice to make a copy of the playbook and then use that copy. This allows you to keep an un-changed master version of the playbook. You can for example append to the playbook name the case reference or date information (in YYYYMMDD format).
Jupyter Notebook and JupyterLab have an ever growing list of extensions. And although they provide a lot of interesting features, it’s advised to always take a step back before including them in your playbook. An extension can make the portability of your playbook more difficult as users will have to install / enable the extension as well.
In almost all cases it is more beneficial to run your playbooks from a Python virtual environment. This allows you to control the installed versions and does not pollute your operating system installed libraries. Also see MISP Playbook technical documentation.
It will not be uncommon that you have to use the same code snippets in different playbooks. You can then modularise your code (store them in separate files) and import these modules in the initialisation phase of the playbook. The downside of this approach is that the playbooks become less portable, as you also have to include these modules when you share your playbooks.
For playbooks that are meant for use within your team or community it can make sense to modularise as much as possible and then distribute the modules (as separate files) together with the playbooks. If you plan to publish your playbooks to a larger audience, it makes sense to include all the necessary code in a single playbook. Note that you can collapse a cell in a notebook, which can already improve the readability.
If you do plan on modularising your code, it is advised to put these files in a separate directory. For example use this directory layout
notebooks/
mynotebook.html
notebooks/modules
module1.py
module2.py
vault/
keys.py
In a traditional Python script you more or less control the execution sequence. In a notebook users can trigger codeblocks (cells) at will, regardless if this code depends on earlier code being executed.
Introducing error handling to check that required previous code has executed is going to clutter your playbooks very quickly so it’s advised not to do this. But be aware that the order of code execution in your playbook is controlled by the playbook user, not by your code logic.
Restarting the Python kernel very often (basically clearing the state of all variables) and rerunning all code cells in your playbook again is a very effective strategy to test the well-functioning of your playbook.
Although it’s a good idea not to use extensions for portable playbooks, you can use extensions during development. One of the more useful extensions is snippets. This extension adds a drop-down menu to the IPython toolbar that allows easy insertion of code snippet cells into the current notebook.
When you develop the Python code cells it’s good practice to stick to the clean code principles, see How to Write Beautiful Python Code With PEP 8 and PEP 8 – Style Guide for Python Code.
Primitive line-based diff and merge tools do not handle well the logical structure of notebook documents. With nbdime you get content-aware diffing and merging of Jupyter notebooks. It is strongly suggested to install nbdime on your development environment and use it to track (code) changes between different versions of the playbooks.
If you want to test multiple versions of a Python code block then it’s easiest to duplicate a cell and set the original cell to Raw. This prevents the Python kernel to interpret the statements in this cell. It’s also much easier compared to commenting out large blocks of code in existing cells.
In order to avoid that you have to start from scratch for each new playbook it is advised to setup a skeleton playbook with all the necessary components, such as loading libraries and credentials. Then start the further development from this skeleton.
JupyterLab has a built-in kernel debugger (1). This feature allows you to set a breakpoint (2) and inspect the code that is being executed in a cell. It also allows you to review the status of variables (3) and “walk” through the Python code step by step. This is a great way to spot errors in the Python code of your playbook.
It’s useful to display the line numbers in code cells when you develop the playbook. Enable the line numbers via View, Show Line Numbers.
From https://jupyterlab.readthedocs.io/en/stable/user/interface.html
The JupyterLab interface consists of a main work area containing tabs of documents and activities, a collapsible left sidebar, and a menu bar. The left sidebar contains a file browser, the list of running kernels and terminals, the command palette, the notebook cell tools inspector, and the tabs list.
The column that allows to switch between tabs is called Activity Bar in JupyterLab.
The sidebars can be collapsed or expanded by selecting “Show Left Sidebar” or “Show Right Sidebar” in the View menu or by clicking on the active sidebar tab.
You can start a playbook from scratch by opening a new notebook or by duplicating an existing notebook, preferably a ‘playbook skeleton’.
A new notebook can be started via File, New Notebook.
Duplicating an existing playbook (notebook) can be done by right clicking on the skeleton and then choosing Duplicate. Do not forget to immediately change the file name to something meaningful.
You can change the cell type by first selecting the cell and then clicking the dropdown list at the top of the editor window. New cells are set by default to “code”.
You can Add, Copy, Paste, Duplicate, Move and Delete cells by using the control buttons at the top of the editor window, or by clicking in a cell and then using the control buttons on the right of the cell.
You can Run (‘play’) and Stop code with the control buttons at the top of the editor window.
There is also a button to Restart the Kernel and a button to restart the kernel and execute all cells at once (‘fast-forward’).
You can add attachments (such as images) to notebooks via Edit, Insert Image. Note that you cannot do this in JupyterLab, you have to use Jupyter Notebook for this.
In the Jupyter notebook file they are stored in the cell with the key attachments
. Attachments can increase the size of the notebook. If you want to remove the attachments then choose View, Cell Toolbar, Attachments.
It can be useful to focus on only one notebook. With the JupyterLab single document mode you remove the visual indications of the other notebooks, without actually having to close them.
You can activate this mode via View, Simple Interface.