# Python Development with Jupyter, Gitlab CI and Tox

November 2020 · 4 minute read

In this post I want to give a brief overview how to structure a Python package, load it in Jupyter notebooks, setup continuous integration with Gitlab and tox-conda that I like to use for data science projects.

Most data scientists coming from R will probably ask themselves:

What is the way to structure a Python package?

and:

How can I get the load_all() behaviour from RStudio replicated in Jupyter to get access to my package in my notebooks?

While in R there is only one (admittedly sometimes limiting) way to structure a package, you have got quite a lot of options in the Python world.

Since data science workloads rely heavily on the conda packaging ecosystem (especially on Windows machines), I will focus on how to make everything play nicely with conda.

## Python Package

We begin setup by creating a new conda environment and specifying the interpreter we want to use and activating it:

conda create --name <environment_name> python=3.8 -y
conda activate <environment_name>


Now we need to setup our package structure. In R there is a function called package.skeleton() (or alternatively RStudio’s GUI) that create a basic package with all necessary files. For Python there are different packages that provide similar functionality (such as cookie-cutter), but I will quickly outline a minimal setup here:

src/
package_name/
__init__.py
python_file1.py
...
python_fileN.py
tests/
notebooks/
data/
outputs/
setup.py
gitlab-ci.yml
tox.ini
config.yml
CHANGELOG.md
.gitignore
.env


Let’s take a look at three important files:

• setup.py
• gitlab-ci.yml
• tox.ini

### setup.py

A simple setup.py file sufficient for small to medium sized projects can look like this:

from setuptools import setup
from setuptools import find_packages

setup(name='<pkg-name>',
version='0.10',
description='<package-description-here>',
author='<your-name>',
author_email='<your-email>',
packages=find_packages("src"),
package_dir={"": "src"},
zip_safe=False,
install_requires=[
'<some-pkg>,
## Note: the space between the @ are acutally necessary!
'<some-gitlab-package>  @  git+ssh://git@<package-url-with-slashes>.git@master',
],
extras_require={
'dev': [
'pytest',
'mypy',
'pylint',
'coverage',
'python-dotenv',
'tox-conda',
'ipykernel',
'matplotlib',
'plotnine',
'seaborn',
]
}
)


For development purposes, you can install your package using:

pip install -e .[dev]


### gitlab-ci.yml

At work, IT only provides us with a basic Gitlab bash runner, so we need to be careful not to accidentially change our testing environment permanently.

As a work around, we use tox-conda which as the name suggests makes tox play nicely with conda environments. All test are run using tox in conda environments.

A very simple gitlab-ci.yml can look like this:

stages:
# sample stages might include:
- test
- deploy

variables:
# declare any variables here
WORKSPACE: "../{CI_PROJECT_NAME}

before_script:
# here you can run any commands before the pipelines start
- echo \$CONDA_PREFIX

after_script:
# Do something after the pipeline is finished
# Attention: if you do clean-up here and delete your junit output, Gitlab cannot upload the file

test job:
stage: test
script:
- tox
artifacts:
when: always
reports:
junit: report.xml
paths:
- report.xml
expire_in: 1 week

deploy job:
stage: deploy
script: <deploy>

last job:
stage: .post
script: <this will always be the last run job, so you can do for example clean-up here>



The next file we are going to cover ist tox.ini that helps us with running tests. Tox is a very convenient tool, as it takes care of automatically creating test environments (in our case conda environment, because we use tox-conda) and running any tests etc we want. Let’s take a look:

### tox.ini

[tox]
envlist =
{py37, py38}

[testenv]
passenv = *
deps=
pytest-sugar
python-dotenv
commands=
pytest --junitxml=report.xml

[testenv:black]
deps=
black
commands=
black --check .


Note, the passenv part is necessary if you want to pass environment variables from Gitlab to tox. You can test multiple Python versions by passing them in envlist.

Don’t forget to exclude your .env file from version control by putting it in .gitignore in case you have any api tokens or similar in your environment file.

You can specify environment variables in Gitlab to be used by your pipelines where you can mask them in log outputs.

## Jupyter Integration

Since I already included the ipykernel package in my setup.py file, we do not need to install it anymore.

python -m ipykernel install --user --name myenv --display-name "Python (myenv)"


Last, but not least, I include the following code in my juypter notebook to get the mimic RStudio’s load_all(), and load some environment variables:

# Load autoreload module