Python

What is Python?

Python is a high-level, general-purpose programming language that is widely used in data science, machine learning, and web development. It has a large standard library and a vibrant community that provides a wide range of libraries and tools for various applications. As such, Python provides a general-purpose ecosystem that can be used for a wide range of applications.

How to install Python?

There are many ways to install Python. We recommend using Python in a virtual environment to avoid conflicts with other Python installations on your system.

We recommend using pixi as a a simple way to create and manage Python virtual environments.

You can manage the python packages that are installed in the virtual environment using a pyproject.toml file. See the pyproject.toml example in this repository for an example of how to manage Python packages. To add package dependencies to the virtual environment, using pixi, you can run:

First, install pixi using winget (Windows) or brew (MacOS/Linux):

Platform	Commands
Windows	`winget install prefix-dev.pixi`
MacOS	`brew install pixi`
Linux	`brew install pixi`

Add libraries to the virtual environment using pixi add ...:

> pixi add jupyterlab pandas matplotlib seaborn --pypi

Coding Conventions

We highly recommend working with a virtual environment to manage Python dependencies. The pyproject.toml is the preferred way to keep track of python dependencies as well as project-specific python conventions.

We recommend using Ruff to enforce linting and formatting rules. In most cases you can use the default linting and formatting rules provided by ruff. However, you can customize the rules by modifying the [tool.ruff] section of the pyproject.toml file in the root of your project. for more about the configuration options, see the Ruff documentation.

If you are working in a virtual environment created in this repository, you automatically have access toRuff via just lint-py and just fmt-python commands to lint and format your code.

For more inspiration, see the GitLab Data Team’s Python Guide and Google’s Python Style Guide.

Example Usage

Let’s load an example World Bank data via Gapminder using the causaldata package.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as sm
from causaldata import gapminder

Load the Gapminder data as a pandas DataFrame:

df = gapminder.load_pandas().data

We can check the dimensions of the DataFrame using df.info():

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1704 entries, 0 to 1703
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   country    1704 non-null   object 
 1   continent  1704 non-null   object 
 2   year       1704 non-null   int64  
 3   lifeExp    1704 non-null   float64
 4   pop        1704 non-null   int64  
 5   gdpPercap  1704 non-null   float64
dtypes: float64(2), int64(2), object(2)
memory usage: 80.0+ KB

Let’s take a look at the first few rows of the DataFrame using df.head():

df.head()

	country	continent	year	lifeExp	pop	gdpPercap
0	Afghanistan	Asia	1952	28.801	8425333	779.445314
1	Afghanistan	Asia	1957	30.332	9240934	820.853030
2	Afghanistan	Asia	1962	31.997	10267083	853.100710
3	Afghanistan	Asia	1967	34.020	11537966	836.197138
4	Afghanistan	Asia	1972	36.088	13079460	739.981106

Take a look at the relationship between GDP per Capita and Life Expectancy:

sns.scatterplot(x="gdpPercap", y="lifeExp", hue="continent", data=df).set(
    xscale="log", ylabel="Life Expectancy", xlabel="GDP per Capita"
)

Separate the data by year, focusing on 1957 and 2007:

sns.relplot(
    data=df.where(df["year"].isin([1957, 2007])),
    x="gdpPercap",
    y="lifeExp",
    col="year",
    hue="continent",
    col_wrap=1,
    kind="scatter",
    palette="muted",
).set(xscale="log", ylabel="Life Expectancy", xlabel="GDP per Capita")

What is Python?

How to install Python?

Coding Conventions

Example Usage

Learning Resources