C O N N E C T I N G - C H A I N S

When working with dataframes instead of excel, we gain a massive amount of power and capabilities given python is a programming language, and excel is a spreadsheet.

One of the things we lose, however, is the graphic interface. The ability to click a button and add a column. Below are a number of ways you can add new columns to a dataframe in Pandas.

Setting up our example

The first thing I am going to do is to create a sample dataframe for us to work with.

import numpy as np
import pandas as pd
# import sample data
data = pd.read_excel('./data/pandas_add_column.xlsx')

This will give us the following table (assuming we are viewing it in Jupyter Notebook):

Sample Order Data
Sample Order Data

I am also going to add the following data that I want to add to this table.

# create additional column
## as list
supplier_list = ['Whiskey', 'Xray', 'Yankee', 'Zulu']

## as dictionary
supplier_dictionary = {'28374':'Whiskey', '28394':'Yankee', '23928':'Zulu', '23305':'Xray'}

How do I add a column to a dataframe in Pandas?

Below are going to cover 7 ways to add new column to your dataframe.

1. Declaration

Here we are setting a column on the table data equal to the list or array of values we stored in that variable above. This is the most straight forward method.

# Method 1
## Declaring

data['Supplier'] = supplier_list

When doing this, it will add the column to the end of the table.

Adding a Column

2. .insert()

Insert allows us to control where the column is inserted. Each column has an index number starting with 0.

Inserting a Column

So if we want Supplier to come right after Customer, we need to choose an index position of 2. Order Number would be index 0, and Customer would be index 1, so we want to insert into index 2.


3. .assign()

When used without complex options, assign acts like declaring a column… by adding it to the end.

data = data.assign(Supplier = supplier_list)
Assigned Column

4. Dictionary

A dictionary in Python is different than a normal list of data. Dictionaries are made up of what are called “Key-Value Pairs”. Rather than a list of values, each value is actually associated with a key.

Looking above when we set up the example data, you can see I created a dictionary called supplier_dictionary. I specified not just a list of suppliers here, but I gave which supplier is mapped to each order. so 28394 was ordered from Yankee.

So far, when we add columns, the list is added in whichever order we happen to have the list. However, what if it matters which row the data is on. While there are many ways to do this, here is one way to add a column if you have a dictionary.

Note: Because of the way this works, we need to invert our dictionary first. That is, we need to turn the keys into values and the values into keys; in other words, we need to change {key:value} into {value:key}. Here is the code to do that:

inverted_dict = dict([[v,k] for k,v in supplier_dictionary.items()])

Now we can add or column. You will notice that this time the suppliers are in a different order. They are aligned to their Order Numbers. Using a dictionary will automatically look for any columns that have data that match your keys and align them for you. All you have to do is declare your column and set it equal to the inverted dictionary.

data["Supplier"] = inverted_dict
Added Column via Dictionary Assignment

5. .assign() with a function

I mentioned earlier you can use assign and give it some real power if you want. This syntax below is an example. While all I am doing here is taking the list and creating a new column, you can put any mathematical formula after the colon. This means you can change and transform the list before it gets added to the new column. I’ll give some transformation examples below after we go over the main application.

data = data.assign(Supplier =lambda x: supplier_list)
Assignment using a function

Now let’s use .assign again, only this time we will take another column in the dataframe and change it before creating our new column.

data = data.assign(Peso_total =lambda x: data['Order Total'] * 19.5)

6. Setting Default Value

Pandas relies on a great library called Numpy. Numpy uses “element-wise” calculations. That means that arrays are operated against arrays. Here is an example.

When you multiply green by yellow, each element in green is multiplied by its corresponding element in yellow. This makes operations very easy using Numpy. Pandas carries all of this since it includes Numpy.

When using “element-wise” operations, you can multiply an entire array (or list of items like [1, 2, 3]) by a single number, and that number gets applied to the entire array automatically.

However, this isn’t only for mathematical operations. We can do anything to an array, and it will do it to the entire array, “element-wise”. So we can define new columns like this as well, and in the process set defaults.

So let’s say that you had a dictionary of suppliers mapped to orders; but you only had some of them. In all other spaces you wanted it to say ‘none’. You could create a column that was set to ‘none’, and then overwrite using a dictionary where you had data. Below is how to create a column with a default value.

data['Supplier'] = "none"
Add a column with a default value

7. Declaring using formulas

Last, and the one I most commonly use, it to create a new column by declaring it based on other columns.

data['Peso_total'] = data['Order Total'] * 19.5