The Ultimate Pandas Cheatsheet

Comprehensive reference guide for working with Series and DataFrames in Pandas, the essential Python library for data manipulation and analysis.

Creating Series and DataFrames

Action	Code Example
Import the library	`import pandas as pd`
Create a Series from a list	`s = pd.Series([909976, 8615246, 2872086, 2273305])`
Create a Series with an index and name	`s = pd.Series([...], name="Population", index=["Stockholm", "London", ...])`
Create a DataFrame from a nested list	`df = pd.DataFrame([[909976, "Sweden"], [8615246, "United Kingdom"], ...])`
Create a DataFrame from a dictionary	`df = pd.DataFrame({"Population": [...], "State": [...]}, index=["Stockholm", ...])`

Inspect Data

Action	Code Example
Get the index	`s.index`
Get the values (as a NumPy array)	`s.values`
Get the columns	`df.columns`
Show the first 5 rows	`df.head()`
Get a summary of the DataFrame	`df.info()`
Get the data types of each column	`df.dtypes`
Get descriptive statistics	`s.describe()`

Selecting and Indexing Data

Action	Code Example
Select a column (returns a Series)	`df["Population"]` or `df.Population`
Select a row by label	`df.loc["Stockholm"]`
Select multiple rows by label	`df.loc[["Paris", "Rome"]]`
Select specific rows and a column	`df.loc[["Paris", "Rome"], "Population"]`

Data Manipulation and Cleaning

Action	Code Example
Set the index of a Series or DataFrame	`s.index = ["Stockholm", "London", "Rome", "Paris"]`
Set the column names of a DataFrame	`df.columns = ["Population", "State"]`
Apply a function to a Series	`df.Population.apply(lambda x: int(x.replace(",", "")))`
Set a column as the index	`df_pop2 = df_pop.set_index("City")`
Set a hierarchical (multi-level) index	`df_pop3 = df_pop.set_index(["State", "City"])`
Sort by the index	`df.sort_index(level=0)`
Sort by column values	`df.sort_values(["State", "NumericPopulation"], ascending=[False, True])`
Count unique values in a Series	`city_counts = df_pop.State.value_counts()`
Group by an index level and aggregate	`df_pop3.groupby(level="State").sum()`
Group by a column and aggregate	`df.groupby("State").sum()`
Create a pivot table	`pd.pivot_table(df, values='outdoor', index=['month'], columns=['hour'])`

Time Series

Action	Code Example
Create a date range	`pd.date_range("2015-1-1", periods=31, freq="D")`
Convert Unix timestamps to Datetime objects	`df.time = pd.to_datetime(df.time.values, unit="s")`
Localize and convert timezone	`df.time.tz_localize('UTC').tz_convert('Europe/Stockholm')`
Select a time slice	`df2["2014-1-1":"2014-1-31"]`
Convert DatetimeIndex to PeriodIndex	`df.to_period("M")`
Downsample a time series	`df1_day = df1.resample("D").mean()`
Upsample with forward fill	`df1.resample("5min").ffill()`

Plotting

Action	Code Example
Create a plot from a Series or DataFrame	`s.plot(kind='bar', title='bar')`
Plot multiple columns from a DataFrame	`df_temp.plot(y=["outdoor", "indoor"], ax=ax)`

Data Input / Output (I/O)

Action	Code Example
Read data from a CSV file	`df_pop = pd.read_csv("european_cities.csv")`
Write a DataFrame to a CSV file	`df.to_csv("subset.csv")`
Write to an HDF5 file store	`store = pd.HDFStore('store.h5')` `store["df1"] = df`
Read from an HDF5 file store	`df = store["df1"]`
Write to a Parquet file with partitioning	`df.to_parquet("data.parquet", partition_cols=["dt"])`
Read from a Parquet file	`df_new = pd.read_parquet("data.parquet")`

Key Concepts Summary

Series vs DataFrame

Series: One-dimensional labeled array
DataFrame: Two-dimensional labeled data structure

Indexing Methods

.loc[]: Label-based indexing
.iloc[]: Position-based indexing
Boolean indexing: Conditional filtering

Data Types in Pandas

object: Text or mixed data
int64: Integer numbers
float64: Floating-point numbers
bool: True/False values
datetime64: Date/time values
category: Categorical data

Essential Import

import pandas as pd

This comprehensive Pandas cheatsheet covers the essential operations for data manipulation and analysis in Python. Regular practice with these methods will greatly improve your data handling skills.

Updated: January 15, 2025
Author: Danial Pahlavan
Category: Data Science & Analysis