The Ultimate Pandas Cheatsheet
Comprehensive reference guide for working with Series and DataFrames in Pandas, the essential Python library for data manipulation and analysis.
Creating Series and DataFrames
| Action |
Code Example |
| Import the library |
import pandas as pd |
| Create a Series from a list |
s = pd.Series([909976, 8615246, 2872086, 2273305]) |
| Create a Series with an index and name |
s = pd.Series([...], name="Population", index=["Stockholm", "London", ...]) |
| Create a DataFrame from a nested list |
df = pd.DataFrame([[909976, "Sweden"], [8615246, "United Kingdom"], ...]) |
| Create a DataFrame from a dictionary |
df = pd.DataFrame({"Population": [...], "State": [...]}, index=["Stockholm", ...]) |
Inspect Data
| Action |
Code Example |
| Get the index |
s.index |
| Get the values (as a NumPy array) |
s.values |
| Get the columns |
df.columns |
| Show the first 5 rows |
df.head() |
| Get a summary of the DataFrame |
df.info() |
| Get the data types of each column |
df.dtypes |
| Get descriptive statistics |
s.describe() |
Selecting and Indexing Data
| Action |
Code Example |
| Select a column (returns a Series) |
df["Population"] or df.Population |
| Select a row by label |
df.loc["Stockholm"] |
| Select multiple rows by label |
df.loc[["Paris", "Rome"]] |
| Select specific rows and a column |
df.loc[["Paris", "Rome"], "Population"] |
Data Manipulation and Cleaning
| Action |
Code Example |
| Set the index of a Series or DataFrame |
s.index = ["Stockholm", "London", "Rome", "Paris"] |
| Set the column names of a DataFrame |
df.columns = ["Population", "State"] |
| Apply a function to a Series |
df.Population.apply(lambda x: int(x.replace(",", ""))) |
| Set a column as the index |
df_pop2 = df_pop.set_index("City") |
| Set a hierarchical (multi-level) index |
df_pop3 = df_pop.set_index(["State", "City"]) |
| Sort by the index |
df.sort_index(level=0) |
| Sort by column values |
df.sort_values(["State", "NumericPopulation"], ascending=[False, True]) |
| Count unique values in a Series |
city_counts = df_pop.State.value_counts() |
| Group by an index level and aggregate |
df_pop3.groupby(level="State").sum() |
| Group by a column and aggregate |
df.groupby("State").sum() |
| Create a pivot table |
pd.pivot_table(df, values='outdoor', index=['month'], columns=['hour']) |
Time Series
| Action |
Code Example |
| Create a date range |
pd.date_range("2015-1-1", periods=31, freq="D") |
| Convert Unix timestamps to Datetime objects |
df.time = pd.to_datetime(df.time.values, unit="s") |
| Localize and convert timezone |
df.time.tz_localize('UTC').tz_convert('Europe/Stockholm') |
| Select a time slice |
df2["2014-1-1":"2014-1-31"] |
| Convert DatetimeIndex to PeriodIndex |
df.to_period("M") |
| Downsample a time series |
df1_day = df1.resample("D").mean() |
| Upsample with forward fill |
df1.resample("5min").ffill() |
Plotting
| Action |
Code Example |
| Create a plot from a Series or DataFrame |
s.plot(kind='bar', title='bar') |
| Plot multiple columns from a DataFrame |
df_temp.plot(y=["outdoor", "indoor"], ax=ax) |
| Action |
Code Example |
| Read data from a CSV file |
df_pop = pd.read_csv("european_cities.csv") |
| Write a DataFrame to a CSV file |
df.to_csv("subset.csv") |
| Write to an HDF5 file store |
store = pd.HDFStore('store.h5')
store["df1"] = df |
| Read from an HDF5 file store |
df = store["df1"] |
| Write to a Parquet file with partitioning |
df.to_parquet("data.parquet", partition_cols=["dt"]) |
| Read from a Parquet file |
df_new = pd.read_parquet("data.parquet") |
Key Concepts Summary
Series vs DataFrame
- Series: One-dimensional labeled array
- DataFrame: Two-dimensional labeled data structure
Indexing Methods
.loc[]: Label-based indexing
.iloc[]: Position-based indexing
- Boolean indexing: Conditional filtering
Data Types in Pandas
- object: Text or mixed data
- int64: Integer numbers
- float64: Floating-point numbers
- bool: True/False values
- datetime64: Date/time values
- category: Categorical data
Essential Import
This comprehensive Pandas cheatsheet covers the essential operations for data manipulation and analysis in Python. Regular practice with these methods will greatly improve your data handling skills.
Updated: January 15, 2025
Author: Danial Pahlavan
Category: Data Science & Analysis