Temperature and Sunscreen Search Trends in California (2004–2024)

Temperature and Sunscreen Search Trends in California (2004–2024)#

In this notebook, we explore the relationship between temperature trends and public interest in sunscreen in California from 2004 to 2024.

We use varunayan to easily extract monthly average temperature data for California from the ERA5 climate reanalysis dataset.

We aim to:

Compare temperature data with Google Trends search queries for “sunscreen.”
Visualize time series of both temperature and search interest and identify potential lead-lag relationships.
Explore regional patterns in temperature using raw high-resolution climate data (latitude-longitude based), which was obtained with the processed data from varunayan.

This study can help identify when and where sunscreen demand may rise, aiding in health awareness and regional sales planning.

Step 1: Extract Monthly Temperature Data for California#

We use the varunayan package to download ERA5 temperature data for the California region using a GeoJSON boundary. The data spans from January 2004 to December 2024, with monthly frequency and 0.1° spatial resolution.

import varunayan

df = varunayan.era5ify_geojson(
    request_id="temp_california_2004_2024",
    variables=["2m_temperature"],
    start_date="2004-1-1",
    end_date="2024-12-31",
    json_file="https://gist.githubusercontent.com/JaggeryArray/50ad17645a290ee4445e1113609de5e4/raw/91e97ec6ae6654093490df87a9802e359bf303b1/california.geojson",
    frequency="monthly",
    resolution=0.1
)

============================================================
STARTING ERA5 SINGLE LEVEL PROCESSING
============================================================
Request ID: temp_california_2004_2024
Variables: ['2m_temperature']
Date Range: 2004-01-01 to 2024-12-31
Frequency: monthly
Resolution: 0.1°
GeoJSON File: C:\Users\ATHARV~1\AppData\Local\Temp\temp_california_2004_2024_temp_geojson.json


--- GeoJSON Mini Map ---

MINI MAP (-124.45°W to -114.14°E, 32.53°S to 42.00°N):
┌─────────────────────────────────────────┐
│·········································│
│·■■■■■■■■■■■■■■■■■·······················│
│·■■■■■■■■■■■■■■■■■·······················│
│·■■■■■■■■■■■■■■■■■·······················│
│··■■■■■■■■■■■■■■■■·······················│
│···■■■■■■■■■■■■■■■·······················│
│···■■■■■■■■■■■■■■■■······················│
│·····■■■■■■■■■■■■■■■■····················│
│·······■■■■■■■■■■■■■■■■■·················│
│········■■■■■■■■■■■■■■■■■■■··············│
│···········■■■■■■■■■■■■■■■■■■■···········│
│···········■■■■■■■■■■■■■■■■■■■■■·········│
│·············■■■■■■■■■■■■■■■■■■■■■■······│
│···············■■■■■■■■■■■■■■■■■■■■■■■···│
│···············■■■■■■■■■■■■■■■■■■■■■■■■··│
│·····················■■■■■■■■■■■■■■■■■■■·│
│··························■■■■■■■■■■■■■··│
│····························■■■■■■■■■■···│
│·········································│
└─────────────────────────────────────────┘
 ■ = Inside the shape
 · = Outside the shape



dbc527cd904c59dd9eb75ed248619a56.zip:   0%|          | 0.00/1.84M [00:00<?, ?B/s]



db162004fe55130472aa4b4b0a19e376.zip:   0%|          | 0.00/1.84M [00:00<?, ?B/s]



7427a1074f5224bada01bce9cd2bf2c1.zip:   0%|          | 0.00/1.04M [00:00<?, ?B/s]



Saving files to output directory: temp_california_2004_2024_output
  Saved final data to: temp_california_2004_2024_output\temp_california_2004_2024_monthly_data.csv
  Saved unique coordinates to: temp_california_2004_2024_output\temp_california_2004_2024_unique_latlongs.csv
  Saved raw data to: temp_california_2004_2024_output\temp_california_2004_2024_raw_data.csv

============================================================
PROCESSING COMPLETE
============================================================

RESULTS SUMMARY:
----------------------------------------
Variables processed: 1
Time period:         2004-01-01 to 2024-12-31
Final output shape:  (252, 3)
Total complete processing time: 91.41 seconds

First 5 rows of aggregated data:
          t2m  year  month
0  279.085327  2004      1
1  279.616730  2004      2
2  286.289825  2004      3
3  287.153900  2004      4
4  290.526306  2004      5

============================================================
ERA5 SINGLE LEVEL PROCESSING COMPLETED SUCCESSFULLY
============================================================

Step 2: Create a Unified Date Column#

We convert the year and month columns to a single datetime column for easier merging and plotting.

import pandas as pd

df['date'] = pd.to_datetime(df[['year', 'month']].assign(day=1))

Step 3: Load Google Trends Data for “Sunscreen”#

We import monthly Google Trends data that captures public search interest in the keyword “sunscreen” for California from 2004 onward. This data could be replaced by sales data of any product too.

df_trend = pd.read_csv("https://gist.githubusercontent.com/JaggeryArray/05a2ffb206ae6ada04f5187c15d77d45/raw/99b5dac0efed5c3fa56c0221e2352aa2e358e17f/multiTimeline_california_googletrends_sunscreen_2004-now.csv")

Step 4: Convert Month Strings to Datetime Format#

The Trends dataset has a Month column as a string. We convert it into a datetime object to align with the ERA5 data.

df_trend['date'] = pd.to_datetime(df_trend['Month'], format='%Y-%m')

Step 5: Merge ERA5 and Google Trends Data#

We perform an inner join on the date column to combine temperature and search interest into a single DataFrame.

df_merged = pd.merge(df, df_trend, on='date', how='inner')

def setup_matplotlib():
    try:
        import matplotlib.pyplot as plt
    except ImportError:
        raise ImportError(
            "Matplotlib is not installed. Install it with: pip install matplotlib"
        )

    plt.rcParams["figure.dpi"] = 300
    plt.rcParams["savefig.dpi"] = 300
    plt.rcParams["font.family"] = "sans-serif"
    plt.rcParams["font.sans-serif"] = ["Arial"]
    plt.rcParams["axes.labelweight"] = "normal"

    plt.rcParams["mathtext.fontset"] = "custom"
    plt.rcParams["mathtext.rm"] = "Arial"
    plt.rcParams["mathtext.it"] = "Arial:italic"
    plt.rcParams["mathtext.bf"] = "Arial:bold"

Step 7: Visualize Temperature vs Sunscreen Search Trends#

We create a dual-axis plot:

Temperature (K) is shown on the left Y-axis.
Log-transformed sunscreen search interest is plotted on the right Y-axis.

This allows us to visually inspect whether warmer months coincide with increased sunscreen interest.

import numpy as np
import matplotlib.pyplot as plt

setup_matplotlib()

fig, ax1 = plt.subplots(figsize=(12, 5))

# Plot t2m on left y-axis
ax1.plot(df_merged['date'], df_merged['t2m'], color='tab:blue', marker='o', label='Temperature (t2m)')
ax1.set_ylabel('Temperature (K)', color='tab:blue')
ax1.tick_params(axis='y', labelcolor='tab:blue')

# Apply log transform to sunscreen
sunscreen_log = np.log(df_merged['sunscreen'] + 1)  # Add 1 to avoid log(0)

# Plot on right y-axis
ax2 = ax1.twinx()
ax2.plot(df_merged['date'], sunscreen_log, color='tab:orange', marker='s', label='Sunscreen (log-scaled)')
ax2.set_ylabel('Log of Sunscreen Trend', color='tab:orange')
ax2.tick_params(axis='y', labelcolor='tab:orange')

# Title and grid
plt.title('Temperature vs Google Trends for Sunscreen (Log-Transformed)')
ax1.set_xlabel('Date')
ax1.grid(True)
fig.tight_layout()

plt.show()

The above plot shows that there is close relation between the Google trend for the search “sunscreen” and temperature in California.

Step 8: Cross-Correlation Analysis (CCF)#

We standardize both variables and compute the cross-correlation function (CCF) over lags from -12 to +12 months. This helps identify if temperature changes lead or lag public interest in sunscreen.

import numpy as np

t2m = (df_merged['t2m'] - df_merged['t2m'].mean()) / df_merged['t2m'].std()
sunscreen = (df_merged['sunscreen'] - df_merged['sunscreen'].mean()) / df_merged['sunscreen'].std()

# Full cross-correlation
corr = np.correlate(sunscreen - sunscreen.mean(), t2m - t2m.mean(), mode='full')
lags = np.arange(-len(sunscreen)+1, len(sunscreen))
corr = corr / (len(sunscreen) * sunscreen.std() * t2m.std())  # Normalize

lag_limit = 12
mask = (lags >= -lag_limit) & (lags <= lag_limit)
lags_limited = lags[mask]
corr_limited = corr[mask]

# Plot
plt.figure(figsize=(10, 5))
plt.stem(lags_limited, corr_limited)
plt.xlabel('Lag (months)')
plt.ylabel('Cross-correlation')
plt.title('CCF: Sunscreen vs Temperature')
plt.axvline(0, color='gray', linestyle='--')
plt.grid(True)
plt.tight_layout()
plt.show()

The cross-correlation plot (CCF) shows that public interest in sunscreen tends to peak about one month before surface temperatures reach their maximum.
This suggests that people anticipate the heat and start searching or preparing for summer in advance (a month before the peak to be exact).

Step 9: Load Raw ERA5 Temperature Data (Hourly, Grid-based)#

We load the raw gridded data (hourly values at individual lat-long points) extracted from the temp_california_2004_2024_raw_data.csv file for spatial analysis.

raw_df = pd.read_csv("temp_california_2004_2024_output/temp_california_2004_2024_raw_data.csv")

Step 10: Filter June and July#

To study spatial temperature distribution during peak sunscreen usage months, we isolate June and July data from the full hourly dataset.

raw_df['date'] = pd.to_datetime(raw_df['date'])

# Filter June (6) and July (7)
raw_df_summer = raw_df[raw_df['date'].dt.month.isin([6, 7])]

Step 11: Calculate Mean Temperature at Each Location#

We group the summer data by latitude and longitude to compute the average temperature for June and July across all years.

raw_df_avg = raw_df_summer.groupby(['latitude', 'longitude'])['t2m'].mean().reset_index()

Step 12: Visualize Spatial Distribution of Summer Temperatures#

We create a heatmap of average June–July temperature (t2m) across California, revealing spatial patterns in heat intensity that may affect sunscreen demand.

import seaborn as sns
import matplotlib.pyplot as plt

setup_matplotlib()

# Pivot data for heatmap
heatmap_data = raw_df_avg.pivot(index='latitude', columns='longitude', values='t2m')

# Plot heatmap
plt.figure(figsize=(6, 5))
ax = sns.heatmap(
    heatmap_data,
    cmap='coolwarm',
    cbar_kws={'label': 'Avg Temperature (K)'}
)

# Format tick labels to 3 decimal places
ax.set_xticklabels([f"{float(label.get_text()):.3f}" for label in ax.get_xticklabels()])
ax.set_yticklabels([f"{float(label.get_text()):.3f}" for label in ax.get_yticklabels()])

plt.title('Average Temperature (t2m) for June-July Across Years')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()