Comparison of ERA5 and IMD Daily Temperature Extremes Across Maharashtra (2024)

Comparison of ERA5 and IMD Daily Temperature Extremes Across Maharashtra (2024)#

In this notebook, we analyze and compare daily temperature extremes patterns across Maharashtra using two key datasets:

ERA5: Climate reanalysis data from ECMWF
IMD: Gridded daily temperature extremes from the Indian Meteorological Department

We focus on:

Harmonizing ERA5 and IMD temperature data
Evaluating the correlation between ERA5 and IMD temperature extremes at each grid point
Visualizing the spatial pattern of Pearson correlations to assess agreement between datasets

We use:

varunayan for extracting ERA5 hourly minimum and maximum temperatures
imdlib for accessing IMD daily tmin and tmax data

This study provides insights into how well ERA5 captures regional temperature extremes in Maharashtra compared to IMD observations, which could be important for validating ERA5 use in climate research and studies for Maharashtra.

Step 1: Extract ERA5 Temperature Extremes Data for Maharashtra#

We use varunayan to download hourly minimum and maximum temperature (mn2t and mx2t) from the ERA5 climate reanalysis dataset for the Maharashtra region, covering the year 2024.

North: 21.5°, South: 15.5°, East: 80.5°, West: 72.5°
Resolution: 1°
Frequency: hourly
Units: Kelvin

We will be using the raw data acquired using varunayan for this analysis.

import varunayan

varunayan.era5ify_bbox(
    request_id='min_max_temp_maha_2020',
    variables=['maximum_2m_temperature_since_previous_post_processing', 'minimum_2m_temperature_since_previous_post_processing'],
    start_date='2020-1-1',
    end_date='2020-12-31',
    north=21.5,
    south=15.5,
    east=80.5,
    west=72.5,
    resolution=1,
)

============================================================
STARTING ERA5 SINGLE LEVEL PROCESSING
============================================================
Request ID: min_max_temp_maha_2020
Variables: ['maximum_2m_temperature_since_previous_post_processing', 'minimum_2m_temperature_since_previous_post_processing']
Date Range: 2020-01-01 to 2020-12-31
Frequency: hourly
Resolution: 1°

Saving files to output directory: min_max_temp_maha_2020_output
  Saved final data to: min_max_temp_maha_2020_output\min_max_temp_maha_2020_hourly_data.csv
  Saved unique coordinates to: min_max_temp_maha_2020_output\min_max_temp_maha_2020_unique_latlongs.csv
  Saved raw data to: min_max_temp_maha_2020_output\min_max_temp_maha_2020_raw_data.csv

============================================================
PROCESSING COMPLETE
============================================================

RESULTS SUMMARY:
----------------------------------------
Variables processed: 2
Time period:         2020-01-01 to 2020-12-31
Final output shape:  (8784, 7)
Total complete processing time: 2865.08 seconds

First 5 rows of aggregated data:
         mx2t        mn2t        date  year  month  day  hour
0  299.386475  286.384277  2020-01-01  2020      1    1     0
1  299.424072  286.346191  2020-01-01  2020      1    1     1
2  299.484619  286.520264  2020-01-01  2020      1    1     2
3  299.594971  286.739990  2020-01-01  2020      1    1     3
4  299.720703  288.402100  2020-01-01  2020      1    1     4

============================================================
ERA5 SINGLE LEVEL PROCESSING COMPLETED SUCCESSFULLY
============================================================

	mx2t	mn2t	date	year	month	day	hour
0	299.386475	286.384277	2020-01-01	2020	1	1	0
1	299.424072	286.346191	2020-01-01	2020	1	1	1
2	299.484619	286.520264	2020-01-01	2020	1	1	2
3	299.594971	286.739990	2020-01-01	2020	1	1	3
4	299.720703	288.402100	2020-01-01	2020	1	1	4
...	...	...	...	...	...	...	...
8779	301.058594	288.187988	2020-12-31	2020	12	31	19
8780	301.026367	287.927979	2020-12-31	2020	12	31	20
8781	300.906494	287.406494	2020-12-31	2020	12	31	21
8782	300.800781	286.872314	2020-12-31	2020	12	31	22
8783	300.804199	286.522705	2020-12-31	2020	12	31	23

8784 rows × 7 columns

Step 2: Process ERA5 Temperature Extremes Data#

We process the downloaded ERA5 dataset to compute daily temperature extremes in Celsius.

Key steps:#

Convert from Kelvin to Celsius
- Subtract 273.15 from mx2t and mn2t values
- Final unit: °C
Aggregate to Daily Extremes (Per Grid Point)
- Compute daily maximum (mx2t) and minimum (mn2t) temperatures for each latitude–longitude point

The result is a dataframe (daily_era5_data) with the columns:

latitude, longitude, date, mx2t and mn2t

import pandas as pd

df_raw_temp = pd.read_csv('min_max_temp_maha_2020_output/min_max_temp_maha_2020_raw_data.csv')

df_raw_temp['date'] = pd.to_datetime(df_raw_temp['date'])
df_raw_temp['mx2t'] = df_raw_temp['mx2t'] - 273.15
df_raw_temp['mn2t'] = df_raw_temp['mn2t'] - 273.15

df_daily_era = (
    df_raw_temp.groupby(["latitude", "longitude", "date"])
    .agg({
        "mx2t": "max",
        "mn2t": "min",
    })
    .reset_index()
)

Step 3: Extract IMD Gridded Temperature Data#

We use the imdlib package to download daily gridded IMD temperature extremes data over Maharashtra.

Variables: 'tmax' and 'tmin'
Grid resolution: 1°
Unit: °C

This dataset represents observed temperature extremes, derived from IMD’s high-quality station network and gridded for regional analysis. While the 1° resolution provides a broad spatial overview, it limits finer scale temperature analysis, a constraint in IMD’s open temperature dataset.

import imdlib as imd

start_yr = 2020
variable = 'tmax'
imd.get_data(variable, start_yr, fn_format='monthwise')

import numpy as np

data = imd.open_data(variable, start_yr, fn_format='monthwise')
ds = data.get_xarray()
df = ds.to_dataframe().reset_index()
df['tmax'] = df['tmax'].replace(-999.0, np.nan)

df_bbox_max = df[
    (df['lat'] >= 15) & (df['lat'] <= 22) &
    (df['lon'] >= 72) & (df['lon'] <= 81)
]

df_bbox_max.reset_index(drop=True, inplace=True)

df_bbox_max

Downloading: maxtemp for year 2020
Download Successful !!!

	time	lat	lon	tmax
0	2020-01-01	15.5	72.5	99.900002
1	2020-01-01	15.5	73.5	31.408098
2	2020-01-01	15.5	74.5	31.509253
3	2020-01-01	15.5	75.5	31.085457
4	2020-01-01	15.5	76.5	30.606255
...	...	...	...	...
23053	2020-12-31	21.5	76.5	28.515602
23054	2020-12-31	21.5	77.5	27.787882
23055	2020-12-31	21.5	78.5	26.746124
23056	2020-12-31	21.5	79.5	26.715961
23057	2020-12-31	21.5	80.5	27.381311

23058 rows × 4 columns

import imdlib as imd

start_yr = 2020
variable = 'tmin'
imd.get_data(variable, start_yr, fn_format='monthwise')

import numpy as np

data = imd.open_data(variable, start_yr, fn_format='monthwise')
ds = data.get_xarray()
df = ds.to_dataframe().reset_index()
df['tmin'] = df['tmin'].replace(-999.0, np.nan)

df_bbox_min = df[
    (df['lat'] >= 15) & (df['lat'] <= 22) &
    (df['lon'] >= 72) & (df['lon'] <= 81)
]

df_bbox_min.reset_index(drop=True, inplace=True)

df_bbox_min

Downloading: mintemp for year 2020
Download Successful !!!

	time	lat	lon	tmin
0	2020-01-01	15.5	72.5	99.900002
1	2020-01-01	15.5	73.5	17.982029
2	2020-01-01	15.5	74.5	18.658150
3	2020-01-01	15.5	75.5	17.838404
4	2020-01-01	15.5	76.5	18.578913
...	...	...	...	...
23053	2020-12-31	21.5	76.5	12.886724
23054	2020-12-31	21.5	77.5	12.003990
23055	2020-12-31	21.5	78.5	11.423929
23056	2020-12-31	21.5	79.5	10.512906
23057	2020-12-31	21.5	80.5	11.225183

23058 rows × 4 columns

Step 4: Merge ERA5 and IMD DataFrames and Compute Correlations#

In this step, we merge the three datasets:

ERA5 daily temperature extremes
IMD daily minimum temperature (tmin)
IMD daily maximum temperature (tmax)

The merge is performed on the shared columns: date, latitude, and longitude.

After merging, we compute Pearson correlation coefficients between ERA5 and IMD temperature extremes to assess their agreement at each grid point.

df_bbox_max = df_bbox_max.rename(columns={'time': 'date', 'lat': 'latitude', 'lon': 'longitude'})
df_bbox_min = df_bbox_min.rename(columns={'time': 'date', 'lat': 'latitude', 'lon': 'longitude'})

# First merge max and min
df_bbox = pd.merge(df_bbox_max, df_bbox_min, on=['latitude', 'longitude', 'date'])

# Then merge with ERA5 data
df_merged = pd.merge(df_bbox, df_daily_era, on=['latitude', 'longitude', 'date'])

df_clean = df_merged[
    (df_merged['tmax'] != 99.900002) & 
    (df_merged['tmin'] != 99.900002)
].copy()

from scipy.stats import pearsonr

def compute_corr(group):
    tmax_corr = pearsonr(group['tmax'], group['mx2t'])[0]
    tmin_corr = pearsonr(group['tmin'], group['mn2t'])[0]
    return pd.Series({'tmax_corr': tmax_corr, 'tmin_corr': tmin_corr})

correlations = df_clean.groupby(['latitude', 'longitude']).apply(compute_corr).reset_index()

C:\Users\Atharva Jagtap\AppData\Local\Temp\ipykernel_7744\3252918934.py:8: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
  correlations = df_clean.groupby(['latitude', 'longitude']).apply(compute_corr).reset_index()

Step 5: Visualize Grid Point Correlations with Maharashtra Block Boundaries#

In this step, we create spatial visualizations of the Pearson correlation coefficients between ERA5 and IMD temperature extremes, computed at each grid point.

Plot Details:

Each grid point is color-coded based on the correlation value (using a colormap)
Separate plots may be generated for tmax and tmin correlations
The underlying map includes Maharashtra’s block-level administrative boundaries, overlaid using a GeoJSON file

This visualization helps assess the spatial consistency of ERA5 with IMD across different parts of Maharashtra and identifies regions with high or low agreement between datasets.

import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np

correlations[['tmax_corr', 'tmin_corr']] = correlations[['tmax_corr', 'tmin_corr']].fillna(0)

# Create GeoDataFrame
gdf = gpd.GeoDataFrame(
    correlations,
    geometry=gpd.points_from_xy(correlations['longitude'], correlations['latitude']),
    crs="EPSG:4326"
)

# Load Maharashtra blocks GeoJSON
maha = gpd.read_file("https://maharashtra-blocks-geojson.netlify.app/maharashtra_blocks.geojson")

# Shared color limits
vmin = min(gdf[['tmax_corr', 'tmin_corr']].min())
vmax = max(gdf[['tmax_corr', 'tmin_corr']].max())

# Plot
fig, ax = plt.subplots(1, 2, figsize=(14, 7), constrained_layout=True)

# Plot tmax_corr
maha.boundary.plot(ax=ax[0], color='black', linewidth=0.5)
sc1 = gdf.plot(
    ax=ax[0], column='tmax_corr', cmap='viridis', markersize=50, 
    vmin=vmin, vmax=vmax, legend=True, legend_kwds={'label': 'Pearson R'}
)
ax[0].set_title('tmax (from IMD) vs mx2t (from ERA5)')
ax[0].set_xlabel('Longitude')
ax[0].set_ylabel('Latitude')

# Plot tmin_corr
maha.boundary.plot(ax=ax[1], color='black', linewidth=0.5)
sc2 = gdf.plot(
    ax=ax[1], column='tmin_corr', cmap='viridis', markersize=50, 
    vmin=vmin, vmax=vmax, legend=True, legend_kwds={'label': 'Pearson R'}
)
ax[1].set_title('tmin (from IMD) vs mn2t (from ERA5)')
ax[1].set_xlabel('Longitude')
ax[1].set_ylabel('Latitude')

# Consistent aspect ratio
for a in ax:
    a.set_aspect('equal')

plt.show()

../_images/7ac23d986cc067a2e23697cbc43b663ac1188e22cee98d6607f11f9fe422166a.png

As we can see in the plot above, the correlation between the two datasets is strong over the mainland and weak over the Arabian Sea.

correlations.to_csv('correlation_maha_2020_tmax_tmin.csv', index=False)