varunayan.processing.data_aggregator module¶

varunayan.processing.data_aggregator.set_v_data_agg(verbosity: int) → None[source]¶

varunayan.processing.data_aggregator.aggregate_by_frequency(df: DataFrame, frequency: str, keep_original_time: bool = False, dist_features: List[str] | None = None) → tuple[DataFrame, DataFrame][source]¶

Aggregate ERA5 data by the specified frequency for multiple points within a polygon.

The function first aggregates data spatially (across points) for each timestamp, then performs temporal aggregation based on the specified frequency.

If a ‘feature’ column is present, aggregation is performed separately for each feature.

Parameters:

df – DataFrame containing ERA5 data with latitude/longitude points and timestamps
frequency – One of ‘hourly’, ‘daily’, ‘weekly’, ‘monthly’, ‘yearly’
keep_original_time – Whether to keep the original valid_time column (default: False)

Returns:

Tuple of (aggregated DataFrame, unique lat/lon DataFrame)

varunayan.processing.data_aggregator.aggregate_pressure_levels(df: DataFrame, frequency: str = 'hourly', keep_original_time: bool = False, dist_features: List[str] | None = None) → tuple[DataFrame, DataFrame][source]¶

Aggregate ERA5 pressure level data by the specified frequency.

All variables are aggregated using mean values (both spatially and temporally).

If a ‘feature’ column is present, aggregation is performed separately for each feature.

Parameters:

df – DataFrame containing ERA5 pressure level data with columns: - latitude, longitude: Spatial coordinates - valid_time or time: Timestamps - pressure_level: Pressure level in hPa - feature (optional): Feature identifier - Other columns: Meteorological variables
frequency – One of ‘hourly’, ‘daily’, ‘weekly’, ‘monthly’, ‘yearly’
keep_original_time – Whether to keep the original time column

Returns:

Tuple of (aggregated DataFrame, unique lat/lon DataFrame)