varunayan.processing package¶
Submodules¶
Module contents¶
- varunayan.processing.aggregate_by_frequency(df: DataFrame, frequency: str, keep_original_time: bool = False, dist_features: List[str] | None = None) tuple[DataFrame, DataFrame][source]¶
Aggregate ERA5 data by the specified frequency for multiple points within a polygon.
The function first aggregates data spatially (across points) for each timestamp, then performs temporal aggregation based on the specified frequency.
If a ‘feature’ column is present, aggregation is performed separately for each feature.
- Parameters:
df – DataFrame containing ERA5 data with latitude/longitude points and timestamps
frequency – One of ‘hourly’, ‘daily’, ‘weekly’, ‘monthly’, ‘yearly’
keep_original_time – Whether to keep the original valid_time column (default: False)
- Returns:
Tuple of (aggregated DataFrame, unique lat/lon DataFrame)
- varunayan.processing.aggregate_pressure_levels(df: DataFrame, frequency: str = 'hourly', keep_original_time: bool = False, dist_features: List[str] | None = None) tuple[DataFrame, DataFrame][source]¶
Aggregate ERA5 pressure level data by the specified frequency.
All variables are aggregated using mean values (both spatially and temporally).
If a ‘feature’ column is present, aggregation is performed separately for each feature.
- Parameters:
df – DataFrame containing ERA5 pressure level data with columns: - latitude, longitude: Spatial coordinates - valid_time or time: Timestamps - pressure_level: Pressure level in hPa - feature (optional): Feature identifier - Other columns: Meteorological variables
frequency – One of ‘hourly’, ‘daily’, ‘weekly’, ‘monthly’, ‘yearly’
keep_original_time – Whether to keep the original time column
- Returns:
Tuple of (aggregated DataFrame, unique lat/lon DataFrame)
- varunayan.processing.filter_netcdf_by_shapefile(ds: Dataset, geojson_data: Dict[str, Any] | GeoDataFrame, dist_features: List[str] | None = None) DataFrame[source]¶
Filter a NetCDF dataset to only include grid points that fall within the GeoJSON polygon(s). Internally handles multi-feature GeoJSONs by matching points to each feature individually, then taking the union of all matched points.
- Parameters:
ds – xarray Dataset
geojson_data – Loaded GeoJSON (as dict or GeoDataFrame)
dist_features – List of attribute/property names in the GeoJSON to use as composite feature identifier
- Returns:
A pandas DataFrame with filtered points and feature identification.
- varunayan.processing.get_unique_coordinates_in_polygon(ds: Dataset, geojson_data: Dict[str, Any] | GeoDataFrame) DataFrame[source]¶
Alternative helper function that returns just the unique lat/lon pairs inside the polygon. This can be useful for other operations or caching coordinate filtering results.
- varunayan.processing.extract_download(zip_or_file_path: str, extract_dir: str | None = None) List[str][source]¶
Extract downloaded file. Handles both single NC file and zip files.
- Parameters:
zip_or_file_path – Path to the downloaded file
extract_dir – Directory to extract to (optional)
- Returns:
List of extracted file paths