API Reference¶
This section provides detailed documentation for all vayuayan classes and functions.
Core Classes¶
CPCBHistorical¶
- class vayuayan.CPCBHistorical[source]
Bases:
object
Client for fetching historical Air Quality Index (AQI) data from CPCB.
- get_complete_list() Dict[str, Any] [source]
Fetch the complete list of all India stations and cities.
- Returns:
Dictionary containing station and city data.
- Raises:
requests.RequestException – If the HTTP request fails.
json.JSONDecodeError – If response cannot be parsed as JSON.
- get_state_list() List[str] [source]
Get list of states available for AQI data.
- Returns:
Sorted list of state names.
- get_city_list(state: str) List[str] [source]
Get list of cities available in given state for AQI data.
- Parameters:
state – State name to get cities for.
- Returns:
Sorted list of city names in the state.
- get_station_list(city: str) List[Dict] [source]
Get station list available in given city for AQI data.
- Parameters:
city – City name to get stations for.
- Returns:
Sorted list of station dictionaries.
- get_file_path(station_id: str, station_name: str, state: str, city: str, year: str, frequency: str, data_type: str) List[Dict[str, Any]] [source]
Get file path containing data for given query parameters.
- Parameters:
station_id – Station ID.
station_name – Station name.
state – State name.
city – City name.
year – Year for data.
frequency – Data frequency (‘hourly’ or ‘daily’).
data_type – Type of data (‘cityLevel’ or ‘stationLevel’).
- Returns:
Dictionary containing file path data.
- Raises:
requests.RequestException – If the HTTP request fails.
- download_past_year_aqi_data_city_level(city: str, year: str, save_location: str) DataFrame [source]
Download past AQI data for a specific city.
- Parameters:
city – City name.
year – Year for data.
save_location – Path to save the downloaded data.
- Returns:
DataFrame preview of the downloaded data.
- Raises:
Exception – If data is not found or download fails.
- download_past_year_aqi_data_station_level(station_id: str, year: str, save_location: str) DataFrame [source]
Download past AQI data for a specific station.
- Parameters:
station_id – Station ID.
year – Year for data.
save_location – Path to save the downloaded data.
- Returns:
DataFrame preview of the downloaded data.
- Raises:
Exception – If station or data is not found.
CPCBLive¶
- class vayuayan.CPCBLive[source]
Bases:
object
Client for fetching live air quality data from CPCB.
- get_system_location() Tuple[float, float] [source]
Retrieve system’s geolocation using IP-based lookup.
- Returns:
Tuple of (latitude, longitude).
- Raises:
Exception – If geolocation lookup fails.
- get_nearest_station(coords: Tuple[float, float] | None = None) Tuple[str, str] [source]
Get the nearest air quality monitoring station.
- Parameters:
coords – Optional tuple of (latitude, longitude). If None, uses IP geolocation.
- Returns:
Tuple of (station_id, station_name).
- Raises:
Exception – If no stations found or coordinates invalid.
- get_all_india() List[Dict[str, Any]] [source]
Get all air quality monitoring stations in India.
- Returns:
List of station dictionaries.
- get_live_aqi_data_for_station(station_id: str, date_time: str) Dict[str, Any] [source]
Get live air quality data for a specific station.
- Parameters:
station_id – Station ID.
date_time – Date and time in ‘YYYY-MM-DDTHH:00:00Z’ format.
- Returns:
Live air quality data dictionary.
- Raises:
ValueError – If parameters are invalid.
Exception – If request fails.
- get_live_aqi_data(station_id: str | None = None, coords: Tuple[float, float] | None = None, date: str | None = None, hour: int | None = None) Dict[str, Any] [source]
Get live AQI data with flexible parameter options.
- Parameters:
station_id – Optional station ID. If not provided, uses nearest station.
coords – Optional (latitude, longitude) tuple.
date – Optional date in ‘YYYY-MM-DD’ format. Defaults to today.
hour – Optional hour (0-23). Defaults to current hour.
- Returns:
Processed live AQI data dictionary.
- Raises:
ValueError – If hour is invalid.
Exception – If data retrieval fails.
PM25Client¶
- class vayuayan.PM25Client(cache_dir: str = 'pm25_data')[source]
Bases:
object
Client for processing PM2.5 satellite data from NetCDF files.
- __init__(cache_dir: str = 'pm25_data') None [source]
Initialize the PM2.5 Client with data paths and AWS configuration.
- Parameters:
cache_dir – Directory to cache downloaded NetCDF files.
- get_netcdf_path(year: int, month: int | None = None) str [source]
Get NetCDF file path for given year and optional month.
- Parameters:
year – Year for data.
month – Optional month (1-12). If None, returns annual data path.
- Returns:
Path to NetCDF file (cached locally).
- download_netcdf_if_needed(year: int, month: int | None = None, force_download: bool = False) str [source]
Download NetCDF file from AWS if not already cached.
- Parameters:
year – Year for data.
month – Optional month (1-12). If None, downloads annual data.
force_download – Whether to re-download even if file exists.
- Returns:
Path to the downloaded NetCDF file.
- Raises:
requests.RequestException – If download fails.
IOError – If file cannot be written.
- get_pm25_stats(geojson_file: str, year: int, month: int | None = None, group_by: str | None = None) Dict[str, float] | DataFrame [source]
Compute PM2.5 statistics inside a polygon region from GeoJSON.
This function automatically downloads the required NetCDF data from AWS if not cached locally.
- Parameters:
geojson_file – Path to GeoJSON file with polygon.
year – Year of the NetCDF data.
month – Optional month of the NetCDF data.
group_by – Optional column name(s) to group polygons by. Can be a single column (e.g., ‘state_name’) or comma-separated multiple columns (e.g., ‘state_name,district_name’). If None, aggregates entire polygon boundary. If specified, aggregates by unique combinations of values.
- Returns:
Dictionary with mean, std, min, and max PM2.5 values. If group_by is specified: DataFrame with statistics for each group.
- Return type:
If group_by is None
- Raises:
FileNotFoundError – If GeoJSON file not found.
requests.RequestException – If NetCDF download fails.
ValueError – If group_by column not found in GeoJSON.
- get_pm25_stats_by_polygon(geojson_file: str, year: int, month: int | None = None, id_field: str | None = None) DataFrame [source]
Compute PM2.5 statistics for each polygon in GeoJSON file.
- Parameters:
geojson_file – Path to GeoJSON file with polygons.
year – Year of the NetCDF data.
month – Optional month of the NetCDF data.
id_field – Optional field in GeoJSON properties to use as identifier.
- Returns:
DataFrame with statistics for each polygon.
- Raises:
FileNotFoundError – If NetCDF or GeoJSON file not found.
This function automatically downloads the required NetCDF data from AWS if not cached locally.
CPCBClient¶
- class vayuayan.CPCBClient(use_test_endpoint: bool = True)[source]
Bases:
object
Main client for fetching CPCB air quality data.
- __init__(use_test_endpoint: bool = True) None [source]
Initialize the CPCB Client.
- Parameters:
use_test_endpoint – Whether to use the test endpoint (unused, kept for compatibility).
- list_stations(as_dataframe: bool = False) List[Dict] | DataFrame [source]
Get list of all available air quality monitoring stations.
- Parameters:
as_dataframe – Whether to return data as pandas DataFrame.
- Returns:
List of station dictionaries or DataFrame if as_dataframe=True.
- Raises:
CPCBError – If failed to fetch station data.
- download_raw_data(url: str | None = None, site_id: str | None = None, station_name: str | None = None, time_period: str | None = '15Min', year: str | None = None, output_dir: str = 'downloads', filename: str | None = None, return_dataframe: bool = False, verbose: bool = False) str | DataFrame | None [source]
Download CSV file from CPCB data repository.
- Parameters:
url – Direct URL to download from. If provided, other parameters are ignored.
site_id – Station site ID (required if url not provided).
station_name – Station name (required if url not provided).
time_period – Time period for data (required if url not provided).
year – Year for data (required if url not provided).
output_dir – Directory to save downloaded file.
filename – Custom filename (optional, auto-generated if not provided).
return_dataframe – Whether to return pandas DataFrame instead of a file path.
verbose – Whether to print status messages.
- Returns:
Path to downloaded file, DataFrame, or None if download fails.
- Raises:
CPCBError – If required parameters are missing or download fails.
NetworkError – If network request fails.
Examples
>>> client = CPCBClient() >>> # Download using direct URL >>> path = client.download_raw_data( ... url="https://airquality.cpcb.gov.in/.../Delhi_Punjabi_Bagh_2024.csv" ... ) >>> # Download using parameters >>> df = client.download_raw_data( ... site_id="DL001", ... station_name="Punjabi_Bagh", ... time_period="2024", ... year="2024", ... return_dataframe=True ... )
- get_nearest_station(lat: float, lon: float, return_distance: bool = False) str | Tuple[str, float] [source]
Find the nearest station to given coordinates using optimized algorithms.
- Parameters:
lat – Target latitude.
lon – Target longitude.
return_distance – Whether to return distance along with station ID.
- Returns:
Station ID of nearest station, or tuple of (station_id, distance) if return_distance is True.
- Raises:
CPCBError – If failed to fetch station data or no stations available.
- get_k_nearest_stations(lat: float, lon: float, k: int = 5) List[Tuple[Dict, float]] [source]
Find the k nearest stations to given coordinates.
- Parameters:
lat – Target latitude.
lon – Target longitude.
k – Number of nearest stations to return.
- Returns:
[(station_info, distance), …] sorted by distance. Each station_info dict contains: id, name, latitude, longitude, live, avg, cityID, and stateID.
- Return type:
List of tuples
- Raises:
CPCBError – If failed to fetch station data or no stations available.
- get_nearest_station_within_radius(lat: float, lon: float, max_distance_km: float = 100) Tuple[str, float] | None [source]
Find nearest station within a specified radius.
- Parameters:
lat – Target latitude.
lon – Target longitude.
max_distance_km – Maximum search radius in kilometers.
- Returns:
Tuple of (station_id, distance) or None if no station found within radius.
- Raises:
CPCBError – If failed to fetch station data.
Utility Functions¶
Utility functions for CPCB data fetching and processing.
This module provides utilities for data cleaning, network requests, date parsing, station data conversion, and analysis functions.
- exception vayuayan.utils.DataProcessingError[source]
Custom exception for data processing errors.
- vayuayan.utils.clean_station_name(station_name: str) str [source]
Convert station name to clean underscore-separated format.
Rules applied: 1. Remove/replace special characters and punctuation 2. Replace spaces with underscores 3. Remove multiple consecutive underscores 4. Remove leading/trailing underscores 5. Handle common patterns like “City - Organization”
- Parameters:
station_name – Original station name string.
- Returns:
Cleaned station name with underscores.
Examples
>>> clean_station_name("Dr. Karni Singh Shooting Range, Delhi - DPCC") 'Dr_Karni_Singh_Shooting_Range_Delhi_DPCC' >>> clean_station_name("ITO, Delhi - DPCC") 'ITO_Delhi_DPCC'
- vayuayan.utils.sort_station_data(data: List[Dict]) List[Dict] [source]
Sort station data by live status and city name.
For each city, stations are sorted by: 1. Live status (live stations first: True before False) 2. City name (alphabetically)
- Parameters:
data – List of cities with nested stations from CPCB API.
- Returns:
Sorted list with the same structure but ordered by live status and city name.
- vayuayan.utils.get_aqi_category(aqi_value: float) str [source]
Convert AQI numeric value to category.
- Parameters:
aqi_value – Numeric AQI value.
- Returns:
AQI category string.
- vayuayan.utils.stations_to_dataframe(data: List[Dict]) DataFrame [source]
Convert nested station data to a flat DataFrame.
- Parameters:
data – List of cities with nested stations from CPCB API.
- Returns:
- city_name, city_id, state_id, station_id,
station_name, longitude, latitude, live, avg_aqi.
- Return type:
DataFrame with columns
- vayuayan.utils.stations_to_city_summary(data: List[Dict]) DataFrame [source]
Convert station data to city-level summary DataFrame.
- Parameters:
data – List of cities with nested stations.
- Returns:
DataFrame with city-level aggregated statistics.
- vayuayan.utils.stations_to_coordinates_dataframe(data: List[Dict]) DataFrame [source]
Convert station data to DataFrame optimized for mapping.
- Parameters:
data – List of cities with nested stations.
- Returns:
DataFrame with geographic information and essential station details.
- vayuayan.utils.convert_station_data_to_dataframe(data: List[Dict], method: str = 'stations') DataFrame [source]
Main conversion function with multiple output formats.
- Parameters:
data – List of cities with nested stations from CPCB API.
method – Conversion method (‘stations’, ‘city_summary’, ‘coordinates’).
- Returns:
Converted DataFrame based on specified method.
- Available methods:
‘stations’: Flat DataFrame with one row per station (default)
‘city_summary’: City-level summary with aggregated statistics
‘coordinates’: Optimized for mapping with geographic data
- vayuayan.utils.analyze_station_data(data: List[Dict]) Dict[str, Any] [source]
Comprehensive analysis of station data.
- Parameters:
data – List of cities with nested stations.
- Returns:
Dictionary with analysis results.
- vayuayan.utils.safe_get(url: str, max_retries: int = 3, timeout: int = 10, verify_ssl: bool = True, allow_ssl_fallback: bool = False, verbose: bool = False) Response [source]
Make HTTP GET request with retry logic.
- Parameters:
url – URL to fetch.
max_retries – Maximum retry attempts.
timeout – Request timeout.
verify_ssl – Whether to verify SSL certificates.
allow_ssl_fallback – Whether to allow fallback to unverified SSL if verification fails.
verbose – Whether to print status messages.
- Returns:
requests.Response object.
- Raises:
NetworkError – If request fails after all retries.
- vayuayan.utils.safe_post(url: str, headers: Dict[str, str], data: Dict[str, Any] | str | bytes, cookies: Dict[str, str] | None = None, max_retries: int = 3, backoff_factor: float = 0.3, timeout: int = 30, verify_ssl: bool = True, allow_ssl_fallback: bool = False, verbose: bool = False) Dict[str, Any] [source]
Make robust POST request with retry logic and base64 decoding.
- Parameters:
url – URL to send POST request to.
headers – Request headers.
data – Request data (dict, string, or bytes).
cookies – Optional cookies dict.
max_retries – Maximum number of retry attempts.
backoff_factor – Backoff factor for exponential retry delay.
timeout – Request timeout in seconds.
verify_ssl – Whether to verify SSL certificates.
allow_ssl_fallback – Whether to allow fallback to unverified SSL if verification fails.
verbose – Whether to print status messages.
- Returns:
Parsed JSON response as dictionary.
- Raises:
NetworkError – If all retries failed or network issues.
DataProcessingError – If base64 decoding or JSON parsing fails.
ValueError – If input parameters are invalid.
- vayuayan.utils.url_encode(data_dict: Dict[str, Any]) str [source]
Encode dictionary as base64 JSON string.
- Parameters:
data_dict – Dictionary to encode.
- Returns:
Base64 encoded JSON string.
- vayuayan.utils.time_to_isodate(timestamp: int) str [source]
Convert timestamp to ISO date format.
- Parameters:
timestamp – Unix timestamp in milliseconds.
- Returns:
ISO formatted date string.
- vayuayan.utils.haversine_distance(lat1: float, lon1: float, lat2: float, lon2: float) float [source]
Calculate great circle distance between two points using Haversine formula.
More accurate than Euclidean distance for geographical coordinates.
- Parameters:
lat1 – Latitude and longitude of first point.
lon1 – Latitude and longitude of first point.
lat2 – Latitude and longitude of second point.
lon2 – Latitude and longitude of second point.
- Returns:
Distance in kilometers.
- vayuayan.utils.euclidean_distance(lat1: float, lon1: float, lat2: float, lon2: float) float [source]
Calculate simple Euclidean distance.
Faster but less accurate for long distances than haversine_distance.
- Parameters:
lat1 – Latitude and longitude of first point.
lon1 – Latitude and longitude of first point.
lat2 – Latitude and longitude of second point.
lon2 – Latitude and longitude of second point.
- Returns:
Euclidean distance (arbitrary units).
- vayuayan.utils.parse_date(date_text: str) str | None [source]
Parse various date formats to standardized format.
- Parameters:
date_text – Raw date text.
- Returns:
Standardized date in YYYY-MM-DD format, or None if parsing fails.
Constants¶
Constants for CPCB data fetching.
This module contains URL endpoints, headers, timeouts, and other configuration constants used throughout the vayuayan package.
Exceptions¶
Custom exceptions for the CPCB data fetching package.
This module defines exception classes used throughout the vayuayan package to provide specific error handling for different types of failures.
- exception vayuayan.exceptions.CPCBError[source]
Bases:
Exception
Base exception for all CPCB related errors.
This is the base class for all custom exceptions in the vayuayan package. All other exceptions inherit from this class.
- exception vayuayan.exceptions.NetworkError[source]
Bases:
CPCBError
Raised when network requests fail.
This exception is raised when HTTP requests to CPCB services fail due to network issues, timeouts, or server errors.
- exception vayuayan.exceptions.DataParsingError[source]
Bases:
CPCBError
Raised when data parsing fails.
This exception is raised when received data cannot be parsed or processed, such as malformed JSON, corrupt Excel files, or unexpected data formats.
- exception vayuayan.exceptions.DataProcessingError[source]
Bases:
CPCBError
Raised when data processing operations fail.
This exception is raised when operations like base64 decoding, data transformation, or statistical calculations fail.
- exception vayuayan.exceptions.CityNotFoundError[source]
Bases:
CPCBError
Raised when a city is not found in the CPCB database.
This exception is raised when a requested city or location is not available in the CPCB monitoring network.
- exception vayuayan.exceptions.StationNotFoundError[source]
Bases:
CPCBError
Raised when a monitoring station is not found.
This exception is raised when a requested station ID or station name is not available in the CPCB monitoring network.
- exception vayuayan.exceptions.InvalidDataError[source]
Bases:
CPCBError
Raised when received data is invalid or corrupted.
This exception is raised when data validation fails, such as missing required fields, invalid coordinates, or data outside expected ranges.
- exception vayuayan.exceptions.AuthenticationError[source]
Bases:
CPCBError
Raised when authentication with CPCB services fails.
This exception is raised when API authentication fails or access is denied to CPCB services.
- exception vayuayan.exceptions.RateLimitError[source]
Bases:
CPCBError
Raised when API rate limits are exceeded.
This exception is raised when too many requests are made to CPCB services in a short period of time.