MapCatalog

MapCatalog#

class MapCatalog(ddf: NestedFrame, ddf_pixel_map: dict[HealpixPixel, int], hc_structure: HealpixDataset, loading_config: HatsLoadingConfig | None = None)[source]#

LSDB DataFrame to contain a continuous map.

Attributes:

all_columns: Returns the names of all columns in the original Dataset.
columns: Returns the names of columns available in the Dataset
dtypes: Returns the pandas datatypes of the columns in the Dataset
name: The name of the catalog
nested_columns: The names of the columns of the catalog that are nested.
npartitions: Returns the number of partitions of the catalog
original_schema: Returns the schema of the original Dataset
partitions: Returns the partitions of the catalog

Methods

`aggregate_column_statistics`([...])	Read footer statistics in parquet metadata, and report on global min/max values.
`box_search`(ra, dec[, fine])	Performs filtering according to right ascension and declination ranges.
`compute`()	Compute dask distributed dataframe to pandas dataframe
`cone_search`(ra, dec, radius_arcsec[, fine])	Perform a cone search to filter the catalog.
`estimate_size`()	Estimate size of catalog.
`get_healpix_pixels`()	Get all HEALPix pixels that are contained in the catalog
`get_ordered_healpix_pixels`()	Get all HEALPix pixels that are contained in the catalog, ordered by breadth-first nested ordering.
`get_partition`(order, pixel)	Get the dask partition for a given HEALPix pixel
`get_partition_index`(order, pixel)	Get the dask partition for a given HEALPix pixel
`head`([n])	Returns a few rows of initial data for previewing purposes.
`map_partitions`(func, *args[, meta, ...])	Applies a function to each partition in the catalog.
`map_rows`(func[, columns, row_container, ...])	Takes a function and applies it to each top-level row of the Catalog.
`moc_search`(moc[, fine])	Finds all catalog points that are contained within a moc.
`nest_lists`([base_columns, list_columns, name])	Creates a new catalog with a set of list columns packed into a nested column.
`order_search`([min_order, max_order])	Filter catalog by order of HEALPix.
`per_pixel_statistics`([use_default_columns, ...])	Read footer statistics in parquet metadata, and report on min/max values for for each data partition.
`pixel_search`(pixels)	Finds all catalog pixels that overlap with the requested pixel set.
`plot_coverage`(**kwargs)	Create a visual map of the coverage of the catalog.
`plot_pixels`([projection])	Create a visual map of the pixel density of the catalog.
`plot_points`(*[, ra_column, dec_column, ...])	Plots the points in the catalog as a scatter plot
`polygon_search`(vertices[, fine])	Perform a polygonal search to filter the catalog.
`prune_empty_partitions`([persist])	Prunes the catalog of its empty partitions
`query`(expr)	Filters catalog using a complex query expression.
`random_sample`([n, seed])	Returns a few randomly sampled rows, like self.sample(), except that it randomly samples all partitions in order to fulfill the rows.
`rename`(columns)	Renames catalog columns (not indices) using a dictionary or function mapping.
`sample`(partition_id[, n, seed])	Returns a few randomly sampled rows from a given partition.
`search`(search)	Find rows by reusable search algorithm.
`tail`([n])	Returns a few rows of data from the end of the catalog for previewing purposes.
`to_dask_dataframe`()	Convert the dataset to a Dask DataFrame.
`to_delayed`([optimize_graph])	Get a list of Dask Delayed objects for each partition in the dataset
`to_hats`(base_catalog_path, *[, ...])	Save the catalog to disk in the HATS format.
`write_catalog`(base_catalog_path, *[, ...])	Save the catalog to disk in HATS format.

__init__(ddf: NestedFrame, ddf_pixel_map: dict[HealpixPixel, int], hc_structure: HealpixDataset, loading_config: HatsLoadingConfig | None = None)#

Initialise a Catalog object.

Not to be used to load a catalog directly, use one of the lsdb.from_… or lsdb.open_… methods

Parameters:

ddf: nd.NestedFrame: Dask Nested DataFrame with the source data of the catalog
ddf_pixel_map: DaskDFPixelMap: Dictionary mapping HEALPix order and pixel to partition index of ddf
hc_structure: HCHealpixDataset: Object with hats metadata of the catalog
loading_config: HatsLoadingConfig or None, default None: The configuration used to read the catalog from disk

Methods

`__init__`(ddf, ddf_pixel_map, hc_structure[, ...])	Initialise a Catalog object.
`aggregate_column_statistics`([...])	Read footer statistics in parquet metadata, and report on global min/max values.
`box_search`(ra, dec[, fine])	Performs filtering according to right ascension and declination ranges.
`compute`()	Compute dask distributed dataframe to pandas dataframe
`cone_search`(ra, dec, radius_arcsec[, fine])	Perform a cone search to filter the catalog.
`estimate_size`()	Estimate size of catalog.
`get_healpix_pixels`()	Get all HEALPix pixels that are contained in the catalog
`get_ordered_healpix_pixels`()	Get all HEALPix pixels that are contained in the catalog, ordered by breadth-first nested ordering.
`get_partition`(order, pixel)	Get the dask partition for a given HEALPix pixel
`get_partition_index`(order, pixel)	Get the dask partition for a given HEALPix pixel
`head`([n])	Returns a few rows of initial data for previewing purposes.
`map_partitions`(func, *args[, meta, ...])	Applies a function to each partition in the catalog.
`map_rows`(func[, columns, row_container, ...])	Takes a function and applies it to each top-level row of the Catalog.
`moc_search`(moc[, fine])	Finds all catalog points that are contained within a moc.
`nest_lists`([base_columns, list_columns, name])	Creates a new catalog with a set of list columns packed into a nested column.
`order_search`([min_order, max_order])	Filter catalog by order of HEALPix.
`per_pixel_statistics`([use_default_columns, ...])	Read footer statistics in parquet metadata, and report on min/max values for for each data partition.
`pixel_search`(pixels)	Finds all catalog pixels that overlap with the requested pixel set.
`plot_coverage`(**kwargs)	Create a visual map of the coverage of the catalog.
`plot_pixels`([projection])	Create a visual map of the pixel density of the catalog.
`plot_points`(*[, ra_column, dec_column, ...])	Plots the points in the catalog as a scatter plot
`polygon_search`(vertices[, fine])	Perform a polygonal search to filter the catalog.
`prune_empty_partitions`([persist])	Prunes the catalog of its empty partitions
`query`(expr)	Filters catalog using a complex query expression.
`random_sample`([n, seed])	Returns a few randomly sampled rows, like self.sample(), except that it randomly samples all partitions in order to fulfill the rows.
`rename`(columns)	Renames catalog columns (not indices) using a dictionary or function mapping.
`sample`(partition_id[, n, seed])	Returns a few randomly sampled rows from a given partition.
`search`(search)	Find rows by reusable search algorithm.
`tail`([n])	Returns a few rows of data from the end of the catalog for previewing purposes.
`to_dask_dataframe`()	Convert the dataset to a Dask DataFrame.
`to_delayed`([optimize_graph])	Get a list of Dask Delayed objects for each partition in the dataset
`to_hats`(base_catalog_path, *[, ...])	Save the catalog to disk in the HATS format.
`write_catalog`(base_catalog_path, *[, ...])	Save the catalog to disk in HATS format.

Attributes

`all_columns`	Returns the names of all columns in the original Dataset.
`columns`	Returns the names of columns available in the Dataset
`dtypes`	Returns the pandas datatypes of the columns in the Dataset
`name`	The name of the catalog
`nested_columns`	The names of the columns of the catalog that are nested.
`npartitions`	Returns the number of partitions of the catalog
`original_schema`	Returns the schema of the original Dataset
`partitions`	Returns the partitions of the catalog
`hc_structure`

MapCatalog

Contents

MapCatalog#