Catalog

Catalog#

class Catalog(ddf: NestedFrame, ddf_pixel_map: dict[HealpixPixel, int], hc_structure: Catalog, *, loading_config: HatsLoadingConfig | None = None, margin: MarginCatalog | None = None)[source]#

LSDB Catalog to perform analysis of sky catalogs and efficient spatial operations.

Attributes:

all_columns: Returns the names of all columns in the original Dataset.
columns: Returns the names of columns available in the Dataset
dtypes: Returns the pandas datatypes of the columns in the Dataset
hc_collection
iloc: Returns the position-indexer for the catalog
loc: Returns the label-indexer for the catalog
margin
name: The name of the catalog
nested_columns: The names of the columns of the catalog that are nested.
npartitions: Returns the number of partitions of the catalog
original_schema: Returns the schema of the original Dataset
partitions: Returns the partitions of the catalog

Methods

`aggregate_column_statistics`([...])	Read footer statistics in parquet metadata, and report on global min/max values.
`box_search`(ra, dec[, fine])	Performs filtering according to right ascension and declination ranges.
`compute`()	Compute dask distributed dataframe to pandas dataframe
`concat`(other, *[, ignore_empty_margins])	Concatenate two catalogs by aligned HEALPix pixels.
`cone_search`(ra, dec, radius_arcsec[, fine])	Perform a cone search to filter the catalog.
`crossmatch`(other, *[, n_neighbors, ...])	Perform a cross-match between two catalogs
`crossmatch_nested`(other, *[, n_neighbors, ...])	Perform a cross-match between two catalogs, adding the result as a nested column
`estimate_size`()	Estimate size of catalog.
`get_healpix_pixels`()	Get all HEALPix pixels that are contained in the catalog
`get_ordered_healpix_pixels`()	Get all HEALPix pixels that are contained in the catalog, ordered by breadth-first nested ordering.
`get_partition`(order, pixel)	Get the dask partition for a given HEALPix pixel
`get_partition_index`(order, pixel)	Get the dask partition for a given HEALPix pixel
`head`([n])	Returns a few rows of initial data for previewing purposes.
`id_search`(values[, index_catalogs, fine])	Query rows by column values.
`join`(other[, left_on, right_on, through, ...])	Perform a spatial join to another catalog
`join_nested`(other[, left_on, right_on, ...])	Perform a spatial join to another catalog by adding the other catalog as a nested column
`map_partitions`(func, *args[, meta, ...])	Applies a function to each partition in the catalog and respective margin.
`map_rows`(func[, columns, row_container, ...])	Takes a function and applies it to each top-level row of the Catalog.
`merge`(other[, how, on, left_on, right_on, ...])	Performs the merge of two catalog Dataframes
`merge_asof`(other[, direction, suffixes, ...])	Uses the pandas merge_asof function to merge two catalogs on their indices by distance of keys
`merge_map`(map_catalog, func, *args[, meta])	Applies a function to each pair of partitions in this catalog and the map catalog.
`moc_search`(moc[, fine])	Finds all catalog points that are contained within a moc.
`nest_lists`([base_columns, list_columns, name])	Creates a new catalog with a set of list columns packed into a nested column.
`order_search`([min_order, max_order])	Filter catalog by order of HEALPix.
`per_pixel_statistics`([use_default_columns, ...])	Read footer statistics in parquet metadata, and report on min/max values for for each data partition.
`pixel_search`(pixels)	Finds all catalog pixels that overlap with the requested pixel set.
`plot_coverage`(**kwargs)	Create a visual map of the coverage of the catalog.
`plot_pixels`([projection])	Create a visual map of the pixel density of the catalog.
`plot_points`(*[, ra_column, dec_column, ...])	Plots the points in the catalog as a scatter plot
`polygon_search`(vertices[, fine])	Perform a polygonal search to filter the catalog.
`prune_empty_partitions`([persist])	Prunes the catalog of its empty partitions
`query`(expr)	Filters catalog and respective margin, if it exists, using a complex query expression
`random_sample`([n, seed])	Returns a few randomly sampled rows, like self.sample(), except that it randomly samples all partitions in order to fulfill the rows.
`rename`(columns)	Renames catalog columns (not indices) and that of its margin if it exists using a dictionary or function mapping.
`sample`(partition_id[, n, seed])	Returns a few randomly sampled rows from a given partition.
`search`(search)	Find rows by reusable search algorithm.
`tail`([n])	Returns a few rows of data from the end of the catalog for previewing purposes.
`to_dask_dataframe`()	Convert the dataset to a Dask DataFrame.
`to_delayed`([optimize_graph])	Get a list of Dask Delayed objects for each partition in the dataset
`to_hats`(base_catalog_path, *[, ...])	Save the catalog to disk in the HATS format.
`write_catalog`(base_catalog_path, *[, ...])	Save the catalog to disk in HATS format.

__init__(ddf: NestedFrame, ddf_pixel_map: dict[HealpixPixel, int], hc_structure: Catalog, *, loading_config: HatsLoadingConfig | None = None, margin: MarginCatalog | None = None)[source]#

Initialise a Catalog object.

Not to be used to load a catalog directly, use one of the lsdb.from_… or lsdb.open_… methods

Parameters:

ddf: nd.NestedFrame: Dask Nested DataFrame with the source data of the catalog
ddf_pixel_map: DaskDFPixelMap: Dictionary mapping HEALPix order and pixel to partition index of ddf
hc_structure: HCHealpixDataset: Object with hats metadata of the catalog
loading_config: HatsLoadingConfig or None, default None: The configuration used to read the catalog from disk
margin: MarginCatalog or None, default None: The margin catalog.

Methods

`__init__`(ddf, ddf_pixel_map, hc_structure, *)	Initialise a Catalog object.
`aggregate_column_statistics`([...])	Read footer statistics in parquet metadata, and report on global min/max values.
`box_search`(ra, dec[, fine])	Performs filtering according to right ascension and declination ranges.
`compute`()	Compute dask distributed dataframe to pandas dataframe
`concat`(other, *[, ignore_empty_margins])	Concatenate two catalogs by aligned HEALPix pixels.
`cone_search`(ra, dec, radius_arcsec[, fine])	Perform a cone search to filter the catalog.
`crossmatch`(other, *[, n_neighbors, ...])	Perform a cross-match between two catalogs
`crossmatch_nested`(other, *[, n_neighbors, ...])	Perform a cross-match between two catalogs, adding the result as a nested column
`estimate_size`()	Estimate size of catalog.
`get_healpix_pixels`()	Get all HEALPix pixels that are contained in the catalog
`get_ordered_healpix_pixels`()	Get all HEALPix pixels that are contained in the catalog, ordered by breadth-first nested ordering.
`get_partition`(order, pixel)	Get the dask partition for a given HEALPix pixel
`get_partition_index`(order, pixel)	Get the dask partition for a given HEALPix pixel
`head`([n])	Returns a few rows of initial data for previewing purposes.
`id_search`(values[, index_catalogs, fine])	Query rows by column values.
`join`(other[, left_on, right_on, through, ...])	Perform a spatial join to another catalog
`join_nested`(other[, left_on, right_on, ...])	Perform a spatial join to another catalog by adding the other catalog as a nested column
`map_partitions`(func, *args[, meta, ...])	Applies a function to each partition in the catalog and respective margin.
`map_rows`(func[, columns, row_container, ...])	Takes a function and applies it to each top-level row of the Catalog.
`merge`(other[, how, on, left_on, right_on, ...])	Performs the merge of two catalog Dataframes
`merge_asof`(other[, direction, suffixes, ...])	Uses the pandas merge_asof function to merge two catalogs on their indices by distance of keys
`merge_map`(map_catalog, func, *args[, meta])	Applies a function to each pair of partitions in this catalog and the map catalog.
`moc_search`(moc[, fine])	Finds all catalog points that are contained within a moc.
`nest_lists`([base_columns, list_columns, name])	Creates a new catalog with a set of list columns packed into a nested column.
`order_search`([min_order, max_order])	Filter catalog by order of HEALPix.
`per_pixel_statistics`([use_default_columns, ...])	Read footer statistics in parquet metadata, and report on min/max values for for each data partition.
`pixel_search`(pixels)	Finds all catalog pixels that overlap with the requested pixel set.
`plot_coverage`(**kwargs)	Create a visual map of the coverage of the catalog.
`plot_pixels`([projection])	Create a visual map of the pixel density of the catalog.
`plot_points`(*[, ra_column, dec_column, ...])	Plots the points in the catalog as a scatter plot
`polygon_search`(vertices[, fine])	Perform a polygonal search to filter the catalog.
`prune_empty_partitions`([persist])	Prunes the catalog of its empty partitions
`query`(expr)	Filters catalog and respective margin, if it exists, using a complex query expression
`random_sample`([n, seed])	Returns a few randomly sampled rows, like self.sample(), except that it randomly samples all partitions in order to fulfill the rows.
`rename`(columns)	Renames catalog columns (not indices) and that of its margin if it exists using a dictionary or function mapping.
`sample`(partition_id[, n, seed])	Returns a few randomly sampled rows from a given partition.
`search`(search)	Find rows by reusable search algorithm.
`tail`([n])	Returns a few rows of data from the end of the catalog for previewing purposes.
`to_dask_dataframe`()	Convert the dataset to a Dask DataFrame.
`to_delayed`([optimize_graph])	Get a list of Dask Delayed objects for each partition in the dataset
`to_hats`(base_catalog_path, *[, ...])	Save the catalog to disk in the HATS format.
`write_catalog`(base_catalog_path, *[, ...])	Save the catalog to disk in HATS format.

Attributes

`all_columns`	Returns the names of all columns in the original Dataset.
`columns`	Returns the names of columns available in the Dataset
`dtypes`	Returns the pandas datatypes of the columns in the Dataset
`hc_collection`	hats.CatalogCollection object representing the structure and metadata of the HATS catalog, as well as links to affiliated tables like margins and indexes.
`iloc`	Returns the position-indexer for the catalog
`loc`	Returns the label-indexer for the catalog
`margin`	Link to a `MarginCatalog` object that represents the objects in other partitions that are within a specified radius of the border with this partition.
`name`	The name of the catalog
`nested_columns`	The names of the columns of the catalog that are nested.
`npartitions`	Returns the number of partitions of the catalog
`original_schema`	Returns the schema of the original Dataset
`partitions`	Returns the partitions of the catalog
`hc_structure`	hats.Catalog object representing (only) the structure and metadata of the HATS catalog

Catalog

Contents

Catalog#