Catalog

Catalog#

class Catalog(ddf: NestedFrame, ddf_pixel_map: dict[HealpixPixel, int], hc_structure: Catalog, *, loading_config: HatsLoadingConfig | None = None, margin: MarginCatalog | None = None)[source]#

LSDB Catalog to perform analysis of sky catalogs and efficient spatial operations.

Attributes:
all_columns

Returns the names of all columns in the original Dataset.

columns

Returns the names of columns available in the Dataset

dtypes

Returns the pandas datatypes of the columns in the Dataset

hc_collection
iloc

Returns the position-indexer for the catalog

loc

Returns the label-indexer for the catalog

margin
name

The name of the catalog

nested_columns

The names of the columns of the catalog that are nested.

npartitions

Returns the number of partitions of the catalog

original_schema

Returns the schema of the original Dataset

partitions

Returns the partitions of the catalog

Methods

aggregate_column_statistics([...])

Read footer statistics in parquet metadata, and report on global min/max values.

box_search(ra, dec[, fine])

Performs filtering according to right ascension and declination ranges.

compute()

Compute dask distributed dataframe to pandas dataframe

concat(other, *[, ignore_empty_margins])

Concatenate two catalogs by aligned HEALPix pixels.

cone_search(ra, dec, radius_arcsec[, fine])

Perform a cone search to filter the catalog.

crossmatch(other, *[, n_neighbors, ...])

Perform a cross-match between two catalogs

crossmatch_nested(other, *[, n_neighbors, ...])

Perform a cross-match between two catalogs, adding the result as a nested column

estimate_size()

Estimate size of catalog.

get_healpix_pixels()

Get all HEALPix pixels that are contained in the catalog

get_ordered_healpix_pixels()

Get all HEALPix pixels that are contained in the catalog, ordered by breadth-first nested ordering.

get_partition(order, pixel)

Get the dask partition for a given HEALPix pixel

get_partition_index(order, pixel)

Get the dask partition for a given HEALPix pixel

head([n])

Returns a few rows of initial data for previewing purposes.

id_search(values[, index_catalogs, fine])

Query rows by column values.

join(other[, left_on, right_on, through, ...])

Perform a spatial join to another catalog

join_nested(other[, left_on, right_on, ...])

Perform a spatial join to another catalog by adding the other catalog as a nested column

map_partitions(func, *args[, meta, ...])

Applies a function to each partition in the catalog and respective margin.

map_rows(func[, columns, row_container, ...])

Takes a function and applies it to each top-level row of the Catalog.

merge(other[, how, on, left_on, right_on, ...])

Performs the merge of two catalog Dataframes

merge_asof(other[, direction, suffixes, ...])

Uses the pandas merge_asof function to merge two catalogs on their indices by distance of keys

merge_map(map_catalog, func, *args[, meta])

Applies a function to each pair of partitions in this catalog and the map catalog.

moc_search(moc[, fine])

Finds all catalog points that are contained within a moc.

nest_lists([base_columns, list_columns, name])

Creates a new catalog with a set of list columns packed into a nested column.

order_search([min_order, max_order])

Filter catalog by order of HEALPix.

per_pixel_statistics([use_default_columns, ...])

Read footer statistics in parquet metadata, and report on min/max values for for each data partition.

pixel_search(pixels)

Finds all catalog pixels that overlap with the requested pixel set.

plot_coverage(**kwargs)

Create a visual map of the coverage of the catalog.

plot_pixels([projection])

Create a visual map of the pixel density of the catalog.

plot_points(*[, ra_column, dec_column, ...])

Plots the points in the catalog as a scatter plot

polygon_search(vertices[, fine])

Perform a polygonal search to filter the catalog.

prune_empty_partitions([persist])

Prunes the catalog of its empty partitions

query(expr)

Filters catalog and respective margin, if it exists, using a complex query expression

random_sample([n, seed])

Returns a few randomly sampled rows, like self.sample(), except that it randomly samples all partitions in order to fulfill the rows.

rename(columns)

Renames catalog columns (not indices) and that of its margin if it exists using a dictionary or function mapping.

sample(partition_id[, n, seed])

Returns a few randomly sampled rows from a given partition.

search(search)

Find rows by reusable search algorithm.

tail([n])

Returns a few rows of data from the end of the catalog for previewing purposes.

to_dask_dataframe()

Convert the dataset to a Dask DataFrame.

to_delayed([optimize_graph])

Get a list of Dask Delayed objects for each partition in the dataset

to_hats(base_catalog_path, *[, ...])

Save the catalog to disk in the HATS format.

write_catalog(base_catalog_path, *[, ...])

Save the catalog to disk in HATS format.

__init__(ddf: NestedFrame, ddf_pixel_map: dict[HealpixPixel, int], hc_structure: Catalog, *, loading_config: HatsLoadingConfig | None = None, margin: MarginCatalog | None = None)[source]#

Initialise a Catalog object.

Not to be used to load a catalog directly, use one of the lsdb.from_… or lsdb.open_… methods

Parameters:
ddf: nd.NestedFrame

Dask Nested DataFrame with the source data of the catalog

ddf_pixel_map: DaskDFPixelMap

Dictionary mapping HEALPix order and pixel to partition index of ddf

hc_structure: HCHealpixDataset

Object with hats metadata of the catalog

loading_config: HatsLoadingConfig or None, default None

The configuration used to read the catalog from disk

margin: MarginCatalog or None, default None

The margin catalog.

Methods

__init__(ddf, ddf_pixel_map, hc_structure, *)

Initialise a Catalog object.

aggregate_column_statistics([...])

Read footer statistics in parquet metadata, and report on global min/max values.

box_search(ra, dec[, fine])

Performs filtering according to right ascension and declination ranges.

compute()

Compute dask distributed dataframe to pandas dataframe

concat(other, *[, ignore_empty_margins])

Concatenate two catalogs by aligned HEALPix pixels.

cone_search(ra, dec, radius_arcsec[, fine])

Perform a cone search to filter the catalog.

crossmatch(other, *[, n_neighbors, ...])

Perform a cross-match between two catalogs

crossmatch_nested(other, *[, n_neighbors, ...])

Perform a cross-match between two catalogs, adding the result as a nested column

estimate_size()

Estimate size of catalog.

get_healpix_pixels()

Get all HEALPix pixels that are contained in the catalog

get_ordered_healpix_pixels()

Get all HEALPix pixels that are contained in the catalog, ordered by breadth-first nested ordering.

get_partition(order, pixel)

Get the dask partition for a given HEALPix pixel

get_partition_index(order, pixel)

Get the dask partition for a given HEALPix pixel

head([n])

Returns a few rows of initial data for previewing purposes.

id_search(values[, index_catalogs, fine])

Query rows by column values.

join(other[, left_on, right_on, through, ...])

Perform a spatial join to another catalog

join_nested(other[, left_on, right_on, ...])

Perform a spatial join to another catalog by adding the other catalog as a nested column

map_partitions(func, *args[, meta, ...])

Applies a function to each partition in the catalog and respective margin.

map_rows(func[, columns, row_container, ...])

Takes a function and applies it to each top-level row of the Catalog.

merge(other[, how, on, left_on, right_on, ...])

Performs the merge of two catalog Dataframes

merge_asof(other[, direction, suffixes, ...])

Uses the pandas merge_asof function to merge two catalogs on their indices by distance of keys

merge_map(map_catalog, func, *args[, meta])

Applies a function to each pair of partitions in this catalog and the map catalog.

moc_search(moc[, fine])

Finds all catalog points that are contained within a moc.

nest_lists([base_columns, list_columns, name])

Creates a new catalog with a set of list columns packed into a nested column.

order_search([min_order, max_order])

Filter catalog by order of HEALPix.

per_pixel_statistics([use_default_columns, ...])

Read footer statistics in parquet metadata, and report on min/max values for for each data partition.

pixel_search(pixels)

Finds all catalog pixels that overlap with the requested pixel set.

plot_coverage(**kwargs)

Create a visual map of the coverage of the catalog.

plot_pixels([projection])

Create a visual map of the pixel density of the catalog.

plot_points(*[, ra_column, dec_column, ...])

Plots the points in the catalog as a scatter plot

polygon_search(vertices[, fine])

Perform a polygonal search to filter the catalog.

prune_empty_partitions([persist])

Prunes the catalog of its empty partitions

query(expr)

Filters catalog and respective margin, if it exists, using a complex query expression

random_sample([n, seed])

Returns a few randomly sampled rows, like self.sample(), except that it randomly samples all partitions in order to fulfill the rows.

rename(columns)

Renames catalog columns (not indices) and that of its margin if it exists using a dictionary or function mapping.

sample(partition_id[, n, seed])

Returns a few randomly sampled rows from a given partition.

search(search)

Find rows by reusable search algorithm.

tail([n])

Returns a few rows of data from the end of the catalog for previewing purposes.

to_dask_dataframe()

Convert the dataset to a Dask DataFrame.

to_delayed([optimize_graph])

Get a list of Dask Delayed objects for each partition in the dataset

to_hats(base_catalog_path, *[, ...])

Save the catalog to disk in the HATS format.

write_catalog(base_catalog_path, *[, ...])

Save the catalog to disk in HATS format.

Attributes

all_columns

Returns the names of all columns in the original Dataset.

columns

Returns the names of columns available in the Dataset

dtypes

Returns the pandas datatypes of the columns in the Dataset

hc_collection

hats.CatalogCollection object representing the structure and metadata of the HATS catalog, as well as links to affiliated tables like margins and indexes.

iloc

Returns the position-indexer for the catalog

loc

Returns the label-indexer for the catalog

margin

Link to a MarginCatalog object that represents the objects in other partitions that are within a specified radius of the border with this partition.

name

The name of the catalog

nested_columns

The names of the columns of the catalog that are nested.

npartitions

Returns the number of partitions of the catalog

original_schema

Returns the schema of the original Dataset

partitions

Returns the partitions of the catalog

hc_structure

hats.Catalog object representing (only) the structure and metadata of the HATS catalog