Catalog#
- class Catalog(ddf: NestedFrame, ddf_pixel_map: dict[HealpixPixel, int], hc_structure: Catalog, *, loading_config: HatsLoadingConfig | None = None, margin: MarginCatalog | None = None)[source]#
LSDB Catalog to perform analysis of sky catalogs and efficient spatial operations.
- Attributes:
all_columnsReturns the names of all columns in the original Dataset.
columnsReturns the names of columns available in the Dataset
dtypesReturns the pandas datatypes of the columns in the Dataset
- hc_collection
ilocReturns the position-indexer for the catalog
locReturns the label-indexer for the catalog
- margin
nameThe name of the catalog
nested_columnsThe names of the columns of the catalog that are nested.
npartitionsReturns the number of partitions of the catalog
original_schemaReturns the schema of the original Dataset
partitionsReturns the partitions of the catalog
Methods
aggregate_column_statistics([...])Read footer statistics in parquet metadata, and report on global min/max values.
box_search(ra, dec[, fine])Performs filtering according to right ascension and declination ranges.
compute()Compute dask distributed dataframe to pandas dataframe
concat(other, *[, ignore_empty_margins])Concatenate two catalogs by aligned HEALPix pixels.
cone_search(ra, dec, radius_arcsec[, fine])Perform a cone search to filter the catalog.
crossmatch(other, *[, n_neighbors, ...])Perform a cross-match between two catalogs
crossmatch_nested(other, *[, n_neighbors, ...])Perform a cross-match between two catalogs, adding the result as a nested column
Estimate size of catalog.
Get all HEALPix pixels that are contained in the catalog
Get all HEALPix pixels that are contained in the catalog, ordered by breadth-first nested ordering.
get_partition(order, pixel)Get the dask partition for a given HEALPix pixel
get_partition_index(order, pixel)Get the dask partition for a given HEALPix pixel
head([n])Returns a few rows of initial data for previewing purposes.
id_search(values[, index_catalogs, fine])Query rows by column values.
join(other[, left_on, right_on, through, ...])Perform a spatial join to another catalog
join_nested(other[, left_on, right_on, ...])Perform a spatial join to another catalog by adding the other catalog as a nested column
map_partitions(func, *args[, meta, ...])Applies a function to each partition in the catalog and respective margin.
map_rows(func[, columns, row_container, ...])Takes a function and applies it to each top-level row of the Catalog.
merge(other[, how, on, left_on, right_on, ...])Performs the merge of two catalog Dataframes
merge_asof(other[, direction, suffixes, ...])Uses the pandas merge_asof function to merge two catalogs on their indices by distance of keys
merge_map(map_catalog, func, *args[, meta])Applies a function to each pair of partitions in this catalog and the map catalog.
moc_search(moc[, fine])Finds all catalog points that are contained within a moc.
nest_lists([base_columns, list_columns, name])Creates a new catalog with a set of list columns packed into a nested column.
order_search([min_order, max_order])Filter catalog by order of HEALPix.
per_pixel_statistics([use_default_columns, ...])Read footer statistics in parquet metadata, and report on min/max values for for each data partition.
pixel_search(pixels)Finds all catalog pixels that overlap with the requested pixel set.
plot_coverage(**kwargs)Create a visual map of the coverage of the catalog.
plot_pixels([projection])Create a visual map of the pixel density of the catalog.
plot_points(*[, ra_column, dec_column, ...])Plots the points in the catalog as a scatter plot
polygon_search(vertices[, fine])Perform a polygonal search to filter the catalog.
prune_empty_partitions([persist])Prunes the catalog of its empty partitions
query(expr)Filters catalog and respective margin, if it exists, using a complex query expression
random_sample([n, seed])Returns a few randomly sampled rows, like self.sample(), except that it randomly samples all partitions in order to fulfill the rows.
rename(columns)Renames catalog columns (not indices) and that of its margin if it exists using a dictionary or function mapping.
sample(partition_id[, n, seed])Returns a few randomly sampled rows from a given partition.
search(search)Find rows by reusable search algorithm.
tail([n])Returns a few rows of data from the end of the catalog for previewing purposes.
Convert the dataset to a Dask DataFrame.
to_delayed([optimize_graph])Get a list of Dask Delayed objects for each partition in the dataset
to_hats(base_catalog_path, *[, ...])Save the catalog to disk in the HATS format.
write_catalog(base_catalog_path, *[, ...])Save the catalog to disk in HATS format.
- __init__(ddf: NestedFrame, ddf_pixel_map: dict[HealpixPixel, int], hc_structure: Catalog, *, loading_config: HatsLoadingConfig | None = None, margin: MarginCatalog | None = None)[source]#
Initialise a Catalog object.
Not to be used to load a catalog directly, use one of the lsdb.from_… or lsdb.open_… methods
- Parameters:
- ddf: nd.NestedFrame
Dask Nested DataFrame with the source data of the catalog
- ddf_pixel_map: DaskDFPixelMap
Dictionary mapping HEALPix order and pixel to partition index of ddf
- hc_structure: HCHealpixDataset
Object with hats metadata of the catalog
- loading_config: HatsLoadingConfig or None, default None
The configuration used to read the catalog from disk
- margin: MarginCatalog or None, default None
The margin catalog.
Methods
__init__(ddf, ddf_pixel_map, hc_structure, *)Initialise a Catalog object.
aggregate_column_statistics([...])Read footer statistics in parquet metadata, and report on global min/max values.
box_search(ra, dec[, fine])Performs filtering according to right ascension and declination ranges.
compute()Compute dask distributed dataframe to pandas dataframe
concat(other, *[, ignore_empty_margins])Concatenate two catalogs by aligned HEALPix pixels.
cone_search(ra, dec, radius_arcsec[, fine])Perform a cone search to filter the catalog.
crossmatch(other, *[, n_neighbors, ...])Perform a cross-match between two catalogs
crossmatch_nested(other, *[, n_neighbors, ...])Perform a cross-match between two catalogs, adding the result as a nested column
Estimate size of catalog.
Get all HEALPix pixels that are contained in the catalog
Get all HEALPix pixels that are contained in the catalog, ordered by breadth-first nested ordering.
get_partition(order, pixel)Get the dask partition for a given HEALPix pixel
get_partition_index(order, pixel)Get the dask partition for a given HEALPix pixel
head([n])Returns a few rows of initial data for previewing purposes.
id_search(values[, index_catalogs, fine])Query rows by column values.
join(other[, left_on, right_on, through, ...])Perform a spatial join to another catalog
join_nested(other[, left_on, right_on, ...])Perform a spatial join to another catalog by adding the other catalog as a nested column
map_partitions(func, *args[, meta, ...])Applies a function to each partition in the catalog and respective margin.
map_rows(func[, columns, row_container, ...])Takes a function and applies it to each top-level row of the Catalog.
merge(other[, how, on, left_on, right_on, ...])Performs the merge of two catalog Dataframes
merge_asof(other[, direction, suffixes, ...])Uses the pandas merge_asof function to merge two catalogs on their indices by distance of keys
merge_map(map_catalog, func, *args[, meta])Applies a function to each pair of partitions in this catalog and the map catalog.
moc_search(moc[, fine])Finds all catalog points that are contained within a moc.
nest_lists([base_columns, list_columns, name])Creates a new catalog with a set of list columns packed into a nested column.
order_search([min_order, max_order])Filter catalog by order of HEALPix.
per_pixel_statistics([use_default_columns, ...])Read footer statistics in parquet metadata, and report on min/max values for for each data partition.
pixel_search(pixels)Finds all catalog pixels that overlap with the requested pixel set.
plot_coverage(**kwargs)Create a visual map of the coverage of the catalog.
plot_pixels([projection])Create a visual map of the pixel density of the catalog.
plot_points(*[, ra_column, dec_column, ...])Plots the points in the catalog as a scatter plot
polygon_search(vertices[, fine])Perform a polygonal search to filter the catalog.
prune_empty_partitions([persist])Prunes the catalog of its empty partitions
query(expr)Filters catalog and respective margin, if it exists, using a complex query expression
random_sample([n, seed])Returns a few randomly sampled rows, like self.sample(), except that it randomly samples all partitions in order to fulfill the rows.
rename(columns)Renames catalog columns (not indices) and that of its margin if it exists using a dictionary or function mapping.
sample(partition_id[, n, seed])Returns a few randomly sampled rows from a given partition.
search(search)Find rows by reusable search algorithm.
tail([n])Returns a few rows of data from the end of the catalog for previewing purposes.
Convert the dataset to a Dask DataFrame.
to_delayed([optimize_graph])Get a list of Dask Delayed objects for each partition in the dataset
to_hats(base_catalog_path, *[, ...])Save the catalog to disk in the HATS format.
write_catalog(base_catalog_path, *[, ...])Save the catalog to disk in HATS format.
Attributes
Returns the names of all columns in the original Dataset.
Returns the names of columns available in the Dataset
Returns the pandas datatypes of the columns in the Dataset
hc_collectionhats.CatalogCollection object representing the structure and metadata of the HATS catalog, as well as links to affiliated tables like margins and indexes.
Returns the position-indexer for the catalog
Returns the label-indexer for the catalog
Link to a
MarginCatalogobject that represents the objects in other partitions that are within a specified radius of the border with this partition.nameThe name of the catalog
The names of the columns of the catalog that are nested.
Returns the number of partitions of the catalog
Returns the schema of the original Dataset
Returns the partitions of the catalog
hats.Catalog object representing (only) the structure and metadata of the HATS catalog