MapCatalog#
- class MapCatalog(ddf: NestedFrame, ddf_pixel_map: dict[HealpixPixel, int], hc_structure: HealpixDataset, loading_config: HatsLoadingConfig | None = None)[source]#
LSDB DataFrame to contain a continuous map.
- Attributes:
all_columnsReturns the names of all columns in the original Dataset.
columnsReturns the names of columns available in the Dataset
dtypesReturns the pandas datatypes of the columns in the Dataset
nameThe name of the catalog
nested_columnsThe names of the columns of the catalog that are nested.
npartitionsReturns the number of partitions of the catalog
original_schemaReturns the schema of the original Dataset
partitionsReturns the partitions of the catalog
Methods
aggregate_column_statistics([...])Read footer statistics in parquet metadata, and report on global min/max values.
box_search(ra, dec[, fine])Performs filtering according to right ascension and declination ranges.
compute()Compute dask distributed dataframe to pandas dataframe
cone_search(ra, dec, radius_arcsec[, fine])Perform a cone search to filter the catalog.
estimate_size()Estimate size of catalog.
get_healpix_pixels()Get all HEALPix pixels that are contained in the catalog
get_ordered_healpix_pixels()Get all HEALPix pixels that are contained in the catalog, ordered by breadth-first nested ordering.
get_partition(order, pixel)Get the dask partition for a given HEALPix pixel
get_partition_index(order, pixel)Get the dask partition for a given HEALPix pixel
head([n])Returns a few rows of initial data for previewing purposes.
map_partitions(func, *args[, meta, ...])Applies a function to each partition in the catalog.
map_rows(func[, columns, row_container, ...])Takes a function and applies it to each top-level row of the Catalog.
moc_search(moc[, fine])Finds all catalog points that are contained within a moc.
nest_lists([base_columns, list_columns, name])Creates a new catalog with a set of list columns packed into a nested column.
order_search([min_order, max_order])Filter catalog by order of HEALPix.
per_pixel_statistics([use_default_columns, ...])Read footer statistics in parquet metadata, and report on min/max values for for each data partition.
pixel_search(pixels)Finds all catalog pixels that overlap with the requested pixel set.
plot_coverage(**kwargs)Create a visual map of the coverage of the catalog.
plot_pixels([projection])Create a visual map of the pixel density of the catalog.
plot_points(*[, ra_column, dec_column, ...])Plots the points in the catalog as a scatter plot
polygon_search(vertices[, fine])Perform a polygonal search to filter the catalog.
prune_empty_partitions([persist])Prunes the catalog of its empty partitions
query(expr)Filters catalog using a complex query expression.
random_sample([n, seed])Returns a few randomly sampled rows, like self.sample(), except that it randomly samples all partitions in order to fulfill the rows.
rename(columns)Renames catalog columns (not indices) using a dictionary or function mapping.
sample(partition_id[, n, seed])Returns a few randomly sampled rows from a given partition.
search(search)Find rows by reusable search algorithm.
tail([n])Returns a few rows of data from the end of the catalog for previewing purposes.
to_dask_dataframe()Convert the dataset to a Dask DataFrame.
to_delayed([optimize_graph])Get a list of Dask Delayed objects for each partition in the dataset
to_hats(base_catalog_path, *[, ...])Save the catalog to disk in the HATS format.
write_catalog(base_catalog_path, *[, ...])Save the catalog to disk in HATS format.
- __init__(ddf: NestedFrame, ddf_pixel_map: dict[HealpixPixel, int], hc_structure: HealpixDataset, loading_config: HatsLoadingConfig | None = None)#
Initialise a Catalog object.
Not to be used to load a catalog directly, use one of the lsdb.from_… or lsdb.open_… methods
- Parameters:
- ddf: nd.NestedFrame
Dask Nested DataFrame with the source data of the catalog
- ddf_pixel_map: DaskDFPixelMap
Dictionary mapping HEALPix order and pixel to partition index of ddf
- hc_structure: HCHealpixDataset
Object with hats metadata of the catalog
- loading_config: HatsLoadingConfig or None, default None
The configuration used to read the catalog from disk
Methods
__init__(ddf, ddf_pixel_map, hc_structure[, ...])Initialise a Catalog object.
aggregate_column_statistics([...])Read footer statistics in parquet metadata, and report on global min/max values.
box_search(ra, dec[, fine])Performs filtering according to right ascension and declination ranges.
compute()Compute dask distributed dataframe to pandas dataframe
cone_search(ra, dec, radius_arcsec[, fine])Perform a cone search to filter the catalog.
estimate_size()Estimate size of catalog.
get_healpix_pixels()Get all HEALPix pixels that are contained in the catalog
get_ordered_healpix_pixels()Get all HEALPix pixels that are contained in the catalog, ordered by breadth-first nested ordering.
get_partition(order, pixel)Get the dask partition for a given HEALPix pixel
get_partition_index(order, pixel)Get the dask partition for a given HEALPix pixel
head([n])Returns a few rows of initial data for previewing purposes.
map_partitions(func, *args[, meta, ...])Applies a function to each partition in the catalog.
map_rows(func[, columns, row_container, ...])Takes a function and applies it to each top-level row of the Catalog.
moc_search(moc[, fine])Finds all catalog points that are contained within a moc.
nest_lists([base_columns, list_columns, name])Creates a new catalog with a set of list columns packed into a nested column.
order_search([min_order, max_order])Filter catalog by order of HEALPix.
per_pixel_statistics([use_default_columns, ...])Read footer statistics in parquet metadata, and report on min/max values for for each data partition.
pixel_search(pixels)Finds all catalog pixels that overlap with the requested pixel set.
plot_coverage(**kwargs)Create a visual map of the coverage of the catalog.
plot_pixels([projection])Create a visual map of the pixel density of the catalog.
plot_points(*[, ra_column, dec_column, ...])Plots the points in the catalog as a scatter plot
polygon_search(vertices[, fine])Perform a polygonal search to filter the catalog.
prune_empty_partitions([persist])Prunes the catalog of its empty partitions
query(expr)Filters catalog using a complex query expression.
random_sample([n, seed])Returns a few randomly sampled rows, like self.sample(), except that it randomly samples all partitions in order to fulfill the rows.
rename(columns)Renames catalog columns (not indices) using a dictionary or function mapping.
sample(partition_id[, n, seed])Returns a few randomly sampled rows from a given partition.
search(search)Find rows by reusable search algorithm.
tail([n])Returns a few rows of data from the end of the catalog for previewing purposes.
to_dask_dataframe()Convert the dataset to a Dask DataFrame.
to_delayed([optimize_graph])Get a list of Dask Delayed objects for each partition in the dataset
to_hats(base_catalog_path, *[, ...])Save the catalog to disk in the HATS format.
write_catalog(base_catalog_path, *[, ...])Save the catalog to disk in HATS format.
Attributes
all_columnsReturns the names of all columns in the original Dataset.
columnsReturns the names of columns available in the Dataset
dtypesReturns the pandas datatypes of the columns in the Dataset
nameThe name of the catalog
nested_columnsThe names of the columns of the catalog that are nested.
npartitionsReturns the number of partitions of the catalog
original_schemaReturns the schema of the original Dataset
partitionsReturns the partitions of the catalog
hc_structure