crossmatch#
- Catalog.crossmatch(other: Catalog, *, n_neighbors: int | None = None, radius_arcsec: float | None = None, min_radius_arcsec: float | None = None, algorithm: AbstractCrossmatchAlgorithm | None = None, output_catalog_name: str | None = None, require_right_margin: bool = False, how: str = 'inner', suffixes: tuple[str, str] | None = None, suffix_method: str | None = None, log_changes: bool = True) Catalog[source]#
Perform a cross-match between two catalogs
The pixels from each catalog are aligned via a PixelAlignment, and cross-matching is performed on each pair of overlapping pixels. The resulting catalog will have partitions matching an inner pixel alignment - using pixels that have overlap in both input catalogs and taking the smallest of any overlapping pixels.
The resulting catalog will be partitioned using the left catalog’s ra and dec, and the index for each row will be the same as the index from the corresponding row in the left catalog’s index.
- Parameters:
- otherCatalog
The right catalog to cross-match against
- n_neighborsint, default 1
The number of neighbors to find within each point.
- radius_arcsecfloat, default 1.0
The threshold distance in arcseconds beyond which neighbors are not added.
- min_radius_arcsecfloat, default 0.0
The threshold distance in arcseconds beyond which neighbors are added.
- algorithmAbstractCrossmatchAlgorithm | None, default KDTreeCrossmatch
The instance of an algorithm used to perform the crossmatch. If None, the default KDTree crossmatch algorithm is used. If specified, the algorithm is defined by subclassing AbstractCrossmatchAlgorithm.
- Default algorithm:
KdTreeCrossmatch: find the k-nearest neighbors using a kd_tree
- Custom algorithm:
To specify a custom algorithm, write a class that subclasses the AbstractCrossmatchAlgorithm class, and either overwrite the crossmatch or the perform_crossmatch function.
The function should be able to perform a crossmatch on two pandas DataFrames from a partition from each catalog. It should return two 1d numpy arrays of equal lengths with the indices of the matching rows from the left and right dataframes, and a dataframe with any extra columns generated by the crossmatch algorithm, also with the same length. These columns are specified in {AbstractCrossmatchAlgorithm.extra_columns}, with their respective data types, by means of an empty pandas dataframe. As an example, the KdTreeCrossmatch algorithm outputs a “_dist_arcsec” column with the distance between data points. Its extra_columns attribute is specified as follows:
pd.DataFrame({"_dist_arcsec": pd.Series(dtype=np.dtype("float64"))})
The crossmatch/perform_crossmatch methods will receive an instance of CrossmatchArgs which includes the partitions and respective pixel information:
- left_df: npd.NestedFrame - right_df: npd.NestedFrame - left_order: int - left_pixel: int - right_order: int - right_pixel: int - left_catalog_info: hc.catalog.TableProperties - right_catalog_info: hc.catalog.TableProperties - right_margin_catalog_info: hc.catalog.TableProperties
Include any algorithm-specific parameters in the initialization of your object. These parameters should be validated in AbstractCrossmatchAlgorithm.validate, by overwriting the method.
- output_catalog_namestr, default {left_name}_x_{right_name}
The name of the resulting catalog.
- require_right_marginbool, default False
If true, raises an error if the right margin is missing which could lead to incomplete crossmatches.
- howstr
How to handle the crossmatch of the two catalogs. One of {‘left’, ‘inner’}; defaults to ‘inner’.
- suffixesTuple[str,str] or None
A pair of suffixes to be appended to the end of each column name when they are joined. Default uses the name of the catalog for the suffix.
- suffix_methodstr or None, default “all_columns”
Method to use to add suffixes to columns. Options are:
“overlapping_columns”: only add suffixes to columns that are present in both catalogs
“all_columns”: add suffixes to all columns from both catalogs
Warning
This default will change to “overlapping_columns” in a future release.
- log_changesbool, default True
If True, logs an info message for each column that is being renamed. This only applies when suffix_method is ‘overlapping_columns’.
- Returns:
- Catalog
A Catalog with the data from the left and right catalogs merged with one row for each pair of neighbors found from cross-matching. The resulting table contains all columns from the left and right catalogs with their respective suffixes and, whenever specified, a set of extra columns generated by the crossmatch algorithm.
- Raises:
- TypeError
If the other catalog is not of type Catalog
- ValueError
If both the kwargs for the default algorithm and an algorithm are specified. If the suffixes provided is not a tuple of two strings. If the right catalog has no margin and require_right_margin is True.
Examples
Crossmatch two small synthetic catalogs:
>>> import lsdb >>> from lsdb.nested.datasets import generate_data >>> nf = generate_data(1000, 5, seed=0, ra_range=(0.0, 300.0), dec_range=(-50.0, 50.0)) >>> df = nf.compute()[["ra", "dec", "id"]] >>> left = lsdb.from_dataframe(df, catalog_name="left") >>> right = lsdb.from_dataframe(df, catalog_name="right") >>> xmatch = left.crossmatch(right, n_neighbors=1, radius_arcsec=1.0, ... suffix_method="overlapping_columns", log_changes=False) >>> xmatch.head()[ ... ["ra_left", "dec_left", "id_left", "_dist_arcsec"] ... ] ra_left dec_left id_left _dist_arcsec _healpix_29 118362963675428450 52.696686 39.675892 8154 0.0 98504457942331510 89.913567 46.147079 3437 0.0 70433374600953220 40.528952 35.350965 8214 0.0 154968715224527848 17.57041 29.8936 9853 0.0 67780378363846894 45.08384 31.95611 8297 0.0