join#
- Catalog.join(other: Catalog, left_on: str | None = None, right_on: str | None = None, through: AssociationCatalog | None = None, suffixes: tuple[str, str] | None = None, output_catalog_name: str | None = None, suffix_method: str | None = None, log_changes: bool = True) Catalog[source]#
Perform a spatial join to another catalog
Joins two catalogs together on a shared column value, merging rows where they match.
This is an inner join: only rows with matching join keys are returned (unmatched rows are dropped).
The operation only joins data from matching partitions, and does not join rows that have a matching column value but are in separate partitions in the sky. For a more general join, see the merge function.
- Parameters:
- otherCatalog
The right catalog to join to
- left_onstr
The name of the column in the left catalog to join on
- right_onstr
The name of the column in the right catalog to join on
- throughAssociationCatalog
An association catalog that provides the alignment between pixels and individual rows.
- suffixestuple[str,str]
Suffixes to apply to the columns of each table
- output_catalog_namestr
The name of the resulting catalog to be stored in metadata
- suffix_methodstr, default “all_columns”
Method to use to add suffixes to columns. Options are:
“overlapping_columns”: only add suffixes to columns that are present in both catalogs
“all_columns”: add suffixes to all columns from both catalogs
Warning
This default will change to “overlapping_columns” in a future release.
- log_changesbool, default True
If True, logs an info message for each column that is being renamed. This only applies when suffix_method is ‘overlapping_columns’.
- Returns:
- Catalog
A new catalog with the columns from each of the input catalogs with their respective suffixes added, and the rows merged on the specified columns.
Examples
Join two catalogs on a shared key within the same sky partitions:
>>> import lsdb >>> from lsdb.nested.datasets import generate_data >>> nf = generate_data(1000, 5, seed=0, ra_range=(0.0, 300.0), dec_range=(-50.0, 50.0)) >>> base = lsdb.from_dataframe(nf.compute()[["ra", "dec", "id"]]) >>> left = base.rename({"ra": "ra_left", "dec": "dec_left"}) >>> right = base.rename({"ra": "ra_right", "dec": "dec_right", "id": "id_right"}).map_partitions( ... lambda df: df.assign(right_flag=True) ... ) >>> joined = left.join(right, left_on="id", right_on="id_right", suffix_method="overlapping_columns") >>> joined.head()[ ... ["ra_left", "dec_left", "id", "right_flag"] ... ] ra_left dec_left id right_flag _healpix_29 118362963675428450 52.696686 39.675892 8154 True 98504457942331510 89.913567 46.147079 3437 True 70433374600953220 40.528952 35.350965 8214 True 154968715224527848 17.57041 29.8936 9853 True 67780378363846894 45.08384 31.95611 8297 True