to_association

Contents

to_association#

to_association(catalog: HealpixDataset, *, base_catalog_path: str | Path | UPath, catalog_name: str | None = None, primary_catalog_dir: str | Path | UPath | None = None, primary_column_association: str | None = None, primary_id_column: str | None = None, join_catalog_dir: str | Path | UPath | None = None, join_column_association: str | None = None, join_to_primary_id_column: str | None = None, join_id_column: str | None = None, separation_column: str | None = None, overwrite: bool = False, addl_hats_properties: dict | None = None, **kwargs)[source]#

Writes a crossmatching product to disk, in HATS association table format. The output catalog comprises partition parquet files and respective metadata.

The column name arguments should reflect the column names on the corresponding primary and join OBJECT catalogs, so that the association table can be used to perform equijoins on the two sides and recreate the crossmatch.

Parameters:
catalogHealpixDataset

A catalog to export

base_catalog_pathpath-like

Location where catalog is saved to

catalog_namestr or None, default None

The name of the output catalog

primary_catalog_dirpath-like or None, default None

The path to the primary catalog

primary_column_associationstr or None, default None

The column in the association catalog that matches the primary (left) side of join

primary_id_columnstr or None, default none

The id column in the primary catalog

join_catalog_dirpath-like or None, default None

The path to the join catalog

join_column_associationstr or None, default None

The column in the association catalog that matches the joining (right) side of join

join_id_columnstr or None, default None

The id column in the join catalog

separation_columnstr or None, default None

The name of the crossmatch separation column

overwritebool, default False

If True existing catalog is overwritten

**kwargs

Arguments to pass to the parquet write operations

Notes

To configure the appropriate column names, consider two tables that do not share an identifier space (e.g. two surveys), and the way you could go about joining them together with an association table:

TABLE GAIA_SOURCE {
    DESIGNATION <primary key>
}

TABLE SDSS {
    SDSS_ID <primary key>
}

And a SQL query to join them with as association table would look like:

SELECT g.DESIGNATION as gaia_id, s.SDSS_ID as sdss_id
FROM GAIA_SOURCE g
JOIN association_table a
    ON a.primary_id_column = g.DESIGNATION
JOIN SDSS s
    ON a.join_id_column = s.SDSS_ID

Consider instead an object table, joining to a detection table:

TABLE OBJECT {
    ID <primary key>
}

TABLE DETECTION {
    DETECTION_ID <primary key>
    OBJECT_ID <foreign key>
}

And a SQL query to join them would look like:

SELECT o.ID as object_id, d.DETECTION_ID as detection_id
FROM OBJECT o
JOIN DETECTION d
    ON o.ID = d.OBJECT_ID

This is important, as there are three different column names, but really only two meaningful identifiers. For this example, the arguments for this method would be as follows:

primary_id_column = "ID",
join_to_primary_id_column = "OBJECT_ID",
join_id_column = "DETECTION_ID",