to_association

to_association#

Writes a crossmatching product to disk, in HATS association table format. The output catalog comprises partition parquet files and respective metadata.

The column name arguments should reflect the column names on the corresponding primary and join OBJECT catalogs, so that the association table can be used to perform equijoins on the two sides and recreate the crossmatch.

Parameters:

catalogHealpixDataset: A catalog to export
base_catalog_pathpath-like: Location where catalog is saved to
catalog_namestr or None, default None: The name of the output catalog
primary_catalog_dirpath-like or None, default None: The path to the primary catalog
primary_column_associationstr or None, default None: The column in the association catalog that matches the primary (left) side of join
primary_id_columnstr or None, default none: The id column in the primary catalog
join_catalog_dirpath-like or None, default None: The path to the join catalog
join_column_associationstr or None, default None: The column in the association catalog that matches the joining (right) side of join
join_id_columnstr or None, default None: The id column in the join catalog
separation_columnstr or None, default None: The name of the crossmatch separation column
overwritebool, default False: If True existing catalog is overwritten
**kwargs: Arguments to pass to the parquet write operations

Notes

To configure the appropriate column names, consider two tables that do not share an identifier space (e.g. two surveys), and the way you could go about joining them together with an association table:

TABLE GAIA_SOURCE {
    DESIGNATION <primary key>
}

TABLE SDSS {
    SDSS_ID <primary key>
}

And a SQL query to join them with as association table would look like:

SELECT g.DESIGNATION as gaia_id, s.SDSS_ID as sdss_id
FROM GAIA_SOURCE g
JOIN association_table a
    ON a.primary_id_column = g.DESIGNATION
JOIN SDSS s
    ON a.join_id_column = s.SDSS_ID

Consider instead an object table, joining to a detection table:

TABLE OBJECT {
    ID <primary key>
}

TABLE DETECTION {
    DETECTION_ID <primary key>
    OBJECT_ID <foreign key>
}

And a SQL query to join them would look like:

SELECT o.ID as object_id, d.DETECTION_ID as detection_id
FROM OBJECT o
JOIN DETECTION d
    ON o.ID = d.OBJECT_ID

This is important, as there are three different column names, but really only two meaningful identifiers. For this example, the arguments for this method would be as follows:

primary_id_column = "ID",
join_to_primary_id_column = "OBJECT_ID",
join_id_column = "DETECTION_ID",

to_association

Contents

to_association#