Cross-match ZTF BTS and NGC

Cross-match ZTF BTS and NGC#

In this notebook, we demonstrate how to cross-match Zwicky Transient Facility (ZTF) Bright Transient Survey (BTS) and New General Catalogue (NGC) using LSDB.

We are using the original survey datasets, as presented by the original survey providers, instead of catalogs that have already been converted to HATS format.

[1]:

# Install astroquery, comment this line if you already have it
%pip install --quiet astroquery

Note: you may need to restart the kernel to use updated packages.

[2]:

import lsdb
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from astropy.coordinates import SkyCoord
from astropy.table import Table
from astroquery.vizier import Vizier
from dask.distributed import Client

1. Download data#

1.1. Download ZTF BTS and convert coordinates to degrees#

We download the ZTF BTS data via HTTP, but must first convert the coordinates into degrees before loading as LSDB catalogs.

Of course ZTF looks much deeper than NGC galaxies from 19th century, so we filter ZTF transients by redshift.

[3]:

%%time

ztf_bts = pd.read_csv(
    "http://sites.astro.caltech.edu/ztf/bts/explorer.php?format=csv",
    na_values="-",
)
ztf_bts = ztf_bts.query("redshift < 0.01")
coord = SkyCoord(ztf_bts["RA"], ztf_bts["Dec"], unit=("hourangle", "deg"))
ztf_bts["ra_deg"], ztf_bts["dec_deg"] = coord.ra.deg, coord.dec.deg
ztf_bts

CPU times: user 61.6 ms, sys: 18.5 ms, total: 80.1 ms
Wall time: 3.42 s

[3]:

	ZTFID	IAUID	RA	Dec	peakt	peakfilt	peakmag	peakabs	duration	rise	fade	type	redshift	b	A_V	ra_deg	dec_deg
228	ZTF18acbwaxk	SN2018hna	12:26:12.08	+58:18:50.8	503.03	r	13.9583	-16.13	61.205	48.696	12.509	SN II	0.00240	58.502394	0.031	186.550333	58.314111
232	ZTF18acbvgqw	SN2018htt	03:06:02.90	-15:36:41.7	441.75	g	13.9113	-19.07	25.333	10.316	15.017	SN Ia	0.00867	-56.742964	0.101	46.512083	-15.611583
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
9447	ZTF25aaugeio	SN2025mvn	13:13:27.85	+36:35:11.6	2850.78	r	15.3985	-15.11	>25.08	>15.06	>10.02	SN II	0.00291	79.453892	0.032	198.366042	36.586556
9477	ZTF25aauvber	SN2025ngs	15:35:16.68	+30:51:55.7	2847.77	r	15.5015	-16.73	>19.561	6.481	>13.08	SN II	0.00630	54.177709	0.078	233.819500	30.865472

108 rows × 17 columns

1.2. Download NGC with `astroquery`#

Please install astroquery first with pip install astroquery or conda install -c conda-forge astroquery.

[4]:

%%time

vizier = Vizier(row_limit=50_000)
tables = vizier.get_catalogs("VII/118/ngc2000")
ngc = tables[0].to_pandas()
coord = SkyCoord(ngc["RAB2000"], ngc["DEB2000"], unit=("hourangle", "deg"))
ngc["ra_deg"], ngc["dec_deg"] = coord.ra.deg, coord.dec.deg
ngc

CPU times: user 685 ms, sys: 15.9 ms, total: 701 ms
Wall time: 702 ms

[4]:

	Name	Type	RAB2000	DEB2000	Source	Const	l_size	size	mag	n_mag	Desc	ra_deg	dec_deg
0	I5370	Gx	00 00.1	+32 45	m	And		0.7	15.0	p	pB, S, R, stell N	0.025	32.750000
1	I5371		00 00.2	+32 49	d	And		NaN	NaN		F, vS, *15 att	0.050	32.816667
...	...	...	...	...	...	...	...	...	...	...	...	...	...
13224	7807	Gx	23 59.6	-18 51	c	Cet		1.0	NaN		eF, pS, iF	359.900	-18.850000
13225	I5369		23 59.8	+32 41	d	And		NaN	NaN		F, S, R, N	359.950	32.683333

13226 rows × 13 columns

2. Plan cross-match job#

LSDB is built upon Dask and can be used with Dask distributed cluster. In this cell we just plan computations and do not actually run them.

[5]:

%%time

matched_catalog = lsdb.crossmatch(
    ztf_bts,
    ngc,
    radius_arcsec=1200,
    ra_column="ra_deg",
    dec_column="dec_deg",
    suffixes=("_ztf", "_ngc"),
)
matched_catalog

CPU times: user 3.64 s, sys: 287 ms, total: 3.92 s
Wall time: 3.85 s

[5]:

lsdb Catalog _ztf_x__ngc:

	ZTFID_ztf	IAUID_ztf	RA_ztf	Dec_ztf	peakt_ztf	peakfilt_ztf	peakmag_ztf	peakabs_ztf	duration_ztf	rise_ztf	fade_ztf	type_ztf	redshift_ztf	b_ztf	A_V_ztf	ra_deg_ztf	dec_deg_ztf	Name_ngc	Type_ngc	RAB2000_ngc	DEB2000_ngc	Source_ngc	Const_ngc	l_size_ngc	size_ngc	mag_ngc	n_mag_ngc	Desc_ngc	ra_deg_ngc	dec_deg_ngc	_dist_arcsec
npartitions=10
Order: 0, Pixel: 0	string[pyarrow]	string[pyarrow]	string[pyarrow]	string[pyarrow]	double[pyarrow]	string[pyarrow]	double[pyarrow]	double[pyarrow]	string[pyarrow]	string[pyarrow]	string[pyarrow]	string[pyarrow]	double[pyarrow]	double[pyarrow]	double[pyarrow]	double[pyarrow]	double[pyarrow]	string[pyarrow]	string[pyarrow]	string[pyarrow]	string[pyarrow]	string[pyarrow]	string[pyarrow]	string[pyarrow]	float[pyarrow]	float[pyarrow]	string[pyarrow]	string[pyarrow]	double[pyarrow]	double[pyarrow]	double[pyarrow]
Order: 0, Pixel: 1	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Order: 7, Pixel: 161010	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Order: 7, Pixel: 179663	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...

31 out of 31 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema

3. Run LSDB pipeline#

We create a dask client to run the crossmatch in parallel.

Additional Help

For additional information on dask client creation, please refer to the official Dask documentation and our Dask cluster configuration page for LSDB-specific tips. Note that dask also provides its own best practices, which may also be useful to consult.

[6]:

%%time

# Create default local cluster
with Client():
    matched_df = matched_catalog.compute()

# Let's output transient name, NGC name and angular distance between them
matched_df = matched_df[["IAUID_ztf", "Name_ngc", "_dist_arcsec", "RA_ztf", "Dec_ztf"]].sort_values(
    by=["_dist_arcsec"]
)
matched_df

CPU times: user 342 ms, sys: 93.9 ms, total: 436 ms
Wall time: 3.83 s

[6]:

	IAUID_ztf	Name_ngc	_dist_arcsec	RA_ztf	Dec_ztf
_healpix_29
1957242016251976064	SN2022xxf	3705	2.985601	11:30:05.93	+09:16:57.2
1827616204711960889	SN2024jlf	5690	6.309787	14:37:42.32	+02:17:04.1
...	...	...	...	...	...
190562289935449883	AT2020aabh	206	1081.99822	00:39:30.06	+40:31:00.8
184639843781841134	AT2022qpg	224	1193.179125	00:44:20.78	+41:23:10.8

84 rows × 5 columns

We may have some false matches here, because NGC is too shallow for this task. However, if we sort the table by the cross-match distance, we can see the first one is a supernova (SN2022xxf) in the nearby galaxy NGC 3705.

4. Make some plot#

This part is not related to LSDB and adopted from PanSTARRS image this tutorial.

Now let’s download host galaxy image from the PanSTARRS survey and plot it out (with SN location in the middle and marked with a “+”)

[7]:

def getimages(ra, dec, size=240, filters="grizy"):
    """Query ps1filenames.py service to get a list of images

    ra, dec = position in degrees
    size = image size in pixels (0.25 arcsec/pixel)
    filters = string with filters to include
    Returns a table with the results
    """

    service = "https://ps1images.stsci.edu/cgi-bin/ps1filenames.py"
    url = ("{service}?ra={ra}&dec={dec}&size={size}&format=fits" "&filters={filters}").format(**locals())
    table = Table.read(url, format="ascii")
    return table


def get_ps1_image(url, size=240):
    """
    size: pixel number for 0.25 arcsec/pixel
    """
    from PIL import Image
    import requests
    from io import BytesIO

    try:
        r = requests.get(url)
        im = Image.open(BytesIO(r.content))
    except:
        print("Can't get ps1 image")
        im = None
    return im

[8]:

c = SkyCoord(
    matched_df["RA_ztf"].to_numpy()[0], matched_df["Dec_ztf"].to_numpy()[0], unit=("hourangle", "deg")
)
ra = c.ra.degree
dec = c.dec.degree
oid = matched_df["IAUID_ztf"].to_numpy()[0]
table = getimages(ra, dec, size=1200, filters="grizy")
url = (
    "https://ps1images.stsci.edu/cgi-bin/fitscut.cgi?"
    "ra={}&dec={}&size=1200&format=jpg&red={}&green={}&blue={}"
).format(ra, dec, table["filename"][0], table["filename"][1], table["filename"][2])
im = get_ps1_image(url)
fig, ax = plt.subplots(figsize=(7, 3))
if im is not None:
    ax.imshow(im)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.scatter(np.average(plt.xlim()), np.average(plt.ylim()), marker="+", color="yellow")
    ax.set_title(oid)
    plt.show()

../../_images/tutorials_pre_executed_ztf_bts-ngc_15_0.png

About#

Authors: Konstantin Malanchev, Mi Dai

Last updated on: Oct 27, 2025

If you use lsdb for published research, please cite following instructions.