Convert Nested Lightcurves to Single Observations with explode

Convert Nested Lightcurves to Single Observations with `explode`#

In this tutorial, we will:

use the explode method from nested_pandas to convert nested lightcurves into a table of single observations

While nested-pandas provides a convenient and efficient way to work with and analyze nested data like lightcurves, this exploded format may be more familiar for use with traditional time-series analysis workflows and packages that expect flat tables.

Like with other pandas and nested-pandas operations, explode can be applied to lsdb catalogs using the map_partitions method, which applies a function to each partition of the catalog in parallel.

First, we’ll load the ZTF DR23 lightcurve catalog.

[1]:

import lsdb

[2]:

ztf = lsdb.open_catalog("s3://ipac-irsa-ztf/contributed/dr23/lc/hats")
ztf

[2]:

lsdb Catalog ZTF_DR23_Lightcurves:

	objectid	filterid	objra	objdec	lightcurve
npartitions=9933
Order: 4, Pixel: 0	int64[pyarrow]	int8[pyarrow]	float[pyarrow]	float[pyarrow]	nested<hmjd: [double], mag: [float], magerr: [...
Order: 4, Pixel: 1	...	...	...	...	...
...	...	...	...	...	...
Order: 5, Pixel: 12286	...	...	...	...	...
Order: 5, Pixel: 12287	...	...	...	...	...

5 out of 13 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema

[3]:

ztf.head()

[3]:

objectid

filterid

objra

objdec

lightcurve

184342612410390

1447212400010477

2

44.042023

1.264162

hmjd	mag	magerr	clrcoeff	catflags
58761.42485	20.727491	0.2088	0.124223	0
+0 rows	...	...	...	...

189475338943485

1447212400010480

2

44.006325

1.263639

hmjd	mag	magerr	clrcoeff	catflags
58761.42485	20.928879	0.226445	0.124223	0
+13 rows	...	...	...	...

171958220169309

1447212400010486

2

44.685963

1.265697

hmjd	mag	magerr	clrcoeff	catflags
58740.4977	20.131371	0.154903	0.126351	0
+0 rows	...	...	...	...

194147107266604

1447212400010488

2

44.277557

1.263729

hmjd	mag	magerr	clrcoeff	catflags
58356.39099	18.124929	0.033738	0.124563	0
+61 rows	...	...	...	...

170852867104507

1447212400010489

2

44.494118

1.264647

hmjd	mag	magerr	clrcoeff	catflags
58787.27224	20.190992	0.160638	0.13095	0
+2 rows	...	...	...	...

5 rows x 5 columns

We can see that the lightcurve column contains nested data, which can be accessed through the features of nested-pandas. It’s also possible to convert this into a flat structure where each row corresponds to a single observation, like traditional lightcurve tables. To do this, we can use the explode method from nested-pandas, applying it to the lightcurve column within a lsdb catalog using map_partitions. We’ll need to define a function that applies explode to each partition of the catalog, and then map this function across all partitions.

[4]:

def explode_lcs(partition):
    return partition.explode("lightcurve")


exploded_cat = ztf.map_partitions(explode_lcs)
exploded_cat

[4]:

lsdb Catalog ZTF_DR23_Lightcurves:

	objectid	filterid	objra	objdec	hmjd	mag	magerr	clrcoeff	catflags
npartitions=9933
Order: 4, Pixel: 0	int64[pyarrow]	int8[pyarrow]	float[pyarrow]	float[pyarrow]	double[pyarrow]	float[pyarrow]	float[pyarrow]	float[pyarrow]	int32[pyarrow]
Order: 4, Pixel: 1	...	...	...	...	...	...	...	...	...
...	...	...	...	...	...	...	...	...	...
Order: 5, Pixel: 12286	...	...	...	...	...	...	...	...	...
Order: 5, Pixel: 12287	...	...	...	...	...	...	...	...	...

9 out of 13 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema

[5]:

exploded_cat.head(10)

[5]:

	objectid	filterid	objra	objdec	hmjd	mag	magerr	clrcoeff	catflags
_healpix_29
184342612410390	1447212400010477	2	44.042023	1.264162	58761.42485	20.727491	0.2088	0.124223	0
189475338943485	1447212400010480	2	44.006325	1.263639	58761.42485	20.928879	0.226445	0.124223	0
189475338943485	1447212400010480	2	44.006325	1.263639	58761.42531	20.858665	0.220293	0.126322	0
189475338943485	1447212400010480	2	44.006325	1.263639	58773.33318	20.28091	0.169244	0.101843	0
189475338943485	1447212400010480	2	44.006325	1.263639	58775.35076	20.71442	0.207655	0.107146	0
189475338943485	1447212400010480	2	44.006325	1.263639	58777.43861	20.532925	0.191753	0.120748	0
189475338943485	1447212400010480	2	44.006325	1.263639	58812.29228	20.832573	0.218007	0.113882	0
189475338943485	1447212400010480	2	44.006325	1.263639	58861.17844	20.225426	0.163945	0.125741	0
189475338943485	1447212400010480	2	44.006325	1.263639	59061.45527	20.921619	0.225809	0.12306	0
189475338943485	1447212400010480	2	44.006325	1.263639	59090.5126	20.474043	0.186864	0.115983	32768

10 rows × 9 columns

We can see that the resulting catalog has a flat structure, with each row representing a single observation from the original nested lightcurves, along with keeping the associated object-level columns.

Convert Nested Lightcurves to Single Observations with explode

Convert Nested Lightcurves to Single Observations with explode#

Convert Nested Lightcurves to Single Observations with `explode`#