Convert Nested Lightcurves to Single Observations with explode

Convert Nested Lightcurves to Single Observations with explode#

In this tutorial, we will:

  • use the explode method from nested_pandas to convert nested lightcurves into a table of single observations

While nested-pandas provides a convenient and efficient way to work with and analyze nested data like lightcurves, this exploded format may be more familiar for use with traditional time-series analysis workflows and packages that expect flat tables.

Like with other pandas and nested-pandas operations, explode can be applied to lsdb catalogs using the map_partitions method, which applies a function to each partition of the catalog in parallel.

First, we’ll load the ZTF DR23 lightcurve catalog.

[1]:
import lsdb
[2]:
ztf = lsdb.open_catalog("s3://ipac-irsa-ztf/contributed/dr23/lc/hats")
ztf
[2]:
lsdb Catalog ZTF_DR23_Lightcurves:
objectid filterid objra objdec lightcurve
npartitions=9933
Order: 4, Pixel: 0 int64[pyarrow] int8[pyarrow] float[pyarrow] float[pyarrow] nested<hmjd: [double], mag: [float], magerr: [...
Order: 4, Pixel: 1 ... ... ... ... ...
... ... ... ... ... ...
Order: 5, Pixel: 12286 ... ... ... ... ...
Order: 5, Pixel: 12287 ... ... ... ... ...
5 out of 13 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema
[3]:
ztf.head()
[3]:
  objectid filterid objra objdec lightcurve
184342612410390 1447212400010477 2 44.042023 1.264162
hmjd mag magerr clrcoeff catflags
58761.42485 20.727491 0.2088 0.124223 0
+0 rows ... ... ... ...
189475338943485 1447212400010480 2 44.006325 1.263639
hmjd mag magerr clrcoeff catflags
58761.42485 20.928879 0.226445 0.124223 0
+13 rows ... ... ... ...
171958220169309 1447212400010486 2 44.685963 1.265697
hmjd mag magerr clrcoeff catflags
58740.4977 20.131371 0.154903 0.126351 0
+0 rows ... ... ... ...
194147107266604 1447212400010488 2 44.277557 1.263729
hmjd mag magerr clrcoeff catflags
58356.39099 18.124929 0.033738 0.124563 0
+61 rows ... ... ... ...
170852867104507 1447212400010489 2 44.494118 1.264647
hmjd mag magerr clrcoeff catflags
58787.27224 20.190992 0.160638 0.13095 0
+2 rows ... ... ... ...
5 rows x 5 columns

We can see that the lightcurve column contains nested data, which can be accessed through the features of nested-pandas. It’s also possible to convert this into a flat structure where each row corresponds to a single observation, like traditional lightcurve tables. To do this, we can use the explode method from nested-pandas, applying it to the lightcurve column within a lsdb catalog using map_partitions. We’ll need to define a function that applies explode to each partition of the catalog, and then map this function across all partitions.

[4]:
def explode_lcs(partition):
    return partition.explode("lightcurve")


exploded_cat = ztf.map_partitions(explode_lcs)
exploded_cat
[4]:
lsdb Catalog ZTF_DR23_Lightcurves:
objectid filterid objra objdec hmjd mag magerr clrcoeff catflags
npartitions=9933
Order: 4, Pixel: 0 int64[pyarrow] int8[pyarrow] float[pyarrow] float[pyarrow] double[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] int32[pyarrow]
Order: 4, Pixel: 1 ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ...
Order: 5, Pixel: 12286 ... ... ... ... ... ... ... ... ...
Order: 5, Pixel: 12287 ... ... ... ... ... ... ... ... ...
9 out of 13 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema
[5]:
exploded_cat.head(10)
[5]:
objectid filterid objra objdec hmjd mag magerr clrcoeff catflags
_healpix_29
184342612410390 1447212400010477 2 44.042023 1.264162 58761.42485 20.727491 0.2088 0.124223 0
189475338943485 1447212400010480 2 44.006325 1.263639 58761.42485 20.928879 0.226445 0.124223 0
189475338943485 1447212400010480 2 44.006325 1.263639 58761.42531 20.858665 0.220293 0.126322 0
189475338943485 1447212400010480 2 44.006325 1.263639 58773.33318 20.28091 0.169244 0.101843 0
189475338943485 1447212400010480 2 44.006325 1.263639 58775.35076 20.71442 0.207655 0.107146 0
189475338943485 1447212400010480 2 44.006325 1.263639 58777.43861 20.532925 0.191753 0.120748 0
189475338943485 1447212400010480 2 44.006325 1.263639 58812.29228 20.832573 0.218007 0.113882 0
189475338943485 1447212400010480 2 44.006325 1.263639 58861.17844 20.225426 0.163945 0.125741 0
189475338943485 1447212400010480 2 44.006325 1.263639 59061.45527 20.921619 0.225809 0.12306 0
189475338943485 1447212400010480 2 44.006325 1.263639 59090.5126 20.474043 0.186864 0.115983 32768

10 rows × 9 columns

We can see that the resulting catalog has a flat structure, with each row representing a single observation from the original nested lightcurves, along with keeping the associated object-level columns.