Convert Nested Lightcurves to Single Observations with explode
In this tutorial, we will:
While nested-pandas provides a convenient and efficient way to work with and analyze nested data like lightcurves, this exploded format may be more familiar for use with traditional time-series analysis workflows and packages that expect flat tables.
Like with other pandas and nested-pandas operations, explode can be applied to lsdb catalogs using the map_partitions method, which applies a function to each partition of the catalog in parallel.
First, we’ll load the ZTF DR23 lightcurve catalog.
lsdb Catalog ZTF_DR23_Lightcurves:
|
objectid |
filterid |
objra |
objdec |
lightcurve |
| npartitions=9933 |
|
|
|
|
|
| Order: 4, Pixel: 0 |
int64[pyarrow] |
int8[pyarrow] |
float[pyarrow] |
float[pyarrow] |
nested<hmjd: [double], mag: [float], magerr: [... |
| Order: 4, Pixel: 1 |
... |
... |
... |
... |
... |
| ... |
... |
... |
... |
... |
... |
| Order: 5, Pixel: 12286 |
... |
... |
... |
... |
... |
| Order: 5, Pixel: 12287 |
... |
... |
... |
... |
... |
5 out of 13 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema
| |
objectid |
filterid |
objra |
objdec |
lightcurve |
| 184342612410390 |
1447212400010477 |
2 |
44.042023 |
1.264162 |
| hmjd |
mag |
magerr |
clrcoeff |
catflags |
| 58761.42485 |
20.727491 |
0.2088 |
0.124223 |
0 |
| +0 rows |
... |
... |
... |
... |
|
| 189475338943485 |
1447212400010480 |
2 |
44.006325 |
1.263639 |
| hmjd |
mag |
magerr |
clrcoeff |
catflags |
| 58761.42485 |
20.928879 |
0.226445 |
0.124223 |
0 |
| +13 rows |
... |
... |
... |
... |
|
| 171958220169309 |
1447212400010486 |
2 |
44.685963 |
1.265697 |
| hmjd |
mag |
magerr |
clrcoeff |
catflags |
| 58740.4977 |
20.131371 |
0.154903 |
0.126351 |
0 |
| +0 rows |
... |
... |
... |
... |
|
| 194147107266604 |
1447212400010488 |
2 |
44.277557 |
1.263729 |
| hmjd |
mag |
magerr |
clrcoeff |
catflags |
| 58356.39099 |
18.124929 |
0.033738 |
0.124563 |
0 |
| +61 rows |
... |
... |
... |
... |
|
| 170852867104507 |
1447212400010489 |
2 |
44.494118 |
1.264647 |
| hmjd |
mag |
magerr |
clrcoeff |
catflags |
| 58787.27224 |
20.190992 |
0.160638 |
0.13095 |
0 |
| +2 rows |
... |
... |
... |
... |
|
5 rows x 5 columns
We can see that the lightcurve column contains nested data, which can be accessed through the features of nested-pandas. It’s also possible to convert this into a flat structure where each row corresponds to a single observation, like traditional lightcurve tables. To do this, we can use the explode method from nested-pandas, applying it to the lightcurve column within a lsdb catalog
using map_partitions. We’ll need to define a function that applies explode to each partition of the catalog, and then map this function across all partitions.
lsdb Catalog ZTF_DR23_Lightcurves:
|
objectid |
filterid |
objra |
objdec |
hmjd |
mag |
magerr |
clrcoeff |
catflags |
| npartitions=9933 |
|
|
|
|
|
|
|
|
|
| Order: 4, Pixel: 0 |
int64[pyarrow] |
int8[pyarrow] |
float[pyarrow] |
float[pyarrow] |
double[pyarrow] |
float[pyarrow] |
float[pyarrow] |
float[pyarrow] |
int32[pyarrow] |
| Order: 4, Pixel: 1 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
| ... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
| Order: 5, Pixel: 12286 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
| Order: 5, Pixel: 12287 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
9 out of 13 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema
|
objectid |
filterid |
objra |
objdec |
hmjd |
mag |
magerr |
clrcoeff |
catflags |
| _healpix_29 |
|
|
|
|
|
|
|
|
|
| 184342612410390 |
1447212400010477 |
2 |
44.042023 |
1.264162 |
58761.42485 |
20.727491 |
0.2088 |
0.124223 |
0 |
| 189475338943485 |
1447212400010480 |
2 |
44.006325 |
1.263639 |
58761.42485 |
20.928879 |
0.226445 |
0.124223 |
0 |
| 189475338943485 |
1447212400010480 |
2 |
44.006325 |
1.263639 |
58761.42531 |
20.858665 |
0.220293 |
0.126322 |
0 |
| 189475338943485 |
1447212400010480 |
2 |
44.006325 |
1.263639 |
58773.33318 |
20.28091 |
0.169244 |
0.101843 |
0 |
| 189475338943485 |
1447212400010480 |
2 |
44.006325 |
1.263639 |
58775.35076 |
20.71442 |
0.207655 |
0.107146 |
0 |
| 189475338943485 |
1447212400010480 |
2 |
44.006325 |
1.263639 |
58777.43861 |
20.532925 |
0.191753 |
0.120748 |
0 |
| 189475338943485 |
1447212400010480 |
2 |
44.006325 |
1.263639 |
58812.29228 |
20.832573 |
0.218007 |
0.113882 |
0 |
| 189475338943485 |
1447212400010480 |
2 |
44.006325 |
1.263639 |
58861.17844 |
20.225426 |
0.163945 |
0.125741 |
0 |
| 189475338943485 |
1447212400010480 |
2 |
44.006325 |
1.263639 |
59061.45527 |
20.921619 |
0.225809 |
0.12306 |
0 |
| 189475338943485 |
1447212400010480 |
2 |
44.006325 |
1.263639 |
59090.5126 |
20.474043 |
0.186864 |
0.115983 |
32768 |
10 rows × 9 columns
We can see that the resulting catalog has a flat structure, with each row representing a single observation from the original nested lightcurves, along with keeping the associated object-level columns.