to_dask_dataframe

to_dask_dataframe#

Catalog.to_dask_dataframe() DataFrame#

Convert the dataset to a Dask DataFrame.

Returns:
dd.DataFrame

The Dask DataFrame representation of the dataset.

Notes

This method returns a Dask DataFrame. However, be aware that the underlying in-memory DataFrame for each partition is still a nested-pandas NestedFrame, rather than a pandas DataFrame.

Examples

>>> import lsdb
>>> catalog = lsdb.from_dataframe(pd.DataFrame({"ra":[0, 10], "dec":[5, 15],
...                                             "mag":[21, 22], "mag_err":[.1, .2]}))
>>> ddf = catalog.to_dask_dataframe()
>>> ddf
Dask DataFrame Structure:
                                 ra             dec             mag          mag_err
npartitions=1
1369094286720630784  int64[pyarrow]  int64[pyarrow]  int64[pyarrow]  double[pyarrow]
1441151880758558720             ...             ...             ...              ...
Dask Name: nestedframe, 3 expressions
Expr=Dask NestedFrame Structure:
                                 ra             dec             mag          mag_err
npartitions=1
1369094286720630784  int64[pyarrow]  int64[pyarrow]  int64[pyarrow]  double[pyarrow]
1441151880758558720             ...             ...             ...              ...
Dask Name: nestedframe, 3 expressions
Expr=MapPartitions(NestedFrame)