Accessing remote data

Accessing remote data#

If you’re accessing HATS catalogs on a local file system, a typical path string like "/path/to/catalogs" will be sufficient. This tutorial will help you get started if you need to access data over HTTP/S, cloud storage, or have some additional parameters for connecting to your data.

We use fsspec and universal_pathlib to create connections to remote data sources. Please refer to their documentation for a list of supported filesystems and any filesystem-specific parameters.

If you’re using PyPI/pip for package management, you can install ALL of the fsspec implementations, as well as some other nice-to-have dependencies with pip install 'lsdb[full]'.

Below, we provide some a basic workflow for accessing remote data, as well as filesystem-specific hints.

1. HTTP / HTTPS#

Firstly, make sure to install the fsspec http package:

pip install aiohttp

OR

conda install aiohttp
[1]:
from upath import UPath

test_path = UPath("https://data.lsdb.io/hats/gaia_dr3/gaia/")
test_path.exists()
[1]:
True
[2]:
import lsdb

cat = lsdb.open_catalog("https://data.lsdb.io/hats/gaia_dr3/gaia/")
cat
[2]:
lsdb Catalog gaia:
solution_id designation source_id random_index ref_epoch ra ra_error dec dec_error parallax parallax_error parallax_over_error pm pmra pmra_error pmdec pmdec_error ra_dec_corr ra_parallax_corr ra_pmra_corr ra_pmdec_corr dec_parallax_corr dec_pmra_corr dec_pmdec_corr parallax_pmra_corr parallax_pmdec_corr pmra_pmdec_corr astrometric_n_obs_al astrometric_n_obs_ac astrometric_n_good_obs_al astrometric_n_bad_obs_al astrometric_gof_al astrometric_chi2_al astrometric_excess_noise astrometric_excess_noise_sig astrometric_params_solved astrometric_primary_flag nu_eff_used_in_astrometry pseudocolour pseudocolour_error ra_pseudocolour_corr dec_pseudocolour_corr parallax_pseudocolour_corr pmra_pseudocolour_corr pmdec_pseudocolour_corr astrometric_matched_transits visibility_periods_used astrometric_sigma5d_max matched_transits new_matched_transits matched_transits_removed ipd_gof_harmonic_amplitude ipd_gof_harmonic_phase ipd_frac_multi_peak ipd_frac_odd_win ruwe scan_direction_strength_k1 scan_direction_strength_k2 scan_direction_strength_k3 scan_direction_strength_k4 scan_direction_mean_k1 scan_direction_mean_k2 scan_direction_mean_k3 scan_direction_mean_k4 duplicated_source phot_g_n_obs phot_g_mean_flux phot_g_mean_flux_error phot_g_mean_flux_over_error phot_g_mean_mag phot_bp_n_obs phot_bp_mean_flux phot_bp_mean_flux_error phot_bp_mean_flux_over_error phot_bp_mean_mag phot_rp_n_obs phot_rp_mean_flux phot_rp_mean_flux_error phot_rp_mean_flux_over_error phot_rp_mean_mag phot_bp_rp_excess_factor phot_bp_n_contaminated_transits phot_bp_n_blended_transits phot_rp_n_contaminated_transits phot_rp_n_blended_transits phot_proc_mode bp_rp bp_g g_rp radial_velocity radial_velocity_error rv_method_used rv_nb_transits rv_nb_deblended_transits rv_visibility_periods_used rv_expected_sig_to_noise rv_renormalised_gof rv_chisq_pvalue rv_time_duration rv_amplitude_robust rv_template_teff rv_template_logg rv_template_fe_h rv_atm_param_origin vbroad vbroad_error vbroad_nb_transits grvs_mag grvs_mag_error grvs_mag_nb_transits rvs_spec_sig_to_noise phot_variable_flag l b ecl_lon ecl_lat in_qso_candidates in_galaxy_candidates non_single_star has_xp_continuous has_xp_sampled has_rvs has_epoch_photometry has_epoch_rv has_mcmc_gspphot has_mcmc_msc in_andromeda_survey classprob_dsc_combmod_quasar classprob_dsc_combmod_galaxy classprob_dsc_combmod_star teff_gspphot teff_gspphot_lower teff_gspphot_upper logg_gspphot logg_gspphot_lower logg_gspphot_upper mh_gspphot mh_gspphot_lower mh_gspphot_upper distance_gspphot distance_gspphot_lower distance_gspphot_upper azero_gspphot azero_gspphot_lower azero_gspphot_upper ag_gspphot ag_gspphot_lower ag_gspphot_upper ebpminrp_gspphot ebpminrp_gspphot_lower ebpminrp_gspphot_upper libname_gspphot
npartitions=2016
Order: 2, Pixel: 0 int64[pyarrow] string[pyarrow] int64[pyarrow] int64[pyarrow] double[pyarrow] double[pyarrow] float[pyarrow] double[pyarrow] float[pyarrow] double[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] double[pyarrow] float[pyarrow] double[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] int16[pyarrow] int16[pyarrow] int16[pyarrow] int16[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] int8[pyarrow] bool[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] int16[pyarrow] int16[pyarrow] float[pyarrow] int16[pyarrow] int16[pyarrow] int16[pyarrow] float[pyarrow] float[pyarrow] int8[pyarrow] int8[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] bool[pyarrow] int16[pyarrow] double[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] int16[pyarrow] double[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] int16[pyarrow] double[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] int16[pyarrow] int16[pyarrow] int16[pyarrow] int16[pyarrow] int8[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] int8[pyarrow] int16[pyarrow] int16[pyarrow] int16[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] int16[pyarrow] float[pyarrow] float[pyarrow] int16[pyarrow] float[pyarrow] float[pyarrow] int16[pyarrow] float[pyarrow] string[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow] double[pyarrow] bool[pyarrow] bool[pyarrow] int16[pyarrow] bool[pyarrow] bool[pyarrow] bool[pyarrow] bool[pyarrow] bool[pyarrow] bool[pyarrow] bool[pyarrow] bool[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] float[pyarrow] string[pyarrow]
Order: 2, Pixel: 1 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Order: 3, Pixel: 766 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Order: 3, Pixel: 767 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
152 out of 152 available columns in the catalog have been loaded lazily, meaning no data has been read, only the catalog schema

Occasionally, with HTTPS data, you may see issues with missing certificates. If you encounter a FileNotFoundError, but you’re pretty sure the file should be found:

  1. Check your network and server availability

  2. On Linux, be sure that openSSL and ca-certificates are in place

  3. On Mac, run /Applications/Python\ 3.*/Install\ Certificates.command

2. AWS#

Cloud storage often requires additional configuration, particularly if you need to authenticate to access the data.

The authentication is typically passed via storage_options, and a connection to AWS S3 might look something like the following:

s3_path = UPath("s3://bucket_name",
                protocol="s3",
                client_kwargs = {"endpoint_url": "http://0.0.0.0:000/"},
                anon= False)

About#

Authors: Melissa DeLucchi

Last updated on: April 4, 2025

If you use lsdb for published research, please cite following instructions.