Loading Generic Particle Data

Notebook

This example creates a fake in-memory particle dataset and then loads it as a yt dataset using the load_particles function.

Our "fake" dataset will be numpy arrays filled with normally distributed randoml particle positions and uniform particle masses. Since real data is often scaled, I arbitrarily multiply by 1e6 to show how to deal with scaled data.

In [1]:
import numpy as np

n_particles = 5000000

ppx, ppy, ppz = 1e6*np.random.normal(size=[3, n_particles])

ppm = np.ones(n_particles)

The load_particles function accepts a dictionary populated with particle data fields loaded in memory as numpy arrays or python lists:

In [2]:
data = {'particle_position_x': ppx,
        'particle_position_y': ppy,
        'particle_position_z': ppz,
        'particle_mass': ppm}

To hook up with yt's internal field system, the dictionary keys must be 'particle_position_x', 'particle_position_y', 'particle_position_z', and 'particle_mass', as well as any other particle field provided by one of the particle frontends.

The load_particles function transforms the data dictionary into an in-memory yt Dataset object, providing an interface for further analysis with yt. The example below illustrates how to load the data dictionary we created above.

In [3]:
import yt
from yt.units import parsec, Msun

bbox = 1.1*np.array([[min(ppx), max(ppx)], [min(ppy), max(ppy)], [min(ppz), max(ppz)]])

ds = yt.load_particles(data, length_unit=parsec, mass_unit=1e8*Msun, bbox=bbox)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_60944/651052214.py in <module>
      4 bbox = 1.1*np.array([[min(ppx), max(ppx)], [min(ppy), max(ppy)], [min(ppz), max(ppz)]])
      5 
----> 6 ds = yt.load_particles(data, length_unit=parsec, mass_unit=1e8*Msun, bbox=bbox)

/tmp/yt/yt/loaders.py in load_particles(data, length_unit, bbox, sim_time, mass_unit, time_unit, velocity_unit, magnetic_unit, periodicity, geometry, unit_system, data_source, default_species_fields)
    755     handler.cosmology_simulation = 0
    756 
--> 757     sds = StreamParticlesDataset(
    758         handler,
    759         geometry=geometry,

/tmp/yt/yt/frontends/stream/data_structures.py in __init__(self, stream_handler, storage_filename, geometry, unit_system, default_species_fields)
    451         default_species_fields=None,
    452     ):
--> 453         super().__init__(
    454             stream_handler,
    455             storage_filename=storage_filename,

/tmp/yt/yt/frontends/stream/data_structures.py in __init__(self, stream_handler, storage_filename, geometry, unit_system, default_species_fields)
    283 
    284         _cached_datasets[name] = self
--> 285         Dataset.__init__(
    286             self,
    287             name,

/tmp/yt/yt/data_objects/static_output.py in __init__(self, filename, dataset_type, file_style, units_override, unit_system, default_species_fields)
    227 
    228         self._parse_parameter_file()
--> 229         self.set_units()
    230         self.setup_cosmology()
    231         self._assign_unit_system(unit_system)

/tmp/yt/yt/data_objects/static_output.py in set_units(self)
   1195                 self.unit_registry.modify("a", 1 / (1 + self.current_redshift))
   1196 
-> 1197         self.set_code_units()
   1198 
   1199     def setup_cosmology(self):

/tmp/yt/yt/data_objects/static_output.py in set_code_units(self)
   1250 
   1251         # set attributes like ds.length_unit
-> 1252         self._set_code_unit_attributes()
   1253 
   1254         self.unit_registry.modify("code_length", self.length_unit)

/tmp/yt/yt/frontends/stream/data_structures.py in _set_code_unit_attributes(self)
    343                 uq = self.quan(unit[0], unit[1])
    344             else:
--> 345                 raise RuntimeError(f"{attr} ({unit}) is invalid.")
    346             setattr(self, attr, uq)
    347 

RuntimeError: length_unit (pc) is invalid.

The length_unit and mass_unit are the conversion from the units used in the data dictionary to CGS. I've arbitrarily chosen one parsec and 10^8 Msun for this example.

The n_ref parameter controls how many particle it takes to accumulate in an oct-tree cell to trigger refinement. Larger n_ref will decrease poisson noise at the cost of resolution in the octree.

Finally, the bbox parameter is a bounding box in the units of the dataset that contains all of the particles. This is used to set the size of the base octree block.

This new dataset acts like any other yt Dataset object, and can be used to create data objects and query for yt fields. This example shows how to access "deposit" fields:

In [4]:
ad = ds.all_data()

# This is generated with "cloud-in-cell" interpolation.
cic_density = ad["deposit", "all_cic"]

# These three are based on nearest-neighbor cell deposition
nn_density = ad["deposit", "all_density"]
nn_deposited_mass = ad["deposit", "all_mass"]
particle_count_per_cell = ad["deposit", "all_count"]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_60944/2596495818.py in <module>
----> 1 ad = ds.all_data()
      2 
      3 # This is generated with "cloud-in-cell" interpolation.
      4 cic_density = ad["deposit", "all_cic"]
      5 

NameError: name 'ds' is not defined
In [5]:
ds.field_list
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_60944/2509385119.py in <module>
----> 1 ds.field_list

NameError: name 'ds' is not defined
In [6]:
ds.derived_field_list
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_60944/2669341139.py in <module>
----> 1 ds.derived_field_list

NameError: name 'ds' is not defined
In [7]:
slc = yt.SlicePlot(ds, 2, ('deposit', 'all_cic'))
slc.set_width((8, 'Mpc'))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_60944/484415534.py in <module>
----> 1 slc = yt.SlicePlot(ds, 2, ('deposit', 'all_cic'))
      2 slc.set_width((8, 'Mpc'))

NameError: name 'ds' is not defined

Finally, one can specify multiple particle types in the data directory by setting the field names to be field tuples (the default field type for particles is "io") if one is not specified:

In [8]:
n_star_particles = 1000000
n_dm_particles = 2000000

ppxd, ppyd, ppzd = 1e6*np.random.normal(size=[3, n_dm_particles])
ppmd = np.ones(n_dm_particles)

ppxs, ppys, ppzs = 5e5*np.random.normal(size=[3, n_star_particles])
ppms = 0.1*np.ones(n_star_particles)

data2 = {('dm', 'particle_position_x'): ppxd,
         ('dm', 'particle_position_y'): ppyd,
         ('dm', 'particle_position_z'): ppzd,
         ('dm', 'particle_mass'): ppmd,
         ('star', 'particle_position_x'): ppxs,
         ('star', 'particle_position_y'): ppys,
         ('star', 'particle_position_z'): ppzs,
         ('star', 'particle_mass'): ppms}

ds2 = yt.load_particles(data2, length_unit=parsec, mass_unit=1e8*Msun, n_ref=256, bbox=bbox)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_60944/3074900769.py in <module>
     17          ('star', 'particle_mass'): ppms}
     18 
---> 19 ds2 = yt.load_particles(data2, length_unit=parsec, mass_unit=1e8*Msun, n_ref=256, bbox=bbox)

TypeError: load_particles() got an unexpected keyword argument 'n_ref'

We now have separate "dm" and "star" particles, as well as their deposited fields:

In [9]:
slc = yt.SlicePlot(ds2, 2, [('deposit', 'dm_cic'), ('deposit', 'star_cic')])
slc.set_width((8, 'Mpc'))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_60944/1049328343.py in <module>
----> 1 slc = yt.SlicePlot(ds2, 2, [('deposit', 'dm_cic'), ('deposit', 'star_cic')])
      2 slc.set_width((8, 'Mpc'))

NameError: name 'ds2' is not defined

(Loading_Generic_Particle_Data.ipynb; Loading_Generic_Particle_Data_evaluated.ipynb; Loading_Generic_Particle_Data.py)