Loading Generic Array Data

Notebook

Even if your data is not strictly related to fields commonly used in astrophysical codes or your code is not supported yet, you can still feed it to yt to use its advanced visualization and analysis facilities. The only requirement is that your data can be represented as three-dimensional NumPy arrays with a consistent grid structure. What follows are some common examples of loading in generic array data that you may find useful.

Generic Unigrid Data

The simplest case is that of a single grid of data spanning the domain, with one or more fields. The data could be generated from a variety of sources; we'll just give three common examples:

Data generated "on-the-fly"

The most common example is that of data that is generated in memory from the currently running script or notebook.

In [1]:
import yt
import numpy as np

In this example, we'll just create a 3-D array of random floating-point data using NumPy:

In [2]:
arr = np.random.random(size=(64,64,64))

To load this data into yt, we need associate it with a field. The data dictionary consists of one or more fields, each consisting of a tuple of a NumPy array and a unit string. Then, we can call load_uniform_grid:

In [3]:
data = dict(density = (arr, "g/cm**3"))
bbox = np.array([[-1.5, 1.5], [-1.5, 1.5], [-1.5, 1.5]])
ds = yt.load_uniform_grid(data, arr.shape, length_unit="Mpc", bbox=bbox, nprocs=64)

load_uniform_grid takes the following arguments and optional keywords:

  • data : This is a dict of numpy arrays, where the keys are the field names
  • domain_dimensions : The domain dimensions of the unigrid
  • length_unit : The unit that corresponds to code_length, can be a string, tuple, or floating-point number
  • bbox : Size of computational domain in units of code_length
  • nprocs : If greater than 1, will create this number of subarrays out of data
  • sim_time : The simulation time in seconds
  • mass_unit : The unit that corresponds to code_mass, can be a string, tuple, or floating-point number
  • time_unit : The unit that corresponds to code_time, can be a string, tuple, or floating-point number
  • velocity_unit : The unit that corresponds to code_velocity
  • magnetic_unit : The unit that corresponds to code_magnetic, i.e. the internal units used to represent magnetic field strengths.
  • periodicity : A tuple of booleans that determines whether the data will be treated as periodic along each axis

This example creates a yt-native dataset ds that will treat your array as a density field in cubic domain of 3 Mpc edge size and simultaneously divide the domain into nprocs = 64 chunks, so that you can take advantage of the underlying parallelism.

The optional unit keyword arguments allow for the default units of the dataset to be set. They can be:

  • A string, e.g. length_unit="Mpc"
  • A tuple, e.g. mass_unit=(1.0e14, "Msun")
  • A floating-point value, e.g. time_unit=3.1557e13

In the latter case, the unit is assumed to be cgs.

The resulting ds functions exactly like a dataset like any other yt can handle--it can be sliced, and we can show the grid boundaries:

In [4]:
slc = yt.SlicePlot(ds, "z", ("gas", "density"))
slc.set_cmap(("gas", "density"), "Blues")
slc.annotate_grids(cmap=None)
slc.show()

Particle fields are detected as one-dimensional fields. Particle fields are then added as one-dimensional arrays in a similar manner as the three-dimensional grid fields:

In [5]:
posx_arr = np.random.uniform(low=-1.5, high=1.5, size=10000)
posy_arr = np.random.uniform(low=-1.5, high=1.5, size=10000)
posz_arr = np.random.uniform(low=-1.5, high=1.5, size=10000)
data = dict(density = (np.random.random(size=(64,64,64)), "Msun/kpc**3"),
            particle_position_x = (posx_arr, 'code_length'), 
            particle_position_y = (posy_arr, 'code_length'),
            particle_position_z = (posz_arr, 'code_length'))
bbox = np.array([[-1.5, 1.5], [-1.5, 1.5], [-1.5, 1.5]])
ds = yt.load_uniform_grid(data, data["density"][0].shape, length_unit=(1.0, "Mpc"), mass_unit=(1.0,"Msun"),
                          bbox=bbox, nprocs=4)

In this example only the particle position fields have been assigned. If no particle arrays are supplied, then the number of particles is assumed to be zero. Take a slice, and overlay particle positions:

In [6]:
slc = yt.SlicePlot(ds, "z", ("gas", "density"))
slc.set_cmap(("gas", "density"), "Blues")
slc.annotate_particles(0.25, p_size=12.0, col="Red")
slc.show()

HDF5 data

HDF5 is a convenient format to store data. If you have unigrid data stored in an HDF5 file, it is possible to load it into memory and then use load_uniform_grid to get it into yt:

In [7]:
from os.path import join
import h5py
from yt.config import ytcfg
data_dir = ytcfg.get('yt','test_data_dir')
from yt.utilities.physical_ratios import cm_per_kpc
f = h5py.File(join(data_dir, "UnigridData", "turb_vels.h5"), "r") # Read-only access to the file

The HDF5 file handle's keys correspond to the datasets stored in the file:

In [8]:
print (f.keys())
<KeysViewHDF5 ['Bx', 'By', 'Bz', 'Density', 'MagneticEnergy', 'Temperature', 'turb_x-velocity', 'turb_y-velocity', 'turb_z-velocity', 'x-velocity', 'y-velocity', 'z-velocity']>

We need to add some unit information. It may be stored in the file somewhere, or we may know it from another source. In this case, the units are simply cgs:

In [9]:
units = ["gauss","gauss","gauss", "g/cm**3", "erg/cm**3", "K", 
         "cm/s", "cm/s", "cm/s", "cm/s", "cm/s", "cm/s"]

We can iterate over the items in the file handle and the units to get the data into a dictionary, which we will then load:

In [10]:
data = {k:(v.value,u) for (k,v), u in zip(f.items(),units)}
bbox = np.array([[-0.5, 0.5], [-0.5, 0.5], [-0.5, 0.5]])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_60846/4188278041.py in <module>
----> 1 data = {k:(v.value,u) for (k,v), u in zip(f.items(),units)}
      2 bbox = np.array([[-0.5, 0.5], [-0.5, 0.5], [-0.5, 0.5]])

/tmp/ipykernel_60846/4188278041.py in <dictcomp>(.0)
----> 1 data = {k:(v.value,u) for (k,v), u in zip(f.items(),units)}
      2 bbox = np.array([[-0.5, 0.5], [-0.5, 0.5], [-0.5, 0.5]])

AttributeError: 'Dataset' object has no attribute 'value'
In [11]:
ds = yt.load_uniform_grid(data, data["Density"][0].shape, length_unit=250.*cm_per_kpc, bbox=bbox, nprocs=8, 
                       periodicity=(False,False,False))
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_60846/230434430.py in <module>
----> 1 ds = yt.load_uniform_grid(data, data["Density"][0].shape, length_unit=250.*cm_per_kpc, bbox=bbox, nprocs=8, 
      2                        periodicity=(False,False,False))

KeyError: 'Density'

In this case, the data came from a simulation which was 250 kpc on a side. An example projection of two fields:

In [12]:
prj = yt.ProjectionPlot(ds, "z", ["z-velocity", "Temperature", "Bx"], weight_field="Density")
prj.set_log("z-velocity", False)
prj.set_log("Bx", False)
prj.show()
---------------------------------------------------------------------------
YTFieldNotFound                           Traceback (most recent call last)
/tmp/ipykernel_60846/105522131.py in <module>
----> 1 prj = yt.ProjectionPlot(ds, "z", ["z-velocity", "Temperature", "Bx"], weight_field="Density")
      2 prj.set_log("z-velocity", False)
      3 prj.set_log("Bx", False)
      4 prj.show()

/tmp/yt/yt/visualization/plot_window.py in __init__(self, ds, axis, fields, center, width, axes_unit, weight_field, max_level, origin, right_handed, fontsize, field_parameters, data_source, method, proj_style, window_size, buff_size, aspect)
   1827         # plotting classes to avoid an exception
   1828         test_data_source = ds.all_data()
-> 1829         validate_mesh_fields(test_data_source, fields)
   1830 
   1831         if isinstance(ds, YTSpatialPlotDataset):

/tmp/yt/yt/visualization/plot_window.py in validate_mesh_fields(data_source, fields)
    134     if isinstance(data_source.ds, YTSpatialPlotDataset):
    135         return
--> 136     canonical_fields = data_source._determine_fields(fields)
    137     invalid_fields = []
    138     for field in canonical_fields:

/tmp/yt/yt/data_objects/data_containers.py in _determine_fields(self, fields)
   1450             ftype, fname = self._tupleize_field(field)
   1451             # print(field, " : ",ftype, fname)
-> 1452             finfo = self.ds._get_field_info(ftype, fname)
   1453 
   1454             # really ugly check to ensure that this field really does exist somewhere,

/tmp/yt/yt/data_objects/static_output.py in _get_field_info(self, ftype, fname)
    817 
    818     def _get_field_info(self, ftype, fname=None):
--> 819         field_info, is_ambiguous = self._get_field_info_helper(ftype, fname)
    820 
    821         if is_ambiguous:

/tmp/yt/yt/data_objects/static_output.py in _get_field_info_helper(self, ftype, fname)
    895                     self._last_finfo = self.field_info[(ftype, fname)]
    896                     return self._last_finfo, is_ambiguous
--> 897         raise YTFieldNotFound(field=INPUT, ds=self)
    898 
    899     def _setup_classes(self):

YTFieldNotFound: Could not find field ('unknown', 'z-velocity') in UniformGridData.

Volume Rendering Loaded Data

Volume rendering requires defining a TransferFunction to map data to color and opacity and a camera to create a viewport and render the image.

In [13]:
#Find the min and max of the field
mi, ma = ds.all_data().quantities.extrema('Temperature')
#Reduce the dynamic range
mi = mi.value + 1.5e7
ma = ma.value - 0.81e7
---------------------------------------------------------------------------
YTFieldNotFound                           Traceback (most recent call last)
/tmp/ipykernel_60846/1716012905.py in <module>
      1 #Find the min and max of the field
----> 2 mi, ma = ds.all_data().quantities.extrema('Temperature')
      3 #Reduce the dynamic range
      4 mi = mi.value + 1.5e7
      5 ma = ma.value - 0.81e7

/tmp/yt/yt/data_objects/derived_quantities.py in __call__(self, fields, non_zero)
    607     def __call__(self, fields, non_zero=False):
    608         fields = list(iter_fields(fields))
--> 609         rv = super().__call__(fields, non_zero)
    610         if len(rv) == 1:
    611             rv = rv[0]

/tmp/yt/yt/data_objects/derived_quantities.py in __call__(self, *args, **kwargs)
     54         storage = {}
     55         for sto, ds in parallel_objects(chunks, -1, storage=storage):
---> 56             sto.result = self.process_chunk(ds, *args, **kwargs)
     57         # Now storage will have everything, and will be done via pickling, so
     58         # the units will be preserved.  (Credit to Nathan for this

/tmp/yt/yt/data_objects/derived_quantities.py in process_chunk(self, data, fields, non_zero)
    615         vals = []
    616         for field in fields:
--> 617             field = data._determine_fields(field)[0]
    618             fd = data[field]
    619             if non_zero:

/tmp/yt/yt/data_objects/data_containers.py in _determine_fields(self, fields)
   1450             ftype, fname = self._tupleize_field(field)
   1451             # print(field, " : ",ftype, fname)
-> 1452             finfo = self.ds._get_field_info(ftype, fname)
   1453 
   1454             # really ugly check to ensure that this field really does exist somewhere,

/tmp/yt/yt/data_objects/static_output.py in _get_field_info(self, ftype, fname)
    817 
    818     def _get_field_info(self, ftype, fname=None):
--> 819         field_info, is_ambiguous = self._get_field_info_helper(ftype, fname)
    820 
    821         if is_ambiguous:

/tmp/yt/yt/data_objects/static_output.py in _get_field_info_helper(self, ftype, fname)
    895                     self._last_finfo = self.field_info[(ftype, fname)]
    896                     return self._last_finfo, is_ambiguous
--> 897         raise YTFieldNotFound(field=INPUT, ds=self)
    898 
    899     def _setup_classes(self):

YTFieldNotFound: Could not find field ('unknown', 'Temperature') in UniformGridData.

Define the properties and size of the camera viewport:

In [14]:
# Choose a vector representing the viewing direction.
L = [0.5, 0.5, 0.5]
# Define the center of the camera to be the domain center
c = ds.domain_center[0]
# Define the width of the image
W = 1.5*ds.domain_width[0]
# Define the number of pixels to render
Npixels = 512

Create a camera object and

In [15]:
sc = yt.create_scene(ds, 'Temperature')
dd = ds.all_data()

source = sc[0]

source.log_field = False

tf = yt.ColorTransferFunction((mi, ma), grey_opacity=False)
tf.map_to_colormap(mi, ma, scale=15.0, colormap='algae')

source.set_transfer_function(tf)

sc.add_source(source)

cam = sc.add_camera()
cam.width = W
cam.center = c
cam.normal_vector = L
cam.north_vector = [0, 0, 1]
---------------------------------------------------------------------------
YTFieldNotFound                           Traceback (most recent call last)
/tmp/ipykernel_60846/3219155676.py in <module>
----> 1 sc = yt.create_scene(ds, 'Temperature')
      2 dd = ds.all_data()
      3 
      4 source = sc[0]
      5 

/tmp/yt/yt/visualization/volume_rendering/volume_rendering.py in create_scene(data_source, field, lens_type)
     63         source = MeshSource(data_source, field=field)
     64     else:
---> 65         source = create_volume_source(data_source, field=field)
     66 
     67     sc.add_source(source)

/tmp/yt/yt/visualization/volume_rendering/render_source.py in create_volume_source(data_source, field)
    129     index_class = ds.index.__class__
    130     if issubclass(index_class, GridIndex):
--> 131         return KDTreeVolumeSource(data_source, field)
    132     elif issubclass(index_class, OctreeIndex):
    133         return OctreeVolumeSource(data_source, field)

/tmp/yt/yt/visualization/volume_rendering/render_source.py in __init__(self, data_source, field)
    189         super().__init__()
    190         self.data_source = data_source_or_all(data_source)
--> 191         field = self.data_source._determine_fields(field)[0]
    192         self.current_image = None
    193         self.check_nans = False

/tmp/yt/yt/data_objects/data_containers.py in _determine_fields(self, fields)
   1450             ftype, fname = self._tupleize_field(field)
   1451             # print(field, " : ",ftype, fname)
-> 1452             finfo = self.ds._get_field_info(ftype, fname)
   1453 
   1454             # really ugly check to ensure that this field really does exist somewhere,

/tmp/yt/yt/data_objects/static_output.py in _get_field_info(self, ftype, fname)
    817 
    818     def _get_field_info(self, ftype, fname=None):
--> 819         field_info, is_ambiguous = self._get_field_info_helper(ftype, fname)
    820 
    821         if is_ambiguous:

/tmp/yt/yt/data_objects/static_output.py in _get_field_info_helper(self, ftype, fname)
    895                     self._last_finfo = self.field_info[(ftype, fname)]
    896                     return self._last_finfo, is_ambiguous
--> 897         raise YTFieldNotFound(field=INPUT, ds=self)
    898 
    899     def _setup_classes(self):

YTFieldNotFound: Could not find field ('unknown', 'Temperature') in UniformGridData.
In [16]:
sc.show(sigma_clip=4)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_60846/1717087347.py in <module>
----> 1 sc.show(sigma_clip=4)

NameError: name 'sc' is not defined

FITS image data

The FITS file format is a common astronomical format for 2-D images, but it can store three-dimensional data as well. The AstroPy project has modules for FITS reading and writing, which were incorporated from the PyFITS library.

In [17]:
import astropy.io.fits as pyfits
# Or, just import pyfits if that's what you have installed

Using pyfits we can open a FITS file. If we call info() on the file handle, we can figure out some information about the file's contents. The file in this example has a primary HDU (header-data-unit) with no data, and three HDUs with 3-D data. In this case, the data consists of three velocity fields:

In [18]:
f = pyfits.open(join(data_dir, "UnigridData", "velocity_field_20.fits"))
f.info()
Filename: /mnt/yt/UnigridData/velocity_field_20.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  X-VELOCITY    1 PrimaryHDU      25   (200, 200, 200)   float64   
  1  Y-VELOCITY    1 ImageHDU        26   (200, 200, 200)   float64   
  2  Z-VELOCITY    1 ImageHDU        26   (200, 200, 200)   float64   

We can put it into a dictionary in the same way as before, but we slice the file handle f so that we don't use the PrimaryHDU. hdu.name is the field name and hdu.data is the actual data. Each of these velocity fields is in km/s. We can check that we got the correct fields.

In [19]:
data = {}
for hdu in f:
    name = hdu.name.lower()
    data[name] = (hdu.data,"km/s")
print (data.keys())
dict_keys(['x-velocity', 'y-velocity', 'z-velocity'])

The velocity field names in this case are slightly different than the standard yt field names for velocity fields, so we will reassign the field names:

In [20]:
data["velocity_x"] = data.pop("x-velocity")
data["velocity_y"] = data.pop("y-velocity")
data["velocity_z"] = data.pop("z-velocity")

Now we load the data into yt. Let's assume that the box size is a Mpc. Since these are velocity fields, we can overlay velocity vectors on slices, just as if we had loaded in data from a supported code.

In [21]:
ds = yt.load_uniform_grid(data, data["velocity_x"][0].shape, length_unit=(1.0,"Mpc"))
slc = yt.SlicePlot(ds, "x", [("gas", "velocity_x"), ("gas", "velocity_y"), ("gas", "velocity_z")])
for ax in "xyz":
    slc.set_log(("gas", f"velocity_{ax}"), False)
slc.annotate_velocity()
slc.show()