Filtering your Dataset

Large datasets are oftentimes too overwhelming to deal with in their entirety, and it can be useful and faster to analyze subsets of these datasets. Furthermore, filtering the dataset based on some field condition can reveal subtle information not easily accessible by looking at the whole dataset. Filters can be generated based on spatial position, say in a sphere in the center of your dataset space, or more generally they can be defined by the properties of any field in the simulation.

Because mesh fields are internally different from particle fields, there are different ways of filtering each type as indicated below; however, filtering fields by spatial location (i.e. geometric objects) will apply to both types equally.

Filtering Mesh Fields

Mesh fields can be filtered by two methods: cut region objects (YTCutRegion) and NumPy boolean masks. Boolean masks are simpler, but they only work for examining datasets, whereas cut regions objects create wholly new data objects suitable for full analysis (data examination, image generation, etc.)

Boolean Masks

NumPy boolean masks can be used with any NumPy array simply by passing the array a conditional. As a general example of this:

In [1]:
import numpy as np
a = np.arange(5)
bigger_than_two = (a > 2)
print("Original Array: a = \n%s" % a)
print("Boolean Mask: bigger_than_two = \n%s" % bigger_than_two)
print("Masked Array: a[bigger_than_two] = \n%s" % a[bigger_than_two])
Original Array: a = 
[0 1 2 3 4]
Boolean Mask: bigger_than_two = 
[False False False  True  True]
Masked Array: a[bigger_than_two] = 
[3 4]

Similarly, if you’ve created a yt data object (e.g. a region, a sphere), you can examine its field values as a NumPy array by simply indexing it with the field name. Thus, it too can be masked using a NumPy boolean mask. Let’s set a simple mask based on the contents of one of our fields.

In [1]:
import yt
ds = yt.load('Enzo_64/DD0042/data0042')
ad = ds.all_data()
hot = ad["temperature"].in_units('K') > 1e6
print('Temperature of all data: ad["temperature"] = \n%s' % ad["temperature"])
print("Boolean Mask: hot = \n%s" % hot)
print('Temperature of "hot" data: ad["temperature"][hot] = \n%s' %
Temperature of all data: ad["temperature"] = 
[  1.00000000e+00   1.00000000e+00   1.00000000e+00 ...,   1.87798863e+07
   1.77985684e+07   1.73020029e+07] K
Boolean Mask: hot = 
[False False False ...,  True  True  True]
Temperature of "hot" data: ad["temperature"][hot] = 
[  6502480.87990464   7005697.58048854   7453051.94593615 ...,
  18779886.27258056  17798568.39391833  17302002.90929025] K

This was a simple example, but one can make the conditionals that define a boolean mask have multiple parts, and one can stack masks together to make very complex cuts on one’s data. Once the data is filtered, it can be used if you simply need to access the NumPy arrays:

In [1]:
import yt
ds = yt.load('Enzo_64/DD0042/data0042')
ad = ds.all_data()
overpressure_and_fast = (ad["pressure"] > 1e-14) & (ad["velocity_magnitude"].in_units('km/s') > 1e2)
print('Density of all data: ad["density"] = \n%s' % ad['density'])
print('Density of "overpressure and fast" data: ad["density"][overpressure_and_fast] = \n%s' %
Density of all data: ad["density"] = 
[  2.05645650e-31   1.87920300e-31   2.73545747e-31 ...,   3.17163369e-28
   2.45725655e-28   2.13200943e-28] g/cm**3
Density of "overpressure and fast" data: ad["density"][overpressure_and_fast] = 
[  9.68106360e-29   1.03163794e-28   1.17164845e-28 ...,   3.17163369e-28
   2.45725655e-28   2.13200943e-28] g/cm**3

Cut Regions

Cut regions are a more general solution to filtering mesh fields. The output of a cut region is an entirely new data object, which can be treated like any other data object to generate images, examine its values, etc.


Let us demonstrate this with an example using the same dataset as we used with the boolean masks.

In [1]:
import yt
ds = yt.load("Enzo_64/DD0042/data0042")

The only argument to a cut region is a conditional on field output from a data object. The only catch is that you must denote the data object in the conditional as "obj" regardless of the actual object's name.

Here we create three new data objects which are copies of the all_data object (a region object covering the entire spatial domain of the simulation), but we've filtered on just "hot" material, the "dense" material, and the "overpressure and fast" material.

In [2]:
ad = ds.all_data()
hot_ad = ad.cut_region(["obj['temperature'] > 1e6"])
dense_ad = ad.cut_region(['obj["density"] > 5e-30'])

# you can chain cut regions in two ways:
dense_and_cool_ad = dense_ad.cut_region(["obj['temperature'] < 1e5"])
overpressure_and_fast_ad = ad.cut_region(['(obj["pressure"] > 1e-14) & (obj["velocity_magnitude"].in_units("km/s") > 1e2)'])

Upon inspection of our "hot_ad" object, we can still get the same results as we got with the boolean masks example above:

In [3]:
print ("Temperature of all cells:\n ad['temperature'] = \n%s\n" % ad["temperature"])
print ("Temperatures of all \"hot\" cells:\n hot_ad['temperature'] = \n%s" % hot_ad['temperature'])
Temperature of all cells:
 ad['temperature'] = 
[  1.00000000e+00   1.00000000e+00   1.00000000e+00 ...,   1.87798863e+07
   1.77985684e+07   1.73020029e+07] K

Temperatures of all "hot" cells:
 hot_ad['temperature'] = 
[  6502480.87990464   7005697.58048854   7453051.94593615 ...,
  18779886.27258056  17798568.39391833  17302002.90929025] K
In [4]:
print ("Density of dense, cool material:\n dense_and_cool_ad['density'] = \n%s\n" % dense_and_cool_ad['density'])
print ("Temperature of dense, cool material:\n dense_and_cool_ad['temperature'] = \n%s" % dense_and_cool_ad['temperature'])
Density of dense, cool material:
 dense_and_cool_ad['density'] = 
[  5.14642762e-30   7.12875101e-30   5.63989313e-30 ...,   5.80347887e-30
   9.08863749e-30   5.50362729e-30] g/cm**3

Temperature of dense, cool material:
 dense_and_cool_ad['temperature'] = 
[  9.58828100e+04   9.36330508e+04   9.10436886e+04 ...,   1.11450104e+00
   3.58217286e+04   1.99072499e+02] K

Now that we've constructed a cut_region, we can use it as a data source for further analysis. To create a plot based on a cut_region, use the data_source keyword argument provided by yt's plotting objects.

Here's an example using projections:

In [5]:
proj1 = yt.ProjectionPlot(ds, 'x', "density", weight_field="density")
proj1.annotate_title('No Cuts')

proj2 = yt.ProjectionPlot(ds, 'x', "density", weight_field="density", data_source=hot_ad)
proj2.annotate_title('Hot Gas')
proj2.set_zlim("density", 3e-31, 3e-27)

/usr/lib64/python3.4/site-packages/matplotlib/ RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0

The data_source keyword argument is also accepted by SlicePlot, ProfilePlot and PhasePlot:

In [6]:
slc1 = yt.SlicePlot(ds, 'x', "density", center='m')
slc1.set_zlim('density', 3e-31, 3e-27)
slc1.annotate_title('No Cuts')

slc2 = yt.SlicePlot(ds, 'x', "density", center='m', data_source=dense_ad)
slc2.set_zlim('density', 3e-31, 3e-27)
slc2.annotate_title('Dense Gas')

In [7]:
ph1 = yt.PhasePlot(ad, 'density', 'temperature', 'cell_mass', weight_field=None)
ph1.set_xlim(3e-31, 3e-27)
ph1.annotate_title('No Cuts')

ph1 = yt.PhasePlot(dense_ad, 'density', 'temperature', 'cell_mass', weight_field=None)
ph1.set_xlim(3e-31, 3e-27)
ph1.annotate_title('Dense Gas')

(mesh_filter.ipynb; mesh_filter_evaluated.ipynb;

Cut regions can also operator on particle fields, but a single cut region object cannot operate on both particle fields and mesh fields at the same time.

Filtering Particle Fields

Particle filters create new particle fields based on the manipulation and cuts on existing particle fields. You can apply cuts to them to effectively mask out everything except the particles with which you are concerned.

Creating a particle filter takes a few steps. You must first define a function which accepts a data object (e.g. all_data, sphere, etc.) as its argument. It uses the fields and information in this geometric object in order to produce some sort of conditional mask that is then returned to create a new particle type.

Here is a particle filter to create a new star particle type. For Enzo simulations, stars have particle_type set to 2, so our filter will select only the particles with particle_type (i.e. field = ('all', 'particle_type') equal to 2.

@yt.particle_filter(requires=["particle_type"], filtered_type='all')
def stars(pfilter, data):
    filter = data[(pfilter.filtered_type, "particle_type")] == 2
    return filter

The particle_filter() decorator takes a few options. You must specify the names of the particle fields that are required in order to define the filter — in this case the particle_type field. Additionally, you must specify the particle type to be filtered — in this case we filter all the particle in dataset by specifying the all particle type.

In addition, you may specify a name for the newly defined particle type. If no name is specified, the name for the particle type will be inferred from the name of the filter definition — in this case the inferred name will be stars.

Lastly, the filter must be applied to our dataset of choice. Note that this filter can be added to as many datasets as we wish. It will only actually create new filtered fields if the dataset has the required fields, though.

import yt
ds = yt.load('IsolatedGalaxy/galaxy0030/galaxy0030')

And that’s it! We can now access all of the (‘stars’, field) fields from our dataset ds and treat them as any other particle field. In addition, it created some deposit fields, where the particles were deposited on to the grid as mesh fields.

As an alternative syntax, you can also define a new particle filter via the add_particle_filter() function.

def Stars(pfilter, data):
    filter = data[(pfilter.filtered_type, "particle_type")] == 2
    return filter

add_particle_filter("stars", function=Stars, filtered_type='all',

This is equivalent to our use of the particle_filter decorator above. The choice to use either the particle_filter decorator or the add_particle_filter function is a purely stylistic choice.


Let us go through a full worked example. Here we have a Tipsy SPH dataset. By general inspection, we see that there are stars present in the dataset, since there are fields with field type: Stars in the ds.field_list. Let's look at the derived_field_list for all of the Stars fields.

In [1]:
import yt
import numpy as np

ds = yt.load("TipsyGalaxy/galaxy.00300")
for field in ds.derived_field_list:
    if field[0] == 'Stars':
        print (field)
('Stars', 'Coordinates')
('Stars', 'Epsilon')
('Stars', 'FeMassFrac')
('Stars', 'Fe_fraction')
('Stars', 'FormationTime')
('Stars', 'Mass')
('Stars', 'Metals')
('Stars', 'Phi')
('Stars', 'Velocities')
('Stars', 'creation_time')
('Stars', 'mesh_id')
('Stars', 'metallicity')
('Stars', 'particle_angular_momentum')
('Stars', 'particle_angular_momentum_magnitude')
('Stars', 'particle_angular_momentum_x')
('Stars', 'particle_angular_momentum_y')
('Stars', 'particle_angular_momentum_z')
('Stars', 'particle_cylindrical_velocity_theta')
('Stars', 'particle_cylindrical_velocity_z')
('Stars', 'particle_mass')
('Stars', 'particle_ones')
('Stars', 'particle_position')
('Stars', 'particle_position_cylindrical_radius')
('Stars', 'particle_position_cylindrical_theta')
('Stars', 'particle_position_cylindrical_z')
('Stars', 'particle_position_relative')
('Stars', 'particle_position_relative_x')
('Stars', 'particle_position_relative_y')
('Stars', 'particle_position_relative_z')
('Stars', 'particle_position_spherical_phi')
('Stars', 'particle_position_spherical_radius')
('Stars', 'particle_position_spherical_theta')
('Stars', 'particle_position_x')
('Stars', 'particle_position_y')
('Stars', 'particle_position_z')
('Stars', 'particle_radial_velocity')
('Stars', 'particle_radius')
('Stars', 'particle_specific_angular_momentum')
('Stars', 'particle_specific_angular_momentum_x')
('Stars', 'particle_specific_angular_momentum_y')
('Stars', 'particle_specific_angular_momentum_z')
('Stars', 'particle_spherical_position_phi')
('Stars', 'particle_spherical_position_radius')
('Stars', 'particle_spherical_position_theta')
('Stars', 'particle_spherical_velocity_phi')
('Stars', 'particle_spherical_velocity_radius')
('Stars', 'particle_spherical_velocity_theta')
('Stars', 'particle_velocity')
('Stars', 'particle_velocity_cylindrical_radius')
('Stars', 'particle_velocity_cylindrical_theta')
('Stars', 'particle_velocity_cylindrical_z')
('Stars', 'particle_velocity_magnitude')
('Stars', 'particle_velocity_relative')
('Stars', 'particle_velocity_relative_x')
('Stars', 'particle_velocity_relative_y')
('Stars', 'particle_velocity_relative_z')
('Stars', 'particle_velocity_spherical_phi')
('Stars', 'particle_velocity_spherical_radius')
('Stars', 'particle_velocity_spherical_theta')
('Stars', 'particle_velocity_x')
('Stars', 'particle_velocity_y')
('Stars', 'particle_velocity_z')

We will filter these into young stars and old stars by masking on the ('Stars', 'creation_time') field.

In order to do this, we first make a function which applies our desired cut. This function must accept two arguments: pfilter and data. The first argument is a ParticleFilter object that contains metadata about the filter its self. The second argument is a yt data container.

Let's call "young" stars only those stars with ages less 5 million years. Since Tipsy assigns a very large creation_time for stars in the initial conditions, we need to also exclude stars with negative ages.

Conversely, let's define "old" stars as those stars formed dynamically in the simulation with ages greater than 5 Myr. We also include stars with negative ages, since these stars were included in the simulation initial conditions.

We make use of pfilter.filtered_type so that the filter definition will use the same particle type as the one specified in the call to add_particle_filter below. This makes the filter definition usable for arbitrary particle types. Since we're only filtering the "Stars" particle type in this example, we could have also replaced pfilter.filtered_type with "Stars" and gotten the same result.

In [2]:
def young_stars(pfilter, data):
    age = data.ds.current_time - data[pfilter.filtered_type, "creation_time"]
    filter = np.logical_and(age.in_units('Myr') <= 5, age >= 0)
    return filter

def old_stars(pfilter, data):
    age = data.ds.current_time - data[pfilter.filtered_type, "creation_time"]
    filter = np.logical_or(age.in_units('Myr') >= 5, age < 0)
    return filter

Now we define these as particle filters within the yt universe with the add_particle_filter() function.

In [3]:
yt.add_particle_filter("young_stars", function=young_stars, filtered_type='Stars', requires=["creation_time"])

yt.add_particle_filter("old_stars", function=old_stars, filtered_type='Stars', requires=["creation_time"])

Let us now apply these filters specifically to our dataset.

Let's double check that it worked by looking at the derived_field_list for any new fields created by our filter.

In [4]:

for field in ds.derived_field_list:
    if "young_stars" in field or "young_stars" in field[1]:
        print (field)
('deposit', 'young_stars_cic')
('deposit', 'young_stars_cic_velocity_x')
('deposit', 'young_stars_cic_velocity_y')
('deposit', 'young_stars_cic_velocity_z')
('deposit', 'young_stars_count')
('deposit', 'young_stars_density')
('deposit', 'young_stars_mass')
('deposit', 'young_stars_nn_velocity_x')
('deposit', 'young_stars_nn_velocity_y')
('deposit', 'young_stars_nn_velocity_z')
('young_stars', 'Coordinates')
('young_stars', 'Epsilon')
('young_stars', 'FeMassFrac')
('young_stars', 'Fe_fraction')
('young_stars', 'FormationTime')
('young_stars', 'Mass')
('young_stars', 'Metals')
('young_stars', 'Phi')
('young_stars', 'Velocities')
('young_stars', 'creation_time')
('young_stars', 'mesh_id')
('young_stars', 'metallicity')
('young_stars', 'particle_angular_momentum')
('young_stars', 'particle_angular_momentum_magnitude')
('young_stars', 'particle_angular_momentum_x')
('young_stars', 'particle_angular_momentum_y')
('young_stars', 'particle_angular_momentum_z')
('young_stars', 'particle_cylindrical_velocity_theta')
('young_stars', 'particle_cylindrical_velocity_z')
('young_stars', 'particle_mass')
('young_stars', 'particle_ones')
('young_stars', 'particle_position')
('young_stars', 'particle_position_cylindrical_radius')
('young_stars', 'particle_position_cylindrical_theta')
('young_stars', 'particle_position_cylindrical_z')
('young_stars', 'particle_position_relative')
('young_stars', 'particle_position_relative_x')
('young_stars', 'particle_position_relative_y')
('young_stars', 'particle_position_relative_z')
('young_stars', 'particle_position_spherical_phi')
('young_stars', 'particle_position_spherical_radius')
('young_stars', 'particle_position_spherical_theta')
('young_stars', 'particle_position_x')
('young_stars', 'particle_position_y')
('young_stars', 'particle_position_z')
('young_stars', 'particle_radial_velocity')
('young_stars', 'particle_radius')
('young_stars', 'particle_specific_angular_momentum')
('young_stars', 'particle_specific_angular_momentum_x')
('young_stars', 'particle_specific_angular_momentum_y')
('young_stars', 'particle_specific_angular_momentum_z')
('young_stars', 'particle_spherical_position_phi')
('young_stars', 'particle_spherical_position_radius')
('young_stars', 'particle_spherical_position_theta')
('young_stars', 'particle_spherical_velocity_phi')
('young_stars', 'particle_spherical_velocity_radius')
('young_stars', 'particle_spherical_velocity_theta')
('young_stars', 'particle_velocity')
('young_stars', 'particle_velocity_cylindrical_radius')
('young_stars', 'particle_velocity_cylindrical_theta')
('young_stars', 'particle_velocity_cylindrical_z')
('young_stars', 'particle_velocity_magnitude')
('young_stars', 'particle_velocity_relative')
('young_stars', 'particle_velocity_relative_x')
('young_stars', 'particle_velocity_relative_y')
('young_stars', 'particle_velocity_relative_z')
('young_stars', 'particle_velocity_spherical_phi')
('young_stars', 'particle_velocity_spherical_radius')
('young_stars', 'particle_velocity_spherical_theta')
('young_stars', 'particle_velocity_x')
('young_stars', 'particle_velocity_y')
('young_stars', 'particle_velocity_z')

We see all of the new young_stars fields as well as the 4 deposit fields. These deposit fields are mesh fields generated by depositing particle fields on the grid. Let's generate a couple of projections of where the young and old stars reside in this simulation by accessing some of these new fields.

In [5]:
p = yt.ProjectionPlot(ds, 'z', [('deposit', 'young_stars_cic'), ('deposit', 'old_stars_cic')], width=(40, 'kpc'), center='m')