Filtering your Dataset¶
Large datasets are oftentimes too overwhelming to deal with in their entirety, and it can be useful and faster to analyze subsets of these datasets. Furthermore, filtering the dataset based on some field condition can reveal subtle information not easily accessible by looking at the whole dataset. Filters can be generated based on spatial position, say in a sphere in the center of your dataset space, or more generally they can be defined by the properties of any field in the simulation.
Because mesh fields are internally different from particle fields, there are different ways of filtering each type as indicated below; however, filtering fields by spatial location (i.e. geometric objects) will apply to both types equally.
Filtering Mesh Fields¶
Mesh fields can be filtered by two methods: cut region objects
and NumPy boolean masks. Boolean masks are simpler, but they only work
for examining datasets, whereas cut regions objects create wholly new
data objects suitable for full analysis (data examination, image generation,
NumPy boolean masks can be used with any NumPy array simply by passing the array a conditional. As a general example of this:
import numpy as np a = np.arange(5) bigger_than_two = (a > 2) print("Original Array: a = \n%s" % a) print("Boolean Mask: bigger_than_two = \n%s" % bigger_than_two) print("Masked Array: a[bigger_than_two] = \n%s" % a[bigger_than_two])
Original Array: a = [0 1 2 3 4] Boolean Mask: bigger_than_two = [False False False True True] Masked Array: a[bigger_than_two] = [3 4]
Similarly, if you’ve created a yt data object (e.g. a region, a sphere), you can examine its field values as a NumPy array by simply indexing it with the field name. Thus, it too can be masked using a NumPy boolean mask. Let’s set a simple mask based on the contents of one of our fields.
import yt ds = yt.load('Enzo_64/DD0042/data0042') ad = ds.all_data() hot = ad["temperature"].in_units('K') > 1e6 print('Temperature of all data: ad["temperature"] = \n%s' % ad["temperature"]) print("Boolean Mask: hot = \n%s" % hot) print('Temperature of "hot" data: ad["temperature"][hot] = \n%s' % ad['temperature'][hot])
Temperature of all data: ad["temperature"] = [1.00000000e+00 1.00000000e+00 1.00000000e+00 ... 1.87798863e+07 1.77985684e+07 1.73020029e+07] K Boolean Mask: hot = [False False False ... True True True] Temperature of "hot" data: ad["temperature"][hot] = [ 6502480.87990464 7005697.58048854 7453051.94593615 ... 18779886.27258056 17798568.39391833 17302002.90929025] K
This was a simple example, but one can make the conditionals that define a boolean mask have multiple parts, and one can stack masks together to make very complex cuts on one’s data. Once the data is filtered, it can be used if you simply need to access the NumPy arrays:
import yt ds = yt.load('Enzo_64/DD0042/data0042') ad = ds.all_data() overpressure_and_fast = (ad["pressure"] > 1e-14) & (ad["velocity_magnitude"].in_units('km/s') > 1e2) print('Density of all data: ad["density"] = \n%s' % ad['density']) print('Density of "overpressure and fast" data: overpressure_and_fast['density'] = \n%s' % overpressure_and_fast['density'])
File "<ipython-input-1-0faf65db8b43>", line 6 print('Density of "overpressure and fast" data: overpressure_and_fast['density'] = \n%s' % ^ SyntaxError: invalid syntax
Cut regions are a more general solution to filtering mesh fields. The output of a cut region is an entirely new data object, which can be treated like any other data object to generate images, examine its values, etc.