sage_analysis.Model

This module contains the Model class. The Model class contains all the data paths, cosmology etc for calculating galaxy properties.

To read SAGE data, we make use of specialized Data Classes (e.g., SageBinaryData and:py:class:~sage_analysis.sage_hdf5.SageHdf5Data). We refer to ../user/data_class for more information about adding your own Data Class to ingest data.

To calculate (and plot) extra properties from the SAGE output, we refer to ../user/calc.rst and ../user/plotting.rst.

class sage_analysis.model.Model(sage_file: str, sage_output_format: Optional[str], label: Optional[str], first_file_to_analyze: int, last_file_to_analyze: int, num_sage_output_files: Optional[int], random_seed: Optional[int], IMF: str, plot_toggles: Dict[str, bool], plots_that_need_smf: List[str], sample_size: int = 1000, sSFRcut: float = -11.0)[source]

Handles all the galaxy data (including calculated properties) for a SAGE model.

The ingestion of data is handled by inidivudal Data Classes (e.g., SageBinaryData and SageHdf5Data). We refer to ../user/data_class for more information about adding your own Data Class to ingest data.

__init__(sage_file: str, sage_output_format: Optional[str], label: Optional[str], first_file_to_analyze: int, last_file_to_analyze: int, num_sage_output_files: Optional[int], random_seed: Optional[int], IMF: str, plot_toggles: Dict[str, bool], plots_that_need_smf: List[str], sample_size: int = 1000, sSFRcut: float = -11.0)[source]

Sets the galaxy path and number of files to be read for a model. Also initialises the plot toggles that dictates which properties will be calculated.

Parameters:
  • label (str, optional) – The label that will be placed on the plots for this model. If not specified, will use FileNameGalaxies read from sage_file.

  • sage_output_format (str, optional) – If not specified will use the OutputFormat read from sage_file.

  • num_sage_output_files (int, optional) – Specifies the number of output files that were generated by running SAGE. This can be different to the range specified by [first_file_to_analyze, last_file_to_analyze].

    Notes

    This variable only needs to be specified if sage_output_format is sage_binary.

  • sample_size (int, optional) – Specifies the length of the properties attributes stored as 1-dimensional ndarray. These properties are initialized using init_scatter_properties().

  • sSFRcut (float, optional) – The specific star formation rate above which a galaxy is flagged as “star forming”. Units are log10.

calc_properties(calculation_functions, gals, snapshot: int)[source]

Calculates galaxy properties for a single file of galaxies.

Parameters:
  • calculation_functions (dict [string, function]) – Specifies the functions used to calculate the properties. All functions in this dictionary are called on the galaxies. The function signature is required to be func(Model, gals)
  • gals (exact format given by the Model Data Class.) – The galaxies for this file.
  • snapshot (int) – The snapshot that we’re calculating properties for.

Notes

If sage_output_format is sage_binary, gals is a numpy structured array. If sage_output_format: is sage_hdf5, gals is an open HDF5 group. We refer to ../user/data_class for more information about adding your own Data Class to ingest data.

calc_properties_all_files(calculation_functions, snapshot: int, close_file: bool = True, use_pbar: bool = True, debug: bool = False)[source]

Calculates galaxy properties for all files of a single Model.

Parameters:
  • calculation_functions (dict [string, list(function, dict[string, variable])]) – Specifies the functions used to calculate the properties of this Model. The key of this dictionary is the name of the plot toggle. The value is a list with the 0th element being the function and the 1st element being a dictionary of additional keyword arguments to be passed to the function. The inner dictionary is keyed by the keyword argument names with the value specifying the keyword argument value.

    All functions in this dictionary for called after the galaxies for each sub-file have been loaded. The function signature is required to be func(Model, gals, <Extra Keyword Arguments>).

  • snapshot (int) – The snapshot that we’re calculating properties for.

  • close_file (boolean, optional) – Some data formats have a single file data is read from rather than opening and closing the sub-files in read_gals(). Hence once the properties are calculated, the file must be closed. This variable flags whether the data class specific close_file() method should be called upon completion of this method.

  • use_pbar (Boolean, optional) – If set, uses the tqdm package to create a progress bar.

  • debug (Boolean, optional) – If set, prints out extra useful debug information.

init_binned_properties(bin_low: float, bin_high: float, bin_width: float, bin_name: str, property_names: List[str], snapshot: int)[source]

Initializes the properties (and respective bins) that will binned on some variable. For example, the stellar mass function (SMF) will describe the number of galaxies within a stellar mass bin.

bins can be accessed via Model.bins["bin_name"] and are initialized as ndarray. properties can be accessed via Model.properties["property_name"] and are initialized using numpy.zeros.

Parameters:
  • bin_low, bin_high, bin_width (floats) – Values that define the minimum, maximum and width of the bins respectively. This defines the binning axis that the property_names properties will be binned on.
  • bin_name (string) – Name of the binning axis, accessed by Model.bins["bin_name"].
  • property_names (list of strings) – Name of the properties that will be binned along the defined binning axis. Properties can be accessed using Model.properties["property_name"]; e.g., Model.properties["SMF"] would return the stellar mass function that is binned using the bin_name bins.
  • snapshot (int) – The snapshot we’re initialising the properties for.
init_scatter_properties(property_names: List[str], snapshot: int)[source]

Initializes the properties that will be extended as ndarray. These are used to plot (e.g.,) a the star formation rate versus stellar mass for a subset of sample_size galaxies. Initializes as empty ndarray.

Parameters:
  • property_names (list of strings) – Name of the properties that will be extended as ndarray.
  • snapshot (int) – The snapshot we’re initialising the properties for.
init_single_properties(property_names: List[str], snapshot: int) → None[source]

Initializes the properties that are described using a single number. This is used to plot (e.g.,) a the sum of stellar mass across all galaxies. Initializes as 0.0.

Parameters:
  • property_names (list of strings) – Name of the properties that will be described using a single number.
  • snapshot (int) – The snapshot we’re initialising the properties for.
select_random_galaxy_indices(inds: numpy.ndarray, num_inds_selected_already: int) → numpy.ndarray[source]

Selects random indices (representing galaxies) from inds. This method assumes that the total number of galaxies selected across all SAGE files analyzed is sample_size and that (preferably) these galaxies should be selected equally amongst all files analyzed.

For example, if we are analyzing 8 SAGE output files and wish to select 10,000 galaxies, this function would hence select 1,250 indices from inds.

If the length of inds is less than the number of requested values (e.g., inds only contains 1,000 values), then the next file analyzed will attempt to select 1,500 random galaxies (1,250 base plus an addition 250 as the previous file could not find enough galaxies).

At the end of the analysis, if there have not been enough galaxies selected, then a message is sent to the user.

IMF

The initial mass function.

Type:{"Chabrier", "Salpeter"}
base_sage_data_path

Base path to the output data. This is the path without specifying any extra information about redshift or the file extension itself.

Type:string
bins

The bins used to bin some properties. Bins are initialized through init_binned_properties(). Key is the name of the bin, (bin_name in init_binned_properties() ).

Type:dict [string, ndarray ]
box_size

Size of the simulation box. Units are Mpc/h.

Type:float
calculation_functions

A dictionary of functions that are used to compute the properties of galaxies. Here, the string is the name of the toggle (e.g., "SMF"), the value is a tuple containing the function itself (e.g., calc_SMF()), and another dictionary which specifies any optional keyword arguments to that function with keys as the name of variable (e.g., "calc_sub_populations") and values as the variable value (e.g., True).

Type:dict[str, tuple[func, dict[str, any]]]
first_file_to_analyze

The first SAGE sub-file to be read. If sage_output_format is sage_binary, files read must be labelled sage_data_path.XXX. If sage_output_format is sage_hdf5, the file read will be sage_data_path and the groups accessed will be Core_XXX. In both cases, XXX represents the numbers in the range [first_file_to_analyze, last_file_to_analyze] inclusive.

Type:int
hubble_h

Value of the fractional Hubble parameter. That is, H = 100*hubble_h.

Type:float
label

Label that will go on axis legends for this Model.

Type:string
last_file_to_analyze

The last SAGE sub-file to be read. If sage_output_format is sage_binary, files read must be labelled sage_data_path.XXX. If sage_output_format is sage_hdf5, the file read will be sage_data_path and the groups accessed will be Core_XXX. In both cases, XXX represents the numbers in the range [first_file_to_analyze, last_file_to_analyze] inclusive.

Type:int
num_gals_all_files

Number of galaxies across all files. For HDF5 data formats, this represents the number of galaxies across all Core_XXX sub-groups.

Type:int
num_sage_output_files

The number of files that SAGE wrote. This will be equal to the number of processors the SAGE ran with.

Notes

If sage_output_format is sage_hdf5, this attribute is not required.

Type:int
output_path

Path to where some plots will be saved. Used for plot_spatial_3d().

Type:string
parameter_dirpath

The directory path to where the SAGE paramter file is located. This is only the base directory path and does not include the name of the file itself.

Type:str
plot_toggles

Specifies which plots should be created for this model. This will control which properties should be calculated; e.g., if no stellar mass function is to be plotted, the stellar mass function will not be computed.

Type:dict[str, bool]
plots_that_need_smf

Specifies the plot toggles that require the stellar mass function to be properly computed and analyzed. For example, plotting the quiescent fraction of galaxies requires knowledge of the total number of galaxies. The strings here must EXACTLY match the keys in plot_toggles.

Type:list of ints
properties

The galaxy properties stored across the input files and snapshots. These properties are updated within the respective calc_<plot_toggle> functions.

The outside key is "snapshot_XX" where XX is the snapshot number for the property. The inner key is the name of the proeprty (e.g., "SMF").

Type:dict [string, dict [string, ndarray ]] or dict[string, dict[string, float]
random_seed

Specifies the seed used for the random number generator, used to select galaxies for plotting purposes. If None, then uses default call to seed().

Type:Optional[int]
redshifts

Redshifts for this simulation.

Type:ndarray
sSFRcut

The specific star formation rate above which a galaxy is flagged as “star forming”. Units are log10.

Type:float
sage_data_path

Path to the output data. If sage_output_format is sage_binary, files read must be labelled sage_data_path.XXX. If sage_output_format is sage_hdf5, the file read will be sage_data_path and the groups accessed will be Core_XXX at snapshot snapshot. In both cases, XXX represents the numbers in the range [first_file_to_analyze, last_file_to_analyze] inclusive.

Type:string
sage_file

The path to where the SAGE .ini file is located.

Type:str
sage_output_format

The output format SAGE wrote in. A specific Data Class (e.g., SageBinaryData and SageHdf5Data) must be written and used for each sage_output_format option. We refer to ../user/data_class for more information about adding your own Data Class to ingest data.

Type:{"sage_binary", "sage_binary"}
sample_size

Specifies the length of the properties attributes stored as 1-dimensional ndarray. These properties are initialized using init_scatter_properties().

Type:int
snapshot

Specifies the snapshot to be read. If sage_output_format is sage_hdf5, this specifies the HDF5 group to be read. Otherwise, if sage_output_format is sage_binary, this attribute will be used to index redshifts and generate the suffix for sage_data_path.

Type:int
volume

Volume spanned by the trees analyzed by this model. This depends upon the number of files processed, [:py:attr:`~first_file_to_analyze`, :py:attr:`~last_file_to_analyze`], relative to the total number of files the simulation spans over, num_sim_tree_files.

Notes

This is not necessarily box_size cubed. It is possible that this model is only analysing a subset of files and hence the volume will be less.

Type:volume