Miscellaneous

Utilities

class omnigenbench.src.misc.utils.RNA2StructureCache(cache_file=None, *args, **kwargs)[source]

Bases: dict

A cache for RNA secondary structure predictions using ViennaRNA.

This class provides a caching mechanism for RNA secondary structure predictions to avoid redundant computations. It supports both single sequence and batch processing with optional multiprocessing for improved performance.

Variables:
  • cache (dict) – Dictionary storing sequence-structure mappings

  • cache_file (str) – Path to the cache file on disk

  • queue_num (int) – Counter for tracking cache updates

fold(sequence, return_mfe=False, num_workers=1)[source]

Predicts RNA secondary structure for given sequences.

This method predicts RNA secondary structures using ViennaRNA. It supports both single sequences and batches of sequences. The method uses caching to avoid redundant predictions and supports multiprocessing for batch processing on non-Windows systems.

Parameters:
  • sequence (str or list) – A single RNA sequence or a list of sequences.

  • return_mfe (bool) – Whether to return minimum free energy along with structure. Defaults to False.

  • num_workers (int) – Number of worker processes for batch processing. Defaults to 1. Set to None for auto-detection.

Returns:

str or list

The predicted structure(s). If return_mfe is True,

returns tuples of (structure, mfe).

Example

>>> cache = RNA2StructureCache()
>>> # Predict structure for a single sequence
>>> structure = cache.fold("GGGAAAUCC")
>>> print(structure)  # "(((...)))"
>>> # Predict structures for multiple sequences
>>> structures = cache.fold(["GGGAAAUCC", "AUUGCUAA"])
>>> print(structures)  # ["(((...)))", "........"]
update_cache_file(cache_file=None)[source]

Updates the cache file on disk.

This method saves the in-memory cache to disk. It only saves when the queue_num reaches 100 to avoid excessive disk I/O.

Parameters:

cache_file (str, optional) – Path to the cache file. If None, uses the instance’s cache_file.

Example

>>> cache.update_cache_file()  # Force save to disk
omnigenbench.src.misc.utils.check_bench_version(bench_version, omnigenbench_version)[source]

Check if benchmark version is compatible with OmniGenBench version.

This function compares the benchmark version with the OmniGenBench version to ensure compatibility and warns if there are potential issues.

Parameters:
  • bench_version (str) – Version of the benchmark

  • omnigenbench_version (str) – Version of OmniGenBench

Example

>>> check_bench_version("0.2.0", "0.3.0")
omnigenbench.src.misc.utils.clean_temp_checkpoint(days_threshold=7)[source]

Clean up temporary checkpoint files older than specified days.

This function removes temporary checkpoint files that are older than the specified threshold to free up disk space.

Parameters:

days_threshold (int) – Number of days after which files are considered old. Defaults to 7.

Example

>>> clean_temp_checkpoint(3)  # Remove files older than 3 days
omnigenbench.src.misc.utils.clean_temp_dir_pt_files()[source]

Clean up temporary PyTorch files in the current directory.

This function removes temporary PyTorch files (like .pt, .pth files) that may be left over from previous runs.

Example

>>> clean_temp_dir_pt_files()
omnigenbench.src.misc.utils.env_meta_info()[source]

Collects metadata about the current environment and library versions.

This function gathers information about the current Python environment, including versions of key libraries like PyTorch and Transformers, as well as OmniGenBench version information.

Returns:

dict

A dictionary containing environment metadata including:
  • library_name: Name of the OmniGenBench library

  • omnigenbench_version: Version of OmniGenBench

  • torch_version: PyTorch version with CUDA info

  • transformers_version: Transformers library version

Example

>>> metadata = env_meta_info()
>>> print(metadata['torch_version'])  # "2.0.0+cu118+git..."
omnigenbench.src.misc.utils.fprint(*objects, sep=' ', end='\n', file=<colorama.ansitowin32.StreamWrapper object>, flush=False)[source]

Enhanced print function with automatic flushing. It provides a print-like interface with automatic flushing to ensure output is displayed immediately. It’s useful for real-time logging and progress tracking.

Parameters:
  • *objects (-) – Objects to print

  • sep (-) – Separator between objects (default: “ “)

  • end (-) – String appended after the last value (default: "\\n")

  • file (-) – File-like object to write to (default: sys.stdout)

  • flush (-) – Whether to flush the stream (default: False)

Example

>>> fprint("Training started...", flush=True)
>>> fprint("Epoch 1/10", "Loss: 0.5", sep=" | ")
omnigenbench.src.misc.utils.load_module_from_path(module_name, file_path)[source]

This function dynamically loads a Python module from a file path, useful for loading configuration files or custom modules.

Parameters:
  • module_name (str) – Name to assign to the loaded module

  • file_path (str) – Path to the Python file to load

Returns:

module – The loaded module object

Example

>>> config = load_module_from_path("config", "config.py")
>>> print(config.some_variable)
omnigenbench.src.misc.utils.naive_secondary_structure_repair(sequence, structure)[source]

This function attempts to repair malformed RNA secondary structure representations by ensuring proper bracket matching. It handles common issues like unmatched brackets by converting them to dots.

Parameters:
  • sequence (str) – A string representing the sequence.

  • structure (str) – A string representing the secondary structure.

Returns:

str – A string representing the repaired secondary structure.

Example

>>> sequence = "GGGAAAUCC"
>>> structure = "(((...)"  # Malformed structure
>>> repaired = naive_secondary_structure_repair(sequence, structure)
>>> print(repaired)  # "(((...))"
omnigenbench.src.misc.utils.print_args(config, logger=None)[source]

This function prints the arguments from a configuration object to the console or a logger. It’s useful for debugging and logging experiment parameters.

Parameters:
  • config – A Namespace object containing the arguments.

  • logger – A logger object. If None, prints to console.

Example

>>> from argparse import Namespace
>>> config = Namespace(learning_rate=0.001, batch_size=32)
>>> print_args(config)
omnigenbench.src.misc.utils.save_args(config, save_path)[source]

This function saves the arguments from a configuration object to a text file. It’s useful for logging experiment parameters and configurations.

Parameters:
  • config – A Namespace object containing the arguments.

  • save_path (str) – A string representing the path of the file to be saved.

Example

>>> from argparse import Namespace
>>> config = Namespace(learning_rate=0.001, batch_size=32)
>>> save_args(config, "config.txt")
omnigenbench.src.misc.utils.seed_everything(seed=42)[source]

Sets random seeds for reproducibility across all random number generators. This function sets seeds for Python’s random module, NumPy, PyTorch (CPU and CUDA), and sets the PYTHONHASHSEED environment variable to ensure reproducible results across different runs.

Parameters:

seed (int) – The seed value to use for all random number generators. Defaults to 42.

Example

>>> # Set seeds for reproducibility
>>> seed_everything(42)
>>> # Now all random operations will be reproducible