CLI Commands¶

Overview¶

OmniGenBench provides a comprehensive command-line interface (CLI) for codeless operation, enabling users to quickly execute benchmarking, training, inference, and specialized genomic tasks directly from the terminal without writing Python scripts.

Version: OmniGenBench v0.3.23alpha

Key Features¶

✅ Unified CLI Entry Points:

ogb - Main unified command (recommended)
python -m omnigenbench.cli.omnigenome_cli - Alternative entry point

✅ Core Commands:

autobench - Automated benchmarking on standard datasets
autotrain - Automated training/fine-tuning
autoinfer - Automated inference with fine-tuned models
rna_design - RNA sequence design for target structures

✅ Supported Benchmarks:

RGB - RNA Genomic Benchmark
PGB - Protein Genomic Benchmark
GUE - Genomics Understanding Evaluation
GB - General Benchmark
BEACON - Comprehensive benchmark suite

Quick Start Examples¶

Unified Command (OGB CLI)¶

Recommended: Use the ogb command for all operations:

# Automated benchmarking
ogb autobench --model yangheng/OmniGenome-186M --benchmark RGB

# Automated training
ogb autotrain --dataset yangheng/tfb_promoters --model zhihan1996/DNABERT-2-117M

# Automated inference
ogb autoinfer --model yangheng/ogb_tfb_finetuned --sequence "ATCGATCGATCG"

Alternative Entry Point¶

Using the Python module (legacy):

# RNA design (use ogb rna_design instead, recommended)
python -m omnigenbench.cli.omnigenome_cli rna_design --structure "(((...)))"

# AutoInfer (use ogb autoinfer instead, recommended)
python -m omnigenbench.cli.omnigenome_cli autoinfer --model yangheng/ogb_tfb_finetuned --sequence "ATCG"

Command Overview¶

AutoBench - Automated Benchmarking¶

Benchmark genomic foundation models on standard datasets:

ogb autobench --model MODEL_NAME --benchmark BENCHMARK_NAME [OPTIONS]

Key Arguments:

--model, -m (required): Model name or path (HF model ID or local path)
--benchmark, -b (required): Benchmark name (RGB, PGB, GUE, GB, BEACON)
--tokenizer, -t: Tokenizer to use (default: same as model)
--trainer: Trainer type (default: accelerate)
--bs_scale: Batch size scale factor for GPU memory utilization
--overwrite: Overwrite existing results

Examples:

# Benchmark OmniGenome on RGB
ogb autobench --model yangheng/OmniGenome-186M --benchmark RGB

# Benchmark DNABERT on GUE with custom trainer
ogb autobench --model zhihan1996/DNABERT-2-117M --benchmark GUE --trainer accelerate

# Benchmark with increased batch size
ogb autobench --model yangheng/OmniGenome-186M --benchmark BEACON --bs_scale 4

AutoTrain - Automated Training¶

Automatically train or fine-tune models on genomic datasets:

ogb autotrain --dataset DATASET_NAME --model MODEL_NAME [OPTIONS]

Key Arguments:

--dataset (required): Dataset name or path
--model (required): Pre-trained model name or path
--tokenizer: Tokenizer to use (default: same as model)
--output-dir: Directory to save fine-tuned model
--num-epochs: Number of training epochs
--batch-size: Training batch size
--learning-rate: Learning rate
--trainer: Trainer type (default: accelerate)
--overwrite: Overwrite existing output directory

Examples:

# Basic training
ogb autotrain --dataset yangheng/tfb_promoters --model zhihan1996/DNABERT-2-117M

# Training with custom parameters
ogb autotrain \\
    --dataset ./my_dataset \\
    --model yangheng/OmniGenome-186M \\
    --num-epochs 10 \\
    --batch-size 32 \\
    --learning-rate 5e-5 \\
    --output-dir ./my_model

AutoInfer - Automated Inference¶

Run inference with fine-tuned models on arbitrary sequences:

ogb autoinfer --model MODEL_NAME [--sequence SEQ | --input-file FILE] [OPTIONS]

Key Arguments:

--model (required): Fine-tuned model name or path
--sequence: Input sequence(s) (single sequence, comma-separated, or .txt file)
--input-file: Path to JSON/CSV file with input data
--output-file: Output file for predictions (default: inference_results.json)
--batch-size: Batch size for inference (default: 32)
--device: Device to use (e.g., ‘cuda:0’, ‘cpu’; auto-detected if not specified)

Input Formats:

Single sequence: --sequence "ATCGATCGATCG"
Multiple sequences: --sequence "ATCG,CGTA,TACG"
Text file: --sequence sequences.txt (one per line)
JSON file: --input-file data.json
- Format 1: {"sequences": ["ATCG", "CGTA"]}
- Format 2: {"data": [{"sequence": "ATCG", "id": 1}, ...]}
- Format 3: ["ATCG", "CGTA"]
CSV file: --input-file data.csv (must have ‘sequence’ column)

Examples:

# Single sequence inference
ogb autoinfer --model yangheng/ogb_tfb_finetuned --sequence "ATCGATCGATCG"

# Multiple sequences
ogb autoinfer --model yangheng/ogb_te_finetuned --sequence "ATCG,CGTA,TACG"

# Batch inference from JSON
ogb autoinfer --model yangheng/ogb_tfb_finetuned --input-file sequences.json --batch-size 64

# CSV input with metadata
ogb autoinfer --model yangheng/ogb_tfb_finetuned --input-file data.csv --device cuda:0

# Text file input
ogb autoinfer --model yangheng/ogb_tfb_finetuned --sequence sequences.txt --output-file results.json

RNA Design¶

Design RNA sequences that fold into specific secondary structures using genetic algorithms combined with masked language modeling:

ogb rna_design --structure STRUCTURE [OPTIONS]

Algorithm Overview:

The RNA design command uses an evolutionary algorithm enhanced with masked language modeling (MLM) to generate RNA sequences that fold into target secondary structures. The process involves:

Initialization: Generate initial population using MLM-guided sequence generation
Evolution: Iteratively improve sequences through:
- Crossover: Multi-point recombination between parent sequences
- Mutation: MLM-guided mutations to explore sequence space
- Selection: Multi-objective optimization using NSGA-II-like sorting
Evaluation: Structure prediction using ViennaRNA with fitness scoring based on:
- Structure similarity (Hamming distance to target)
- Thermodynamic stability (Minimum Free Energy)

Key Arguments:

--structure (required): Target RNA structure in dot-bracket notation (e.g., “(((…)))”)
- Use ( for opening pairs, ) for closing pairs, . for unpaired bases
- Structure length determines sequence length
- Example simple hairpin: "(((...))) (8 nucleotides)
- Example complex structure: "(((..(((...)))..))) (18 nucleotides)
--model: Pre-trained model name or path (default: yangheng/OmniGenome-186M)
- Can use HuggingFace model IDs or local paths
- Larger models may produce better results but run slower
- Recommended: yangheng/OmniGenome-186M for balance of speed and quality
--mutation-ratio: Mutation ratio for genetic algorithm (0.0-1.0, default: 0.5)
- Controls the fraction of nucleotides mutated per generation
- Lower values (0.1-0.3): More conservative, slower convergence
- Higher values (0.5-0.8): More exploration, may be unstable
- Recommended: Start with 0.5, decrease if not converging
--num-population: Population size (default: 100)
- Number of candidate sequences per generation
- Larger populations explore more solutions but run slower
- Recommended: 100-500 depending on structure complexity
- Simple structures (< 20 nt): 100 is sufficient
- Complex structures (> 50 nt): Consider 200-500
--num-generation: Number of generations (default: 100)
- Maximum evolutionary iterations
- Algorithm terminates early if perfect solution found
- Recommended: 50-200 depending on difficulty
- Monitor output to see if more generations are needed
--output-file: Output JSON file to save results
- Saves designed sequences and parameters
- JSON format includes structure, parameters, and best sequences
- If not specified, only prints to console

Output Format:

The command outputs designed sequences to console and optionally saves to JSON file:

{
  "structure": "(((...)))",
  "parameters": {
    "model": "yangheng/OmniGenome-186M",
    "mutation_ratio": 0.5,
    "num_population": 100,
    "num_generation": 100
  },
  "best_sequences": [
    "GCGAAACGC",
    "GCUAAAGCC",
    ...
  ]
}

Examples:

# Basic RNA design for simple hairpin
ogb rna_design --structure "(((...)))"

# Design with custom parameters for better results
ogb rna_design \
    --structure "(((...)))" \
    --model yangheng/OmniGenome-186M \
    --mutation-ratio 0.3 \
    --num-population 200 \
    --num-generation 150 \
    --output-file hairpin_design.json

# Design complex structure (stem-loop-stem)
ogb rna_design \
    --structure "(((..(((...)))..)))" \
    --num-population 300 \
    --num-generation 200 \
    --output-file complex_design.json

# Fast design with smaller population
ogb rna_design \
    --structure "((((....))))" \
    --num-population 50 \
    --num-generation 50 \
    --mutation-ratio 0.6

Legacy Command (still supported):

python -m omnigenbench.cli.omnigenome_cli rna_design --structure "(((...)))"

Performance Tips:

Start Simple: Test with small structures first to verify functionality
Monitor Progress: The algorithm shows progress with tqdm progress bar
Early Termination: Algorithm stops when perfect match is found (Hamming distance = 0)
GPU Acceleration: Uses GPU automatically if available for MLM inference
Parallel Folding: Enable with parallel=True in Python API for faster structure evaluation

Troubleshooting:

Poor Convergence: Try lowering mutation ratio (0.2-0.3) or increasing population
Too Slow: Reduce population/generations or use smaller model
No Perfect Match: Some structures are difficult; best sequence is still returned
Out of Memory: Reduce batch size in MLM inference or use CPU
Invalid Structure: Ensure balanced parentheses in dot-bracket notation

Common Structure Patterns:

Hairpin: ``”(((…)))”`
Stem-Loop-Stem: ``”(((..(((…)))..)))”`
Multi-loop: ``”(((…(((…)))..(((…))).)))”`
Pseudoknot: Not supported (dot-bracket notation limitation)

Related Python API:

from omnigenbench import OmniModelForRNADesign

model = OmniModelForRNADesign(model="yangheng/OmniGenome-186M")
sequences = model.design(
    structure="(((...)))",
    mutation_ratio=0.5,
    num_population=100,
    num_generation=100
)
print("Designed sequences:", sequences)

Available Benchmarks¶

Benchmark	Description	Key Tasks
RGB	RNA Genomic Benchmark	RNA structure prediction, stability, interactions
PGB	Protein Genomic Benchmark	Protein function, structure, interaction prediction
GUE	Genomics Understanding Evaluation	Comprehensive genomic understanding tasks
GB	General Benchmark	General genomic prediction tasks
BEACON	Comprehensive Benchmark Suite	Long-range dependencies, context understanding

Trainer Types¶

All training and benchmarking commands support multiple trainer backends:

Trainer	Description	Use Case
accelerate	HuggingFace Accelerate (default)	Multi-GPU distributed training
native	Native PyTorch trainer	Single-GPU, debugging
hf_trainer	HuggingFace Trainer	Full HF ecosystem integration

Example:

# Use Accelerate trainer (recommended for multi-GPU)
ogb autotrain --dataset data --model model --trainer accelerate

# Use native trainer (for single-GPU)
ogb autotrain --dataset data --model model --trainer native

Best Practices¶

✅ Recommended Workflows

Model Benchmarking:

# Benchmark model on standard dataset
ogb autobench --model yangheng/OmniGenome-186M --benchmark RGB

Model Fine-tuning:

# Fine-tune on custom dataset
ogb autotrain --dataset ./my_data --model yangheng/OmniGenome-186M --num-epochs 10

Inference Pipeline:

# Inference on new sequences
ogb autoinfer --model ./my_finetuned_model --input-file new_data.csv

❌ Common Pitfalls

Not providing either --sequence or --input-file for autoinfer
Using invalid benchmark names
Forgetting to specify output directory when fine-tuning
Using batch size too large for GPU memory

Detailed API Documentation¶

Below are the detailed API references for each CLI component.

OGB CLI (Unified Entry Point)¶

OmniGenBench (OGB) Command Line Interface

This is the main entry point for all OmniGenBench CLI commands. It provides four main subcommands: - autobench: Automated benchmarking of genomic foundation models - autotrain: Automated training/fine-tuning of models - autoinfer: Automated inference with fine-tuned models - rna_design: RNA sequence design for target structures

omnigenbench.cli.ogb_cli.create_autobench_parser(subparsers)[source]: Create the autobench subcommand parser.

omnigenbench.cli.ogb_cli.create_autoinfer_parser(subparsers)[source]: Create the autoinfer subcommand parser.

omnigenbench.cli.ogb_cli.create_autotrain_parser(subparsers)[source]: Create the autotrain subcommand parser.

omnigenbench.cli.ogb_cli.create_rna_design_parser(subparsers)[source]: Create the parser for RNA design command.

omnigenbench.cli.ogb_cli.main()[source]

Main entry point for the OGB (OmniGenBench) CLI.

This provides a unified interface for all OmniGenBench command-line tools.

omnigenbench.cli.ogb_cli.run_autobench(args)[source]: Execute the autobench command.

omnigenbench.cli.ogb_cli.run_autoinfer(args)[source]: Execute the autoinfer command.

omnigenbench.cli.ogb_cli.run_autotrain(args)[source]: Execute the autotrain command.

omnigenbench.cli.ogb_cli.run_rna_design(args)[source]: Execute the RNA design command.

The ogb command is the recommended unified entry point for all OmniGenBench CLI operations. It provides three main subcommands:

autobench: Automated benchmarking
autotrain: Automated training
autoinfer: Automated inference

Usage:

ogb {autobench,autotrain,autoinfer} [options]

OmniGenome CLI (Alternative Entry Point)¶

omnigenbench.cli.omnigenome_cli.main()[source]

The main entry point for the OmniGenome command-line interface. This function sets up the command-line argument parser and handles the execution of different subcommands. Supports RNA design and model inference functionality.

Example

>>> # Design RNA sequences from command line
>>> python -m omnigenbench.cli.omnigenome_cli rna_design --structure "(((...)))"
>>> # Run model inference
>>> python -m omnigenbench.cli.omnigenome_cli autoinfer --model "yangheng/ogb_tfb_finetuned" --sequence "ATCGATCGATCG"

Alternative entry point via Python module. Supports:

rna_design: RNA sequence design
autoinfer: Model inference (also available via ogb)

Usage:

python -m omnigenbench.cli.omnigenome_cli {rna_design,autoinfer} [options]

Base Commands¶

class omnigenbench.cli.commands.base.BaseCommand[source]

Bases: ABC

This class provides a common interface for all command-line interface commands in the OmniGenome framework. It defines the structure that all command classes must follow, including registration and common argument handling. Subclasses must implement the register_command method to define their specific command-line interface and arguments.

Example

>>> class MyCommand(BaseCommand):
...     @classmethod
...     def register_command(cls, subparsers):
...         parser = subparsers.add_parser("mycommand", help="My command")
...         parser.add_argument("--input", required=True)
...         parser.set_defaults(func=cls.execute)
...
...     @staticmethod
...     def execute(args):
...         print(f"Executing with input: {args.input}")

classmethod add_common_arguments(parser)[source]

This method adds standard arguments that are common across all OmniGenome CLI commands, such as logging level and output directory.

Parameters:: parser – The ArgumentParser for the specific command

Example

>>> parser = argparse.ArgumentParser()
>>> BaseCommand.add_common_arguments(parser)

abstractmethod classmethod register_command(subparsers)[source]

This abstract method must be implemented by all subclasses to define their specific command-line interface, including arguments, help text, and default functions.

Parameters:: subparsers – The subparsers object from the main ArgumentParser

Example

>>> parser = argparse.ArgumentParser()
>>> subparsers = parser.add_subparsers()
>>> MyCommand.register_command(subparsers)

The BaseCommand abstract class provides a common interface for all CLI commands. All command classes inherit from this base and implement the register_command method.

Key Methods:

register_command(cls, subparsers): Register command with argparse
add_common_arguments(cls, parser): Add common CLI arguments

Bench Commands¶

class omnigenbench.cli.commands.bench.bench_cli.BenchCommand[source]

Bases: BaseCommand

This class provides a CLI interface for the AutoBench functionality, allowing users to easily run comprehensive evaluations of genomic models across multiple benchmarks. It supports various benchmarks, models, and training configurations.

Variables:

benchmarks (list) – List of available benchmarks (RGB, PGB, GUE, GB, BEACON)
trainers (list) – List of available trainers (native, accelerate, hf_trainer)

Example

>>> # Run basic benchmark
>>> python -m omnigenbench.cli autobench --model "model_name" --benchmark "RGB"
>>> # Run with custom settings
>>> python -m omnigenbench.cli autobench
...     --model "model_name"
...     --benchmark "RGB"
...     --trainer "accelerate"
...     --bs_scale 2
...     --overwrite True

static execute(args: Namespace)[source]

Execute the autobench command with the provided arguments. It handles model and tokenizer loading, benchmark execution, and result logging.

Parameters:: args (argparse.Namespace) – Parsed command-line arguments containing benchmark configuration and model settings

Example

>>> args = parser.parse_args(['autobench', '--model', 'model_name'])
>>> BenchCommand.execute(args)

classmethod register_command(subparsers)[source]

This method sets up the command-line interface for the autobench functionality, including all necessary arguments and their descriptions.

Parameters:: subparsers – The subparsers object from argparse to add the command to

Example

>>> parser = argparse.ArgumentParser()
>>> subparsers = parser.add_subparsers()
>>> BenchCommand.register_command(subparsers)

omnigenbench.cli.commands.bench.bench_cli.register_command(subparsers)[source]

This function is a convenience wrapper for registering the BenchCommand with the argument parser.

Parameters:: subparsers – The subparsers object from argparse to add the command to

Example

>>> parser = argparse.ArgumentParser()
>>> subparsers = parser.add_subparsers()
>>> register_command(subparsers)

The BenchCommand class implements automated benchmarking functionality. It evaluates genomic foundation models on standard benchmark datasets.

Supported Benchmarks:

RGB (RNA Genomic Benchmark)
PGB (Protein Genomic Benchmark)
GUE (Genomics Understanding Evaluation)
GB (General Benchmark)
BEACON (Comprehensive benchmark suite)

Example:

# Basic benchmarking
ogb autobench --model yangheng/OmniGenome-186M --benchmark RGB

# With custom settings
ogb autobench \\
    --model yangheng/OmniGenome-186M \\
    --benchmark RGB \\
    --trainer accelerate \\
    --bs_scale 2 \\
    --overwrite

RNA Commands¶

class omnigenbench.cli.commands.rna.rna_design.RNADesignCommand[source]

Bases: BaseCommand

This class provides a CLI interface for designing RNA sequences that fold into specific secondary structures. It uses genetic algorithms with customizable parameters to optimize sequence design for target structures.

The design process involves:

Loading a pre-trained RNA design model
Running genetic algorithm optimization
Generating sequences that match the target structure
Saving results to file (optional)

Variables:

model (str) – Path or name of the pre-trained RNA design model
structure (str) – Target RNA secondary structure in dot-bracket notation
mutation_ratio (float) – Genetic algorithm mutation rate
num_population (int) – Population size for genetic algorithm
num_generation (int) – Number of generations for evolution

Example

>>> # Basic RNA design
>>> python -m omnigenbench.cli.omnigenome_cli rna_design --structure "(((...)))"
>>> # Design with custom parameters
>>> python -m omnigenbench.cli.omnigenome_cli rna_design \
...     --structure "(((...)))" \
...     --model "yangheng/OmniGenome-186M" \
...     --mutation-ratio 0.3 \
...     --num-population 200 \
...     --num-generation 150 \
...     --output-file "results.json"

static execute(args: Namespace)[source]

This method runs the RNA sequence design process using genetic algorithms. It validates parameters, loads the model, runs the design optimization, and outputs or saves the results.

Parameters:: args (argparse.Namespace) – Parsed command-line arguments containing design parameters and model settings
Raises:: ValueError – If mutation_ratio is not between 0.0 and 1.0

Example

>>> args = parser.parse_args(['design', '--structure', '(((...)))'])
>>> RNADesignCommand.execute(args)

classmethod register_command(subparsers)[source]

This method sets up the command-line interface for RNA sequence design, including all necessary arguments and their descriptions.

Parameters:: subparsers – The subparsers object from argparse to add the command to

Example

>>> parser = argparse.ArgumentParser()
>>> subparsers = parser.add_subparsers()
>>> RNADesignCommand.register_command(subparsers)

omnigenbench.cli.commands.rna.rna_design.register_command(subparsers)[source]

This function is a convenience wrapper for registering the RNADesignCommand with the argument parser.

Parameters:: subparsers – The subparsers object from argparse to add the command to

Example

>>> parser = argparse.ArgumentParser()
>>> subparsers = parser.add_subparsers()
>>> register_command(subparsers)

The RNADesignCommand class provides RNA sequence design functionality using genetic algorithms.

Design Process:

Load pre-trained RNA design model
Run genetic algorithm optimization
Generate sequences matching target structure
Save results (optional)

Example:

# Basic design (recommended)
ogb rna_design \
    --structure "(((...)))"

# Advanced design
ogb rna_design \
    --structure "(((...)))" \
    --model yangheng/OmniGenome-186M \
    --mutation-ratio 0.3 \
    --num-population 200 \
    --num-generation 150 \
    --output-file results.json

# Legacy command (still supported)
python -m omnigenbench.cli.omnigenome_cli rna_design --structure "(((...)))"

Advanced Usage Examples¶

Batch Processing Pipeline¶

Complete pipeline from training to inference:

# Step 1: Fine-tune model
ogb autotrain \\
    --dataset yangheng/tfb_promoters \\
    --model yangheng/OmniGenome-186M \\
    --num-epochs 10 \\
    --output-dir ./tfb_model

# Step 2: Benchmark the fine-tuned model
ogb autobench \\
    --model ./tfb_model \\
    --benchmark RGB

# Step 3: Run inference on new data
ogb autoinfer \\
    --model ./tfb_model \\
    --input-file new_sequences.csv \\
    --output-file predictions.json

Multi-GPU Training¶

Using Accelerate for distributed training:

# Configure Accelerate (one-time setup)
accelerate config

# Launch distributed training
accelerate launch -m omnigenbench.cli.ogb_cli autotrain \\
    --dataset yangheng/tfb_promoters \\
    --model yangheng/OmniGenome-186M \\
    --trainer accelerate

# Or use torchrun
torchrun --nproc_per_node=4 -m omnigenbench.cli.ogb_cli autotrain \\
    --dataset data \\
    --model model

Inference with Different Input Formats¶

JSON Format:

# sequences.json
{
  "sequences": [
    "ATCGATCGATCG",
    "CGTAGCTAGCTA"
  ]
}

# Run inference
ogb autoinfer --model model --input-file sequences.json

CSV Format with Metadata:

# data.csv
# sequence,gene_id,organism
# ATCGATCGATCG,GENE001,human
# CGTAGCTAGCTA,GENE002,mouse

# Run inference (preserves metadata)
ogb autoinfer --model model --input-file data.csv

Complex JSON with Metadata:

# complex_data.json
{
  "data": [
    {"sequence": "ATCG", "id": 1, "type": "promoter"},
    {"sequence": "CGTA", "id": 2, "type": "enhancer"}
  ]
}

# Run inference
ogb autoinfer --model model --input-file complex_data.json

Custom Benchmark Configuration¶

# Run benchmark with custom batch size and trainer
ogb autobench \\
    --model yangheng/OmniGenome-186M \\
    --benchmark BEACON \\
    --trainer accelerate \\
    --bs_scale 4 \\
    --overwrite

# Results will be saved to the benchmark directory
# with model name and timestamp

RNA Design Workflow¶

Complete workflow for designing RNA sequences for multiple target structures:

# Design multiple structures in batch (recommended: use ogb command)
for structure in "(((...)))" "(((..)))" "((((....))))"; do
    ogb rna_design \
        --structure "$structure" \
        --num-population 200 \
        --num-generation 200 \
        --output-file "design_${structure//[().]/_}.json"
done

# Or using a structures file
while IFS= read -r structure; do
    echo "Designing for: $structure"
    ogb rna_design \
        --structure "$structure" \
        --num-population 300 \
        --num-generation 150 \
        --output-file "design_$(echo $structure | md5sum | cut -d' ' -f1).json"
done < structures.txt

Advanced: Python Script for Batch Design:

from omnigenbench import OmniModelForRNADesign
import json
from pathlib import Path

# Define target structures
structures = [
    "(((...)))",           # Simple hairpin
    "(((..(((...)))..)))", # Stem-loop-stem
    "(((..(((...)))..(((...))).)))",  # Multi-loop
]

# Initialize model once for efficiency
model = OmniModelForRNADesign(model="yangheng/OmniGenome-186M")

# Design sequences for each structure
results = {}
for structure in structures:
    print(f"Designing sequences for: {structure}")
    sequences = model.design(
        structure=structure,
        mutation_ratio=0.5,
        num_population=200,
        num_generation=150
    )
    results[structure] = sequences
    print(f"  Found {len(sequences) if isinstance(sequences, list) else 1} solutions")

# Save all results
Path("batch_design_results.json").write_text(
    json.dumps(results, indent=2)
)
print(f"\\nAll results saved to batch_design_results.json")

Troubleshooting¶

Common Issues and Solutions¶

Issue: “Either –sequence or –input-file must be provided for inference”

Solution: Always provide input data for autoinfer:

# Correct
ogb autoinfer --model model --sequence "ATCG"
ogb autoinfer --model model --input-file data.json

# Wrong
ogb autoinfer --model model  # Missing input!

Issue: CUDA out of memory during benchmarking

Solution: Reduce batch size or use gradient accumulation:

# Reduce batch size
ogb autobench --model model --benchmark RGB --bs_scale 1

# Or use smaller model
ogb autobench --model smaller_model --benchmark RGB

Issue: “Invalid benchmark name”

Solution: Use correct benchmark names:

# Correct
ogb autobench --model model --benchmark RGB
ogb autobench --model model --benchmark GUE

# Wrong
ogb autobench --model model --benchmark rgb  # Case sensitive!

Issue: Model not found

Solution: Verify model path/name:

# HuggingFace Hub model
ogb autoinfer --model yangheng/OmniGenome-186M --sequence "ATCG"

# Local model (use absolute path)
ogb autoinfer --model /path/to/model --sequence "ATCG"

Performance Tips¶

Use appropriate batch size:

# For large GPU memory
ogb autobench --model model --benchmark RGB --bs_scale 4

# For limited memory
ogb autobench --model model --benchmark RGB --bs_scale 1

Enable mixed precision:

# Automatically enabled with accelerate trainer
ogb autotrain --dataset data --model model --trainer accelerate

Parallel inference:

# Increase batch size for faster inference
ogb autoinfer --model model --input-file large_data.csv --batch-size 128

GPU selection:

# Specify GPU device
ogb autoinfer --model model --sequence "ATCG" --device cuda:0

Environment Variables¶

OmniGenBench respects the following environment variables:

# HuggingFace cache directory
export HF_HOME=/path/to/cache

# CUDA device selection
export CUDA_VISIBLE_DEVICES=0,1,2,3

# Disable warnings
export PYTHONWARNINGS=ignore

Output Formats¶

AutoInfer Output¶

The autoinfer command produces JSON output with the following structure:

{
  "model": "model_name",
  "total_sequences": 100,
  "results": [
    {
      "sequence": "ATCGATCGATCG",
      "metadata": {"index": 0},
      "predictions": [0.8, 0.2],
      "probabilities": [0.8, 0.2],
      "logits": [1.5, -0.3]
    },
    ...
  ]
}

Fields:

model: Model used for inference
total_sequences: Total number of sequences processed
results: List of per-sequence results
- sequence: Input sequence
- metadata: Original metadata from input
- predictions: Model predictions
- probabilities: Class probabilities (if available)
- logits: Raw logits (if available)
- error: Error message (if processing failed)

RNA Design Output¶

RNA design produces JSON with the following structure:

{
  "structure": "(((...)))",
  "parameters": {
    "mutation_ratio": 0.5,
    "population": 100,
    "generations": 100
  },
  "best_sequences": [
    "GCGAAACGC",
    "GCUAAAGCC",
    ...
  ]
}

CLI Commands¶

Overview¶

Key Features¶

Quick Start Examples¶

Unified Command (OGB CLI)¶

Alternative Entry Point¶

Command Overview¶

AutoBench - Automated Benchmarking¶

AutoTrain - Automated Training¶

AutoInfer - Automated Inference¶

RNA Design¶

Available Benchmarks¶

Trainer Types¶

Best Practices¶

Detailed API Documentation¶

OGB CLI (Unified Entry Point)¶

OmniGenome CLI (Alternative Entry Point)¶

Base Commands¶

Bench Commands¶

RNA Commands¶

Advanced Usage Examples¶

Batch Processing Pipeline¶

Multi-GPU Training¶

Inference with Different Input Formats¶

Custom Benchmark Configuration¶

RNA Design Workflow¶

Troubleshooting¶

Common Issues and Solutions¶

Performance Tips¶

Environment Variables¶

Output Formats¶

AutoInfer Output¶

RNA Design Output¶

See Also¶

External Resources¶

Getting Help¶

CLI Commands¶

Overview¶

Key Features¶

Quick Start Examples¶

Unified Command (OGB CLI)¶

Alternative Entry Point¶

Command Overview¶

AutoBench - Automated Benchmarking¶

AutoTrain - Automated Training¶

AutoInfer - Automated Inference¶

RNA Design¶

Available Benchmarks¶

Trainer Types¶

Best Practices¶

Detailed API Documentation¶

OGB CLI (Unified Entry Point)¶

OmniGenome CLI (Alternative Entry Point)¶

Base Commands¶

Bench Commands¶

RNA Commands¶

Advanced Usage Examples¶

Batch Processing Pipeline¶

Multi-GPU Training¶

Inference with Different Input Formats¶

Custom Benchmark Configuration¶

RNA Design Workflow¶

Troubleshooting¶

Common Issues and Solutions¶

Performance Tips¶

Environment Variables¶

Output Formats¶

AutoInfer Output¶

RNA Design Output¶

See Also¶

Related Documentation¶

External Resources¶

Getting Help¶