CLI Commands¶
Overview¶
OmniGenBench provides a comprehensive command-line interface (CLI) for codeless operation, enabling users to quickly execute benchmarking, training, inference, and specialized genomic tasks directly from the terminal without writing Python scripts.
Version: OmniGenBench v0.3.23alpha
Key Features¶
- ✅ Unified CLI Entry Points:
ogb- Main unified command (recommended)python -m omnigenbench.cli.omnigenome_cli- Alternative entry point
- ✅ Core Commands:
autobench- Automated benchmarking on standard datasetsautotrain- Automated training/fine-tuningautoinfer- Automated inference with fine-tuned modelsrna_design- RNA sequence design for target structures
- ✅ Supported Benchmarks:
RGB - RNA Genomic Benchmark
PGB - Protein Genomic Benchmark
GUE - Genomics Understanding Evaluation
GB - General Benchmark
BEACON - Comprehensive benchmark suite
Quick Start Examples¶
Unified Command (OGB CLI)¶
Recommended: Use the ogb command for all operations:
# Automated benchmarking
ogb autobench --model yangheng/OmniGenome-186M --benchmark RGB
# Automated training
ogb autotrain --dataset yangheng/tfb_promoters --model zhihan1996/DNABERT-2-117M
# Automated inference
ogb autoinfer --model yangheng/ogb_tfb_finetuned --sequence "ATCGATCGATCG"
Alternative Entry Point¶
Using the Python module (legacy):
# RNA design (use ogb rna_design instead, recommended)
python -m omnigenbench.cli.omnigenome_cli rna_design --structure "(((...)))"
# AutoInfer (use ogb autoinfer instead, recommended)
python -m omnigenbench.cli.omnigenome_cli autoinfer --model yangheng/ogb_tfb_finetuned --sequence "ATCG"
Command Overview¶
AutoBench - Automated Benchmarking¶
Benchmark genomic foundation models on standard datasets:
ogb autobench --model MODEL_NAME --benchmark BENCHMARK_NAME [OPTIONS]
Key Arguments:
--model, -m(required): Model name or path (HF model ID or local path)--benchmark, -b(required): Benchmark name (RGB, PGB, GUE, GB, BEACON)--tokenizer, -t: Tokenizer to use (default: same as model)--trainer: Trainer type (default: accelerate)--bs_scale: Batch size scale factor for GPU memory utilization--overwrite: Overwrite existing results
Examples:
# Benchmark OmniGenome on RGB
ogb autobench --model yangheng/OmniGenome-186M --benchmark RGB
# Benchmark DNABERT on GUE with custom trainer
ogb autobench --model zhihan1996/DNABERT-2-117M --benchmark GUE --trainer accelerate
# Benchmark with increased batch size
ogb autobench --model yangheng/OmniGenome-186M --benchmark BEACON --bs_scale 4
AutoTrain - Automated Training¶
Automatically train or fine-tune models on genomic datasets:
ogb autotrain --dataset DATASET_NAME --model MODEL_NAME [OPTIONS]
Key Arguments:
--dataset(required): Dataset name or path--model(required): Pre-trained model name or path--tokenizer: Tokenizer to use (default: same as model)--output-dir: Directory to save fine-tuned model--num-epochs: Number of training epochs--batch-size: Training batch size--learning-rate: Learning rate--trainer: Trainer type (default: accelerate)--overwrite: Overwrite existing output directory
Examples:
# Basic training
ogb autotrain --dataset yangheng/tfb_promoters --model zhihan1996/DNABERT-2-117M
# Training with custom parameters
ogb autotrain \\
--dataset ./my_dataset \\
--model yangheng/OmniGenome-186M \\
--num-epochs 10 \\
--batch-size 32 \\
--learning-rate 5e-5 \\
--output-dir ./my_model
AutoInfer - Automated Inference¶
Run inference with fine-tuned models on arbitrary sequences:
ogb autoinfer --model MODEL_NAME [--sequence SEQ | --input-file FILE] [OPTIONS]
Key Arguments:
--model(required): Fine-tuned model name or path--sequence: Input sequence(s) (single sequence, comma-separated, or .txt file)--input-file: Path to JSON/CSV file with input data--output-file: Output file for predictions (default: inference_results.json)--batch-size: Batch size for inference (default: 32)--device: Device to use (e.g., ‘cuda:0’, ‘cpu’; auto-detected if not specified)
Input Formats:
Single sequence:
--sequence "ATCGATCGATCG"Multiple sequences:
--sequence "ATCG,CGTA,TACG"Text file:
--sequence sequences.txt(one per line)JSON file:
--input-file data.jsonFormat 1:
{"sequences": ["ATCG", "CGTA"]}Format 2:
{"data": [{"sequence": "ATCG", "id": 1}, ...]}Format 3:
["ATCG", "CGTA"]
CSV file:
--input-file data.csv(must have ‘sequence’ column)
Examples:
# Single sequence inference
ogb autoinfer --model yangheng/ogb_tfb_finetuned --sequence "ATCGATCGATCG"
# Multiple sequences
ogb autoinfer --model yangheng/ogb_te_finetuned --sequence "ATCG,CGTA,TACG"
# Batch inference from JSON
ogb autoinfer --model yangheng/ogb_tfb_finetuned --input-file sequences.json --batch-size 64
# CSV input with metadata
ogb autoinfer --model yangheng/ogb_tfb_finetuned --input-file data.csv --device cuda:0
# Text file input
ogb autoinfer --model yangheng/ogb_tfb_finetuned --sequence sequences.txt --output-file results.json
RNA Design¶
Design RNA sequences that fold into specific secondary structures using genetic algorithms combined with masked language modeling:
ogb rna_design --structure STRUCTURE [OPTIONS]
Algorithm Overview:
The RNA design command uses an evolutionary algorithm enhanced with masked language modeling (MLM) to generate RNA sequences that fold into target secondary structures. The process involves:
Initialization: Generate initial population using MLM-guided sequence generation
Evolution: Iteratively improve sequences through:
Crossover: Multi-point recombination between parent sequences
Mutation: MLM-guided mutations to explore sequence space
Selection: Multi-objective optimization using NSGA-II-like sorting
Evaluation: Structure prediction using ViennaRNA with fitness scoring based on:
Structure similarity (Hamming distance to target)
Thermodynamic stability (Minimum Free Energy)
Key Arguments:
--structure(required): Target RNA structure in dot-bracket notation (e.g., “(((…)))”)Use
(for opening pairs,)for closing pairs,.for unpaired basesStructure length determines sequence length
Example simple hairpin:
"(((...)))(8 nucleotides)Example complex structure:
"(((..(((...)))..)))(18 nucleotides)
--model: Pre-trained model name or path (default: yangheng/OmniGenome-186M)Can use HuggingFace model IDs or local paths
Larger models may produce better results but run slower
Recommended: yangheng/OmniGenome-186M for balance of speed and quality
--mutation-ratio: Mutation ratio for genetic algorithm (0.0-1.0, default: 0.5)Controls the fraction of nucleotides mutated per generation
Lower values (0.1-0.3): More conservative, slower convergence
Higher values (0.5-0.8): More exploration, may be unstable
Recommended: Start with 0.5, decrease if not converging
--num-population: Population size (default: 100)Number of candidate sequences per generation
Larger populations explore more solutions but run slower
Recommended: 100-500 depending on structure complexity
Simple structures (< 20 nt): 100 is sufficient
Complex structures (> 50 nt): Consider 200-500
--num-generation: Number of generations (default: 100)Maximum evolutionary iterations
Algorithm terminates early if perfect solution found
Recommended: 50-200 depending on difficulty
Monitor output to see if more generations are needed
--output-file: Output JSON file to save resultsSaves designed sequences and parameters
JSON format includes structure, parameters, and best sequences
If not specified, only prints to console
Output Format:
The command outputs designed sequences to console and optionally saves to JSON file:
{
"structure": "(((...)))",
"parameters": {
"model": "yangheng/OmniGenome-186M",
"mutation_ratio": 0.5,
"num_population": 100,
"num_generation": 100
},
"best_sequences": [
"GCGAAACGC",
"GCUAAAGCC",
...
]
}
Examples:
# Basic RNA design for simple hairpin
ogb rna_design --structure "(((...)))"
# Design with custom parameters for better results
ogb rna_design \
--structure "(((...)))" \
--model yangheng/OmniGenome-186M \
--mutation-ratio 0.3 \
--num-population 200 \
--num-generation 150 \
--output-file hairpin_design.json
# Design complex structure (stem-loop-stem)
ogb rna_design \
--structure "(((..(((...)))..)))" \
--num-population 300 \
--num-generation 200 \
--output-file complex_design.json
# Fast design with smaller population
ogb rna_design \
--structure "((((....))))" \
--num-population 50 \
--num-generation 50 \
--mutation-ratio 0.6
Legacy Command (still supported):
python -m omnigenbench.cli.omnigenome_cli rna_design --structure "(((...)))"
Performance Tips:
Start Simple: Test with small structures first to verify functionality
Monitor Progress: The algorithm shows progress with tqdm progress bar
Early Termination: Algorithm stops when perfect match is found (Hamming distance = 0)
GPU Acceleration: Uses GPU automatically if available for MLM inference
Parallel Folding: Enable with
parallel=Truein Python API for faster structure evaluation
Troubleshooting:
Poor Convergence: Try lowering mutation ratio (0.2-0.3) or increasing population
Too Slow: Reduce population/generations or use smaller model
No Perfect Match: Some structures are difficult; best sequence is still returned
Out of Memory: Reduce batch size in MLM inference or use CPU
Invalid Structure: Ensure balanced parentheses in dot-bracket notation
Common Structure Patterns:
Hairpin: ``”(((…)))”`
Stem-Loop-Stem: ``”(((..(((…)))..)))”`
Multi-loop: ``”(((…(((…)))..(((…))).)))”`
Pseudoknot: Not supported (dot-bracket notation limitation)
Related Python API:
from omnigenbench import OmniModelForRNADesign
model = OmniModelForRNADesign(model="yangheng/OmniGenome-186M")
sequences = model.design(
structure="(((...)))",
mutation_ratio=0.5,
num_population=100,
num_generation=100
)
print("Designed sequences:", sequences)
Available Benchmarks¶
Benchmark |
Description |
Key Tasks |
|---|---|---|
RGB |
RNA Genomic Benchmark |
RNA structure prediction, stability, interactions |
PGB |
Protein Genomic Benchmark |
Protein function, structure, interaction prediction |
GUE |
Genomics Understanding Evaluation |
Comprehensive genomic understanding tasks |
GB |
General Benchmark |
General genomic prediction tasks |
BEACON |
Comprehensive Benchmark Suite |
Long-range dependencies, context understanding |
Trainer Types¶
All training and benchmarking commands support multiple trainer backends:
Trainer |
Description |
Use Case |
|---|---|---|
accelerate |
HuggingFace Accelerate (default) |
Multi-GPU distributed training |
native |
Native PyTorch trainer |
Single-GPU, debugging |
hf_trainer |
HuggingFace Trainer |
Full HF ecosystem integration |
Example:
# Use Accelerate trainer (recommended for multi-GPU)
ogb autotrain --dataset data --model model --trainer accelerate
# Use native trainer (for single-GPU)
ogb autotrain --dataset data --model model --trainer native
Best Practices¶
✅ Recommended Workflows
Model Benchmarking:
# Benchmark model on standard dataset ogb autobench --model yangheng/OmniGenome-186M --benchmark RGB
Model Fine-tuning:
# Fine-tune on custom dataset ogb autotrain --dataset ./my_data --model yangheng/OmniGenome-186M --num-epochs 10
Inference Pipeline:
# Inference on new sequences ogb autoinfer --model ./my_finetuned_model --input-file new_data.csv
❌ Common Pitfalls
Not providing either
--sequenceor--input-filefor autoinferUsing invalid benchmark names
Forgetting to specify output directory when fine-tuning
Using batch size too large for GPU memory
Detailed API Documentation¶
Below are the detailed API references for each CLI component.
OGB CLI (Unified Entry Point)¶
OmniGenBench (OGB) Command Line Interface
This is the main entry point for all OmniGenBench CLI commands. It provides four main subcommands: - autobench: Automated benchmarking of genomic foundation models - autotrain: Automated training/fine-tuning of models - autoinfer: Automated inference with fine-tuned models - rna_design: RNA sequence design for target structures
- omnigenbench.cli.ogb_cli.create_autobench_parser(subparsers)[source]
Create the autobench subcommand parser.
- omnigenbench.cli.ogb_cli.create_autoinfer_parser(subparsers)[source]
Create the autoinfer subcommand parser.
- omnigenbench.cli.ogb_cli.create_autotrain_parser(subparsers)[source]
Create the autotrain subcommand parser.
- omnigenbench.cli.ogb_cli.create_rna_design_parser(subparsers)[source]
Create the parser for RNA design command.
- omnigenbench.cli.ogb_cli.main()[source]
Main entry point for the OGB (OmniGenBench) CLI.
This provides a unified interface for all OmniGenBench command-line tools.
- omnigenbench.cli.ogb_cli.run_autobench(args)[source]
Execute the autobench command.
- omnigenbench.cli.ogb_cli.run_autoinfer(args)[source]
Execute the autoinfer command.
- omnigenbench.cli.ogb_cli.run_autotrain(args)[source]
Execute the autotrain command.
- omnigenbench.cli.ogb_cli.run_rna_design(args)[source]
Execute the RNA design command.
The ogb command is the recommended unified entry point for all OmniGenBench CLI operations. It provides three main subcommands:
autobench: Automated benchmarkingautotrain: Automated trainingautoinfer: Automated inference
Usage:
ogb {autobench,autotrain,autoinfer} [options]
OmniGenome CLI (Alternative Entry Point)¶
- omnigenbench.cli.omnigenome_cli.main()[source]
The main entry point for the OmniGenome command-line interface. This function sets up the command-line argument parser and handles the execution of different subcommands. Supports RNA design and model inference functionality.
Example
>>> # Design RNA sequences from command line >>> python -m omnigenbench.cli.omnigenome_cli rna_design --structure "(((...)))" >>> # Run model inference >>> python -m omnigenbench.cli.omnigenome_cli autoinfer --model "yangheng/ogb_tfb_finetuned" --sequence "ATCGATCGATCG"
Alternative entry point via Python module. Supports:
rna_design: RNA sequence designautoinfer: Model inference (also available via ogb)
Usage:
python -m omnigenbench.cli.omnigenome_cli {rna_design,autoinfer} [options]
Base Commands¶
- class omnigenbench.cli.commands.base.BaseCommand[source]
Bases:
ABCThis class provides a common interface for all command-line interface commands in the OmniGenome framework. It defines the structure that all command classes must follow, including registration and common argument handling. Subclasses must implement the register_command method to define their specific command-line interface and arguments.
Example
>>> class MyCommand(BaseCommand): ... @classmethod ... def register_command(cls, subparsers): ... parser = subparsers.add_parser("mycommand", help="My command") ... parser.add_argument("--input", required=True) ... parser.set_defaults(func=cls.execute) ... ... @staticmethod ... def execute(args): ... print(f"Executing with input: {args.input}")
- classmethod add_common_arguments(parser)[source]
This method adds standard arguments that are common across all OmniGenome CLI commands, such as logging level and output directory.
- Parameters:
parser – The ArgumentParser for the specific command
Example
>>> parser = argparse.ArgumentParser() >>> BaseCommand.add_common_arguments(parser)
- abstractmethod classmethod register_command(subparsers)[source]
This abstract method must be implemented by all subclasses to define their specific command-line interface, including arguments, help text, and default functions.
- Parameters:
subparsers – The subparsers object from the main ArgumentParser
Example
>>> parser = argparse.ArgumentParser() >>> subparsers = parser.add_subparsers() >>> MyCommand.register_command(subparsers)
The BaseCommand abstract class provides a common interface for all CLI commands. All command classes inherit from this base and implement the register_command method.
Key Methods:
register_command(cls, subparsers): Register command with argparseadd_common_arguments(cls, parser): Add common CLI arguments
Bench Commands¶
- class omnigenbench.cli.commands.bench.bench_cli.BenchCommand[source]
Bases:
BaseCommandThis class provides a CLI interface for the AutoBench functionality, allowing users to easily run comprehensive evaluations of genomic models across multiple benchmarks. It supports various benchmarks, models, and training configurations.
- Variables:
benchmarks (list) – List of available benchmarks (RGB, PGB, GUE, GB, BEACON)
trainers (list) – List of available trainers (native, accelerate, hf_trainer)
Example
>>> # Run basic benchmark >>> python -m omnigenbench.cli autobench --model "model_name" --benchmark "RGB" >>> # Run with custom settings >>> python -m omnigenbench.cli autobench ... --model "model_name" ... --benchmark "RGB" ... --trainer "accelerate" ... --bs_scale 2 ... --overwrite True
- static execute(args: Namespace)[source]
Execute the autobench command with the provided arguments. It handles model and tokenizer loading, benchmark execution, and result logging.
- Parameters:
args (argparse.Namespace) – Parsed command-line arguments containing benchmark configuration and model settings
Example
>>> args = parser.parse_args(['autobench', '--model', 'model_name']) >>> BenchCommand.execute(args)
- classmethod register_command(subparsers)[source]
This method sets up the command-line interface for the autobench functionality, including all necessary arguments and their descriptions.
- Parameters:
subparsers – The subparsers object from argparse to add the command to
Example
>>> parser = argparse.ArgumentParser() >>> subparsers = parser.add_subparsers() >>> BenchCommand.register_command(subparsers)
- omnigenbench.cli.commands.bench.bench_cli.register_command(subparsers)[source]
This function is a convenience wrapper for registering the BenchCommand with the argument parser.
- Parameters:
subparsers – The subparsers object from argparse to add the command to
Example
>>> parser = argparse.ArgumentParser() >>> subparsers = parser.add_subparsers() >>> register_command(subparsers)
The BenchCommand class implements automated benchmarking functionality. It evaluates genomic foundation models on standard benchmark datasets.
Supported Benchmarks:
RGB (RNA Genomic Benchmark)
PGB (Protein Genomic Benchmark)
GUE (Genomics Understanding Evaluation)
GB (General Benchmark)
BEACON (Comprehensive benchmark suite)
Example:
# Basic benchmarking
ogb autobench --model yangheng/OmniGenome-186M --benchmark RGB
# With custom settings
ogb autobench \\
--model yangheng/OmniGenome-186M \\
--benchmark RGB \\
--trainer accelerate \\
--bs_scale 2 \\
--overwrite
RNA Commands¶
- class omnigenbench.cli.commands.rna.rna_design.RNADesignCommand[source]
Bases:
BaseCommandThis class provides a CLI interface for designing RNA sequences that fold into specific secondary structures. It uses genetic algorithms with customizable parameters to optimize sequence design for target structures.
The design process involves:
Loading a pre-trained RNA design model
Running genetic algorithm optimization
Generating sequences that match the target structure
Saving results to file (optional)
- Variables:
model (str) – Path or name of the pre-trained RNA design model
structure (str) – Target RNA secondary structure in dot-bracket notation
mutation_ratio (float) – Genetic algorithm mutation rate
num_population (int) – Population size for genetic algorithm
num_generation (int) – Number of generations for evolution
Example
>>> # Basic RNA design >>> python -m omnigenbench.cli.omnigenome_cli rna_design --structure "(((...)))" >>> # Design with custom parameters >>> python -m omnigenbench.cli.omnigenome_cli rna_design \ ... --structure "(((...)))" \ ... --model "yangheng/OmniGenome-186M" \ ... --mutation-ratio 0.3 \ ... --num-population 200 \ ... --num-generation 150 \ ... --output-file "results.json"
- static execute(args: Namespace)[source]
This method runs the RNA sequence design process using genetic algorithms. It validates parameters, loads the model, runs the design optimization, and outputs or saves the results.
- Parameters:
args (argparse.Namespace) – Parsed command-line arguments containing design parameters and model settings
- Raises:
ValueError – If mutation_ratio is not between 0.0 and 1.0
Example
>>> args = parser.parse_args(['design', '--structure', '(((...)))']) >>> RNADesignCommand.execute(args)
- classmethod register_command(subparsers)[source]
This method sets up the command-line interface for RNA sequence design, including all necessary arguments and their descriptions.
- Parameters:
subparsers – The subparsers object from argparse to add the command to
Example
>>> parser = argparse.ArgumentParser() >>> subparsers = parser.add_subparsers() >>> RNADesignCommand.register_command(subparsers)
- omnigenbench.cli.commands.rna.rna_design.register_command(subparsers)[source]
This function is a convenience wrapper for registering the RNADesignCommand with the argument parser.
- Parameters:
subparsers – The subparsers object from argparse to add the command to
Example
>>> parser = argparse.ArgumentParser() >>> subparsers = parser.add_subparsers() >>> register_command(subparsers)
The RNADesignCommand class provides RNA sequence design functionality using genetic algorithms.
Design Process:
Load pre-trained RNA design model
Run genetic algorithm optimization
Generate sequences matching target structure
Save results (optional)
Example:
# Basic design (recommended)
ogb rna_design \
--structure "(((...)))"
# Advanced design
ogb rna_design \
--structure "(((...)))" \
--model yangheng/OmniGenome-186M \
--mutation-ratio 0.3 \
--num-population 200 \
--num-generation 150 \
--output-file results.json
# Legacy command (still supported)
python -m omnigenbench.cli.omnigenome_cli rna_design --structure "(((...)))"
Advanced Usage Examples¶
Batch Processing Pipeline¶
Complete pipeline from training to inference:
# Step 1: Fine-tune model
ogb autotrain \\
--dataset yangheng/tfb_promoters \\
--model yangheng/OmniGenome-186M \\
--num-epochs 10 \\
--output-dir ./tfb_model
# Step 2: Benchmark the fine-tuned model
ogb autobench \\
--model ./tfb_model \\
--benchmark RGB
# Step 3: Run inference on new data
ogb autoinfer \\
--model ./tfb_model \\
--input-file new_sequences.csv \\
--output-file predictions.json
Multi-GPU Training¶
Using Accelerate for distributed training:
# Configure Accelerate (one-time setup)
accelerate config
# Launch distributed training
accelerate launch -m omnigenbench.cli.ogb_cli autotrain \\
--dataset yangheng/tfb_promoters \\
--model yangheng/OmniGenome-186M \\
--trainer accelerate
# Or use torchrun
torchrun --nproc_per_node=4 -m omnigenbench.cli.ogb_cli autotrain \\
--dataset data \\
--model model
Inference with Different Input Formats¶
JSON Format:
# sequences.json
{
"sequences": [
"ATCGATCGATCG",
"CGTAGCTAGCTA"
]
}
# Run inference
ogb autoinfer --model model --input-file sequences.json
CSV Format with Metadata:
# data.csv
# sequence,gene_id,organism
# ATCGATCGATCG,GENE001,human
# CGTAGCTAGCTA,GENE002,mouse
# Run inference (preserves metadata)
ogb autoinfer --model model --input-file data.csv
Complex JSON with Metadata:
# complex_data.json
{
"data": [
{"sequence": "ATCG", "id": 1, "type": "promoter"},
{"sequence": "CGTA", "id": 2, "type": "enhancer"}
]
}
# Run inference
ogb autoinfer --model model --input-file complex_data.json
Custom Benchmark Configuration¶
# Run benchmark with custom batch size and trainer
ogb autobench \\
--model yangheng/OmniGenome-186M \\
--benchmark BEACON \\
--trainer accelerate \\
--bs_scale 4 \\
--overwrite
# Results will be saved to the benchmark directory
# with model name and timestamp
RNA Design Workflow¶
Complete workflow for designing RNA sequences for multiple target structures:
# Design multiple structures in batch (recommended: use ogb command)
for structure in "(((...)))" "(((..)))" "((((....))))"; do
ogb rna_design \
--structure "$structure" \
--num-population 200 \
--num-generation 200 \
--output-file "design_${structure//[().]/_}.json"
done
# Or using a structures file
while IFS= read -r structure; do
echo "Designing for: $structure"
ogb rna_design \
--structure "$structure" \
--num-population 300 \
--num-generation 150 \
--output-file "design_$(echo $structure | md5sum | cut -d' ' -f1).json"
done < structures.txt
Advanced: Python Script for Batch Design:
from omnigenbench import OmniModelForRNADesign
import json
from pathlib import Path
# Define target structures
structures = [
"(((...)))", # Simple hairpin
"(((..(((...)))..)))", # Stem-loop-stem
"(((..(((...)))..(((...))).)))", # Multi-loop
]
# Initialize model once for efficiency
model = OmniModelForRNADesign(model="yangheng/OmniGenome-186M")
# Design sequences for each structure
results = {}
for structure in structures:
print(f"Designing sequences for: {structure}")
sequences = model.design(
structure=structure,
mutation_ratio=0.5,
num_population=200,
num_generation=150
)
results[structure] = sequences
print(f" Found {len(sequences) if isinstance(sequences, list) else 1} solutions")
# Save all results
Path("batch_design_results.json").write_text(
json.dumps(results, indent=2)
)
print(f"\\nAll results saved to batch_design_results.json")
Troubleshooting¶
Common Issues and Solutions¶
Issue: “Either –sequence or –input-file must be provided for inference”
Solution: Always provide input data for autoinfer:
# Correct
ogb autoinfer --model model --sequence "ATCG"
ogb autoinfer --model model --input-file data.json
# Wrong
ogb autoinfer --model model # Missing input!
Issue: CUDA out of memory during benchmarking
Solution: Reduce batch size or use gradient accumulation:
# Reduce batch size
ogb autobench --model model --benchmark RGB --bs_scale 1
# Or use smaller model
ogb autobench --model smaller_model --benchmark RGB
Issue: “Invalid benchmark name”
Solution: Use correct benchmark names:
# Correct
ogb autobench --model model --benchmark RGB
ogb autobench --model model --benchmark GUE
# Wrong
ogb autobench --model model --benchmark rgb # Case sensitive!
Issue: Model not found
Solution: Verify model path/name:
# HuggingFace Hub model
ogb autoinfer --model yangheng/OmniGenome-186M --sequence "ATCG"
# Local model (use absolute path)
ogb autoinfer --model /path/to/model --sequence "ATCG"
Performance Tips¶
Use appropriate batch size:
# For large GPU memory ogb autobench --model model --benchmark RGB --bs_scale 4 # For limited memory ogb autobench --model model --benchmark RGB --bs_scale 1
Enable mixed precision:
# Automatically enabled with accelerate trainer ogb autotrain --dataset data --model model --trainer accelerate
Parallel inference:
# Increase batch size for faster inference ogb autoinfer --model model --input-file large_data.csv --batch-size 128
GPU selection:
# Specify GPU device ogb autoinfer --model model --sequence "ATCG" --device cuda:0
Environment Variables¶
OmniGenBench respects the following environment variables:
# HuggingFace cache directory
export HF_HOME=/path/to/cache
# CUDA device selection
export CUDA_VISIBLE_DEVICES=0,1,2,3
# Disable warnings
export PYTHONWARNINGS=ignore
Output Formats¶
AutoInfer Output¶
The autoinfer command produces JSON output with the following structure:
{
"model": "model_name",
"total_sequences": 100,
"results": [
{
"sequence": "ATCGATCGATCG",
"metadata": {"index": 0},
"predictions": [0.8, 0.2],
"probabilities": [0.8, 0.2],
"logits": [1.5, -0.3]
},
...
]
}
Fields:
model: Model used for inferencetotal_sequences: Total number of sequences processedresults: List of per-sequence resultssequence: Input sequencemetadata: Original metadata from inputpredictions: Model predictionsprobabilities: Class probabilities (if available)logits: Raw logits (if available)error: Error message (if processing failed)
RNA Design Output¶
RNA design produces JSON with the following structure:
{
"structure": "(((...)))",
"parameters": {
"mutation_ratio": 0.5,
"population": 100,
"generations": 100
},
"best_sequences": [
"GCGAAACGC",
"GCUAAAGCC",
...
]
}
See Also¶
External Resources¶
HuggingFace Hub - Pre-trained models
GitHub Repository - Source code
Paper - Research paper (if available)
Getting Help¶
If you encounter issues or have questions:
Check the documentation
Search GitHub Issues
Open a new issue with:
Command used
Full error message
OmniGenBench version (
pip show omnigenbench)Python version
CUDA version (if using GPU)