ollama_agent - elsciRL

# LLM Agent The LLM Agent is a reinforcement learning agent that uses a local Ollama language model to make policy decisions. Instead of learning through traditional Q-learning or neural networks, it leverages the reasoning capabilities of large language models to select actions based on state descriptions. ## Features - **LLM-powered decision making**: Uses Ollama models for intelligent action selection - **Robust fallback system**: Falls back to random action selection if LLM fails - **Configurable models**: Support for any Ollama model (llama2, mistral, codellama, etc.) - **Temperature control**: Adjustable randomness in LLM responses - **Custom system prompts**: Customize the agent's behavior through system prompts - **Save/Load functionality**: Persist and restore agent configurations - **Logging**: Comprehensive logging of decisions and errors ## Prerequisites 1. **Ollama installation**: Install Ollama from [https://ollama.ai/](https://ollama.ai/) 2. **Model download**: Pull a model, e.g., `ollama pull llama3.2` 3. **Python dependencies**: The `ollama` package is included in requirements.txt ## Usage ```python from elsciRL.agents.LLM_Agent import LLMAgent # Initialize the agent agent = LLMAgent( parameter=0.1, # General parameter for future use model_name="llama2", # Ollama model name temperature=0.7, # Controls randomness (0.0-1.0) system_prompt="Custom prompt..." # Optional custom system prompt ) # Use the agent to make decisions state = "You are in a maze with walls to the north and west. There are open paths east and south." legal_actions = ["move_east", "move_south", "stay"] chosen_action = agent.policy(state, legal_actions) print(f"Agent chose: {chosen_action}") ``` ## Configuration Options ### Model Selection - `llama3.2`: General-purpose model, good balance of performance and speed - `mistral`: Fast and efficient, good for quick decisions - `codellama`: Optimized for code and logic-based reasoning - `llama3.2:13b`: Larger model for better reasoning (requires more resources) ### Temperature Settings - `0.0`: Deterministic, always chooses the most likely action - `0.3-0.5`: Slightly creative but mostly consistent - `0.7`: Good balance of creativity and consistency (default) - `1.0+`: Very creative and unpredictable ### System Prompt Examples **Strategic Agent:** ```python system_prompt = """You are a strategic decision maker. Analyze the situation carefully, consider long-term consequences, and choose the action that maximizes expected future reward.""" ``` **Risk-Averse Agent:** ```python system_prompt = """You are a cautious agent that prioritizes safety and risk minimization. Always choose the safest action that still makes progress toward the goal.""" ``` **Exploratory Agent:** ```python system_prompt = """You are an curious explorer. When faced with unknown situations, prefer actions that provide new information or lead to unexplored areas.""" ``` ## Error Handling The agent includes robust error handling: 1. **Invalid action selection**: If the LLM suggests an action not in `legal_actions`, falls back to random selection 2. **JSON parsing errors**: If LLM response isn't valid JSON, uses first available action 3. **Ollama connection issues**: Falls back to random selection and logs the error 4. **Model not found**: Provides helpful error messages about installing models ## Save/Load Functionality ```python # Save agent configuration saved_config = agent.save() # Load configuration into a new agent new_agent = LLMAgent(parameter=0.1) new_agent.load(saved_config) ``` ## Integration with Environment The LLM Agent follows the same interface as other elsciRL agents: - `policy(state, legal_actions)`: Returns chosen action - `learn(state, next_state, reward, action)`: Updates from experience - `save()`/`load()`: Persistence functionality ## Limitations - **Latency**: LLM inference can be slower than traditional RL agents - **Consistency**: May make different decisions for identical states - **Resource usage**: Requires local GPU/CPU resources for model inference - **Learning**: Currently doesn't update the underlying model from rewards (future enhancement) ## Future Enhancements - **Fine-tuning**: Use reward signals to fine-tune the model - **Experience replay**: Store successful action patterns - **Multi-turn reasoning**: Allow the agent to "think" through complex decisions - **Ensemble methods**: Combine multiple models for better decisions