Skip to content

Troubleshooting

This guide covers common issues encountered when using Stackbooster.io and provides solutions to help you resolve them quickly.

AWS Integration

Access Denied Errors

If you encounter "Access Denied" errors when connecting your AWS account:

  1. Check IAM Role Permissions

    • Verify that the IAM role has all required permissions listed in the AWS Integration Guide
    • Ensure the role trust relationship includes the correct Stackbooster.io account ID
    • Confirm the external ID matches exactly what's displayed in the Stackbooster.io dashboard
  2. Organizational Policies

    • Check if your AWS organization has Service Control Policies (SCPs) that might restrict certain actions
    • Review any permission boundaries applied to the IAM role
    • Ensure your AWS account isn't restricting service-linked role creation
  3. Region Restrictions

    • Verify that the regions you're trying to access aren't restricted in the IAM policy

AWS Account Connection Failures

If your AWS account fails to connect:

  1. CloudFormation Stack Issues

    • Check the CloudFormation events log for specific error messages
    • Verify that the user creating the stack has sufficient permissions
    • Try deleting the stack and recreating it from the Stackbooster.io dashboard
  2. Manual Setup Problems

    • Double-check the ARN entered in the Stackbooster.io dashboard
    • Ensure the custom policy document matches exactly what's provided
    • Verify that all required managed policies are attached
  3. Cross-Account Role Issues

    • Check that the trust relationship is correctly configured
    • Ensure MFA or source IP restrictions aren't blocking Stackbooster.io's access

Kubernetes Agent

Agent Installation Failures

If the Kubernetes agent fails to install:

  1. RBAC Permissions

    • Ensure your kubectl context has cluster-admin privileges
    • Check for any admission controllers that might block the installation
    • Verify that the namespace for the agent isn't subject to restrictive policies
  2. Networking Issues

    • Check if your cluster has outbound internet access
    • Verify that firewall rules allow connections to the Stackbooster.io API endpoints
    • Ensure DNS resolution is working correctly within the cluster
  3. Resource Constraints

    • Check if the cluster has sufficient resources for the agent pods
    • Verify that there are no strict resource quotas preventing deployment

Agent Communication Problems

If the agent installs but shows "Disconnected" status:

  1. Network Connectivity

    • Verify outbound connectivity from the cluster to Stackbooster.io API endpoints
    • Check for any proxies or firewalls that might be blocking traffic
    • Ensure the agent pod has the correct API key in its configuration
  2. Pod Status Issues

    • Check the agent pod logs for error messages: kubectl logs -n stackbooster-system -l app=stackbooster-agent
    • Verify that the pod is in Running state: kubectl get pods -n stackbooster-system
    • Check for any CrashLoopBackOff or other pod status issues
  3. API Key Problems

    • Verify that the API key used during installation is valid and active
    • Try reinstalling the agent with a newly generated API key

Optimization Issues

No Cost Savings Recommendations

If you don't see any cost optimization recommendations:

  1. Insufficient Data Collection

    • Ensure the agent has been running for at least 24 hours to collect sufficient data
    • Check if the agent is reporting metrics correctly in the dashboard
    • Verify that you have workloads running on the cluster
  2. Already Optimized Cluster

    • Your cluster might already be well-optimized
    • Try running a test workload to generate more varied usage patterns
    • Check if cost-saving features are already enabled on your cluster
  3. Configuration Issues

    • Review your optimization settings and adjust the aggressiveness level
    • Ensure that the right namespaces are included in the analysis
    • Check if any exclusion rules might be preventing recommendations

Unexpected Scaling Behavior

If your cluster is scaling in unexpected ways:

  1. Competing Autoscalers

    • Check if the native Kubernetes Cluster Autoscaler is also running
    • Look for other tools that might be modifying node groups
    • Review AWS Auto Scaling Group settings for manual modifications
  2. Misconfigured Scaling Settings

    • Review your scaling configuration in the Stackbooster.io dashboard
    • Check workload priority settings and adjust if needed
    • Verify that scale-up and scale-down thresholds are appropriate
  3. Workload Spikes

    • Examine usage patterns for unexpected load spikes
    • Check for cron jobs or batch processes that might cause sudden resource demands
    • Review application logs for issues causing excessive resource consumption

Application Performance

Pods Failing to Schedule

If pods can't be scheduled after optimization:

  1. Resource Availability

    • Check if node resources are too fragmented to schedule your pods
    • Verify that the cluster has enough nodes to accommodate your workloads
    • Review pod resource requests and limits for accuracy
  2. Node Selectors and Taints

    • Ensure your pods' node selectors match available nodes
    • Check for taints on nodes that might prevent scheduling
    • Verify that node affinity and anti-affinity rules can be satisfied
  3. PodDisruptionBudgets

    • Review PDBs to ensure they're not too restrictive
    • Verify that enough replicas exist to satisfy PDB requirements
    • Check if kube-system pods have appropriate PDBs

Degraded Application Performance

If applications perform poorly after optimization:

  1. Resource Contention

    • Check if pods are experiencing CPU throttling
    • Look for memory pressure on nodes
    • Verify that pods have appropriate resource requests and limits
  2. Node Type Mismatches

    • Ensure that workloads are running on suitable instance types
    • Check if CPU or memory-intensive apps are placed appropriately
    • Verify that specialized workloads (GPU, high I/O) are on appropriate nodes
  3. Aggressive Consolidation

    • Review workload consolidation settings and reduce aggressiveness
    • Check if pod anti-affinity rules need to be added
    • Consider adjusting bin-packing settings to leave more headroom

Dashboard and Reporting

Missing or Incorrect Data

If your dashboard shows incomplete or incorrect data:

  1. Agent Connection Issues

    • Check agent connectivity status in the dashboard
    • Verify that all agent pods are running correctly
    • Review agent logs for data collection errors
  2. Metrics Collection Problems

    • Ensure the Kubernetes metrics server is running properly
    • Check if custom metrics sources are correctly configured
    • Verify that cloud provider metrics are being collected
  3. Dashboard Caching

    • Try refreshing the browser or clearing browser cache
    • Check the data timestamp to verify recency
    • Log out and back in to refresh your session

Report Generation Failures

If cost or optimization reports fail to generate:

  1. Data Availability

    • Ensure sufficient data exists for the requested time period
    • Check if the cluster was connected during the entire reporting period
    • Verify that cost data from AWS is being collected properly
  2. Report Configurations

    • Check for invalid filters or groupings in report settings
    • Verify that the report timeframe is valid
    • Try generating a report with default settings
  3. Browser Issues

    • Try a different browser or clear browser cache
    • Ensure your browser is updated to the latest version
    • Check for browser console errors when generating reports

Account and Billing

Access and Permission Issues

If users can't access features or clusters:

  1. User Permissions

    • Review role assignments in the account settings
    • Check if the user's role has permission for the specific action
    • Verify cluster-specific access controls
  2. Team Configuration

    • Ensure the user is assigned to the correct team
    • Check if team permissions are correctly configured
    • Verify that team resource limits aren't restricting access
  3. Session Issues

    • Try logging out and back in to refresh the session
    • Clear browser cookies and cache
    • Verify that the user's account is active and not suspended

Billing and Subscription Problems

If you encounter billing or subscription issues:

  1. Payment Method

    • Verify that the payment method on file is current and valid
    • Check for any failed payment notifications
    • Update payment details if necessary
  2. Subscription Status

    • Verify that your subscription is active
    • Check if you've reached any limits in your current plan
    • Review subscription renewal dates
  3. Usage Accounting

    • Verify that cluster and node counts match your expectations
    • Check for any unexpected changes in resource usage
    • Review billing statements for accuracy

Still Need Help?

If you're still experiencing issues after trying these troubleshooting steps:

  1. Contact Support

    • Email [email protected] with details of your issue
    • Include relevant logs, screenshots, and steps to reproduce
    • Provide your account ID and affected cluster names
  2. Community Forums

    • Check our community forums for similar issues
    • Post a detailed description of your problem for community assistance
  3. Office Hours

    • Join our weekly office hours for live troubleshooting
    • Schedule a one-on-one session with our support engineers

Released under the MIT License. Contact us at [email protected]