Discourse on Rancher: Troubleshooting Guide
This guide addresses common issues that may arise during your Discourse deployment on Rancher Kubernetes.
Deployment Issues
Pods Stuck in Pending State
Symptoms:
- Pods remain in “Pending” status
- Deployment doesn’t proceed
Possible Causes:
- Insufficient cluster resources
- PVC issues
- Node selector/affinity issues
Troubleshooting Steps:
# Check pod status with details
kubectl describe pod <pod-name> -n discourse-dev
# Check PVC status
kubectl get pvc -n discourse-dev
# Check node resources
kubectl describe nodes
Resolution:
- If resources are insufficient, adjust resource requests in values.yaml
- If PVC issues, verify storageClass exists and is configured correctly
- Ensure sufficient nodes are available matching any node selectors
Container Creation Errors
Symptoms:
- Pods show status like “CreateContainerError” or “CrashLoopBackOff”
- Containers repeatedly restart
Possible Causes:
- Invalid configuration
- Secret/ConfigMap issues
- Image pull issues
Troubleshooting Steps:
# Check pod logs
kubectl logs <pod-name> -n discourse-dev
# Check events
kubectl get events -n discourse-dev --sort-by='.lastTimestamp'
# Check container status
kubectl describe pod <pod-name> -n discourse-dev
Resolution:
- Fix configuration errors in values.yaml
- Ensure secrets and configmaps exist and are correctly formatted
- Verify container registry access if using custom images
Ingress/Networking Issues
Cannot Access Discourse UI
Symptoms:
- Web UI not accessible at configured domain
- Browser shows connection errors
Possible Causes:
- Ingress misconfiguration
- TLS certificate issues
- DNS issues
Troubleshooting Steps:
# Check ingress status
kubectl get ingress -n discourse-dev
kubectl describe ingress <ingress-name> -n discourse-dev
# Verify TLS secret
kubectl get secret <tls-secret-name> -n discourse-dev
# Check services
kubectl get svc -n discourse-dev
Resolution:
- Ensure DNS points to correct Kubernetes ingress IP
- Verify TLS certificate exists and is valid
- Check ingress controller logs for routing errors
TLS Certificate Issues
Symptoms:
- Browser shows certificate errors
- Certificate invalid or for wrong domain
Possible Causes:
- Incorrect secret name in ingress config
- Expired or invalid certificate
- Cert-manager issues (if using)
Troubleshooting Steps:
# Check certificate secret
kubectl describe secret <tls-secret-name> -n discourse-dev
# If using cert-manager, check certificate resource
kubectl get certificate -n discourse-dev
kubectl describe certificate <cert-name> -n discourse-dev
Resolution:
- Update ingress configuration with correct secretName
- Renew expired certificates
- Fix cert-manager configuration if needed
Database Issues
PostgreSQL Connection Failures
Symptoms:
- Discourse shows database connection errors
- PostgreSQL logs show authentication failures
Possible Causes:
- Incorrect database credentials
- Database pod not ready
- Network policy blocking connections
Troubleshooting Steps:
# Check PostgreSQL pod status
kubectl get pods -l app.kubernetes.io/name=postgresql -n discourse-dev
# Check PostgreSQL logs
kubectl logs <postgresql-pod-name> -n discourse-dev
# Check if credentials in secret match values.yaml
kubectl get secret <discourse-postgresql-secret> -n discourse-dev -o yaml
Resolution:
- Update credentials in values.yaml and redeploy
- Ensure PostgreSQL pod is running correctly
- Check and update network policies if necessary
Database Migration Failures
Symptoms:
- Discourse initialization logs show migration errors
- Application doesn’t start properly
Possible Causes:
- Database version mismatch
- Insufficient permissions
- Schema conflicts
Troubleshooting Steps:
# Check Discourse container logs for migration errors
kubectl logs <discourse-pod-name> -n discourse-dev
# Check PostgreSQL logs
kubectl logs <postgresql-pod-name> -n discourse-dev
Resolution:
- Ensure compatible PostgreSQL version
- Verify database user has sufficient permissions
- For major version upgrades, follow Discourse migration documentation
Application Issues
Plugin Installation Failures
Symptoms:
- Plugins not appearing in admin UI
- Error messages during plugin installation
Possible Causes:
- Plugin URL incorrect
- Plugin compatibility issues
- Network connectivity issues
Troubleshooting Steps:
# Check Discourse container logs
kubectl logs <discourse-pod-name> -n discourse-dev
# Verify plugin configuration in values.yaml
# Check plugin repository URLs
Resolution:
- Update plugin URLs in values.yaml
- Verify plugin compatibility with your Discourse version
- Ensure container has network access to plugin repositories
Email Delivery Issues
Symptoms:
- Emails not being sent
- Error messages related to SMTP in logs
Possible Causes:
- Incorrect SMTP configuration
- Network connectivity to SMTP server
- Authentication issues
Troubleshooting Steps:
# Check Discourse logs for email errors
kubectl logs <discourse-pod-name> -n discourse-dev
# Verify SMTP configuration in values.yaml
# Test SMTP server connectivity from within the pod
kubectl exec -it <discourse-pod-name> -n discourse-dev -- bash
nc -zv <smtp-host> <smtp-port>
Resolution:
- Update SMTP configuration in values.yaml
- Ensure network connectivity to SMTP server
- Verify SMTP credentials are correct
Resource/Performance Issues
High CPU/Memory Usage
Symptoms:
- Pods showing high resource utilization
- Performance degradation
Possible Causes:
- Insufficient resource allocation
- Traffic spikes
- Database inefficiency
Troubleshooting Steps:
# Check resource usage
kubectl top pods -n discourse-dev
# Check detailed metrics if Prometheus is set up
# Review Grafana dashboards
# Check database performance
kubectl exec -it <postgresql-pod-name> -n discourse-dev -- psql -U <username> -d <database> -c "SELECT * FROM pg_stat_activity;"
Resolution:
- Increase resource limits in values.yaml
- Consider enabling autoscaling
- Optimize database queries and indexes
Persistent Volume Issues
Symptoms:
- PVCs remain in Pending state
- Storage-related errors in logs
Possible Causes:
- StorageClass issues
- Insufficient storage capacity
- Volume provisioner problems
Troubleshooting Steps:
# Check PVC status
kubectl get pvc -n discourse-dev
kubectl describe pvc <pvc-name> -n discourse-dev
# Check StorageClass
kubectl get storageclass
kubectl describe storageclass <storageclass-name>
Resolution:
- Verify StorageClass exists and is default if not specified
- Check storage provider capacity and quotas
- Ensure volume provisioner is working correctly
Upgrade Issues
Failed Helm Upgrades
Symptoms:
- Helm upgrade command fails
- Error messages during upgrade
Possible Causes:
- Configuration changes incompatible with current state
- Chart version incompatibilities
- Resource conflicts
Troubleshooting Steps:
# Check Helm release status
helm --kubeconfig C:\Users\myuser\.kube\rancher-cluster.yaml status discourse -n discourse-dev
# Get detailed description of the release
helm --kubeconfig C:\Users\myuser\.kube\rancher-cluster.yaml get manifest discourse -n discourse-dev
# Check for any issues in the cluster
kubectl get events -n discourse-dev --sort-by='.lastTimestamp'
Resolution:
- Resolve configuration conflicts
- Consider a fresh installation if upgrade path is problematic
- Backup data before attempting major version upgrades
Post-Upgrade Functionality Issues
Symptoms:
- Features stop working after upgrade
- New errors in application logs
Possible Causes:
- Plugin compatibility with new version
- Configuration changes needed for new version
- Database schema changes
Troubleshooting Steps:
# Check application logs
kubectl logs <discourse-pod-name> -n discourse-dev
# Review Discourse upgrade notes for the version
# Check for required site setting changes
Resolution:
- Update plugins to compatible versions
- Apply any required configuration changes
- Run rake tasks if needed for database migrations
Recovery Procedures
Restoring from Backup
If you need to restore Discourse from a backup:
-
Locate your backup file (from S3 or local storage)
- Stop the Discourse application:
kubectl scale deployment discourse --replicas=0 -n discourse-dev - Restore the database: ```bash
Copy backup to PostgreSQL pod
kubectl cp backup.sql
:/tmp/ -n discourse-dev
Restore database
kubectl exec -it
4. **Restore uploads** (if separate from database backup):
```bash
# Copy uploads to Discourse pod
kubectl cp uploads.tar.gz <discourse-pod-name>:/tmp/ -n discourse-dev
# Extract uploads
kubectl exec -it <discourse-pod-name> -n discourse-dev -- bash -c "cd /bitnami/discourse && tar -xzvf /tmp/uploads.tar.gz"
- Restart Discourse:
kubectl scale deployment discourse --replicas=1 -n discourse-dev
Rolling Back a Deployment
If you need to roll back to a previous version:
- List Helm release revisions:
helm --kubeconfig C:\Users\myuser\.kube\rancher-cluster.yaml history discourse -n discourse-dev - Roll back to a specific revision:
helm --kubeconfig C:\Users\myuser\.kube\rancher-cluster.yaml rollback discourse <revision-number> -n discourse-dev - Verify rollback was successful:
helm --kubeconfig C:\Users\myuser\.kube\rancher-cluster.yaml status discourse -n discourse-dev kubectl get pods -n discourse-dev
Getting Help
If you encounter issues not covered in this guide:
- Check the Discourse Meta forum for similar issues
- Review the Bitnami Discourse Helm Chart documentation
- Check the Rancher documentation for Kubernetes-specific issues
- Consult the project team for assistance