Step-by-Step: Deploying ManageEngine RecoveryManager Plus for AD and Office 365 Backup

Troubleshooting Common Issues in ManageEngine RecoveryManager PlusManageEngine RecoveryManager Plus is a powerful backup, recovery, and AD/Office 365/Exchange/SharePoint/SQL-focused protection tool. When it runs smoothly it saves time and prevents data loss; when it doesn’t, diagnosing the cause quickly is essential. This article walks you through systematic troubleshooting of the most common issues, practical diagnostic steps, and concrete fixes—plus preventative tips to reduce repeat problems.


1. Preparation: collect information and reproduce the issue

Before you change settings or restart services, gather evidence and attempt to reproduce the problem. This saves time and prevents unnecessary changes.

  • Check exact error messages, timestamps, and affected objects (users, databases, mailboxes, domains).
  • Note software version (RecoveryManager Plus build), OS version, Java version (if used), and recent changes (patches, configuration changes, network updates).
  • Try to reproduce: run the same backup, restore, or discovery operation manually to capture logs and behavior.
  • Identify scope: single object, single server, or entire deployment.

2. Common categories of issues and first-line checks

Problems generally fit into one of these categories: connectivity/authentication, backup job failures, restore failures, slow performance, discovery issues, license or database problems, and agent/service crashes. For every category begin with these baseline checks:

  • Ensure RecoveryManager Plus services are running. On Windows: check Services or Task Manager; on Linux: check systemd or the process list.
  • Verify network connectivity between the RecoveryManager Plus server and target servers (ping, traceroute, port checks). Essential ports depend on product modules (LDAP/AD, Exchange, Office 365/Graph API, SQL, SMB).
  • Confirm credentials used by the product (domain account, service account, API credentials) are valid, non-expired, and have required permissions.
  • Check disk space and I/O on the RecoveryManager Plus server and target servers (backups can fail on low disk or saturated I/O).
  • Open the product’s built-in logs (installation directory/logs) and the server OS event logs for correlated errors.

3. Connectivity and authentication issues

Symptoms: discovery fails, backup jobs fail with authentication errors, permission denied, or repeated prompts for credentials.

Troubleshooting steps:

  • Verify account permissions. For AD-related tasks, the service account needs domain read permissions and appropriate rights for objects targeted. For Exchange/Office 365, ensure the account has the required Exchange Online or Graph API roles and app permissions if using OAuth.
  • Test connectivity from the RecoveryManager Plus server to target ports: LDAP (⁄636), LDAPS, SMB (445), RPC, SQL (1433/instance ports), Exchange Web Services/Graph endpoints (443). Use telnet/nc or PowerShell’s Test-NetConnection.
  • If OAuth/Modern Authentication is used for Office 365, ensure tokens aren’t expired and app registrations in Azure AD are intact. Re-authenticate if necessary.
  • Check time synchronization (NTP). Kerberos and certificate validation are time-sensitive; a clock drift can cause authentication failures.
    Fixes:
  • Update/change credentials and re-run discovery/backup.
  • Reassign required roles/permissions in AD, Exchange, or Azure AD.
  • Reconfigure firewall/transport rules to allow necessary ports.

4. Backup job failures and incomplete backups

Symptoms: jobs fail mid-run, items are skipped, or backup reports incomplete data.

Troubleshooting steps:

  • Check job-specific logs (job name, timestamp) inside RecoveryManager Plus logs. Identify exact failure points—network timeout, file locked, SQL error, out-of-memory.
  • Verify source system state: for database backups, check DB health and transaction log status; for mailboxes, check mailbox size and special characters in item names.
  • If incremental/differential backups fail, test a full backup to isolate whether the issue is with change tracking or previous backup chain corruption.
  • Inspect retention policies and storage quota on the backup repository—insufficient space can cause silent failures or skipped objects.
    Fixes:
  • Clear or expand repository storage, then re-run or reschedule the job.
  • Use recommended backup modes for the workload (VSS-based for Windows; proper database-aware modes for SQL/Exchange).
  • For locked files, schedule backups during low-activity windows or use VSS snapshots.
  • If chain/corruption suspected, run a full backup to rebuild the baseline.

5. Restore failures and data mismatch

Symptoms: restore fails, restored items missing or corrupted, or permissions aren’t retained.

Troubleshooting steps:

  • Confirm that the backup containing the required data exists and is listed in RecoveryManager Plus. Check job history and the specific backup snapshot.
  • Read restore logs to see if errors occurred during extraction or writeback (permission denied, path not found, schema mismatch).
  • If restoring to a different environment, check compatibility (AD forest/domain differences, Exchange versions, SQL collation).
  • For Office 365 restores, API limits or throttling can interrupt large restores—check throttling messages and the app’s API usage.
    Fixes:
  • Restore smaller batches to avoid API throttling; add delays or use throttling-aware options.
  • Use the exact target paths, mappings, and accounts that have permission to write/modify the destination.
  • If permissions not retained, enable/ensure “preserve permissions” setting is used and the target supports the ACL model.

6. Discovery and inventory problems

Symptoms: incomplete domain discovery, missing OUs, devices, or mailboxes not detected.

Troubleshooting steps:

  • Validate discovery credentials and scopes—service accounts must have read access across the OU/domain.
  • Increase discovery log verbosity temporarily to capture LDAP errors or referral issues.
  • For large environments, discovery can time out; check discovery timeouts and paging settings. AD referrals or multi-domain forests may require separate discovery runs per domain/forest.
    Fixes:
  • Break discovery into smaller scopes or increase timeout/paging settings.
  • Use domain-joined credentials or ensure cross-domain trusts are healthy.
  • Reconfigure discovery filters to include required OUs and object classes.

7. Performance issues (slow backups, UI lag)

Symptoms: slow job completion, slow UI responses, long report generation.

Troubleshooting steps:

  • Check server resource usage: CPU, memory, disk I/O, and network bandwidth on the RecoveryManager Plus server.
  • Identify whether slowness is server-side (agent/engine) or target-side (source servers slow to respond). Look at per-job resource consumption.
  • Review database performance—RecoveryManager Plus uses an internal DB; high latency or locks can slow operations.
  • Ensure antivirus/endpoint protection is not scanning backup repositories or application folders during operations.
    Fixes:
  • Increase server resources (CPU/RAM), move DB to faster disk/SSD, or tune JVM/heap settings if applicable.
  • Schedule jobs to avoid peak hours and spread heavy jobs across windows.
  • Exclude product folders and backup repositories from real-time scanning.
  • Consider scaling: distribute workloads using multiple RecoveryManager Plus installations or dedicated proxy/collector components if supported.

8. Licensing, database corruption, and upgrade issues

Symptoms: license errors, product shows expired/wrong license, or upgrade fails and breaks functionality.

Troubleshooting steps:

  • Verify license file and expiry; check the product’s license summary page. Confirm license matches modules in use.
  • If database corruption suspected (errors, crashes), locate DB logs and run product-recommended DB consistency checks. Back up the DB before any repair.
  • For upgrade failures, check pre-upgrade compatibility matrix (OS, Java, DB), take a backup of the application and DB, and consult upgrade logs for the exact failing step.
    Fixes:
  • Reapply or renew license from the ManageEngine license portal and restart services.
  • Restore DB from a recent clean backup if repair tools can’t resolve corruption; contact ManageEngine support for DB repair scripts if necessary.
  • Roll back to pre-upgrade snapshot and retry upgrade after meeting prerequisites.

9. Agent/service crashes and abnormal exits

Symptoms: application crashes, service stops unexpectedly, JVM crashes, or high memory usage leading to OOM.

Troubleshooting steps:

  • Capture stack traces, JVM crash logs (hs_err_pid*.log), or Windows Event Viewer entries.
  • Check for recent configuration changes, plugin/add-on installs, or external integrations that coincide with crashes.
  • Verify JVM memory settings and garbage collection (GC) logs for memory pressure patterns.
    Fixes:
  • Increase JVM heap size per vendor guidance, tune GC options, or apply hotfixes for known memory leaks.
  • Remove or disable suspect plugins and test stability.
  • Keep the product up to date with patches that address stability issues.

10. Useful logs and diagnostics to gather

Always collect these when escalating or opening a support ticket:

  • RecoveryManager Plus application logs (install_dir/logs).
  • Job-specific logs and export of job history for the failing job.
  • OS event logs (Windows Event Viewer or syslog).
  • Network traces (tcpdump/wireshark) for connectivity issues.
  • JVM crash logs and heap dumps if applicable.
  • Screenshots of error messages and full text of any stack traces.

11. Preventative maintenance and best practices

  • Keep RecoveryManager Plus and its components patched and updated.
  • Use dedicated service accounts with least privilege but necessary roles. Rotate credentials on a schedule and revalidate after changes.
  • Monitor disk space, DB health, and job success rates with alerts for failures.
  • Schedule regular full backups in addition to incremental chains to avoid dependence on long chains.
  • Document environment topology and maintain up-to-date discovery scopes and credentials.
  • Test restores periodically to ensure recovery procedures work end-to-end.

12. When to contact ManageEngine support

Contact support if:

  • You’ve collected logs and reproduced the issue following the steps above and the problem persists.
  • There’s evidence of database corruption, product crashes with JVM errors, or upgrade failures you can’t recover from.
  • A bug or unexpected behavior appears after applying patches and you need vendor fixes.

Provide support with: product build/version, OS details, exact error messages, relevant logs, steps to reproduce, and timestamps.


Troubleshooting RecoveryManager Plus is largely methodical: collect data, isolate the layer (network, auth, storage, app), test fixes in a controlled way, and escalate with full diagnostics when needed. The steps above cover the most frequent problems and should get you to a resolution faster.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *