GUI Desktop Automation in Practice: Building Cross-Application Workflows with DesireCore

4/2/2026 · DesireCore

GUI automationComputer UseHostAgentworkflowcross-application

GUI Desktop Automation in Practice: Building Cross-Application Workflows with DesireCore

In the wave of digital transformation, businesses and individuals alike face a common challenge: vast amounts of repetitive manual tasks consuming precious working hours. From entering customer records one by one to copying and pasting data between multiple applications, from daily monitoring dashboard checks to batch document processing — these mechanical, tedious yet indispensable tasks are draining the creativity and energy of knowledge workers.

Traditional RPA (Robotic Process Automation) tools have attempted to address this problem, but they typically require complex script writing, precise pixel-coordinate targeting, and break down the moment an interface undergoes even the slightest change. More critically, traditional RPA lacks the ability to “understand” — it merely executes preset steps mechanically and cannot make flexible judgments when facing unexpected situations.

DesireCore introduces an entirely new solution: AI-native GUI desktop automation. Through the Computer Use capability, DesireCore’s AI agents can “operate computer and mobile graphical interfaces just like a human,” not only comprehending on-screen content but also making intelligent decisions based on context. Combined with the intelligent task orchestration engine, individual operations can be organized into complex cross-application workflows, achieving true end-to-end automation.

This article provides a comprehensive guide — from concept to setup, operational capabilities, real-world case studies, and security mechanisms — on how to leverage DesireCore to build cross-application workflows and eliminate repetitive manual work once and for all.

Part 1: What Is Computer Use — The Smart Version of Remote Desktop

From Remote Desktop to Intelligent Control

If you have ever used TeamViewer, AnyDesk, or Windows Remote Desktop, you are already familiar with the concept of “remote control.” Traditional remote desktop allows you to connect to another computer over the network and control its graphical interface with your mouse and keyboard.

Computer Use 五步工作流程

DesireCore’s Computer Use can be understood as “the smart version of remote desktop.” Unlike traditional remote desktop, the operator is no longer a human but an AI agent with visual comprehension capabilities. This agent can:

Read screen content: Through screenshot recognition technology, the AI understands what is currently displayed on screen — text, buttons, input fields, dropdown menus, tables, and even information within charts and images.
Understand operational context: The AI does not merely recognize individual elements; it comprehends the entire page layout and logical relationships, knowing which interface of which application is currently active and what task is being performed.
Make intelligent decisions: When facing unexpected pop-ups, loading delays, or interface changes, the AI can flexibly adjust its operational strategy based on the current state rather than simply throwing an error and stopping, as traditional RPA would.
Natural language interaction: You do not need to write any scripts or code — simply describe the task you want to accomplish in natural language, and the AI will automatically plan and execute the corresponding operations.

The Computer Use Workflow

DesireCore’s Computer Use follows a clear five-step workflow:

Step 1: User issues a task. You describe the work to be done to the AI agent in natural language. For example: “Please enter the customer information from this Excel spreadsheet into the CRM system one by one.”

Step 2: The agent formulates an operation plan. The AI analyzes the task requirements and breaks the complex task into a series of specific operational steps. It considers which applications need to be opened, the order of operations, potential exceptions, and contingency plans.

Step 3: HostAgent executes operations. The HostAgent plugin installed on the target device receives operation commands from the agent and performs concrete actions on the device’s graphical interface — moving the mouse, clicking buttons, typing text, switching windows, and more.

Step 4: Screenshot feedback and verification. After each operation, HostAgent captures the current screen and sends it back to the agent. The agent uses visual recognition to confirm whether the operation was successfully executed and whether the current interface state matches expectations.

Step 5: Result reporting. Upon task completion, the agent reports the execution results to the user, including which operations were successfully completed, what issues were encountered, and the final execution status.

The core advantage of this workflow lies in closed-loop verification. Traditional RPA typically operates “blindly” — executing preset steps without confirming whether results are correct. Every operation in DesireCore is accompanied by visual verification, ensuring accuracy and reliability.

Why Not Just Use APIs?

One might ask: since APIs can directly manipulate data, why operate through the GUI? The answer is simple: not all systems provide APIs.

In real-world work scenarios, numerous internal enterprise systems, legacy applications, and third-party SaaS services either lack open APIs, have incomplete API functionality, or require complex approval processes for API access. GUI interfaces, however, are an interaction method that virtually all applications possess. Through Computer Use, DesireCore can operate any application with a graphical interface, free from API limitations, truly serving as a “universal connector.”

Furthermore, many operations are inherently GUI-level — for example, generating a report within a specific application and exporting it as PDF, or filling out a multi-step form on a webpage that requires dynamic interaction. For these operations, GUI automation is often more intuitive and reliable even when APIs exist.

Part 2: HostAgent Installation and Configuration Guide

What Is HostAgent?

HostAgent is the execution engine for DesireCore’s Computer Use capability. It is a lightweight client plugin that must be installed on the target device you wish to automate. Think of it as the AI agent’s “hands” on the target device — the agent’s brain resides in the cloud, but the actual mouse clicks, keyboard inputs, and other operations are performed locally through HostAgent.

HostAgent is designed with the following principles:

Lightweight: Small installation package, low runtime resource consumption, no impact on normal device usage.
Secure: All communications are encrypted, and operations follow the principle of least privilege.
Cross-platform: Supports six major platforms — Windows, macOS, Linux, Android, iOS, and HarmonyOS.

Three Steps to Complete Setup

Regardless of your platform, HostAgent installation follows a unified three-step process:

Step 1: Download and Install HostAgent

Visit the DesireCore website’s download page and select the installation package corresponding to your target device’s operating system.

Windows:

Download the .exe installer
Double-click to run the installation wizard and follow the prompts
After installation, HostAgent displays an icon in the system tray
It is recommended to set HostAgent to start automatically at boot

macOS:

Download the .dmg disk image
Open the image file and drag HostAgent to the “Applications” folder
On first launch, macOS may warn “cannot verify the developer” — go to “System Settings > Privacy & Security” to allow it to run
HostAgent will appear in the menu bar

Linux:

Both .deb (Debian/Ubuntu) and .rpm (Fedora/CentOS) packages are available
Install using the appropriate package manager: sudo dpkg -i hostagent.deb or sudo rpm -i hostagent.rpm
Start the service with systemctl start hostagent
Enable auto-start with systemctl enable hostagent

Android:

Download the APK from the DesireCore website (Google Play version is also under review)
Allow “Install from unknown sources” and proceed with installation
Open the app and follow the initialization guide

iOS:

Install via TestFlight or enterprise signing (App Store version under review)
Open the app after installation and follow the initialization guide

HarmonyOS:

Download from the DesireCore website or Huawei AppGallery
Installation process is similar to Android

Step 2: Add Device in DesireCore and Enter Pairing Code

After installation, open HostAgent and you will see a pairing code (typically a 6-character alphanumeric combination). This code is single-use, designed to securely link your target device with the DesireCore platform.

Log in to the DesireCore desktop client or web interface
Navigate to the “Device Management” page
Click “Add New Device”
Enter the pairing code displayed by HostAgent
Confirm device information (operating system, device name, etc.)
Click “Complete Pairing”

After successful pairing, you will see this device in the device management list with its status showing “Online.” You can assign each device an easily identifiable name, such as “Office PC - Windows” or “Test Phone - Android.”

Step 3: Grant Necessary Permissions Based on Operating System

This is the most critical step. For HostAgent to perform GUI operations, it needs relevant permissions at the operating system level. Required permissions vary by platform:

Windows permissions:

Administrator privileges: Some applications (especially those running as administrator) require HostAgent to also have admin privileges. It is recommended to run HostAgent as administrator for first use.
Screen recording permission: Windows 10/11 typically allows screen capture by default.
Accessibility permission: HostAgent leverages Windows UI Automation interfaces for more precise element recognition; the system usually grants access automatically.

macOS permissions:

Accessibility: The most critical permission, allowing HostAgent to control mouse and keyboard. Go to “System Settings > Privacy & Security > Accessibility” and enable HostAgent.
Screen Recording: Allows HostAgent to capture screen content. Go to “System Settings > Privacy & Security > Screen Recording” and enable HostAgent.
Automation: macOS may prompt for authorization when specific applications need to be controlled — select “Allow.”

Linux permissions:

X11/Wayland permission: Typically automatic under X11. Wayland environments require additional configuration — refer to the Wayland setup guide in DesireCore documentation.
Input device permission: Ensure the HostAgent user is in the input group: sudo usermod -aG input $USER

Android permissions:

Accessibility Service: Go to “Settings > Accessibility > HostAgent” and enable it. This is the core permission for GUI automation on Android.
Overlay permission: Allows HostAgent to display a status indicator above other apps.
Screen capture permission: The system will prompt for authorization on first use.
Storage permission (if file operations are involved)

iOS permissions:

iOS permission management is more restrictive. HostAgent operates through Accessibility APIs and Shortcuts integration.
Configuration is completed under “Settings > Accessibility.”

HarmonyOS permissions:

Similar to Android — grant Accessibility Service, overlay, and screen capture permissions.
The HarmonyOS permission management interface path may differ slightly; follow system prompts.

Configuration Verification

After completing the three steps above, verify that the configuration is successful:

Select the added device in DesireCore
Enter a simple command in the dialog box, such as “open the calculator”
Observe whether the target device successfully opens the calculator application

If the operation executes successfully, HostAgent has been properly installed, paired, and granted necessary permissions — you are ready to start using Computer Use.

Multi-Device Management

DesireCore supports managing multiple devices simultaneously. The device management page displays all paired devices with their online status, operating system information, and last activity time. When issuing tasks, you can specify which device to execute operations on, or create cross-device workflows — for example, exporting data from an ERP system on a Windows PC, processing it with specialized software on a macOS machine, and then sending the results via enterprise IM on an Android phone.

Part 3: Full Operational Capabilities — Mouse, Keyboard, Screenshots, and Application Control

DesireCore’s Computer Use provides a comprehensive set of GUI operation capabilities covering every type of action a human might perform when using graphical interfaces. Let us examine each category in detail.

Mouse Operations

The mouse is the foundational tool for GUI interaction. DesireCore supports the following mouse operations:

Click: The most basic operation for pressing buttons, selecting menu items, and activating input fields. The agent first locates the target element’s position through visual recognition, then instructs HostAgent to click at that position.

Double-click: Used for opening files, selecting words, and other scenarios requiring a double-click. The agent can determine when a double-click is needed instead of a single click.

Right-click: Opens the context menu for accessing shortcuts like copy, paste, and properties. The agent can recognize options within the context menu and perform subsequent operations.

Drag and drop: Moves elements from one position to another. Commonly used in file management, interface layout adjustment, and chart element manipulation. The agent precisely calculates drag start and end points.

Scroll: Scrolls pages or lists up, down, left, or right. When content extends beyond the visible area, the agent automatically determines the scroll direction and distance. This is particularly important for handling long lists, lengthy pages, or large tables.

Hover: Moves the mouse to a specific position without clicking, used to trigger tooltips, expand submenus, or activate hover effects.

Keyboard Input

Keyboard operations range from simple text entry to complex key combinations:

Typing: Entering text content in input fields, text editors, and similar locations. Supports input in Chinese, English, and other languages. For Chinese input, HostAgent can use the clipboard method to avoid input method compatibility issues.

Shortcut keys: Executing single shortcut key operations such as Tab (switch focus), Enter (confirm), Escape (cancel), and Delete.

Key combinations: Executing operations requiring multiple simultaneous key presses, such as Ctrl+C (copy), Ctrl+V (paste), Ctrl+S (save), Alt+Tab (switch window), and Ctrl+Shift+S (save as). The agent intelligently selects the appropriate key combination based on task requirements.

Special keys: Supports function keys (F1-F12), arrow keys, Page Up/Down, Home/End, and other special keys.

Screenshot Recognition

Screenshot recognition serves as Computer Use’s “eyes” and is the foundation for closed-loop verification:

Full-screen capture: Captures the entire screen for a global view and status overview.

Region capture: Captures a specific area of the screen for focused analysis of particular interface elements or regions.

Element recognition: Based on screenshot content, the AI identifies various interface elements — buttons, input fields, text labels, dropdown menus, checkboxes, radio buttons, table rows and columns, tabs, and more. This recognition does not rely on fixed pixel coordinates but on visual semantic understanding, enabling accurate element location even when interface layouts change.

OCR (Optical Character Recognition): Extracts text information from screenshots for reading data, error messages, and status prompts displayed on screen. This allows the agent to “read” on-screen content and make informed decisions.

State assessment: Analyzes screenshots to determine whether operations were successful — for example, whether a success message appeared after form submission, or whether the page changed as expected after a button click.

Application Operations

Beyond operating within applications, DesireCore can manage applications themselves:

Open applications: Launch specified desktop applications. The agent can open applications through the Start menu, Dock, desktop shortcuts, or command line.

Switch applications: Switch between multiple open applications using taskbar clicks or Alt+Tab.

Close applications: Close specified applications to free system resources. The agent confirms whether data needs to be saved before closing.

Window management: Adjust application window size and position, minimize, maximize, or restore windows. In multi-monitor environments, windows can be moved to specific displays.

Form Filling

Form filling is one of the most common GUI automation requirements, and DesireCore provides specialized optimization:

Auto-location: The agent identifies each field and its label within forms, automatically positioning the input cursor in the correct field. Even complex form layouts with irregularly distributed fields are accurately recognized.

Smart filling: Automatically selects the appropriate filling method based on field type:

Text boxes: Direct text input
Dropdown menus: Expand the option list and select the correct choice
Checkboxes/radio buttons: Check or uncheck as needed
Date pickers: Select the correct date through the date control
File uploads: Select specified files for upload

Data validation: After filling, the agent checks for error prompts or validation warnings, automatically correcting issues or reporting them to the user.

File Operations

For operations involving the file system, DesireCore provides complete support:

File copying: Copy files from one location to another, accomplished through file manager GUI operations or keyboard shortcuts.

File moving: Move files to specified directories, supporting both drag-and-drop and cut-paste methods.

File renaming: Select a file and execute a rename operation, entering the new filename.

Batch operations: Perform the same operation on multiple files, such as batch renaming or batch moving to a specified folder.

Combining Operational Capabilities

The individual capabilities described above can be flexibly combined to form complex operation sequences. For example, “open Chrome browser > navigate to a URL > fill in the login form > click login > wait for page load > enter keywords in the search box > scroll through results > copy result data to Excel” — this complete operation sequence involves application operations, keyboard input, mouse clicks, form filling, scrolling, and file operations. DesireCore’s agent can automatically plan and execute such complex sequences.

Part 4: Intelligent Task Orchestration — From Single Operations to Complex Workflows

While individual GUI operations are useful, the real productivity gains come from orchestrating multiple operations into complete workflows. DesireCore’s intelligent task orchestration engine is designed precisely for this purpose.

Three Core Steps of the Orchestration Engine

Intent Recognition

When you describe a task to the agent, the orchestration engine first performs intent recognition. It analyzes your natural language description and extracts the following key information:

Objective: What result do you want to achieve?
Input data: What data or files need to be processed?
Involved applications: Which applications does the task require?
Constraints: Are there specific ordering requirements, time limitations, or quality standards?

For example, when you say “import this customer list from Excel into Salesforce, making sure each record’s phone number is in the correct format,” the orchestration engine identifies: the objective is data import, the input is a customer list in an Excel file, the involved applications are Excel and Salesforce, and the constraint is phone number format validation.

Task Decomposition

After identifying intent, the orchestration engine breaks the overall task into a series of fine-grained subtasks. Each subtask is an independently executable and verifiable unit. Decomposition considers:

Dependencies: Which subtasks must execute sequentially, and which can run in parallel?
Data flow: How does the output of one step become the input for the next?
Error handling strategy: Should each subtask retry, skip, or abort the entire workflow upon failure?
Checkpoints: At which key nodes should intermediate results be verified?

Continuing the example above, the orchestration engine might decompose it into:

Open the Excel file
Read the first row of customer data
Validate the phone number format (mark and record if incorrect)
Open the Salesforce new lead page
Fill in the customer information form
Submit the form and confirm successful save
Return to Excel and move to the next row
Repeat steps 2-7 until all rows are processed
Generate a processing report (N successful, M failed, failure reason list)

Automatic Capability Matching

After decomposition, the orchestration engine automatically matches each subtask with the most appropriate execution capability. DesireCore offers not only GUI operation capabilities but also integrates multiple tools and capabilities, including:

Computer Use (GUI operations): Used when tasks require graphical interface manipulation
API calls: Preferred when the target application provides an API and the API approach is more efficient
Data processing: Format conversion, validation, aggregation, and other data processing
File processing: Reading, writing, and converting various file formats
Notification delivery: Sending notifications via email, instant messaging, and other channels

The orchestration engine automatically selects the optimal capability combination. For example, reading Excel data preferentially uses file processing capabilities (direct file parsing), while filling forms in Salesforce uses Computer Use (as GUI manipulation is required). If Salesforce has API access configured, the system may choose between “API call” and “GUI operation” for maximum efficiency.

Two Execution Modes

DesireCore’s orchestration engine supports two execution modes to accommodate different automation scenarios:

Fixed Mode (SOP/Workflow)

Fixed mode is suitable for clearly defined, standardized processes that need to be executed repeatedly. In this mode:

Pre-defined workflows: You can manually perform an operation once, and the system records the entire operation sequence, codifying it as a standard operating procedure (SOP).
Stable and reliable: Each execution strictly follows the pre-defined steps, ensuring consistent results.
Schedulable: Codified workflows can be triggered on a schedule, by events, or manually.
Optimizable: Through data feedback from multiple executions, the workflow’s efficiency and accuracy can be continuously improved.

Fixed mode is particularly suitable for:

Routine tasks that need to be repeated daily/weekly
Compliance tasks with strict operational requirements
Team collaboration tasks where multiple people follow the same process
Critical business processes (such as financial reconciliation, order processing)

Flexible Mode (AI-Driven Orchestration)

Flexible mode leverages AI’s intelligent judgment to dynamically plan operational steps based on real-time conditions. In this mode:

Dynamic planning: The agent adjusts its operational strategy in real-time based on the current screen state and task progress.
Exception handling: When unexpected situations arise, the AI can autonomously determine how to respond without needing pre-defined exception paths.
Context awareness: The agent adjusts subsequent operations based on the results of previous steps, achieving truly adaptive execution.
Natural language driven: The entire process requires only a natural language description of the task objective, without pre-orchestrated workflow steps.

Flexible mode is particularly suitable for:

New tasks being executed for the first time (no established standard process)
Judgment-based tasks requiring different handling based on data content
Unstructured tasks involving complex decision-making
Exploratory tasks (uncertain optimal execution path)

Full Status Tracking

Regardless of the execution mode, DesireCore provides comprehensive status tracking:

Real-time progress display: You can check the current task’s execution progress at any time — how many steps have been completed, which step is currently executing, and estimated remaining time.

Timeout alerts: If a step’s execution time exceeds expectations, the system automatically issues an alert. You can choose to continue waiting, skip the current step, or abort the entire workflow.

Automatic reassignment: When a step fails, the system can automatically retry based on preset strategies or reassign the task to another device for execution.

Completion summary: After task completion, the system generates a detailed execution report including each step’s execution time, success/failure status, volume of data processed, and more. These reports are invaluable for subsequent workflow optimization.

Part 5: Case Study 1 — Bulk CRM Data Entry

Let us demonstrate the complete DesireCore GUI automation workflow through a concrete case study.

Scenario Description

A company’s sales team receives a batch of new prospect information each week, stored in an Excel spreadsheet containing fields such as customer name, company name, job title, phone number, email, and source channel. The sales assistant needs to enter each record into the company’s CRM system (Salesforce in this example).

Approximately 200-300 new records arrive each week, with each record taking 2-3 minutes to enter manually (opening the new record page, filling in each field, selecting dropdown options, saving). The entire process requires 8-15 hours. This is not only extremely time-consuming but also error-prone — rows can be miscopied, dropdown selections can be wrong, and phone number formats may be inconsistent.

Preparation

Before beginning automation, the following preparations are needed:

Confirm device connection: Ensure the computer with HostAgent installed is paired with DesireCore and online.
Prepare the data file: Ensure the Excel file is saved at a specified location on the target computer with standardized data format (clear column headers, no merged cells).
Confirm CRM login status: Ensure Salesforce is logged in, or have login credentials ready.
Field mapping confirmation: Clearly define which Excel column maps to which CRM field.

Execution Process

In DesireCore’s chat interface, enter the task description:

“Please enter the customer data from ‘New Customer List 0402.xlsx’ on the desktop into the ‘Leads’ module in Salesforce, one by one. The Excel columns are: A-Name, B-Company, C-Title, D-Phone, E-Email, F-Source Channel. Phone numbers must be in 11-digit format. If a record has issues, skip it and mark it.”

The agent executes the following operation sequence:

Phase 1: Data Preprocessing

Open the Excel file on the desktop
Read the table data and count total rows
Pre-check phone number column format
Report to user: “Detected 256 records total, 12 with abnormal phone number format (not 11 digits), these will be skipped and marked during entry. Confirm to proceed?”

Phase 2: Record-by-Record Entry

For each valid record:

Switch to the Salesforce page in Chrome
Click the “New Lead” button
Wait for the new record form to load
Enter the customer name in the “Name” field
Enter the company name in the “Company” field
Enter the job title in the “Title” field
Enter the phone number in the “Phone” field
Enter the email in the “Email” field
Select the corresponding source channel from the “Source” dropdown
Click the “Save” button
Verify save success via screenshot (check for success message or detail page)
If save fails, record error information and capture screenshot
Switch back to Excel, add status marker to the current row (“Entered” or “Failed + Reason”)
Move to the next row

Phase 3: Result Reporting

After processing all records, the agent generates an execution report:

“Bulk entry task completed. Processing summary:

Total records: 256

Successfully entered: 241

Skipped due to format issues: 12 (phone number not 11 digits)

Entry failed: 3 (2 rejected by Salesforce due to duplicate email, 1 due to special characters in company name)

Total time: 1 hour 47 minutes

Excel file has been updated with entry status markers

Failed record details have been summarized in Sheet2 of the Excel file.”

Efficiency Comparison

Metric	Manual Operation	DesireCore Automation
Time for 256 records	~10 hours	~1.8 hours
Error rate	3-5% (fatigue-induced)	<1% (format validation + visual verification)
Human involvement	100% (fully manual)	5% (issue command + review report)
Repeatability	Depends on operator condition	Consistent and stable

Advanced Optimization: Codifying as SOP

If this task needs to be performed weekly, you can codify the workflow as a standard process:

After task completion, select “Save as Workflow”
Set trigger conditions (e.g., “auto-execute every Monday at 9:00 AM” or “trigger when a new Excel file appears in the designated folder”)
Configure parameterized options (such as Excel file path, CRM module name, which can be specified for each execution)
Set completion notification method (email, Slack, Teams, etc.)

Thereafter, the entire entry process runs automatically — you only need to review the execution report after receiving the completion notification.

Part 6: Case Study 2 — Cross-Application Data Transfer (Backend > Excel > Email)

Scenario Description

The operations team needs to export the previous day’s sales data from the company’s backend management system daily, organize and perform pivot analysis in Excel, then send the analysis results to management via email. This workflow involves three different applications — the web backend system, Excel, and Outlook/email client — requiring data to flow and be processed across them.

跨应用数据转移流程

Manually executing this workflow typically takes 30-45 minutes, and since it is a daily mandatory task, it often becomes the first “chore” operations staff face upon arriving at work each morning.

Task Description

In DesireCore, enter:

“Automatically execute the following task daily at 8:30 AM:

Log in to the company backend system (address: admin.company.com), navigate to the ‘Sales Data’ module, and export yesterday’s sales detail report as an Excel file

Open the exported Excel file, create a pivot table summarizing sales amount and order count by product category, and generate a bar chart

Send the prepared Excel file as an attachment via Outlook to the leadership distribution group (leadership@company.com), with the subject ‘Daily Report: [Date] Sales Data Analysis,’ and include a key data summary in the email body”

Detailed Execution Flow

Phase 1: Backend Data Export

Open Chrome browser
Navigate to admin.company.com
If login is required, enter username and password (provided by user on first run; securely stored credentials used subsequently)
Navigate to the “Sales Data” module
Set the date filter to “yesterday”
Click the “Export” button
Select Excel as the export format
Wait for the file download to complete
Capture screenshot confirming file downloaded to the “Downloads” folder

Phase 2: Excel Data Processing

Open the downloaded Excel file
Verify data integrity (row count, correct column headers)
Select the data range
Insert a pivot table into a new worksheet
Set the row field to “Product Category”
Set value fields to “Sales Amount” (sum) and “Order Count” (count)
Sort by sales amount in descending order
Select the pivot table and insert a bar chart
Set chart title and formatting
Return to the pivot table, extract key data: total sales, total orders, Top 3 product categories and their sales
Save the file, rename to “Sales Daily Report_[Date].xlsx”

Phase 3: Email Sending

Open Outlook
Click “New Email”
Enter leadership@company.com in the recipient field
Enter the email subject: “Daily Report: 2026-04-01 Sales Data Analysis”
Compose the data summary in the body:

Dear Leadership Team,

Below is the sales data summary for April 1, 2026:

Total sales: $178,432

Total orders: 456

Day-over-day change: +12.3%

Top 3 product categories:

Smart Hardware $66,220 (37%)

Software Services $49,961 (28%)

Accessories & Consumables $33,902 (19%)

Please see the attached file for detailed analysis.

Add the Excel file as an attachment
Click “Send”
Capture screenshot confirming email sent successfully

The Key to Cross-Application Coordination

The core challenge of this case study is data flow across three applications. When traditional RPA handles cross-application scenarios, it requires separate operation scripts for each application, relying on fixed file paths or the clipboard to transfer data. If any link encounters an unexpected situation (download path changes, Excel version differences causing menu position shifts, Outlook interface updates), the entire workflow collapses.

DesireCore’s AI agent, with its visual comprehension capabilities, identifies the current interface state in real-time after each operation step, automatically adapting to interface changes. For example:

If the backend system’s export button has moved, the AI can locate it through text recognition
If a different Excel version changes the “Insert Pivot Table” menu path, the AI adapts
If Outlook’s interface has been updated, the AI similarly identifies the new “New Email” button

This adaptive capability is the core advantage of AI-native automation over traditional RPA.

Scheduled Execution and Exception Handling

After setting this workflow as a scheduled task, DesireCore’s scheduling system automatically triggers execution at 8:30 AM daily. If exceptions occur during execution, the system handles them as follows:

Backend system inaccessible: Wait 5 minutes and retry, up to 3 times. If still failing, notify the operations staff: “Backend system access error, today’s daily report requires manual processing.”
Empty data export: Possibly a holiday with no sales data. The system sends an email notification: “No sales data yesterday; daily report will not be sent.”
Outlook not logged in: Attempt automatic login; if two-factor authentication is required, notify the user for manual handling.

Part 7: Case Study 3 — Scheduled GUI Inspection and Anomaly Alerts

Scenario Description

The IT operations team needs to regularly check multiple monitoring dashboards (such as Grafana, Zabbix, and company-built operations screens) to confirm that all metrics are normal and no alerts are present. This work is typically performed manually by on-duty staff every 1-2 hours: opening each monitoring page, checking key metrics one by one, confirming whether any anomaly alerts exist, and recording and escalating any findings.

While each inspection session is relatively short (about 10-15 minutes), the high-frequency repetition accumulates to consume significant human resources. During overnight shifts, the reliability of manual inspections also decreases due to fatigue.

Task Configuration

Configure a scheduled inspection task in DesireCore:

“Execute the following GUI inspection task every hour:

Open Chrome, visit the Grafana monitoring dashboard (grafana.company.com/dashboard/main)

Check if CPU utilization exceeds 80%

Check if memory utilization exceeds 85%

Check if disk utilization exceeds 90%

Check for any red alert indicators

Switch to the Zabbix page (zabbix.company.com)

Check if the ‘unacknowledged problems’ list is empty

If there are unacknowledged problems, record the details

Switch to the company operations screen (ops-screen.company.com)

Check if all service availability indicators are green

Check if all response times are within thresholds

If any anomalies are found:

Capture the anomaly screen

Send an alert to the operations group chat via enterprise messaging (including anomaly description and screenshot)

For severe anomalies (such as service unavailability), additionally phone notify the on-duty lead”

Inspection Execution Details

Steps for each inspection round:

Grafana check:
- Open the Grafana main monitoring dashboard
- Capture a full-screen screenshot
- Read current CPU, memory, and disk utilization values via OCR
- Compare against defined thresholds
- Scan the page for red/orange alert icons
- Record check results
Zabbix check:
- Navigate to the Zabbix problem list page
- Read the count of “unacknowledged problems”
- If the count is non-zero, read each problem’s name, severity, duration, and scope of impact
- Record check results
Operations screen check:
- Navigate to the operations screen page
- Check the color of each service status indicator
- Read response time values
- Record check results
Result processing:
- If all checks are normal: log the inspection record and wait for the next round
- If anomalies are found:
  - Generate an anomaly report (including screenshots, anomalous metrics, potential impact)
  - Send an alert message via enterprise messaging API or GUI operation
  - Determine whether to trigger a phone notification based on severity

Inspection Report Examples

Normal inspection log:

Inspection time: 2026-04-02 14:00:00 Inspection result: All normal

Grafana: CPU 45%, Memory 62%, Disk 71% — all within thresholds

Zabbix: Unacknowledged problems: 0

Operations screen: All services green, response times normal

Anomaly alert message:

[ALERT] Anomaly detected during inspection — 2026-04-02 15:00:00

The following anomalies were found during scheduled inspection:

Grafana — Server prod-web-03 CPU utilization at 94% (threshold 80%), sustained for 23 minutes

Zabbix — 2 unacknowledged problems:

[High] prod-db-01 disk I/O latency anomaly (triggered at 15:02)

[Medium] prod-cache-02 connection count nearing limit (triggered at 14:47)

Anomaly screenshots are attached. Please address promptly.

Value of Automated Inspection

Aspect	Manual Inspection	DesireCore Automated Inspection
Frequency	Once every 1-2 hours (human limitation)	As frequent as every 5 minutes
Overnight reliability	Affected by fatigue, prone to omissions	Consistent 24/7 execution
Response speed	5-10 minutes from detection to escalation	<1 minute from detection to alert
Inspection granularity	Depends on personnel experience, may miss details	Systematic item-by-item checking per defined rules
Historical traceability	Depends on manual records, may be incomplete	Automatic archiving of every inspection round
Labor cost	Requires dedicated on-duty personnel	Frees human resources for higher-value work

Part 8: Security Mechanisms — Whitelists, Human Gates, and Audit Logs

Allowing an AI agent to operate your computer and phone naturally raises security as the foremost concern. DesireCore designed Computer Use with security as a core consideration, establishing a multi-layered security protection system.

五层安全防护机制

Application Whitelist Control

Not all applications are suitable for automated operation. DesireCore provides an application whitelist mechanism for precise control over which applications the AI agent can operate:

Whitelist configuration:

Configure the list of permitted applications for each device on the device management page
Only applications on the whitelist will respond to agent operation commands
Even if an agent requests to operate a non-whitelisted application, HostAgent will refuse execution

Typical configuration example:

Allowed: Chrome, Excel, Outlook, company CRM system, ERP system
Blocked: Online banking applications, password managers, system settings (partial), antivirus software

Dynamic management:

Whitelists can be adjusted at any time
Support time-based whitelist policies (e.g., allow CRM operations during work hours, only inspection operations outside work hours)
Support per-workflow whitelists (each workflow can only operate its required applications)

Human Gate Confirmation Mechanism

For sensitive operations, DesireCore introduces the “Human Gate” confirmation mechanism. This is conceptually similar to safety gates in industrial production — before executing critical operations, explicit human confirmation must be obtained.

Trigger conditions: The human gate does not trigger at every operation step — that would defeat the purpose of automation. It triggers only in the following situations:

Financial operations: Involving payments, transfers, order confirmations, and other financial actions
Data deletion operations: Executing irreversible operations like data deletion or record clearing
Permission change operations: Modifying user permissions, role assignments, and other security-sensitive operations
External communication operations: Sending emails or messages to external contacts (configurable)
System configuration changes: Modifying system settings, network configurations, or other changes that could affect service stability
Custom rules: You can define which operations require human gate confirmation

Confirmation flow:

The agent pauses when reaching a step requiring confirmation
A confirmation request is sent to the user, including:
- Description of the operation to be performed
- The operation’s target and expected effect
- Data or objects involved in the operation
- Current screen screenshot
After review, the user selects:
- Confirm execution: Proceed with the operation
- Reject execution: Skip the operation; subsequent workflow behavior depends on policy
- Modify and execute: Adjust operation parameters before execution
The confirmation action is recorded in the audit log

Human gate flexibility:

Confirmation timeout can be set (e.g., auto-skip if unconfirmed within 5 minutes)
Specific approvers can be designated (not necessarily the task initiator — could be a supervisor or security reviewer)
“Batch confirmation” mode is available (for similar operations, confirm once and subsequent similar operations execute automatically)

Comprehensive Operation Audit Logs

Every Computer Use operation is fully recorded in audit logs, ensuring traceability and compliance:

Log contents:

Timestamp: Precise time of operation execution
Operator: Identity of the user who initiated the task
Target device: Which device the operation was executed on
Operation type: Mouse operation, keyboard input, application operation, etc.
Operation details: Specific operation content (e.g., “entered ‘John Smith’ in Salesforce Name field”)
Screenshot archive: Before and after screenshots of the operation
Execution result: Whether the operation succeeded; if failed, the failure reason
Human gate records: If a human gate was triggered, records the approver, confirmation time, and outcome

Log uses:

Compliance auditing: Meets industry compliance requirements (e.g., regulated industries like finance and healthcare)
Troubleshooting: When automation workflows encounter anomalies, logs pinpoint the exact step and cause
Workflow optimization: Analyzing execution time and success rate data to identify optimization opportunities
Security traceability: In the event of a security incident, audit logs provide a complete evidence chain

Log management:

Searchable by time range, device, user, operation type, and other dimensions
Exportable in CSV and JSON formats
Configurable retention policies (e.g., retain for 90 days)
Critical operation logs can be set as non-deletable

One-Click Interrupt

At any time, you can use the one-click interrupt feature to immediately stop all AI agent operations on the target device:

Keyboard shortcut interrupt: Press the designated shortcut in the DesireCore client (default Ctrl+Shift+Esc)
Button interrupt: Click the “Emergency Stop” button on the task execution interface
Device-side interrupt: Select “Stop All Operations” from HostAgent’s tray icon on the target device
Remote interrupt: Remotely stop operations on any device via the DesireCore mobile app

After interruption, the agent immediately ceases all operations and reports the current execution status and completed steps, helping you decide on next steps.

Rate Limiting Protection

To prevent the AI agent’s rapid operation speed from causing issues with target applications (such as triggering anti-bot mechanisms or exceeding API rate limits), DesireCore includes built-in rate limiting:

Default rate limiting: Reasonable intervals between mouse clicks and keyboard inputs (simulating human operation cadence)
Custom rate limiting: Different operation speeds can be set for different applications
Intelligent rate limiting: The AI automatically adjusts operation cadence based on application response speed — if a page loads slowly, it waits longer before the next operation
Rate alerts: When the operation frequency approaches an application’s threshold, the system automatically slows down and issues a warning

Part 9: Mobile Automation — Android/iOS/HarmonyOS

With the rise of mobile work, an increasing number of workflows involve mobile device operations. DesireCore’s Computer Use covers not only desktop platforms but also fully supports mobile automation.

Unique Challenges of Mobile Automation

Compared to desktop environments, mobile automation faces several unique challenges:

Small screen size: Mobile devices have limited screen space, requiring frequent scrolling to view complete content
Touch interaction: Phones use touch rather than mouse input, with different interaction patterns (tap, long press, swipe, pinch-to-zoom)
Strict system permissions: Especially on iOS and HarmonyOS, with numerous restrictions on background operations
Variable network environments: Mobile devices may switch between WiFi and cellular networks
Notification interference: Various notification pop-ups on phones can disrupt automation operations

Android Automation

Android is the most mature platform for mobile automation. DesireCore’s HostAgent achieves comprehensive GUI operation capabilities through Android’s AccessibilityService:

Supported operations:

Touch input: Single-finger tap, long press, swipe (all directions), two-finger pinch
Text input: Via clipboard method (bypassing input method compatibility issues)
Application management: Open, switch, and close applications
Notification handling: Read and respond to notifications
System operations: Adjust settings, connect to WiFi, etc.

Typical scenarios:

Batch-approving pending items in enterprise apps
Updating customer follow-up status in mobile CRM
Sending standardized replies in instant messaging apps
Completing regular check-in and attendance tasks on mobile

Notes:

Android 8.0 or above recommended
Disable battery optimization to prevent the system from killing HostAgent
It is recommended to keep the screen on (enable “Developer Options > Stay Awake” in settings)

iOS Automation

iOS’s closed ecosystem imposes more restrictions on automation compared to Android, but DesireCore still achieves a viable automation solution through multiple technical approaches:

Implementation methods:

Basic interface operations via iOS Accessibility APIs
System-level operations via iOS Shortcuts integration
For jailbroken devices, more complete operation capabilities are available

Supported operations:

Touch input: Tap, swipe, long press
Application switching: Via accessibility shortcut methods
Text input: Via clipboard method
Select system operations: Implemented through Shortcuts

Limitations and solutions:

iOS does not support background screenshots (limitation) > DesireCore obtains visuals through the screen recording interface (solution)
iOS restricts cross-application operations (limitation) > Achieved through accessibility and Shortcuts combinations (solution)
iOS permission dialogs require manual confirmation (limitation) > After initial authorization, subsequent operations proceed automatically (solution)

HarmonyOS Automation

As an emerging mobile operating system from Huawei, DesireCore provides native support for HarmonyOS:

Technical foundation:

Based on HarmonyOS’s accessibility framework (AccessibilityExtensionAbility)
Supports HarmonyOS 4.0 and above
Compatible with both native HarmonyOS applications and Android-compatible applications

Distinctive features:

Leverages HarmonyOS’s distributed capabilities for seamless cross-device task handoff
Supports HarmonyOS Atomic Services
Potential integration with Huawei’s Celia assistant

Mobile Case Study: Automating Repetitive In-App Tasks

Scenario: An e-commerce operations staff member needs to modify promotional prices for 50 products daily in a product management app.

Manual process: Open app > search for product > enter edit page > modify price > save > return to list > search for next product… Repeat 50 times, taking approximately 1.5 hours.

DesireCore automation:

Prepare data containing product SKUs and new prices (can be an Excel or text file)
Create a task in DesireCore:

“Please modify product prices in the product management app on my phone according to the following list: [Product SKU] > [New Price] SKU001 > 199.00 SKU002 > 299.00 … Confirm successful save after each modification. If a product cannot be found, skip and mark it.”

The agent automatically executes on the phone:
- Open the product management app
- Enter the product SKU in the search box
- Tap the search result to enter product details
- Tap the “Edit” button
- Locate the price field
- Clear the existing price and enter the new price
- Tap “Save”
- Verify save success via screenshot
- Return to the list and process the next product
Generate a modification report upon completion

Time required: Approximately 20 minutes (machine operation speed is consistent and fatigue-free)

Desktop and Mobile Collaboration

One of DesireCore’s most powerful capabilities is supporting collaborative workflows across desktop and mobile platforms. For example:

Export data from a database on a Windows PC
Generate reports using specialized software on a macOS machine
Send the reports to clients via enterprise messaging on an Android phone
Annotate approval comments using Apple Pencil on an iOS iPad

This cross-platform, cross-device collaboration capability makes DesireCore not merely a point automation tool but a true “universal digital assistant.”

Part 10: Pairing with Super Document — The Complete Document Processing Loop

Computer Use solves the GUI operation automation problem, but in many workflows, document processing is an indispensable component. DesireCore’s “Super Document” feature is designed specifically for document scenarios, and when used in conjunction with Computer Use, it creates a complete loop from data acquisition to document output.

What Is Super Document?

Super Document applies code review mechanisms to document writing, with the core philosophy of “AI writes for you, you review.” Unlike traditional AI writing tools, Super Document does not simply generate a complete document for you to “accept or reject wholesale.” Instead, like code review, it marks each modification individually, provides choices, and includes rationale.

Core workflow:

AI drafts/revises: You provide a document draft or requirements description, and the AI generates or modifies the document content.
Individual change marking: Every change is clearly marked — added content, deleted content, modified content — all visible at a glance.
Three action options: For each change, you can:
- Accept: Agree with the change and keep the AI’s suggestion
- Reject: Disagree with the change and keep the original text
- Edit: Further adjust based on the AI’s suggestion
Change rationale: The AI provides a rationale for each change, explaining why it was made (e.g., “grammatical inconsistency here,” “this description is not precise enough,” “recommend using more professional terminology”).
Git-style version history: All changes and review records are preserved, and you can revert to any historical version at any time.

Computer Use + Super Document Collaboration Scenarios

When Computer Use and Super Document are combined, they enable the following powerful workflows:

Scenario 1: Automated data collection + intelligent report generation

Computer Use automatically collects data from multiple systems (e.g., exporting sales data from ERP, HR data from the HR system, cost data from the financial system)
Super Document automatically generates a monthly business analysis report based on the collected data
The user reviews AI-generated analysis conclusions and recommendations through the review interface, confirming or modifying each point
Computer Use sends the final version of the report to management via email

In this entire workflow, the human only needs to perform the “review” step — data collection, report generation, and email sending are all handled automatically by AI.

Scenario 2: Contract review and revision

The user uploads a contract document
Super Document automatically reviews contract terms, marking potential risk points and suggested clause revisions
The user reviews each marked item, accepting, rejecting, or modifying the AI’s suggestions
After review is complete, Computer Use automatically opens the company’s contract management system, uploads the revised contract, and fills in the approval form

Scenario 3: Multilingual document translation and proofreading

The user provides a Chinese document
Super Document generates an English translation with paragraph-by-paragraph comparison marking
The user reviews translation quality, modifying unsatisfactory paragraphs
Computer Use uploads the translated document to the company’s document management system, updating the multilingual version

Practical Value of Version History

Super Document’s Git-style version history is more than just “being able to roll back.” It brings an entirely new experience to document collaboration:

Change tracking: Every modification is recorded, clearly showing “who made what change and when.”
Version comparison: Any two versions can be compared to see their differences and understand the document’s evolution.
Review chain: The complete review record forms a review chain, satisfying compliance requirements (such as ISO document control standards).
Rollback capability: If the latest modifications prove problematic, you can roll back to a previous stable version with one click.
Branch collaboration: Multiple people can make parallel modifications based on different versions of the same document, then merge the results.

Super Document vs. Traditional AI Writing Tools

Feature	Traditional AI Writing Tools	DesireCore Super Document
Output method	One-time complete document generation	Individual change marking
User control	Accept or reject wholesale	Accept/reject/edit per change
Change transparency	Opaque (unclear what changed)	Fully transparent (every change marked)
Change rationale	None	Rationale provided for each change
Version management	None or simple undo	Git-style full version history
Automation integration	Typically unsupported	Seamless collaboration with Computer Use

Conclusion: The Future from Manual to Intelligent Automation

Through this comprehensive guide, we can see that DesireCore’s GUI desktop automation is not a simple replacement for traditional RPA but a paradigm-level upgrade. Let us review the key takeaways:

Core Technical Breakthroughs

AI-native visual understanding: No longer relying on fixed pixel coordinates or element selectors, but understanding interface semantics through AI visual capabilities and adapting to interface changes.
Natural language driven: No scripts or code needed — describe the task in natural language for automatic execution.
Closed-loop verification: Every operation is accompanied by screenshot verification to ensure execution accuracy.
Intelligent exception handling: Capable of autonomous judgment and response when encountering unexpected situations, no longer fragile enough to stop at the first error.
Cross-platform coverage: Comprehensive support across six major platforms — Windows, macOS, Linux, Android, iOS, and HarmonyOS.

Real-World Application Value

From the three case studies in this article, we can see that DesireCore’s GUI automation delivers significant efficiency improvements across different scenarios:

Bulk CRM data entry: Reduced from 10 hours to 1.8 hours, error rate dropped from 3-5% to below 1%
Cross-application data transfer: Transformed from 30-45 minutes of manual work to fully automated scheduled execution
Scheduled GUI inspection: Shifted from manual on-duty staff dependence to 24/7 automated monitoring with minute-level response times

Security and Control

DesireCore has not sacrificed security in pursuit of automation. Five major security mechanisms — application whitelists, human gate confirmation, comprehensive audit logs, one-click interrupt, and rate limiting — ensure the AI agent always operates under your control. The human gate mechanism in particular provides additional security for sensitive operations, letting you enjoy the convenience of automation without worrying about loss of control.

Complete Toolchain

Computer Use does not exist in isolation. Combined with the intelligent task orchestration engine, individual operations can be organized into complex workflows. Combined with Super Document, data collection and document processing form a complete loop. Two execution modes — fixed mode (SOP/Workflow) and flexible mode (AI-driven orchestration) — accommodate varying automation needs.

Looking Ahead

GUI desktop automation is at an exciting stage of development. As AI visual understanding and reasoning capabilities continue to advance, we can anticipate:

More complex task handling: AI will be able to handle advanced tasks requiring multi-step reasoning and complex judgment, not just mechanical repetitive operations.
More natural human-AI collaboration: Collaboration between humans and AI will become more fluid, with AI proactively seeking human guidance when needed and making independent decisions when confident.
Broader platform support: Beyond the six currently supported platforms, future expansion to additional device types and operating systems is likely.
Stronger learning capabilities: AI will be able to learn from user operation habits, automatically optimizing operational strategies and workflows.
Deeper system integration: Integration with enterprise internal systems will grow increasingly deep, expanding from GUI operations to hybrid API calls, direct database connections, and other approaches.

DesireCore is leading this transformation from manual to intelligent automation. Whether you are an individual user needing to automate daily repetitive tasks or an enterprise team looking to boost operational efficiency, DesireCore’s Computer Use capability is well worth trying.

Starting today, let an AI agent become your digital assistant, freeing your time and energy for work that truly requires human creativity and judgment. This is not merely an efficiency improvement — it is a fundamental transformation of how we work.

This article is based on the latest version of DesireCore. For more information or to get started, visit the DesireCore website to download the client, or consult the product documentation for detailed usage guides.