LangEngine硬核开源！24小时极限复刻OpenManus

本文链接：https://blog.csdn.net/weixin_68964112/article/details/147832815

摘要

LangEngine v1.2.1 推出 openmanus-preview 的 Java 版本，由于LangEngine框架前期已经有许多技术积累，openmanus很快能够复刻出来，初步具备从规划到执行全流程自主完成任务的能力，本篇文章将针对功能展开介绍。

Monica.im公司推出全球首款自主 AI Agent 产品Manus，引发 AI 行业震动，甚至出现了邀请码一码难求的情况，Manus在 GAIA 基准测试中取得了 SOTA 成绩，其性能超越 OpenAI 的同层次大模型。开源界也实现了类似于 manus 功能的openmanus、owl等开源框架，但是这些框架都是基于 python 语言编写，LangEngine作为纯 java 的 AI 应用开发框架的代表，全新实现了 openmanus 功能，优化规划功能、BrowserUse工具、DeepSearch深度搜索的工具能力。

什么是Manus？Manus作为全球首款真正意义上的通用AI Agent，具备从规划到执行全流程自主完成任务的能力，如撰写报告、制作表格等。它不仅生成想法，更能独立思考并采取行动。以其强大的独立思考、规划并执行复杂任务的能力，直接交付完整成果，展现了前所未有的通用性和执行能力。

LangEngine开源地址：https://github.com/AIDC-AI/ali-langengine

OpenManus For Java

工作原理

工具集合

BrowserUse

基于 Java 版的 Selenium 的 UI 自动化组件。Selenium是一个用于 Web 应用程序测试的工具。Selenium测试直接运行在浏览器中，就像真正的用户在操作一样。支持的浏览器包括 IE、Edge、Mozilla Firefox，Safari，Google Chrome，Opera，Edge 等。这个工具的主要功能包括：测试与浏览器的兼容性——测试应用程序看是否能够很好地工作在不同浏览器和操作系统之上。测试系统功能——创建回归测试检验软件功能和用户需求。支持自动录制动作和自动生成.Net、Java、Perl等不同语言的测试脚本。

在使用 Selenium 之前需要下载ChromeDriver，网站链接：https://googlechromelabs.github.io/chrome-for-testing/（较高版本），https://developer.chrome.com/docs/chromedriver/downloads?hl=zh-cn（较低版本），需要在应用程序里面绑定driver。

URL resource = getClass().getClassLoader().getResource("data/chromedriver");
if (resource == null) {
    thrownew IllegalStateException("Chromedriver not found in resources");
}
String chromedriverPath = Paths.get(resource.getPath()).toFile().getAbsolutePath();
System.setProperty("webdriver.chrome.driver", chromedriverPath);

工具实现：

@Override
public ToolExecuteResult run(String toolInput, ExecutionContext executionContext) {
    log.info("BrowserUseTool toolInput:" + toolInput);
    Map<String, Object> toolInputMap = JSON.parseObject(toolInput, new TypeReference<Map<String, Object>>() {});

    String action = null;
    if(toolInputMap.get("action") != null) {
        action = (String) toolInputMap.get("action");
    }
    String url = null;
    if(toolInputMap.get("url") != null) {
        url = (String) toolInputMap.get("url");
    }
    Integer index = null;
    if(toolInputMap.get("index") != null) {
        index = (Integer) toolInputMap.get("index");
    }
    String text = null;
    if(toolInputMap.get("text") != null) {
        text = (String) toolInputMap.get("text");
    }
    String script = null;
    if(toolInputMap.get("script") != null) {
        script = (String) toolInputMap.get("script");
    }
    Integer scrollAmount = null;
    if(toolInputMap.get("scroll_amount") != null) {
        scrollAmount = (Integer) toolInputMap.get("scroll_amount");
    }
    Integer tabId = null;
    if (toolInputMap.get("tab_id") != null) {
        tabId = (Integer) toolInputMap.get("tab_id");
    }
    try {
        switch (action) {
            case"navigate":
                if (url == null) {
                    returnnew ToolExecuteResult("URL is required for 'navigate' action");
                }
                driver.get(url);
                returnnew ToolExecuteResult("Navigated to " + url);

            case"click":
                if (index == null) {
                    returnnew ToolExecuteResult("Index is required for 'click' action");
                }
                List<WebElement> elements = driver.findElements(By.cssSelector("*"));
                if (index < 0 || index >= elements.size()) {
                    returnnew ToolExecuteResult("Element with index " + index + " not found");
                }
                elements.get(index).click();
                returnnew ToolExecuteResult("Clicked element at index " + index);

            case"input_text":
                if (index == null || text == null) {
                    returnnew ToolExecuteResult("Index and text are required for 'input_text' action");
                }
                WebElement inputElement = driver.findElements(By.cssSelector("input, textarea")).get(index);
                inputElement.sendKeys(text);
                returnnew ToolExecuteResult("Input '" + text + "' into element at index " + index);

            case"screenshot":
                TakesScreenshot screenshot = (TakesScreenshot) driver;
                String base64Screenshot = screenshot.getScreenshotAs(OutputType.BASE64);
                    returnnew ToolExecuteResult("Screenshot captured (base64 length: " + base64Screenshot.length() + ")");

                case"get_html":
                    String html = driver.getPageSource();
                    returnnew ToolExecuteResult(html.length() > MAX_LENGTH ? html.substring(0, MAX_LENGTH) + "..." : html);

                case"get_text":
                    int counter = 0;
                    String body = driver.findElement(By.tagName("body")).getText();
                    log.info("get_text body is {}", body);
                    if(body != null && body.contains("我们的系统检测到您的计算机网络中存在异常流量")) {
                        while (counter++ < 5) {
                            Thread.sleep(10000);
                            body = driver.findElement(By.tagName("body")).getText();
                            log.info("retry {} get_text body is {}", counter, body);
                            if(body != null && body.contains("我们的系统检测到您的计算机网络中存在异常流量")) {
                                continue;
                            }
                            returnnew ToolExecuteResult(body);
                        }
                    }
                    returnnew ToolExecuteResult(body);

                case"execute_js":
                    if (script == null) {
                        returnnew ToolExecuteResult("Script is required for 'execute_js' action");
                    }
                    JavascriptExecutor jsExecutor = (JavascriptExecutor) driver;
                    Object result = jsExecutor.executeScript(script);
                    returnnew ToolExecuteResult(result.toString());

                case"scroll":
                    if (scrollAmount == null) {
                        returnnew ToolExecuteResult("Scroll amount is required for 'scroll' action");
                    }
                    ((JavascriptExecutor) driver).executeScript("window.scrollBy(0," + scrollAmount + ");");
                    String direction = scrollAmount > 0 ? "down" : "up";
                    returnnew ToolExecuteResult("Scrolled " + direction + " by " + Math.abs(scrollAmount) + " pixels");

                case"new_tab":
                    if (url == null) {
                        returnnew ToolExecuteResult("URL is required for 'new_tab' action");
                    }
                    ((JavascriptExecutor) driver).executeScript("window.open('" + url + "', '_blank');");
                    returnnew ToolExecuteResult("Opened new tab with URL " + url);

                case"close_tab":
                    driver.close();
                    returnnew ToolExecuteResult("Closed current tab");

                case"switch_tab":
                    if (tabId == null) {
                        returnnew ToolExecuteResult("Tab ID is out of range for 'switch_tab' action");
                    }
                    Object[] windowHandles = driver.getWindowHandles().toArray();
                    driver.switchTo().window(windowHandles[tabId].toString());
                    returnnew ToolExecuteResult("Switched to tab " + tabId);

                case"refresh":
                    driver.navigate().refresh();
                    returnnew ToolExecuteResult("Refreshed current page");

                default:
                    returnnew ToolExecuteResult("Unknown action: " + action);
            }
        } catch (Exception e) {
            returnnew ToolExecuteResult("Browser action '" + action + "' failed: " + e.getMessage());
        }
    }

PythonExecute

当前 Java 可以使用 ProcessBuilder 以及 Runtime.getRuntime().exec()来调用系统命令，因此可以用这种方式来执行 Python 脚本。

@Override
public ToolExecuteResult run(String toolInput, ExecutionContext executionContext) {
    log.info("PythonExecute toolInput:" + toolInput);
    Map<String, Object> toolInputMap = JSON.parseObject(toolInput, new TypeReference<Map<String, Object>>() {});
    String code = (String) toolInputMap.get("code");
    CodeExecutionResult codeExecutionResult = CodeUtils.executeCode(code, "python", "tmp_" + LogIdGenerator.generateUniqueId() + ".py", arm64, new HashMap<>());
    String result = codeExecutionResult.getLogs();
    returnnew ToolExecuteResult(result);
}

Bash

同上，也是采用ProcessBuilder方式，如果是生产环境，需要通过沙盒环境进行执行。

@Override
public ToolExecuteResult run(String toolInput, ExecutionContext executionContext) {
    log.info("Bash toolInput:" + toolInput);
    Map<String, Object> toolInputMap = JSON.parseObject(toolInput, new TypeReference<Map<String, Object>>() {});
    String command = (String) toolInputMap.get("command");
    List<String> commandList = new ArrayList<>();
    commandList.add(command);
    List<String> result = BashProcess.executeCommand(commandList, workingDirectoryPath);
    returnnew ToolExecuteResult(JSON.toJSONString(result));
}

publicstatic List<String> executeCommand(List<String> commandList, String workingDirectoryPath) {
        return commandList.stream().map(commandLine -> {
            try {
                    ProcessBuilder pb = new ProcessBuilder("bash", "-c", commandLine);
                    if(!StringUtils.isEmpty(workingDirectoryPath)) {
                        pb.directory(new File(workingDirectoryPath));
                    }

                    // 启动进程
                    Process process = pb.start();

                    // 获取命令输出
                    BufferedReader reader = new BufferedReader(new InputStreamReader(process.getInputStream()));
                    StringBuilder builder = new StringBuilder();
                    String line;
                    while ((line = reader.readLine()) != null) {
                        log.warn(line);
                        builder.append(line);
                        builder.append("\n");
                    }

                    // 等待命令执行完成
                    int exitCode = process.waitFor();

                    if (exitCode == 0) {
                        log.warn("Bash command executed successfully.");
                    } else {
                        log.error("Failed to execute Bash command.");
                    }
                return builder.toString();
            } catch (Throwable e) {
                e.printStackTrace();
            }
            returnnull;
        }).collect(Collectors.toList());
    }

GoogleSearch

Google搜索引擎提供的 API 能力：https://developers.google.com/custom-search/v1/overview?hl=zh_CN、https://serpapi.com/search-api

@Override
public ToolExecuteResult run(String toolInput, ExecutionContext executionContext) {
    log.info("GoogleSearch toolInput:" + toolInput);

    Map<String, Object> toolInputMap = JSON.parseObject(toolInput, new TypeReference<Map<String, Object>>() {});
    String query = (String) toolInputMap.get("query");

    Integer numResults = 2;
    if(toolInputMap.get("num_results") != null) {
        numResults = (Integer) toolInputMap.get("num_results");
    }
    Map<String, Object> response = service.search(query, 0, numResults, serpapiKey);

    if (response.containsKey("error")) {
        thrownew RuntimeException("Got error from SerpAPI: " + response.get("error"));
    }
    if (response.containsKey("answer_box") && response.get("answer_box") instanceof List) {
        response.put("answer_box", ((List) response.get("answer_box")).get(0));
    }

    String toret = "";
    if (response.containsKey("answer_box") && ((Map<String, Object>) response.get("answer_box")).containsKey("answer")) {
        toret = ((Map<String, Object>) response.get("answer_box")).get("answer").toString();
    } elseif (response.containsKey("answer_box") && ((Map<String, Object>) response.get("answer_box")).containsKey("snippet")) {
        toret = ((Map<String, Object>) response.get("answer_box")).get("snippet").toString();
    } elseif (response.containsKey("answer_box")
            && ((Map<String, Object>) response.get("answer_box")).containsKey("snippet_highlighted_words")
    ) {
        toret = ((List<String>) ((Map<String, Object>) response.get("answer_box")).get("snippet_highlighted_words")).get(0);
    } elseif (response.containsKey("sports_results") && ((Map<String, Object>) response.get("sports_results")).containsKey("game_spotlight")) {
        toret = ((Map<String, Object>) response.get("sports_results")).get("game_spotlight").toString();
    } elseif (response.containsKey("shopping_results") && ((List<Map<String, Object>>) response.get("shopping_results")).get(0).containsKey("title")) {
        List<Map<String, Object>> shoppingResults = (List<Map<String, Object>>) response.get("shopping_results");
        List<Map<String, Object>> subList = shoppingResults.subList(0, 3);
        toret = subList.toString();
    } elseif (response.containsKey("knowledge_graph") && ((Map<String, Object>) response.get("knowledge_graph")).containsKey("description")) {
        toret = ((Map<String, Object>) response.get("knowledge_graph")).get("description").toString();
    } elseif ((((List<Map<String, Object>>) response.get("organic_results")).get(0)).containsKey("snippet")) {
        toret = (((List<Map<String, Object>>) response.get("organic_results")).get(0)).get("snippet").toString();
    } elseif ((((List<Map<String, Object>>) response.get("organic_results")).get(0)).containsKey("link")) {
        toret = (((List<Map<String, Object>>) response.get("organic_results")).get(0)).get("link").toString();
    } elseif (response.containsKey("images_results") && ((Map<String, Object>) ((List<Map<String, Object>>) response.get("images_results")).get(0)).containsKey("thumbnail")) {
        List<String> thumbnails = new ArrayList<>();
        List<Map<String, Object>> imageResults = (List<Map<String, Object>>) response.get("images_results");
        for (Map<String, Object> item : imageResults.subList(0, 10)) {
            thumbnails.add(item.get("thumbnail").toString());
        }
        toret = thumbnails.toString();
    } else {
        toret = "No good search result found";
    }
    log.warn("SerpapiTool result:" + toret);
    returnnew ToolExecuteResult(toret);
}

DeepSearch

什么是 Deep Search？DeepSearch 的核心理念是通过在搜索、阅读和推理三个环节中不断循环往复，直到找到最优答案。搜索环节利用搜索引擎探索互联网，而阅读环节则专注于对特定网页进行详尽的分析。推理环节则负责评估当前的状态，并决定是应该将原始问题拆解为更小的子问题，还是尝试其他的搜索策略。

工作原理：

参考DeepSearcher的工作原理，实现了Java版本的DeepSearch框架。具体参考：https://github.com/AIDC-AI/ali-langengine/tree/main/alibaba-langengine-infrastructure/alibaba-langengine-deepsearch

@Override
public ToolExecuteResult run(String toolInput, ExecutionContext executionContext) {
    log.info("DeepSearchTool toolInput:" + toolInput);

    Map<String, Object> toolInputMap = JSON.parseObject(toolInput, new TypeReference<Map<String, Object>>() {});
    String query = (String) toolInputMap.get("query");

    RetrievalResultData retrievalResultData = deepSearcher.query(query);
    log.warn("DeepSearchTool result:" + JSON.toJSONString(retrievalResultData));
    returnnew ToolExecuteResult(retrievalResultData.getAnswer());
}

PlanningTool

作为 Planning 大模型用于 TooCall 的 Planning 工具，主要针对 Plan 进行CRUD，保存会话上下文。

@Override
public ToolExecuteResult run(String toolInput, ExecutionContext executionContext) {
    try {
        log.info("PlanningTool toolInput:" + toolInput);
        Map<String, Object> toolInputMap = JSON.parseObject(toolInput, new TypeReference<Map<String, Object>>() {});

        String command = null;
        if(toolInputMap.get("command") != null) {
            command = (String) toolInputMap.get("command");
        }
        String planId = null;
        if(toolInputMap.get("plan_id") != null) {
            planId = (String) toolInputMap.get("plan_id");
        }
        String title = null;
        if(toolInputMap.get("title") != null) {
            title = (String) toolInputMap.get("title");
        }
        List<String> steps = null;
        if(toolInputMap.get("steps") != null) {
            steps = (List<String>) toolInputMap.get("steps");
        }
        Integer stepIndex = null;
        if(toolInputMap.get("step_index") != null) {
            stepIndex = (Integer) toolInputMap.get("step_index");
        }
        String stepStatus = null;
        if(toolInputMap.get("step_status") != null) {
            stepStatus = (String) toolInputMap.get("step_status");
        }
        String stepNotes = null;
        if (toolInputMap.get("step_notes") != null) {
            stepNotes = (String) toolInputMap.get("step_notes");
        }

        switch (command) {
            case"create":
                return createPlan(planId, title, steps);
            case"update":
                return updatePlan(planId, title, steps);
            case"list":
                return listPlans();
            case"get":
                return getPlan(planId);
            case"set_active":
                return setActivePlan(planId);
            case"mark_step":
                return markStep(planId, stepIndex, stepStatus, stepNotes);
            case"delete":
                return deletePlan(planId);
            default:
                thrownew RuntimeException("Unrecognized command: " + command + ". Allowed commands are: create, update, list, get, set_active, mark_step, delete");
        }
    } catch (Throwable e) {
        thrownew RuntimeException(e);
    }
}

用例展示

论文生成

股价搜索

用户问题：用百度搜索一下阿里巴巴最近一周的股价，并绘制一个股价趋势图并保存到本地目录。

结果：

创建公司组织结构图

用户问题：Please create a comprehensive OpenAI organizational chart using public information, showing current hierarchy and reporting structures. Include key personnel (noting recent departures) and identify team members across departments. Deliver in HTML format.

结果：

四月日本之旅

用户问题：I need a 7-day Japan itinerary for April 15-23 from Seattle, with a $2500-5000 budget for my fiancée and me. We love historical sites, hidden gems, and Japanese culture (kendo, tea ceremonies, Zen meditation). We want to see Nara's deer and explore cities on foot. I plan to propose during this trip and need a special location recommendation. Please provide a detailed itinerary and a simple HTML travel handbook with maps, attraction descriptions, essential Japanese phrases, and travel tips we can reference throughout our journey. And Html use 'python3 -m http.server 8000' execute.

结果：

杭州买房

用户问题：我想在杭州西湖区买一套房产，我有一个孩子在读中学，一个孩子在读幼儿园，希望考虑他们的教育问题。我和妻子每月总收入5万。请用百度搜索，推荐性价比高的楼盘，写到本地文件中。

过程：

结果：

框架使用

代码实现

开源地址：https://github.com/AIDC-AI/ali-langengine/tree/main/alibaba-langengine-community/alibaba-langengine-openmanus

OpenManusMain

publicstaticvoidmain(String[] args) {
    URL resource = OpenManusMain.class.getClassLoader().getResource("data/chromedriver");
    if (resource == null) {
        thrownew IllegalStateException("Chromedriver not found in resources");
    }
    String chromedriverPath = Paths.get(resource.getPath()).toFile().getAbsolutePath();
    System.setProperty("webdriver.chrome.driver", chromedriverPath);

    ManusAgent manusAgent = new ManusAgent();

    Map<String, BaseAgent> agentMap = new HashMap<String, BaseAgent>() {{
        put("manus", manusAgent);
    }};
    Map<String, Object> data = new HashMap<>();
    PlanningFlow planningFlow = new PlanningFlow(agentMap, data);

    Scanner scanner = new Scanner(System.in);

    try {
        System.out.print("Enter your prompt: ");
        String prompt = scanner.nextLine().trim();

        if (prompt.isEmpty()) {
            System.out.println("Empty prompt provided.");
            return;
        }

        System.out.println("Processing your request...");

        try {
            long startTime = System.currentTimeMillis();
            String result = planningFlow.execute(prompt);

            long elapsedTime = System.currentTimeMillis() - startTime;
            System.out.println("Request processed in " + elapsedTime / 1000.0 + " seconds");
            log.info(result);
        } catch (Exception e) {
            log.error("Error: " + e.getMessage());
        }

    } catch (Throwable e) {
        log.error("Unexpected Error: " + e.getMessage());
    }

    System.out.println("Please press any key to end:");
    String input = scanner.nextLine();
    System.out.println("You finished");
    scanner.close();
}

总结

Langengine-Openmanus初步具备了通过大模型规划和执行步骤的能力，能够针对浏览器进行 UI 操作，并可本地执行 Python 命令。本文分享的目的是希望大家能够通过这一框架快速学习和理解其原理。

开源生态为Agent技术的快速发展提供了核心动力。从算法模型到工程框架，全球开发者的开放共享使前沿创新得以快速落地验证。

但技术原型与成熟产品之间存在真实壁垒：代码复现可以“快”，而构建稳定、可用的服务需要技术沉淀——包括工程优化、模型演进、场景适配及稳定性保障。这既依赖对开源技术的深度理解，更需直面真实场景中的复杂需求。期待与社区持续协作，在算法、工程、体验等维度打磨产品，将前沿技术转化为真正服务大众的智能解决方案。