On December 7, 2024, at the StarRocks Summit 2024, the data brand ‘Deltaverse’ incubated by Tencent Game’s data team made its official debut, launching its first data product – the intelligent data assistant for the big data era, ‘UData’. At the summit, Liu Yan, the technical lead and senior expert engineer of Tencent Game’s data team, delivered a keynote speech titled ‘Building a Lakehouse Data System in the AI Era’, sharing Tencent Game’s practical experience with ‘AI+Lakehouse’ and how UData has helped to enhance the efficiency of data work for Tencent Game’s business.
As a Q&A-style intelligent AI data assistant, UData is built on large language model technology and a lakehouse architecture, supported by a new generation of AI data asset systems. These assets can be understood and utilized by AI, improving the accuracy from business requirements to data delivery. UData provides users with a convenient experience of querying, exploring, analyzing, and visualizing data through natural language interaction. According to Liu Yan, UData has been applied to over 80 internal business applications at Tencent Game, increasing SQL code writing efficiency by 300%. In terms of the most critical delivery accuracy, UData has achieved a one-time accuracy rate of 89%, meeting the demands of actual business scenarios.
Liu Yan stated, ‘Tencent Game’s existing business has hundreds of thousands of data mining and extraction requirements annually. Compared to BI scenarios, data mining needs to deal with tens of thousands of tables, which must be understood by AI and achieve human-level accuracy to meet actual business needs.’ He added, ‘We have been exploring how to better empower data work with AI capabilities, applying AI to actual business scenarios, and making Data+AI a core competitive strength of the enterprise. UData is the best practice within the Tencent Game data team, solving key issues in building a ‘Data+AI’ system. ‘
The key to improving AI delivery accuracy: requirement construction and asset development. The Tencent Game data team, through extensive practice and research analysis, found that the low accuracy of AI writing SQL in actual business scenarios is often not due to insufficient capabilities of large models, but because of two reasons: first, AI’s ambiguous understanding of data requirements; and second, AI’s ambiguous understanding of data assets, with large models not obtaining complete information.
Addressing these two pain points, UData focuses its technical approach on demand construction and asset building, enhancing AI accuracy through an engineering approach. In terms of demand construction, it first defines standards that both AI and humans can understand. Based on these defined standards, it matches demand cases with industry knowledge, rewriting human-posed demands into standard formats to eliminate ambiguities in AI’s understanding of demands.
Additionally, when data requirements are complex, the demand agent can break down complex demands into simpler sub-demands, reducing the difficulty for AI generation and ensuring stable and controllable delivery quality through an engineered approach. For example, when a user requests: ‘Rank various gameplays within a game by daily participation rate + next-day retention ranking + seven-day retention ranking, and calculate an overall ranking. ‘ UData queries relevant game domain knowledge, breaks down this complex demand into four sub-demands, calculates and generates SQL for participation rate, active users, gameplay participation rate, next-day, and seven-day retention, and finally merges the SQL results of the four data packages to generate a final SQL.In terms of asset building, to enable AI to better understand and utilize assets, UData has created an ‘AI-driven data asset system.
‘ Traditional asset systems suffer from a lack of unstructured standards, construction lagging behind business needs, and high governance costs, which do not support large language models in delivering data requirements quickly and accurately. Therefore, based on ‘new-generation AI data assets,’ it aims to enable AI to understand and correctly deliver SQL for self-service delivery, defining semantic layer modeling standards, including industry knowledge, metrics, dimensions, features, metadata, etc. AI, through understanding semantic assets, adopts different asset usage strategies for different demands; for demands that already have metrics and dimension assets, it satisfies by recommending existing dashboards; for new metrics and new dimension demands, it enables AI to generate metrics and dimensions through feature assets to satisfy; for demands lacking semantic assets, AI can perceive and alert, and after supplementing features and other semantic assets, achieve AI asset delivery. The upgrade from traditional data platforms to new-generation AI data assets can establish asset connections between business needs, industry knowledge, and data structures, and through domain models, ensure assets are understood and used by AI. ‘Stable and controllable demand construction and an AI-understandable asset system are key to UData’s enhancement of AI delivery accuracy and its differentiated advantage over other products in the industry. ‘Liu Yan, the head of data technology at Tencent Games, stated: “Based on the current internal application situation of Tencent Games, the accuracy rate has been steadily maintained at 89%, and we firmly believe that this direction is reliable.” Leveraging lakehouse capabilities for intelligent and dynamic computational acceleration, it is not enough to simply write SQL correctly to support real-time exploration and analysis of detailed data.
Traditional data warehouse architectures (such as Lambda) perform a large amount of computation offline on a T+1 basis, which cannot support real-time fast querying of all data. To address this, UData has upgraded its data infrastructure by adopting a lakehouse architecture, achieving efficient querying of real-time detailed data through real-time data access, virtual data warehouses, and cold-hot data tiering technologies. Concurrently, UData has constructed a cost-efficiency optimization engine that quickly identifies assets requiring optimization and acceleration around three directions: asset热度、execution speed, and data scale. By integrating assets and materialized views, it enables the low-cost, high-efficiency use of data. Based on large model capabilities, a sustainable optimization operational platform is being built with a new generation of AI data assets. Through universal large models, domain models, and Agent multi-agent architecture, AI capabilities are better unleashed. Currently, UData can adapt to various industry-standard large models, including GPT and Hunyuan. Moreover, for industry-specific know-how and enterprise knowledge, UData introduces “domain models” that help large models better understand data assets through knowledge graphs, semantic understanding, retrieval, and sorting technologies. In terms of platform application processes, UData uses an Agent multi-agent architecture to create a highly collaborative operational platform between humans and AI that is sustainable and continuously optimized. A job (work) is broken down into several tasks, with some tasks completed by AI and others collaboratively completed by humans and AI (demand collaboration, acceptance collaboration), covering the entire process from business requirements to data delivery. Each node agent can interact with users in real-time, perceive issues promptly, and intervene and correct them, ensuring the sustainable optimization of the system. The AI multi-agent architecture allows AI to reconstruct various domains of data work. UData has been applied to over 80 business units within Tencent Games, tailored to the specific personalized needs of different game categories such as MOBA, MMORPG, and tactical competition, and continuously iterates and upgrades the product. In addition to gaming, UData’s product capabilities can also be utilized in other industries, such as catering, finance, and education, to help traditional enterprises achieve AI digital transformation, enhance data work efficiency, and improve data governance ROI through the new generation of AI data assets, assisting enterprises in reducing costs and increasing efficiency.The application of AI technology in data work still holds great potential. Tencent game data team Deltaverse is also constantly exploring. In addition to generating SQL through AI to improve data acquisition efficiency, we are further attempting to integrate the capabilities of ‘AI + lakehouse integration’ with more tools and systems to further explore and tap the potential of AI and realize the reconstruction of various fields of data work with AI.
Enterprises and partners interested in Tencent game data teams Deltaverse, UData, and data technologies can log in to the Deltaverse official website www.deltaverse.net to view more information and freely apply for product trials.