In 2024, the fierce battle of the "hundred models" that has been raging for more than half a year has reached the field of video. The rapidly evolving models and the violent accumulation of computing power have shown tremendous energy for change. However, the massive amount of video data that is difficult to develop and utilize has become a new "bottleneck".

"In the era of data, large models are the core tools, and scenario-based applications are the key to value realization," said Zhou Wenkai, Vice President of the R&D Center of Dahua Shares, to Zhi Dongxi. "Currently, data elements are very popular, but there are still many problems to be solved in the production, circulation, and trading of video data, which is greatly related to the privacy, sensitivity, and security of video data."

Advertisement

As a leading company in the AIoT field, Dahua Shares has been deeply involved in the field of video for more than a decade. Zhou Wenkai believes that extracting structured information from video data based on scenario-based business understanding and deeply integrating it with business applications can bring greater value.

As a representative enterprise of the data industry with video as the core, how does Dahua Shares create a future model? Through a dialogue with Zhou Wenkai, this article attempts to find the answers to these questions from the practical experience of Dahua Shares in thousands of industries.

1. Data is king, and the data industry with video as the core is particularly important.

According to the well-known research institution IDC, by 2025, the total amount of global data will exceed 180ZB, of which China's data volume will rank first in the world. With this trend, the scale of the data trading market is expected to reach more than 220 billion yuan. If we further consider the comprehensive development of infrastructure such as computing, storage, AI technology, and software driven by it, the overall market size is expected to break through the 200 trillion yuan mark.

What kind of broad development prospects will this show? Zhou Wenkai revealed to Zhi Dongxi that among all the data types currently generated, non-structured data with video as the core accounts for more than 90% of the total data volume. Each link of data collection, circulation, analysis, calculation, and application around video hides tremendous value. However, when we open the video data industry chain, we see that there are still many challenges: how to achieve precise data collection and cross-network interconnection of a large number of devices in complex perception scenarios and extensive IoT protocols, and in the analysis and processing services of video data, the current mining degree of non-structured data such as video is still very low. Video data needs further mining by industry intelligent algorithms and applications. Currently, the penetration rate of artificial intelligence in China is less than 10%. Only by structuring video content and combining it with business can greater value be realized, and scenario-based applications are the core key to the value realization of the data industry.

"China's video data element market has built a clear three-level structure: the upstream focuses on the first-level market of data collection and governance, the midstream focuses on the second-level market of data processing and analysis, and the downstream focuses on the third-level market of data application evaluation. This specialized division of labor system has not only promoted the refined development of the video data industry chain but also significantly accelerated the pace of video data moving towards industrialization," said Zhou Wenkai.

Overall, the vast data resources and the three-level video data element market are all fertile soil for the data industry.II. Leveraging the "Multiplier Effect" of Video Elements, Four Hurdles to Overcome

On January 4th of this year, the National Data Bureau and 16 other departments jointly issued the "Data Element ×" Three-Year Action Plan (2024-2026) (hereinafter referred to as the Action Plan), proposing to select 12 industries and fields such as industrial manufacturing, modern agriculture, commercial trade, transportation, and financial services to promote the multiplier effect of data elements and unlock the value of data elements.

Zhou Wenkai told Zhi Dongxi that more than 90% of the data involved in the 12 industries mentioned in the action plan is non-structured data centered on images and videos. To leverage the "multiplier effect" of these video elements, at least four hurdles must be overcome.

Specifically, these video data include various types such as user-generated data, professionally produced data, data generated from public resources, and data shared on social media. Compared to structured text data, these video data are more complex in terms of storage, processing, understanding, and computation.

1. Large storage volume. Unlike structured text data, a large amount of image and video data requires substantial storage. This raises higher demands for the effective use of storage space, including the need for strong encoding and decoding technology and effective storage capabilities.

2. Complex data processing. Video data comes in a variety of types with numerous encoding methods, requiring support for the processing of data with different resolutions, frame rates, encoding methods, and formats. This demands that data processors accumulate the ability to handle these different types of data.

3. Difficulty in video understanding. Video data content is complex and diverse, and to understand the contextual content, it is best to combine small and large models to parse the data, thereby extracting valuable information from the video data. It should be added that, in addition to the raw data required for training models, the real transactional value generated by video data still lies in the data value produced after video structuring, so the quality of video understanding largely determines the size of the video's value.

4. Large computational load. The computational load of video data is very large, and large models exacerbate this situation. This requires manufacturers to find a balance between accuracy and efficiency. The number of parameters in the video parsing model should be moderate, and it cannot have hundreds of billions of parameters like large text models, otherwise, the computational load would be too great.

The above characteristics also determine that the pricing, trading, and circulation of video data are more difficult.

Zhou Wenkai said that video data, unlike structured text data, is not as convenient to circulate, but it is huge in scale, and most of it is not readable by machines. If it is to be traded, how to upload and download massive video data poses high requirements for network bandwidth, security protection, and structuring costs; how to set pricing standards based on cost and value are all issues that need to be resolved.In response to this, Zhou Wenkai believes that the development of the value of video data elements can refer to the "separation of powers" of ownership, usage rights, and management rights adopted by the real estate industry. Users of video data pay fees to the data owners, and operators of video data promote the maximum safe mining of video value, thereby revitalizing the market for video data elements.

From a technical perspective, making video data tradable and usable also depends on AI's understanding of data. For example, Dahua has made in-depth accumulations in various aspects such as video coding and decoding, big data platforms, data governance, visual large models, and security compliance. For example, how to achieve heterogeneous data fusion and calculation between video data and text data, thereby reducing the difficulty and cost of developing and utilizing video data.

III. The key to realizing the value of data is the landing of scenario-based applications.

In the digital age, data is regarded as a new type of production factor. The value of data lies not only in itself but more in how to effectively use this data. The landing of scenario-based applications is the key to realizing the value of data. Since 2017, Dahua has started to build a large-scale IoT and data intelligence platform, formed a complete platform system architecture in 2019, and released the Dahua Think # strategy in 2021, launching the "one system, two platforms", that is, the "IoT and data intelligence middle platform system" and "City Platform 2.0, Enterprise Platform 3.0", to help various industries explore the value of video data.

In 2023, Dahua upgraded the Dahua Think #2.0 strategy and the newly upgraded IoT and data intelligence platform 2.0. This platform fully integrates technologies such as IoT perception, computing network integration, visual large models, and data intelligence, improves software engineering capabilities, and empowers the applications of customers in various industries.

In terms of cities, Dahua has expanded in various fields around efficient urban governance, autonomous operation, security system upgrades, and ecological collaborative governance, covering more than 200 urban scenarios. For example, in the traffic management scenario, Zhou Wenkai said that small models in the past could only recognize some local scenes, such as how long it would take to queue at a certain checkpoint; after applying large models, the overall traffic situation of a city can be comprehensively controlled, and traffic allocation decisions become more scientific.

In terms of enterprises, Dahua helps enterprises build a large security system, digital intelligence productivity, and enhance business decision-making power, providing enterprises with digital management tools to help solve business pain points. For example, in the energy field, Zhou Wenkai said that some customers have strong needs, and they urgently need to use video for safe production and cost reduction and efficiency improvement; in addition, the demand of customers in the intelligent driving field is also very strong, and they need to combine video and radar to do intelligent driving.

Of course, the upgrade of these applications is inseparable from the loading of large model capabilities. Many large models are not suitable for direct use, but need to be combined with scenarios to exert value. "Whether it is a generative model or an analytical model, these large models are basic models (Foundation Model), which essentially enhance the understanding and cognitive ability compared to small models. When cognition is produced, people build capabilities facing various scenarios on it, which is the essence of the value of large models."

Zhou Wenkai said that the large models made by Dahua are centered on vision, integrating multimodal capabilities, with five major features of accuracy and generalization leap, graphic prompt definition of new functions, breakthrough in visual cognitive ability, full scene autonomous analysis, and coordination between large and small models and computing power, which have been applied in various industries.

In addition to large models, this is also inseparable from a series of optimizations made by Dahua around video coding, decoding, governance, analysis, networking, and storage.For instance, in the realm of data analysis, Dahua has independently established a data center with a scale of over a thousand servers, specifically designed to simulate various scenarios of big data demands. This supports Dahua in conducting full-process experiments in multiple aspects such as data storage, data governance, and data analysis, thereby ensuring the technological leadership and stability.

In terms of data security, Dahua has accumulated a multitude of security-related technologies, providing numerous security control measures for data transfer and circulation on its one-stop data intelligence engine. These measures include project isolation, data isolation, permission isolation, access isolation, and security auditing.

Conclusion: The fertile land of video data, where scenario-based data elements highlight their value.

At present, the acceleration of data elements into a thousand industries is a critical time, and deeply exploring the value of scenario-based data has become particularly important.

From the practice of Dahua Shares, we understand that domestic players are overcoming the challenges faced by video data in aspects such as storage, processing, understanding, computing, pricing, and circulation, deeply integrating video data processing capabilities with a variety of scenario applications. This not only includes the company's decades of industry experience and data processing capabilities, but also condenses precious industry knowledge and professional insights (know-how) across multiple business cycles.

In the era of data as king, while stacking computing power and refining models, how to make the massive elements of video data play a multiplier effect has become a key to achieving a leading position in the world's digital industry.