
编者按
“中国算力网”的概念是在什么背景下提出的?为什么要建设“中国算力网”?
近日,在第二十届北京论坛上,中国工程院院士、中国数字经济50人论坛委员、鹏城实验室主任、北京大学信息与工程科学部主任高文作了题为《中国算力网的需求与挑战》的主旨报告。高文院士提出,希望像建设电网一样建立“算力网”,像运营互联网一样运营“算力网”,让用户像用电一样方便地使用算力。
中国工程院院士、中国数字经济50人论坛委员、鹏城实验室主任、北京大学信息与工程科学部主任高文
高文,中国工程院院士、中国数字经济50人论坛、鹏城实验室主任、北京大学信息与工程科学部主任、北京大学博雅讲席教授,新一代人工智能产业技术创新战略联盟理事长,全国专业标准化技术委员会副主任,数字音视频编解码技术标准(AVS)工作组组长,IEEEFellow、ACMFellow。主要从事人工智能应用和多媒体技术、计算机视觉、模式识别与图像处理、虚拟现实方面的研究。曾一次获得国家技术发明二等奖、五次获得国家科技进步二等奖、一次获得国家自然科学二等奖,获得“2005中国十大教育英才”称号和中国计算机学会王选奖。今天我想跟大家介绍的,是名叫“中国算力网”的项目。“中国算力网”有三个重要部分,一是算力节点,二是网络连接,三是资源调度。
算力应该是我们整个时代发展中非常重要的一个支撑要素,无论是人工智能还是互联网的发展,都离不开算力。从整个经济的发展来看,算力和GDP正相关。研究表明,GDP越大,算力指数越高,反之亦然。现在全球GDP最高的是美国,其算力指数也是最高,中国GDP是美国的70%左右,算力指数刚好比美国低了30%,经济排名第三的日本算力指数也是世界第三。 CXO UNION-CXO联盟(cxounion.cn)
这几年,中国经济发展速度非常快,算力发展的速度也在攀升,我们可以清晰地看到中国算力指数在所有国家中增长最快,平均年增长在13%左右。那么,既然算力这么重要,为了未来经济科学和绿色地发展,我们需要考虑今后的算力怎么布局?安放在哪里?怎么使用?未来算力能否像今天的电力一样,不管放在哪里,想用的时候插上就能用?
我们的设想是,希望在中国建立一张网,“这张网”可以把中国算力连接起来,任何人、任何企业、任何大学想使用算力时,可以将“接口”插到一个插座上面,这个插座就能把算力送到你的桌面。在算力的布局方面,我们希望算力的计算放在西部,这需要解决很多问题,例如算力如何分装,如何满足设施要求,如何让带宽不受限制,这些都是我们必须回答的问题。
为此我们提出了“中国算力网”的概念——希望像建设电网一样建立“算力网”,像运营互联网一样运营“算力网”,让用户像用电一样方便地使用算力,这是我们设定这个项目的发展愿景。而做到这一点需要面临很多挑战,包括算力的供给,越是在大城市,大学和企业越需要更多的算力。最近,工信部发布了《算力基础设施高质量发展行动计划》,提出了2025年发展量化指标,到2025年中国算力规模超过300EFLOPS,一个E就是10的18次方。这里面的算力分三种,分为超算算力(超级计算机)、智能算力、云算力,三种算力加在一起规模超过300EFLOPS,其中和AI有关的算力占到1/3,约为105EFLOPS。
第二个需要考虑的问题,就是如何把算力连接起来,让它延迟最短。很多云计算的算力中心、云中心和用户之间的距离不会超过200公里,否则会导致实时响应不够。如果要落实“东数西算”,把算力中心放在成都、重庆、青海,相互间相隔两三千公里以上的距离,我们需要超低延迟和超宽带链接来保证传输的效率。当前,算力正在被不同的运营商、不同的互联网厂商管理着,难以做到统一调度。因此亟需构建一个全新的调度网络,像通过电网调度电力一样,将算力调度到需要的地方。 CXO UNION-CXO联盟(cxounion.cn)
鹏城实验室正在牵头做“中国算力网”,我们主要落实三件比较大的工作:第一,建立超级算力节点,“超级算力”的概念大概是中国所需要的算力的1/6。
第二,建立比现在市场上连接速度更快的网络连接,达到差不多100T到P级的连接,即10的15次方,目前这是现有技术无法实现的目标,我们正在研发该项技术,通过使用不同光纤,冲刺比现有任何速度快100倍的速度,甚至更快。
第三,做好算力调度,建立云原生网络的调度系统。我们在深圳建立智能超算平台,“鹏城云脑Ⅱ”智能算力平台大概有1000P的算力,目前正在研发的下一代鹏城云脑,预计能达到16,000P的算力,这个数字正好是2025年中国需要的智能算力的1/6。“鹏城云脑Ⅱ”AI性能是全世界超级计算机里面最好的,在全球IO500总榜单已经连续6次排名第一名,在AIPerf500连续3年排第一。这台机器做出来后,我们支持了很多国内企业做大模型的计算,包括华为、百度等,他们很多大模型都是在我们的机器上进行训练。除了提供给国内的合作伙伴外,我们实验室还训练了一批AI模型,这些模型大部分与北京大学、清华大学等高校合作,包括了自然语言模型、计算机视觉模型、生物医学模型等。
最近,我们刚刚完成了一项工作,训练了“鹏城·脑海”大模型,这个模型有200B的规模,2000亿参数。鹏城实验室通过开源的方式在做“鹏城·脑海”,最后都将变成Open Source模型,我们把上面可能需要的一些工具做完后,将开源开放,供大家使用。 CXO UNION-CXO联盟(cxounion.cn)
目前“鹏城云脑Ⅱ”上运行的200B的AI大模型,训练一次需要几个月的时间。为了让效率更高,我们正在研发下一个版本、拥有16,000p算力的机器,叫做下一代鹏城云脑,做出来之后将比现在的机器算力提高20倍。原本训练AI大模型需要200天,现在10天就能训练结束,这台机器将会是算力节点。
还有几个问题我们也在思考。关于光网络,希望把所有的算力节点和枢纽用的光网络连接起来。设计光网络,要特别考虑在远距离时,实现不低于100T的带宽。设计光网络有很多科学问题,既涉及到光,也涉及到通信,包括传输、交换、管控、光纤等等,我们设置了多芯光纤,一束光纤可以有若干根“芯”,至少4根,也可能19根,使用的技术是SDM技术,它能使通信的速度呈19倍增长。由于光纤的成本增加很少,可以使用新的技术实现长距离、大带宽的通信连接,目前我们已经完成了200T、2000公里的光通信实验。而网络运营商现在提供的光纤网络,单根光纤100G或者400G,一根纤上面一个波,一根线上可以用很多波,现在4根纤对应同一类设备成本大大降低,将使得整个传输系统更高效。
关于调度,不同的算力资源如何组合起来,让用户需要的时候直接拿到算力,这个需要实现跨地域异构算力。各类算力本身用的芯片系统不一样,如何跨地域使用,存在比较难的封装问题,不同类型的算力封装方式不一样,就如不同的发电厂和源不一样,需要我们尽量去规范。算力原来是什么不要紧,如果要入网重新封装后加入成网并最终让大家看到一样的东西,第一步就要做好异构算力跨域调度的工作,这方面还是有很大的挑战;第二步是统一提交同步做;第三步是跨中心异构做;第四步是把不同的算力源整合进来。 CXO UNION-CXO联盟(cxounion.cn)
“中国算力网”所有的理念和以往的云计算不一样,我们引入了云原生网络,所有底层都采用同样逻辑、一套体系,第一步在上面建立逻辑调度,对现有的网络做重新梳理更换、提升。现在有很多案例,通过云原生网络可以把所有的数据变成源数据,通过源数据进行调度,通过调度可以就近选择算力源。第二步做到“数”随“算”走,第三步“算”随“数”走,算力网要考虑数据的存在。
2019年我们开始做“中国算力网”的0.1版本,在国家发改委、科技部的支持下,用了不到3年时间,做了“中国算力网”第一期,把全国不同区域几个算力中心整合到一起,通过调度打通,实现不同算力的分配和使用。“中国算力网”的1.0版,可以实现分布式协调训练。
下一步,鹏城实验室在推进“中国算力网”建设的过程中,将通盘考虑所有方面,为中国绿色发展、高效经济发展、智能发展、数字发展提供技术支撑和支持。我们希望“中国算力网”这件事不仅仅在中国能做,还期待未来开展更广泛的国际合作。 CXO UNION-CXO联盟(cxounion.cn)

翻译:
Academician Gao Wen: The needs and challenges of China’s computer power network
Editor’s note
In what context was the concept of “China Computing Power Network” put forward? Why build “China Computing Power Network”?
Recently, at the 20th Beijing Forum, Gao Wen, academician of the Chinese Academy of Engineering, member of the China Digital Economy 50 Forum, director of Pengcheng Laboratory, and director of the Department of Information and Engineering Science of Peking University, made a keynote report entitled “The Needs and Challenges of China’s Computer Power Network”. Academician Gao Wen proposed that he hopes to establish a “computing power network” like the construction of a power grid, and operate a “computing power network” like the operation of the Internet, so that users can use computing power as easily as electricity. CXO UNION-CXO联盟(cxounion.cn)
Gao Wen, Academician of Chinese Academy of Engineering, member of China Digital Economy 50 Forum, Director of Pengcheng Laboratory, Director of Information and Engineering Science Department of Peking University
Gao Wen, Academician of the Chinese Academy of Engineering, China Digital Economy 50 Forum, Director of Pengcheng Laboratory, Director of the Department of Information and Engineering Science of Peking University, Professor of Liberal Arts of Peking University, Chairman of the New Generation of Artificial Intelligence Industry Technology Innovation Strategic Alliance, Deputy director of the National Professional Standardization Technical Committee, leader of the Digital Audio and Video CodecS Technical Standard (AVS) Working group, IEEEFellow, ACMFellow. He is mainly engaged in the research of artificial intelligence application and multimedia technology, computer vision, pattern recognition and image processing, virtual reality. He has won the second prize of National Technological invention once, the second prize of National Scientific and Technological Progress five times, the second prize of National Natural Science once, and the title of “2005 Top Ten Educational Talents of China” and the Wang Selection Award of China Computer Society. Today I would like to introduce to you a project called “China Computing Power Network”. “China Computing Power Network” has three important parts, one is the computing power node, the second is the network connection, the third is the resource scheduling.
Computing power should be a very important supporting element in the development of our entire era, whether it is the development of artificial intelligence or the Internet, it is inseparable from computing power. From the perspective of the development of the entire economy, computing power is positively correlated with GDP. Research shows that the larger the GDP, the higher the computing power index, and vice versa. At present, the world’s highest GDP is the United States, its computing power index is also the highest, China’s GDP is about 70% of the United States, the computing power index is just 30% lower than the United States, the third economic ranking of Japan’s computing power index is also the world’s third. CXO UNION-CXO联盟(cxounion.cn)
In recent years, China’s economic development is very fast, and the speed of computing power development is also rising, we can clearly see that China’s computing power index has the fastest growth in all countries, with an average annual growth of about 13%. So, since computing power is so important, for the sake of future economic science and green development, we need to consider how to layout computing power in the future? Where is it placed? How to use it? Will future computing power be like electricity today, no matter where you put it, you can plug it in when you want to use it?
Our idea is that we want to build a network in China, “this network” can connect Chinese computing power, anyone, any enterprise, any university want to use computing power, you can plug the “interface” into a socket, the socket can send computing power to your desktop. In terms of the layout of the computing power, we hope that the computing power is calculated in the west, which needs to solve many problems, such as how to distribute the computing power, how to meet the requirements of the facility, how to make the bandwidth unrestricted, these are questions we must answer.
To this end, we put forward the concept of “China Computing Power Network” – we hope to build a “computing power network” like the construction of a power grid, and operate a “computing power network” like the operation of the Internet, so that users can use computing power as easily as electricity. This is our development vision for this project. There are many challenges, including the supply of computing power, which is needed by universities and businesses in large cities. Recently, the Ministry of Industry and Information Technology issued the Action Plan for the high-quality development of computing Infrastructure, proposing quantitative indicators for development in 2025, by 2025, the scale of China’s computing power exceeds 300EFLOPS, and an E is 10 to the 18th power. There are three kinds of computing power here, which are divided into supercomputing power (supercomputer), intelligent computing power, and cloud computing power, and the combined scale of the three computing power exceeds 300EFLOPS, of which the AI-related computing power accounts for 1/3, about 105EFLOPS.
The second problem to consider is how to connect the computing power so that it has the shortest latency. Many cloud computing computing power center, cloud center and the distance between the user will not exceed 200 kilometers, otherwise it will lead to insufficient real-time response. If we want to implement the “east count west count” and place the computing power center in Chengdu, Chongqing and Qinghai, separated by more than two or three thousand kilometers, we need ultra-low latency and ultra-broadband links to ensure transmission efficiency. At present, computing power is being managed by different operators and different Internet vendors, and it is difficult to achieve unified scheduling. Therefore, it is urgent to build a new dispatching network, just like dispatching electricity through the grid, dispatching computing power to where it is needed. CXO UNION-CXO联盟(cxounion.cn)
Pengcheng Laboratory is leading the “China computing power network”, we mainly implement three relatively large work: first, the establishment of super computing power node, the concept of “super computing power” is about 1/6 of the computing power needed in China.
The second is to build a network connection that is faster than what is currently available on the market, reaching almost 100T to P-level connectivity, that is, 10 to the power of 15, which is currently impossible to achieve with current technology, and we are developing the technology to sprint 100 times faster than any existing speed, or even faster, by using different optical fibers.
Third, do a good job of computing power scheduling and establish a scheduling system for cloud native networks. We establish an intelligent supercomputing platform in Shenzhen, “Pengcheng Cloud brain II” intelligent computing platform has about 1000P computing power, and the next generation of Pengcheng cloud brain is currently being developed, which is expected to reach 16,000P computing power, which is exactly 1/6 of the intelligent computing power needed in China in 2025. “Pengcheng Cloud Brain II” AI performance is the best supercomputer in the world, has ranked first in the global IO500 overall list for 6 consecutive times, and ranked first in AIPerf500 for 3 consecutive years. After this machine was made, we supported many domestic enterprises to do large model calculations, including Huawei, Baidu, etc., and many of their large models were trained on our machines. In addition to providing domestic partners, our lab has also trained a number of AI models, most of which are in cooperation with Peking University, Tsinghua University and other universities, including natural language models, computer vision models, biomedical models and so on. CXO UNION-CXO联盟(cxounion.cn)
Recently, we have just completed the work of training the “Pengcheng · Mind” large model, which has a scale of 200B and 200 billion parameters. Pengcheng Laboratory through the Open Source way to do “Pengcheng · mind”, and finally will become the open source model, we may need to finish some of the above tools, will be open source for everyone to use. CXO UNION-CXO联盟(cxounion.cn)
At present, the 200B AI large model running on Pengcheng Cloud Brain II takes several months to train once. In order to make it even more efficient, we are developing the next version of a machine with 16,000p of computing power, called the next generation of Pengcheng Cloud Brain, which will be 20 times more powerful than today’s machines. Originally it took 200 days to train a large AI model, now it can be trained in 10 days, and this machine will be a computing power node.
There are a couple of other things we’re thinking about. Regarding the optical network, it is hoped that all the computing power nodes will be connected to the optical network used by the hub. In the design of optical networks, special consideration should be given to achieving a bandwidth of not less than 100T at a long distance. There are many scientific problems in the design of optical networks, both involving light and communication, including transmission, switching, control, optical fibers, etc., we set up multi-core optical fibers, a bundle of optical fibers can have several “cores”, at least 4, or 19, the technology used is SDM technology, which can make the speed of communication increased by 19 times. Since the cost of optical fiber increases very little, new technologies can be used to achieve long-distance, large-bandwidth communication connections, and we have completed 200T and 2000 kilometers of optical communication experiments. The network operator now provides optical fiber network, a single fiber 100G or 400G, a fiber above a wave, a line can use a lot of waves, now 4 fibers corresponding to the same type of equipment cost is greatly reduced, will make the entire transmission system more efficient.
About scheduling, how to combine different computing power resources, so that users can get computing power directly when they need it, which needs to achieve cross-regional heterogeneous computing power. Various types of computing power itself with different chip systems, how to use across regions, there are difficult packaging problems, different types of computing power packaging methods are not the same, just like different power plants and sources are not the same, we need to try to standardize. It does not matter what the original computing power is, if you want to enter the network after re-packaging to join the network and finally let everyone see the same thing, the first step is to do a good job of heterogeneous computing power cross-domain scheduling work, which is still a great challenge; The second step is to do unified submission synchronously; The third step is to do cross-center heterogeneous; The fourth step is to integrate different sources of computing power.
All the concepts of “China Computing Power network” are different from the previous cloud computing, we introduced the cloud native network, and all the bottom layers use the same logic and a set of systems, the first step is to establish logical scheduling on the above, and to re-organize and replace the existing network. Now there are many cases, through the cloud native network can turn all the data into source data, through the source data scheduling, through the scheduling can choose the nearest source of computing power. The second step is to do the “number” with the “calculation”, the third step is to “calculation” with the “number”, and the computing power network should consider the existence of data.
In 2019, we began to do the 0.1 version of “China Computing Power Network”, with the support of the National Development and Reform Commission and the Ministry of Science and Technology, it took less than 3 years to do the first phase of “China Computing Power Network”, integrating several computing power centers in different regions of the country, and opening up through scheduling to achieve the distribution and use of different computing power. The 1.0 version of “China Computing Power Network” can realize distributed and coordinated training. CXO UNION-CXO联盟(cxounion.cn)
In the next step, Pengcheng Laboratory will consider all aspects comprehensively in the process of promoting the construction of “China Computing Power Network”, and provide technical support and support for China’s green development, efficient economic development, intelligent development and digital development. We hope that “China Computing Power Network” can be done not only in China, but also look forward to more extensive international cooperation in the future.
由CXO UNION-CXO联盟(cxounion.cn)转载而成,来源于创壹智库;编辑/翻译:CXO UNIONCXO联盟小U。
如需加入CXO UNION(CXO联盟)高管社群,请联系社群小伙伴哦~

免责声明: 本网站(http://www.cxounion.cn/)内容主要来自原创、合作媒体供稿和第三方投稿,凡在本网站出现的信息,均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性,但不保证有关资料的准确性及可靠性,读者在使用前请进一步核实,并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏,概不负任何法律责任。
本网站刊载的所有内容(包括但不仅限文字、图片、LOGO、音频、视频、软件、程序等) 版权归原作者所有。任何单位或个人认为本网站中的内容可能涉嫌侵犯其知识产权或存在不实内容时,请及时通知本站,予以删除。
如需加入CXO UNION(CXO联盟)高管社群,请联系社群小伙伴哦~

免责声明: 本网站(http://www.cxounion.cn/)内容主要来自原创、合作媒体供稿和第三方投稿,凡在本网站出现的信息,均仅供参考。本网站将尽力确保所提供信息的准确性及可靠性,但不保证有关资料的准确性及可靠性,读者在使用前请进一步核实,并对任何自主决定的行为负责。本网站对有关资料所引致的错误、不确或遗漏,概不负任何法律责任。
本网站刊载的所有内容(包括但不仅限文字、图片、LOGO、音频、视频、软件、程序等) 版权归原作者所有。任何单位或个人认为本网站中的内容可能涉嫌侵犯其知识产权或存在不实内容时,请及时通知本站,予以删除。
Search
Popular Posts
-
2024数字化灯塔案例评选申报开启!
“2024数字化灯塔案例评选”于3月正式启动,诚挚欢迎业界同仁自荐和推荐,一起推动产业数字化进程,助力赋能企业…
-
2024 X-Award星盘奖申报通道已开启!
X-Award星盘奖是数字化转型服务、IT服务行业重要的商业奖项,旨在表彰行业里提供杰出数字化转型服务与IT服…
-
2024 N-Award星云奖申报通道已开启!
N-Award是数字化转型领域重要的商业奖项,旨在表彰那些以非凡的远见、超群的领导才能和卓越的成就来激励他人的…






