Section A Information Storage 信息存储
1. The importance of Information信息的重要性
词汇
reside vi属于,驻留
tablet n平板电脑
laptop n笔记本电脑
repository n仓库
claim n索赔
regulatory n法规
contractual a合同的
obligation n责任,合约
Information is increasingly important in our daily lives. We have become information dependent in the 21st century, living in an on-command, on-demand world, which means, we need information when and where it is required.
信息在我们日常生活中变得越来越重要。在21世纪,我们已经变得依赖信息,生活在一个随需应变的世界,这意味着我们需要在任何时间和任何地点获取信息。We access the Internet every day to perform searches, participate in social networking, send and receive E-mails, share pictures and videos, and use scores of other applications.
我们每天访问互联网进行搜索、参与社交网络、发送和接收电子邮件、分享图片和视频,以及使用许多其他应用程序。Equipped with a growing number of content-generating devices, more information is created by individuals than by organizations. Information created by individuals gains value when shared with others.
随着越来越多的内容生成设备的出现,个人创造的信息比组织创造的更多。个人创造的信息在与他人分享时增值。
2. What is data 什么是数据
词汇
binary n二进制
digital a数字的
DBMS(Database Management System) 数据库管理系统
query v查询
retrieve v检索
Data is a collection of raw facts from which conclusions may be drawn.
数据是从中可以得出结论的原始事实的集合。Before the advent of computers, the methods adopted for data creation and sharing were limited to fewer forms, such as paper and film.
在计算机出现之前,数据的创建和共享方式仅限于纸张和胶片等形式。Today, the same data can be converted into more convenient forms, such as an e-mail message, an e-book, a digital image, or a digital movie.
今天,相同的数据可以转换为更方便的形式,如电子邮件、电子书、数字图像或数字电影。This data can be generated using a computer and stored as strings of binary numbers (0s and 1s).
这些数据可以由计算机生成,并以二进制数字(0和1)的字符串形式存储。Data in this form is called digital data and is accessible by the user only after a computer processes it.
以这种形式的数据被称为数字数据,只有在计算机处理后才能被用户访问。Data can be classified as structured or unstructured based on how it is stored and managed.
数据可以根据其存储和管理方式被分类为结构化或非结构化。Structured data is organized in rows and columns in a rigidly defined format so that applications can retrieve and process it efficiently.
结构化数据以严格定义的格式在行和列中组织,以便应用程序可以高效地检索和处理。Structured data is typically stored using a database management system (DBMS).
结构化数据通常使用数据库管理系统(DBMS)存储。Data is unstructured if its elements cannot be stored in rows and columns, which makes it difficult to query and retrieve by applications.
如果数据元素不能存储在行和列中,那么数据就是非结构化的,这使得应用程序难以查询和检索。For example, customer contacts that stored in various forms such as sticky notes, e-mail messages, business cards, or even digital format files, such as .doc, .txt, and .pdf.
例如,存储在各种形式中的客户联系信息,如便笺、电子邮件、名片或甚至是数字格式文件,如.doc、.txt和.pdf。Due to its unstructured nature, it is difficult to retrieve this data using a traditional customer relationship management application.
由于其非结构化的特性,使用传统的客户关系管理应用程序难以检索这些数据。A vast majority of new data being created today is unstructured.
今天创建的绝大多数新数据是非结构化的。The industry is challenged with new architectures, technologies, techniques, and skills to store, manage, analyze, and derive value from unstructured data form numerous sources.
行业面临着新的架构、技术、技巧和技能的挑战,以存储、管理、分析并从众多来源的非结构化数据中提取价值。
3. Evolved of storage Architecture 存储架构的演变
词汇
storage n存储
term vt把。。。称为
mainframe n主机,大型机
tape reel 磁带卷
disk pack 磁盘组
Affordability n可购性,成本合理性
deployment n部署
maintenance n维护,维修
consolidate v整合
leverage n杠杆;v利用
Storage devices 存储设备
a media card in a cell phone or digital camera,
手机或数码相机中的存储卡,
DVDs, CD-ROMs, and disk drives in personal computers
个人电脑中的DVD、CD-ROM和磁盘驱动器
- DVDs abbr.(digital video disks, 或 digital versatile discs)
数字影碟,数字光碟- CD-ROMs abbr.(=Compact disc read-only memory)
【计】(信息容量极大的)光盘只读存储器internal hard disks, external disk arrays, and tapes
内部硬盘、外部磁盘阵列和磁带
Storage architecture 存储架构
- server-centric storage architecture 以服务器为中心的存储架构
In earlier implementations of open systems, the storage was typically internal to the server. These storage devices could not be shared with any other servers.
在早期的开放系统中,存储通常位于服务器内部。这些存储设备不能与其他服务器共享。In this architecture, each server has a limited number of storage devices, and any administrative tasks, such as maintenance of the server or increasing storage capacity, might result in unavailability of information.
在这种架构中,每个服务器有有限数量的存储设备,任何管理任务,如服务器维护或增加存储容量,可能会导致信息不可用。
- information-centric architecture 以信息为中心的架构
storage devices are managed centrally and independent of servers. These centrally-managed storage devices are shared with multiple servers.
存储设备被集中管理,并且独立于服务器。这些集中管理的存储设备与多个服务器共享。When a new server is deployed in the environment, storage is assigned from the same shared storage devices, to that server.
当环境中部署了一台新服务器时,存储是从相同的共享存储设备分配给该服务器的。The capacity of shared storage can be increased dynamically by adding more storage devices without impacting information availability.
可以通过添加更多的存储设备来动态增加共享存储的容量,而不影响信息的可用性。
4. Storage networking technologies 存储网络技术
词汇
SAN(Storage Area Network) 存储区域网络
fibre n光纤
Gb 千兆字节,吉字节(gigabye的缩写)
scalable a可扩展的
robust a强健,鲁棒
NAS(Network Attached Storage)网络连接存储
seamlessly adv无缝的
Object-based Storage 基于对象的存储
flat a单一的
- SAN(Storage Area Network) 存储区域网络
- NAS(Network Attached Storage)网络连接存储
- Object-based Storage 基于对象的存储
More than 90% of the data being generated is unstructured. Traditional solutions are inefficient to handle the growth.
正在生成的数据超过90%是非结构化的。传统的解决方案在处理增长方面效率低下。These challenges demanded a smarter approach to manage unstructured data based on its content.
这些挑战要求基于内容以更智能的方式管理非结构化数据。Object-based storage is a way to store file data in the form of objects on flat address space based on its content and attributes rather than the name and location.
基于对象的存储是在平坦地址空间中以对象形式存储文件数据的方式,基于其内容和属性,而不是名称和位置。Figure 4A-4 displays the key components of Object-based Storage device.
图4A-4显示了基于对象的存储设备的关键组件。
5. Challenge of Storage 存储的挑战
词汇
data science 数据科学
simultaneously adv同时地
provision v为...提供物品
- data science 数据科学
- data center 数据中心
- virtualization and cloud computing 虚拟化和云计算
练习A1(知识点填空)
Although the majority of information is created by individuals, it is stored and managed by a relatively small number of organizations.
尽管大部分信息是由个人创造的,但它是由相对较少的组织来存储和管理的。Data is a collection of raw facts from which conclusions may be drawn.
数据是从中可以得出结论的原始事实的集合。This data can be generated using a computer and stored as strings of binary numbers (0s and 1s).
这些数据可以用计算机生成,并以二进制数字(0和1)的字符串形式存储。Data can be classified as structured or unstructured based on how it is stored and managed. 数据可以根据其存储和管理方式被分类为结构化或非结构化。
Structured data is typically stored using a database management system (DBMS). 结构化数据通常使用数据库管理系统(DBMS)来存储。
In information-centric architecture, storage devices are managed centrally and independent of servers.
在以信息为中心的架构中,存储设备被集中管理,并且独立于服务器。Data center is a facility that contains storage, compute, network, and other IT resources to provide centralized data-processing capabilities.
数据中心是一个包含存储、计算、网络和其他IT资源的设施,提供集中的数据处理能力。Cloud infrastructure is usually built upon virtualized data centers, which provide resource pooling and rapid provisioning of resources.
云基础设施通常建立在虚拟化数据中心之上,虚拟化数据中心提供资源池和快速供应资源。
练习A2(词汇翻译)
- data center - 数据中心
- binary - 二进制
- digital - 数字的
- data science - 数据科学
- DBMS (Database Management System) - 数据库管理系统
- mainframe - 主机(大型计算机)
- tape reel - 磁带卷
- disk pack - 磁盘组
- SAN (Storage Area Network) - 存储区域网络
- Fibre Channel (FC) - 光纤通道
- scalable - 可扩展的
- robust - 健壮的、可靠的
- NAS (Network Attached Storage) - 网络连接存储
- object-based storage - 基于对象的存储
Section B Data Mining 数据挖掘
词汇
interrogation n询问
data warehouse 数据仓库
lieu n代替,场所
statistics n统计学
machine learning 机器学习
neural network 神经网络
cluster analysis 聚类分析
association analysisi 关联分析
outlier analysis 孤立点分析
deviation n偏离
sequential pattern analysis 序列模式分析
empirical a经验的
bioinformatics n生物信息学
genomics n基因学
biometrics n生物统计学
coincidence n巧合,一致
ethical n道德的,民族的
练习B1(知识点填空)
A rapidly expanding subject that is closely associated with database technology is data mining, which consists of techniques for discovering patterns in collections of data.
一个与数据库技术紧密相关的迅速扩展的学科是数据挖掘,它包括了在数据集合中发现模式的技术。Data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.
数据挖掘是在大型关系数据库中发现数十个字段之间的相关性或模式的过程。Several types of analytical software in data mining are available: statistical, machine learning, and neural networks.
数据挖掘中有几种类型的分析软件: 统计学、机器学习和神经网络。Association analysis involves looking for links between data groups.
关联分析包括寻找数据组之间的链接。Outlier analysis tries to identify data entries that do not comply with the norm.
孤立点分析尝试识别不符合常规的数据条目。Data mining encompasses a vast number of ethical issues involving the rights of individuals represented in the data warehouse.
数据挖掘涉及大量涉及数据仓库中个人权利的道德问题。
练习B2(词汇翻译)
- data mining - 数据挖掘
- knowledge discovery in data (KDD) - 数据挖掘中的知识发现
- data warehouses - 数据仓库
- machine learning - 机器学习
- neural networks - 神经网络
cluster analysis - 聚类分析:一种将数据集中的对象分组的统计方法,使得同一组内的对象比其他组的对象更相似。
association analysis - 关联分析:一种用于发现大数据集中变量之间有趣关系的方法,常见的应用包括市场篮子分析。
outlier analysis - 孤立点分析:一种用于识别数据集中异常值或离群点的分析方法,这些点可能代表了测量误差、数据录入错误或真实的变异。
sequential pattern analysis - 序列模式分析:一种分析数据集中的序列信息,以发现项目之间有意义的时序关联模式的方法。