DA 2 Glossary

DA2 Glossary Part A: Basic Data Analytics Terminology

Note:

  • Except in one case, the source of these definitions is Chapter 1 Glossary of Camm’s “Essentials of Business Analytics”.  If you’d like to learn about these terms in context, you can access this chapter by clicking here.
  • The DA Terminology Quiz to be held in class November 14th (Session 04) will cover Part A and Part B of the DA2 Glossary
Business analytics The scientific process of transforming data into insight for making better decisions. (Source: Essentials)
Data dashboard A collection of tables, charts, and maps to help management monitor selected aspects of the company’s performance. (Source: Essentials)
Data query A request for information with certain characteristics from a database. (Source: Essentials)
Data scientists Analysts trained in both computer science and statistics who know how to effectively process and analyze massive amounts of data. (Source: Essentials)
Data security Protecting stored data from destructive forces or unauthorized users. (Source: Essentials)
Decision analysis A technique used to develop an optimal strategy when a decision maker is faced with several decision alternatives and an uncertain set of future events. (Source: Essentials)
Descriptive analytics Analytical tools and methods that describe what has happened in the past (Adapted) (Source: Essentials)
Hadoop An open-source programming environment that supports big data processing through distributed storage and distributed processing on clusters of computers. (Source: Essentials)
Internet of Things (loT) Technologies that allow data collected from sensors in all types of machines to be sent over the Internet to repositories where it can be stored and analyzed. (Source: Essentials)
MapReduce Programming model used within Hadoop that performs the two major steps for which it is named– the map step and the reduce step. The map step divides the data into manageable subsets and distributes it to the computers in the cluster for storing and processing. The reduce step collects answers from the nodes and combines them into an answer to the original problem.  (Source: Essentials)
Model A model is a description of a system using mathematical concepts and language. A model may help to explain a system and to study the effects of different components, and to make predictions about behavior. (wikipedia)
Operational decision A decision concerned with how the organization is run from day to day. (Source: Essentials)
Optimization model A mathematical model that gives the best decision, subject to the situation’s constraints. (Source: Essentials)
Predictive analytics Techniques that use models constructed from past data to predict the future or to ascertain the impact of one variable on another. (Source: Essentials)
Prescriptive analytics Techniques that analyze input data and yield a best course of action. (Source: Essentials)
Simulation optimization The use of probability and statistics to model uncertainty, combined with optimization techniques, to find good decisions in highly complex and highly uncertain settings. (Source: Essentials)
Simulation The use of probability and statistics to construct a computer model to study the impact of uncertainty on the decision at hand. (Source: Essentials)

DA2 Glossary Part B: Basic Information Technology Terminology

Note:

  • These definitions are based on a variety of sources, but mainly wikipedia.com and whatis.com.  They are not presented in alphabetical order; rather, terms on related topics are grouped together and presented in the order I that I deemed most logical. (If you prefer alphabetical order, you can download table and sort by column 1)
  • The DA Terminology Quiz to be held in class November 14th (Session 04) will cover Part A and Part B of the DA2 Glossary
Bit A bit (from Binary Digit) is the basic unit of storage in a digital computer. A bit can take on a value of only 0 or 1, which can stand for the binary numbers zero and one, or as “on” and “off”.
Byte In modern computers, a byte is a grouping of 8 bits. When talking about aggregates of bytes the suffix is lower case. When talking about bits the suffix is upper case. So, 10 MB is (about) 10 million bytes, while 10 Mb is (about) 10 million bits.
Kilobyte (KB) Approximately one thousand bytes (exactly 210 = 1024 bytes)
Megabyte (MB) Approximately one million bytes (exactly 220 =1024*1024 bytes)
Gigabyte (GB) Approximately one billion bytes (exactly 230= 1024 to the third power bytes) (The reason for using multiples of 1024 instead of multiples of 1000 is because 1024 is nice round number in binary math (i.e., 2 to the 10th power)
Terabyte (TB) Approximately one trillion bytes (exactly 240 = 1024 to the fourth power bytes)
Petabyte (PB) Approximately one thousand terabytes (exactly 250 = 1024 to the fifth power bytes)
Exabyte (EB) Approximately one million terabytes (exactly 260 = 1024 to the sixth power bytes)
Random Access Memory (RAM) chips Stores all the programs you currently have loaded, and all the data those programs are working with. RAM chips forget their contents when the power is turned off.
Flash memory A special kind of semi-conductor memory that doesn’t forget its contents when the power is shut off. Flash memory is a little bit slower and more expensive than RAM chips.  It comes in the form of flash cards (such as used in cameras and phones) and USB flash drives (which can be plugged into computers).
Cache A redundant storage for recently used data that can be accessed especially fast.  Microprocessors come with some amount of hard-wired cache memory to hold recently used data and programs.  When a program needs some data, it first looks in cache, and if the data isn’t there, then it looks in regular RAM.  Data can also be cached on a hard drive.  For example, when you browse the Internet, a copy of every page you visit is stored on your hard drive in disk cache. The principle is the same. If you want to look at the same page again (say, by hitting the “back” button) it can be called up from your disk cache many times faster than it takes to retrieve it again from the website you’re browsing.
Hard Disk Drives Physical magnetic disks spinning at very high rate that permanently store data. They are the storage workhorses of modern computers. They are far less expensive per byte than RAM and Flash memory. However they are much slower, consume more power, are subject to shocks and bumps. For this reason, flash memory is becoming increasingly common in devices where shocks and power consumption are key issues (e.g., digital music players, smart phones, laptop computers).
Central processing unit (CPU) The part of the computer that actually runs your programs. The CPU fetches data and programs from memory and storage, does computations, and then sends the results back out to memory and storage.
Microprocessor A CPU on a single semiconductor chip. Modern computers and digital devices of all sorts have microprocessors as their CPU.
Transistor The most basic building block of the circuitry in computers and electronics of all sorts. Though it is possible to buy individual transistors, modern devices use integrated circuits  (Links to an external site.)Links to an external site., which are large numbers of transistors etched into semiconductor chips (microprocessors and memory chips).
Microprocessor Clockspeed A measure of how many “cycles” of work it can perform in a second. A typical clockspeed for a modern microprocessor is several gigahertz (billions of cycles per second).
Server In the context of computing hardware, a “server” refers to server-class computer hardware. Server-class computers are typically shared between many users and are quite powerful–with lots of processing power, memory, and hard drive space.  They are used to support large applications such as Enterprise Resource Planning or ecommerce websites.
Web servers The purpose of a web server is to fulfill requests in a networking environment. Every time to you browse the Internet, you are sending requests to a server, which processes your request (i.e., to generate a web page) and then sends you the result.
Mainframe The largest and most powerful kind of computer in common use. Mainframes are capable of supporting hundreds of simultaneous users. Today’s mainframes usually perform the role of servers, and in fact have essentially been renamed as “enterprise servers”.
Embedded Systems These are special purpose computers (often just a single microprocessor chip) that are increasingly being incorporated into both complex machinery (cars, aircraft engines) and everyday appliances (coffee makers, TVs). Embedded systems turns what used to be “dumb” machines into “smart” machines that can be programmed and controlled in new ways. Embedded systems increasingly include wireless networking capability, which enables remote diagnosis of problems and in some cases, remote maintenance.
Source code The actual code that a programmer writes in a language like Java or Python.  Before a program can be run it has to be converted to a language that computers understand, called object code or machine language.
Open Source Software (OSS) Refers to a software distribution model in which developers and end users are given access to an application’s source code, not just its object code. Because developers and customers can read the source code, they are able to make modifications to enhance the product or to fix bugs. This is difficult with object code.  Developer and users can also redistribute the software, so long as they agree to keep the software open and free.
Compiler A software program that converts source code to object code, which is used to actually run programs.
Computer operating system A layer of software that stands between application programs and the hardware they run on (computers, phones, other “smart” devices).   The leading operating systems in use today a Windows, Mac OS, and Linux (for PCs) and IOS and Android (for mobile devices) and mobile Apple Mac OS  (Links to an external site.)Links to an external site.,Linux  (Links to an external site.)Links to an external site. (and Linux variants, like the Android  (Links to an external site.)Links to an external site. OS).  Without the operating system, every program would have to know how to talk directly to every component in the system, which would make programming nearly impossible. Instead, programs talk to the operating system, which then talks to each component. It controls the communication between all the computers components (CPU, memory, hard drive, peripherals) and assigns tasks to the components.  Most operating systems also come with a variety of bundled applications.
The Internet A global network of interconnected computers that communicate according to a set of standards, such as TCP/IP  (Links to an external site.)Links to an external site.HTML  (Links to an external site.)Links to an external site.FTP  (Links to an external site.)Links to an external site.and many others.
World Wide Web (WWW) The subset of the Internet that involves hypertext  web pages. It is the most visible part, but by no means the whole Internet.
Packet-Switching An approach to transmitting digital information over a network (e.g., the Internet) in which a “message” (i.e., a web-page, a file transfer, some words spoken on Skype) is broken up into a set of packages or “packets” which are each sent separately from source to destination, using routers. Each packet contains data identifying the address of the source, the address of the destination, its order in the message, the actual contents of part of the message, and other information needed to route the packet efficiently.
TCP/IP Stands for Transmission Control Protocol/Internet Protocol. It is what makes the Internet possible.  TCP/IP is a particular way to do packet-switching. The TCP part governs how data is divided up into packets, while the IP part figures out how to move the packets from router to router across the network. The packets comprising a message can take different routes across the network, and so sometimes they arrive out of order. Part of what TCP does is put the packets back in order before delivering the message to the recipient.
Router (or Internet router) A device that moves packets along towards their destinations over the Internet. It takes packets in from connected links, and then sends them out again in the direction of their destination. Routers also manage the efficiency of the network as a whole. They are aware of how heavy traffic is along different links, and send packets by different links to even out the load.
Voice over IP (VoIP) Refers to the growing popularity of transmitting voice traffic using internet standards (Internet Protocol) and packet switching. Skype is the leading VoIP service. Also, consumers are increasingly opting for digital telephone service (which also use VoIP) as part of a service bundle through their local phone company or cable provider.
HTML Stands for hypertext markup language. It is the language of web pages. Your Internet Browser reads HTML, and translates that into the text and images you see on your computer screen.
HTTP Stands for hypertext transmission protocol. It is the protocol that governs the transmission of web pages.
IP (Internet Protocol) Address The 12-digit numeric address of particular computers attached to the internet, e.g., monet.bc.edu = 136.167.49.70
Uniform Resource Locator (URL) Akin to the street address of a webpage, a URL It provides the path to a particular file on a particular computer. The structure is protocol://host.computer/directory.path/file.name.
Domain Name A way to refer to server (or a set of servers) for some organization that has a presence on the Internet.  For example, BC’s domain is bc.edu. The top level domain name is “edu” and the second level domain name is “bc”.
Electronic Data Interchange (EDI) A fixed format (defined in advance by the EDI standards committee) for exchanging electronic business documents such as orders, invoices, requests for quotes and shipping documents.
XML Stands for eXtensible Markup Language. The unique aspect of XML is each document includes a section that defines its own content. So XML is like a flexible form of EDI which allows new kinds of documents to be defined by anyone, instead of having to go through a standards committee. For this reason (and others) XML is replacing EDI as the standard for electronic document exchange.
Enterprise Systems A class of large, integrated information systems that serve many business processes and departments. The most common enterprise systems are Enterprise Resource Planning (ERP), Customer Relationship Management (CRM) and Supply Chain Management (SCM). The transactional data contained in databases supporting these systems is one of the main sources of data for analytics.
Enterprise Resource Planning (ERP) The most commonly used type of enterprise system, and the most comprehensive. ERP usually covers three broad areas–(1) operations (e.g., order entry, inventory, manufacturing, logistics), (2) financials (e.g., accounts payable, accounts receivable, billing) and (3) human resources (e.g., recruiting, benefits, payroll, and expense reporting).
Customer Relationship Management (CRM) Refers to an enterprise system that covers everything related to managing relationships with customers and potential customers, including (1) sales force automation (e.g., contact management, sales forecasting, commissions), (2) customer service transactions (e.g., in person, phone, email, web) and (3) marketing automation (e.g., develop and manage marketing campaigns through channels like call centers, email, direct mail, field sales.).
Supply Chain Management (SCM) Refers to an enterprise system that provides a set of modules for managing a firm’s supply chain.  Typical SCM modules include Demand Planning and Pricing, Supply Management, and Transportation and Logistics.
Cloud computing Refers to a model of computing where some combination of computing resources (hardware, networking, applications) are provided by a vendor as service to customers over Internet (e.g., Gmail, Google Docs) rather than a customer owning and operating all their own computing resources. Cloud computing solutions can vary according to how much is done by the cloud vendor, and how much is done by the customer. The three options are Infrastructure as as Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS).
Cloud Computing – Advantages Customers don’t have to install or maintain hardware and (in most cases) software. Data is backed up automatically. Data and applications are accessible to any Internet connected device
Cloud Computing – Disadvantages Doing everything over the Internet can cause network delays. Also, if you don’t have an Internet connection at the moment (e.g., you’re on a plane), then you may not be able to access your personal data (Note, many cloud services have various synchronization options that allow you to make temporary copies of your data to work on when you’re not connected.) If your Internet connection goes down in the middle of doing something, you may lose some work.
Infrastructure as as Service (IaaS) Is the most limited variant of cloud computing.  The cloud vendor (Amazon.com is one of the leaders here) makes hardware and networking services available to the customer. The customer has to install and manage everything else, i.e., their preferred operating system, application development environment, database, applications, and so forth. This requires the most effort and sophistication on the customer’s part, but allows the most flexibility.
Platform as a Service (PaaS) A form of cloud computing where the cloud vendor maintains a specific “platform” consisting of a customer’s preferred combination of hardware, operating system, application development environment, database software, etc. They make that platform available to the customer to build and run their own applications. This variant requires less effort and sophistication on the part of customers than IaaS, but also has less flexibility than IaaS.
Software as a Service (SaaS) A form of cloud computing in which the cloud vendor manages (and controls) everything, and the customer sees and interacts with is the high level application (e.g., Google Docs).  It’s the easiest variant for customers, but provides very little flexibility.
Virtualization Refers to the use of software that makes a single physical computer act as if it were actually multiple computers. The software divides up the computer into multiple “virtual machines,” each with its own operating system and resources.

 

#glossary

Leave a Reply

Your email address will not be published / Required fields are marked *