Buscar
Estás en modo de exploración. debe iniciar sesión para usar MEMORY

   Inicia sesión para empezar

level: Level 1 of Chapter 2

Questions and Answers List

level questions: Level 1 of Chapter 2

QuestionAnswer
the nature of data- data is a collection of facts ( obtained through ,experiences, observations or experiments) - data can consist of numbers, words, images - data is lowest level of abstraction , data --> information --> knowledge - date is the source of information - data quality and integrity --> critical to analytics -structured data ( numbers) - unstructured date (text, images)
definitions of datadata: facts obtained through experiments, observations, sensors, or transactions
What are the structured datas?structured data is what computers typically process - categorical •nominal : descriptive non numeric ( color of a phone) •ordinal : order data ( first, second, third - low, medium, high) - numerical •interval data : measures the difference between two values (IQ score, temperature) • ratio data : has an a zero (weight, height)
What is unstructured data-Textual -multimedia ( audio, image, Video)
What is Data preprocessing?Data preprocessing: getting data ready for analysis
what are some data preprocessing ?1. data consolidation -sourcing • collect relevent data 2. data cleaning - quality • eliminate incorrect values, input missing values 3. data transformation - put in correct form for processing •numerical variables into categorical ( random numbers, into low, medium, high) 4, data reduction
what is a RFID?Radio Frequency Identifying Device tag that when scanned transmits the data on the tag For example, ski resorts use it to allow skiers with passes to go through checkpoints before getting on the chairlift
what is statistics?– A collection of mathematical techniques to characterize and interpret data
what is Descriptive Statistics ?Describing characteristics of the data (as it is) Used for descriptive analytics
what is Inferential statistics?Drawing insights about the population based on sample data. Sample  Population
describe characteristics of negative skewness?-drops to left - mode > median > mean
describe characteristics of positive skewness?-drops to right -mode < median < mean
describe characteristics of kurtosis?normal distribution, kurtosis = 3 - is associated with height and flatness - smaller (negative kurtosis) more flat /short - higher ( positive kurtosis) the more peaked/tall
Simple Regression versus Multiple RegressionSimple regression has one input variable while multiple regression has more than one SEE MORE ON SLIDES
interpreting regression analysisThe Multiple R is the Correlation Coefficient that measures the strength of a linear relationship between two variables. The larger the absolute value, the stronger is the relationship. • 1 means a strong positive relationship • -1 means a strong negative relationship • 0 means no relationship at all • R Square signifies the Coefficient of Determination, which shows the goodness of fit. It shows how well the data fits this regression model. In our example, the value of R square is 0.97, which is an excellent fit. In other words, 97% of the variation in the dependent variable (y-values) is explained by the independent variables (x-values). • Adjusted R Square is the modified version of R square that adjusts for predictors that are not significant to the regression model. • Standard error is also a goodness of fit measure.
what is a time seriesA time series is a sequence of data points of a variable of interest over a period of time. The data points must be evenly spaced. Eg. Quarterly sales over several years. SEE MORE ON SLIDES
Difference Between MAD, MSE , and MAPEMAD = measures the average absolute errors MAPE= measures the average percentage difference MSE = gives the average squared differences
what is a business report?Business Report: Information is presented in a useful form for business decision makers
what is a business report's purpose ?Purpose: - to improve managerial decisions – Persuade: argument with supporting evidence – Inform – provide information, analysis, etc. – Empower the user to act
Time Series NAIVE APPRoachAssumes demand in next period is the same as demand in most recent period – e.g., If January sales were 68, then February sales are predicted to be 68
Time Series Moving Average MethodMoving Average is a series of arithmetic means • All data points are equally weighted • Used if little or no trend • Used often for smoothing
DTime Series Weighted Moving Average MethodUsed when some trend might be present – Older data usually less important • Weights based on experience and intuition • More recent data weighted more heavily than older data
Time Series exponential smoothing methodForm of weighted moving average – Most recent data weighted most – Weights decline exponentially • Requires smoothing constant (α) – Ranges from 0 to 1 – Subjectively chosen – Higher the value of α, the more weight placed on more recent data s• Advantage: Involves little record keeping of past data
what kind of chart is this ?Line chart
what kind of chart is this ?Bar chart
what kind of chart is this ?Multivariable chart
What kind of chart is this?Stacked bar chart
What kind of chart is this?scatter plot
What kind of chart is this?histogram is like a bar chart but shows frequency distribution of a continuous variable