• UNIT 4 : BIVARIATE STATISTICS

     Key unit Competence: Extend understanding, analysis and interpretation
                                                  of bivariate data to correlation coefficients and

                                                  regression lines

    4.0 INTRODUCTORY ACTIVITY

    In Kabeza village, after her 9 observations about farming,
    UMULISA saw that in every house observed, where there is a cow (X) if
    there is also domestic duck (Y), then she got the following results:
    (1,4) ,( 2,8) , (3,4) , (4,12) , (5,10),(6,14) , (7,16) , (8,6 ), (9,18)

    a. Represent this

    information graphically in (x, y) − coordinates .
    b. Find the equation of line joining any two points of the graph and guess the
    name of 
    this line.

    c. According to your observation 
    from (a), explain in your own words if there is any
    relationship between the variation of Cows (X) and the variation of domestic duck (Y).
    4
    4.1 Bivariate data, scatter diagram and types of correlation
    ACTIVITY 4.1
    Consider the situation in which the mass, y (g), of a chemical is
    related to the time , x minutes, for which the chemical reaction has

    been taking place ,according to the table.

    R

    a) Plot the above information in (x, y) coordinates.

    b) Explain in your own words the relationship between x and y

    In statistics, bivariate or double series includes technique of analyzing data in
    two variables, when focus on the relationship between a dependent variable-y
    and an independent variable-x.

    For example, between age and weight, weight and height, years of education

    and salary, amount of daily exercise and cholesterol level, etc. As with data for a
    single variable, we can describe bivariate data both graphically and numerically.
    In both cases we will be primarily concerned with determining whether there
    is a linear relationship between the two variables under consideration or not.

    It should be kept in mind that a statistical relationship between two variables

    does not necessarily imply a causal relationship between them. For example,
    a strong relationship between weight and height does not imply that either
    variable causes the other.

    Scatter plots or Scatter diagram and types of correlation

    Consider the following data which relate x, the respective number of branches
    that 10 different banks have in a given common market, with y, the corresponding

    market share of total deposits held by the banks:

    R

    The scatter plot or scatter diagram (in the figure above) indicates that, roughly
    speaking, the market share increases as the number of branches increases. We
    say that x and y have a positive correlation.

    On the other hand, consider the data below, which relate average daily

    temperature x, in degrees Fahrenheit, and daily natural gas consumption y, in

    cubic metre.

    E

    Finally, consider the data items (x, y) below, which relate daily temperature x

    over a 10-day period to the Dow Jones stock average y.

    E
    We see that y tends to decrease as x increases. Here, x and y have a negative
    correlation.

    Finally, consider the data items (x, y) below, which relate daily temperature x
    over a 10-day period to the Dow Jones stock average y: (63, 3385); (72, 3330);
    (76, 3325); (70, 3320); (71, 3330); (65, 3325); (70, 3280); (74, 3280) ;(68,

    3300); (61, 3265).

    4

    There is no apparent relationship between x and y (no correlation or Weak

    correlation.

    APPLICATION ACTIVITY 4.1
    One measure of personal fitness is the time taken for an individual’s
    pulse rate to return to normal after strenuous exercise, the greater the
    fitness, the shorter the time. Following a short program of strenuous
    exercise Norman recorded his pulse rates P at time t minutes after
    he had stopped exercising. Norman’s results are given in the table

    below.

    3

    a) Draw a scatter diagram to represent this information in
    (x, y)coordinates

    b) Explain the relationship between Norman’s pulse P and time t.

    R

    R

    In case of two variables, say x and y, there is another important result called
    covariance of x and y, denoted cov(x, y) .
    The covariance of variables x and y is a measure of how these two variables
    change together. If the greater values of one variable mainly correspond with the
    greater values of the other variable, and the same holds for the smaller values,
    i.e. the variables tend to show similar behavior, the covariance is positive. In
    the opposite case, when the greater values of one variable mainly correspond
    to the smaller values of the other, i.e. the variables tend to show opposite
    behavior, the covariance is negative. If covariance is zero the variables are said
    to be uncorrelated, itmeans that there is no linear relationship between them.

    Therefore, the sign of covariance shows the tendency in the linear relationship

    between the variables. The magnitude of covariance is not easy to interpret.

    F

    E

    E

    D

    R

    E

    E

    M

    M

    The Pearson’s coefficient of correlation (or Product moment coefficient of
    correlation or simply coefficient of correlation), denoted by r, is a measure of

    the strength of linear relationship between two variables.

    S

    Properties of the coefficient of correlation
    a) The coefficient of correlation does not change the measurement scale.
    That is, if the height is expressed in meters or feet, the coefficient of
    correlation does not change.

    b) The sign of the coefficient of correlation is the same as the covariance.


    c) The square of the coefficient of correlation is equal to the product of the

    gradient of the regression line of y on x , and the gradient of the regression

    line of x on y .

    M

    N

    M

    M

    M

    g) If the linear coefficient of correlation takes values closer to −1, the
    correlation is strong and negative, and will become stronger the closer
    rapproaches −1.

    h) If the linear coefficient of correlationtakes values close to
    1 the correlation
    is strong and positive, and will become stronger the closer r approaches 1

    i) If the linear coefficient of correlationtakes values close to
    0, the correlation is weak.

    j) If
    r = 1or r = −1, there is perfect correlation and the line on the scatter
    plot is increasing or decreasing respectively.

    k) If r = 0, there is no linear correlation.

    Examples:
    1) A test is made over 200 families on number of children (x) and number of
    beds y per family. Results are collected in the table below

    M

    a) What is the average number for children and beds per a family?
    b) Find the covariance.
    c) Can we confirm that there is a high linear correlation between the number of
    children and number of beds per family?

    Solution

    a) Average number of children per family:

    Contingency table:

    M

    M

    N

    M

    M

    Spearman’s coefficient of rank correlation
    A Spearman coefficient of rank correlation or Spearman’s rho is measure
    of statistical dependence between two variables. It assesses how well the
    relationship between two variables can be described using a monotonic
    function. The Spearman’s coefficient of rank correlation is denoted and defined by
    N
    Where, d refers to the difference of ranks between paired items in two series and
    n is the number of observations. It is much easier to calculate the Spearman’s
    coefficient of rank correlation than to calculate the Pearson’s coefficient
    of correlation as there is far less working involved. However, in general, the
    Pearson’s coefficient of correlation is a more accurate measure of correlation
    when data are numerical.

    Method of ranking

    Ranking can be done in ascending order or descending order.

    Examples:

    1) Suppose that we have the marks, x, of seven students in this order:
    12, 18, 10, 13, 15, 16, 9
    We assign the rank 1, 2, 3, 4, 5, 6, 7 such that the smallest value of x will be
    ranked 1.

    That is

    N

    F

    D

    T

    H

    T

    E


    CONTENT SUMMARY

    We use the regression line of y on x to predict a value of y for any given value
    of x and vice versa, we use the regression line of x on y, to predict a value of
    x for a given value of y. The “best” line would make the best predictions: the
    observed y-values should stray as little as possible from the line. This straight
    line is the regression line from which we can adjust its algebraic expressions

    and it is written as y = ax + b , where a is the gradient and b is the y-intercept.

    F

    Z

    M

    F

    R

    R

    R

    APPLICATION ACTIVITY 4.4

    1. Consider the following table

    m

    4.5 Interpretation of statistical data (Application)
    ACTIVITY 4.5
    Explain in your own words how statistics, especially bivariate
    statistics, can be used in our daily life.

    Bivariate statistics can help in prediction of a value for one variable if we know

    the value of the other.

    Examples:

    1. One measure of personal fitness is the time taken for an individual’s pulse
    rate to return to normal after strenuous exercise, the greater the fitness, the
    shorter the time. Following a short program of strenuous exercise Norman
    recorded his pulse rates P at time t minutes after he had stopped exercising.

    Norman’s results are given in the table below.

    d

    d

    d

    g

    iv. State with a reason whether it is sensible to conclude from your
    answer to part( iii) that and are linearly related.
    v. The line of regression of on x has equation y = ax + b . Calculate the
    value of a and b each correct to three significant figures.
    vi. Use your regression line to estimate what the contrast index
    corresponding to the damaged piece of film would have been if the
    piece has been undamaged.
    vii.State with a reason, whether it would be sensible to use your
    regression equation to estimate the contrast index when the quantity

    of chemical applied to the film is zero.

    4.6 END UNIT ASSESSMENT
    1) The following results were obtained from lineups in Mathematics

    and Physics examinations:

    4

    f

    4) The table below shows the marks awarded to six students in a

    competition:

    d

    Calculate a coefficient of rank correlation.
    5) A company is to replace its fleet of cars. Eight possible models
    are considered and the transport manager is asked to rank them,
    from 1 to 8, in order of preference. A saleswoman is asked to use
    each type of car for a week and grade them according to their

    suitability for the job (A-very suitable to E-unsuitable).

    f

    a. Calculate the Spearman’s coefficient of rank correlation between
    i. price and transport manager’s rankings,
    ii. price and saleswoman’s grades.
    b. Based on the result of a. state, giving a reason, whether it would
    be necessary to use all three different methods of assessing the cars.
    c. A new employee is asked to collect further data and to do some
    calculations. He produces the following results:
    The coefficient of correlationbetween
          i. price and boot capacity is 1.2,
          ii. maximum speed and fuel consumption in miles per
               gallons is -0.7,
          iii. price and engine capacity is -0.9
    For each of his results say, giving a reason, whether you think
    it is reasonable.

    d. Suggest two sets of circumstances where Spearman’s coefficient

    of rank correlation would be preferred to the Pearson’s

    coefficient of correlation as a measure of association.

    UNIT 3:ACQUAINTED WITH TEACHING AND LEARNINGUNIT 5: CONDITIONAL PROBABILITY AND BAYES THEOREM