• Unit13:STATISTICS (BIVARIATE DATA)

    Key unit competence: By the end of this unit, the learner should be able to
        collect, represent and interpret bivariate data.

    Unit outline

    • Definition and examples bivariate data.

    • Frequency distribution table of bivariate data.

    • Review of data representation using graphs.

    • Definition of scatter diagrams.

    • Correlation

    • Unit summary

    • Unit Test

    Introduction

    Unit Focus Activity

    In a certain school, 15 boys took two examination papers in the same subject.
    The percentage marks obtained by each boy is given in table 13.1 where each
    boy’s marks are in the same column.

    (a) Is the performance in the twopapers consistent? Which is the
          better of the two performances?

    (b) Plot the corresponding points (x, y) on a Cartesian plane and
         describe the resulting graph.

    (c) Can you obtain a rule relating x and y? Explain your answer.

    (d) Calculate the mean mark in each paper and represent the same
         on your graph paper. Can you describe the performance in each
         paper with reference to the mean mark.

    (e) Can you draw a line that roughly represents the points in your graph?

    (f) Calculate the median mark on  each paper, denote the medians as
         mx and my respectively.

    (g) Using the graph Table 13.1, divide the set into 3 regions each
         containing 5 entries.
    (iii) Plot the three points on the same graphs and join them
         with the best line you can draw.

    Most of the statistical skills that we have developed in our earlier work were based
    on one variate i.e one set of data only. Such variates included heights, mass,
    age, marks etc. In everyday life situations, activities, circumstances etc may arise so
    that there is need to compare two sets of data. In this unit we shall learn how
    represent and analyse data that considers use of two variables for the same person
    or activity.

    13.1 Definition and examples of bivariate data
      
          Activity 13.1
    1. Use reference books or internet to define bivariate data.

    2. Give examples of bivariate data that can be obtained.

    (i) From members of your class

    (ii) From any other source.

    Consider statistical data whose observations have exactly two measurement
    or variables for the same group of persons, items or activities.
    Such data is know as Bivariate data. Some examples of Bivariate data are

    • Age (x) years and mass Yes kg

    • Height (x) cm and age Yes years.

    • Height (x) cm and mass Yes kg.

    • Expenditure in a business (x) and profit Yes for a particular period of
       time etc.

    13.2 Frequency distribution table for bivariate data
       
          Activity 13.2


    This data can be displayed in a frequency table using the information
    for the whole class for later use. In a bivariate data, the x and y are
    considered as an ordered part written as (x1, y1), (x2, y2),…..(xn, yn). Meaning
    that both variables have equal number of elements, referring to the same entry.
    Suppose a Mathematics examination consists of two distinct papers i.e. an
    algebra paper and a geometry paper. The mark scored by a candidate in the
    algebra paper is one variate, and that on the Geometry is another variate. In
    order to assess the overall performance in mathematics the examiner must
    consider the two corresponding marks together for each student.
    Let x denote the algebra mark And y denote the geometry mark.
    And the number of students in the class be 15.

    The set of marks (x, y) in table 13.3 above
     is an example of bivariate data. Every set denotes the corresponding
     marks for each student. For example student member 1 scored 45% in algebra
     and 57% in geometry.

    13.3 Review of data presentation using graphs
          
            Activity 13.3


    4. What kind of graph do you obtain? Would it make sense to join these
       points?
      To be able to represent any two variables graphically, one should first write the
     given information in co-ordinate form.

    • For an accurate graph, an appropriate scale should be chosen and used.

    • In order to analyse a graph accurately, it must be accurately drawn and
      points joined either using:

    i) A straight line or

    ii) A curve or

    iii) A series of zigzag line segments.





    Graph. 13.1

    This graph represents a curve with an increasing upward trend.

    (a) Graph 13.1 shows the required graph of y against x.

    (b) (i) when x = 2.5, y = 30

    (ii) when y = 150, x = 4.6

    (c) When y = 500, x = 7.

    Note:
    The graph in our example results in a curve.

    13.4 Representing bivariate using scatter diagrams

    13.4.1 Definition of scatter diagrams

    Activity 13.4

    In a certain county, data relating 18 to 25 year old drivers and fatal traffic
    accidents was gathered and recorded over a period of 3 months. The data
    was presented as in table 13.9.
    2. Plot the points (18, 8) (18,10) (18,6),………….(25, 2)

    3. What patterns do the points reveal that can help you draw
       any conclusion?

    4. Do you think any group or groups of people would be
        interested in the conclusion of this activity? Explain.

    Often graphs are in form of straight lines, or curves or continuous jagged line
    segments.

    However, some graphs do not fit into any of these categories. Some graphs result in
    a scatter of points rather than a collection of points falling on a well-defined line
    or curve. Sometimes there may be an underlying linear relation between the
    variables, but the points in the graph are scattered along it rather than fall on it.
    Such a graph is called a scatter graph orscatter diagram.

    13.4.2 The line of best fit in a scatter  diagram
        
              Activity 13.5
    Using the graph you obtained in activity 13.4, draw a line as follows.
    1. Use a transparent ruler.

    2. Place the ruler on the scatter diagram in the direction of the
        trend of the points so that you can see all the points.

    3. Adjust the ruler until it appears as though it passes through the
        centre of the data (points). Then draw the line.

    Although the points in Fig 13.1 do not fall on a line, they seem to scatter in a specific
    direction. Therefore, a line that closely approximates the data can be drawn and
    used to analyse the data further. In a scatter diagram, it is not easy
    to define a relation between the two variables involved. However, the points
    may appear to point some direction which may be approximated by a line
    known as the line of best fit.

    Using points on such a line, we can
    analyse the given data as follows;

    i) Describe the relation between the two variables.

    ii) Find the equation of the line,

    iii) Interpret the data i.e. read values from the graph,.

    iv) Describe the trend of the graph. written R2(A).


    (a) Write the data in co-ordinate form.

    (b) Plot the points to obtain a scatter diagram.

    (c) Draw the line of best fit.

    (d) Identify three points on the line and use them to find the equation of the
          line.

    (e) Describe the trend of the graph.

    (f) Use the graph to estimate: (i) x
        when y = 42.0 ;     (ii) y when x =85.

    Use the data to draw a scattered diagram. Use the scatter diagram
    obtained to draw the line of best fit. Use your graph to estimate the shoe
    size you expect someone 171 cm tall to wear.

    4. From an experiment, different masses are attached at the end of a spring
       wire and the length of the wire noted and recorded as in Table 13.14 below.
    (a) Obtain the scatter diagram for  the data in table 13.15 to estimate
         the line of best fit.

    (b) What velocity corresponds to 8.0 volts?

    (c) What can you say about the gradient of the line?

    (d) Find the equation of the line

    Alternative method of obtaining the line of best fit (The median fit line)

    Activity 13.6

    Table 13.16 shows the result of a survey done on nine supermarket
    stores. It shows the amount of money spent on advertising
    each day and the corresponding amount of money earned as
    profit.


    use Table 13.17 to draw a scatter diagram.

    Step 1

    1. Divide the scatter diagram into   three region. Each region should
        have the same number of points.

    Step 2

    1. For each region, using the table of values above,

    (i) find the median of the x co-ordinates (x - median)

    (ii) find the median of the y co-ordinates (y - median).

    (iii) Write the x- and y-medians for each region in the coordinate
          form.

    (iv) On the scatter diagram, plot the three median points.

    (v) Place the edge of a ruler between the first and third median points. If
        the middle point is not on the line formed, then slide
         the ruler about a third of the way towards the second
         points without changing the direction of the shape of the
         line. Draw the median fit line.
    (b) The median points are (20, 14), (55, 31), and (90, 49.5). They are marked with
         (x) crosses on the graph.
    • The broken lines in Graph 13.4  (b) represent the mean marks for
       the two subjects.

    • From this graph we can see that any Physics mark above 33 is
      above average score. Similarly any Mathematics mark above 40 is
      above average score.

    • We can see that the above-average scorers are same in the two
      subjects.

    Also the below-average scorers are the same.

    From these observations, we can conclude that, for this sample, ability in
    Mathematics is closely associated with ability in Physics.

    Exercise 13.3

    1. The mass (kg) and the average daily food consumption for a group of 13
       teenage boys was recorded as in Table 13.19.
    Use the table to:

    (a) Draw a scatter diagram.

    (b) Estimate the line of best fit.

    (c) Describe the slope of the line.

    2. Table 13.20 below shows marks scored by 25 students in two subjects,
        Chemistry and Social studies.
    a) Use the data to draw a scatter diagram.

    (b) Use the method of mean marks to establish the type of relationship if
        any between the Chemistry marks and the Social studies marks.

    13.5 Correlation

          Activity 13.7

    1. Use the dictionary or internet to find the meaning of the world
        correlation.
    Table 13.21 shows entrance test marks (x) and the corresponding
    course grade Yes for a particular year.

    2. Draw a scatter diagram for the data in table 13.21 and draw a
        line of best fit. Comment on the suitability of the entrance test to
        the placement of the students in the respective courses the
        co-ordinates of the image.

    By definition, correlation is a mutual relationship between two or more things.
    Examples of correlation in real life include:

    • As students study time increases, the tests average increases too.

    • As the number of trees cut down increases, soil erosion increases too.

    • The more you exercise your muscles, the stronger they get.

    • As a child grows, so does the clothing size.

    • The more one smokes, the fewer years he will have to live.

    • A student who has many absences has a decrease in grades scored.

    Types of correlation

    Activity 13.8

    Identify positive and negative
    correlation situation in the following cases:

    a) The more times people have unprotected sex with different
         partners, the more the rates of HIV in a society.

    b) The more people save their incomes, the more financially
        stable they become.

    c) As weather gets colder, air  conditioning costs decrease.

    d) The more alcohol are consumes, the less the judgment one has.

    e) The more one cleans the house, the less likely are to be pests
         problems.

    Correlation can be negative or positive depending on the situation:
    In a situation where one variable positively affects another variable, we
    say positive correlation has occurred. Also, when one variable affects another
    variable negatively, we say negative correlation has occurred.
    Correlation is a scatter diagram which can be determined whether it is positive
    or negative by following the trend of the points and the gradient of the line of the
    best fit.

    • If the gradient is positive, the positive correlation occurs.

    • If the gradient is negative, then negative correlation occurs

    • If gradient is zero, the there is no correlation between two variables.

    Example 13.5

    The amount of government bursaries allocated to certain administration
    regions in the country in a certain year is listed together with their population sizes.

    Use the data in table 13.22 to draw a scatter diagram and assess whether
    there appears to be correlation between the two measurements labeled x and y.
    Vs: 1 cm: 50 000
    Hs: 1 cm: 10 m

    Solution

    Graph 13.5 shows the scatter diagram and the line of bet fit of the sets of data.
    The two measurements (x) and Yes have a correlation.

    The best line of fit in this scatter diagram follows the general trend of the
    points. This line has a positive gradient. Thus, the relation in this data is called
    a positive correlation. image cordinator A′′B′′C′′and D′′

    Example 13.6

    The following measurements were made and the data recorded to the nearest cm.

    b) Draw the line of best fit for the data

    (c) Find the equation of the line in (b) above.

    (d) Describe the correlation.

    (e) Would it be reasonable to use your graph to estimate the height of a
          father whose son is 158 cm tall?

    Solution

    Let the height of the boy be x cm and that of the father y cm. Using a scale of
    1cm – 5cm on both axes, mark x on the horizontal axis and y on the vertical axis.

    Exercise 13.4

    1. The set of data in Table 13.24 shows number of vehicles and road deaths
        in some 10 countries.
    Use table 13.24 to draw a scatter
    diagram and use it to determine whether there is a correlation
    between x and y. If yes, describe the correlation.

    2. Data were collected on the mass of a rabbit in kilograms at various ages in
         weeks. Table 13.25
    Use the data in the table above to:
    (a) Draw a scatter diagram for the data

    (b) Draw a line of best fit

    (c) Use your graph to predict the mass of the rabbit when it is 8 weeks.

    (d) How would you describe the correlation in this data?

    3. From a laboratory research, data were collected on mass of hen and mass of
        the heart of hen and recorded table 13.26
    Use the table to:
    (a) Construct a scatter diagram for the data.

    (b) Estimate a line of best fit.

    (c) Find the equation of the line.

    (d) How can you use the equation in c to make predictions?

    (e) Describe the correlation in this data.

    4. In a junior cross country race the masses (kg) of sample participants
        were checked and recorded as they entered the race in Table 13.27. Their
        finishing positions in the race were also noted. Use a scatter diagram
       to determine whether there was any correlation between the sets of data.
       Give reason(s) for your answer.
    5. A basketball coach recorded the amount of time each player played
       and the number of points the player scored. Table 13.28 shows the data.
    (a) Make a scatter diagram and estimate the line of best fit.

    (b) Describe the type of correlation if any.

    (c) Use your graph to estimate how much time a player who scored
        67 points played.

    6. The production manager had 10 newly recruited workers under him.
        For one week, he kept a record of the number of times that each employee
        needed help with a task, table 13.29.
    (a) Make a scatter diagram for the data and estimate the line of best fit.

    (b) What type of correlation is there?

    (c) What conclusion do you make from your graph? Justify your  answer.

    Unit Summary

    The data in the Table 13.30 belows belong to the same sample.
    • Such data is called Bivariate data. This data has two variants x and y.
      – The data has ten entries
      – Each entry has two variants x and y
      – This data can be presented
    graphically using co-ordinates.
    (x, y) ie. (46, 33) (17, 17)………… (56, 39)

    • The graph below represents the given data. Such a graph is called a scatter
      diagram. The point do not lie on a line or a curve, hence the name.
    The line drawn through the points is an approximation. It is called the line
    of best fit. It shows the general trend of the data or graph.

    • The equation of such a line can be found and used to analyse data
       equation: 20y = 11x + 80.

    • The line of best fit shows that there is a relationship between the two data sets
      though the line is an approximation. If it shows a positive trend i.e. the
      line has a positive gradient thus we say the data has a correlation. Since
      the line has a positive gradient, the correlation is positive.
      If the line had a negative gradient, we would say the correlation is negative.

    • We can use the graph to estimate missing variants. For example, we
       can find:

    (i) y when x = 29; Ans y = 20

    (ii) x when y = 3; Ans x = 60

    (iii) y when x = 44; Ans y = 28

    Unit 13 Test

    1. The table below shows the data collected and recorded on ten football
         players in a season.

    Use the given information to answer the following questions.

    (a) Make a scatter diagram for the data.

    (b) Estimate the line of best fit representing the data in (Table 13.31).

    (c) Use your graph to estimate:

    (i) The number of points earned by a player who scored 25 goals.

    (ii) The number of goals scored by a player who earned 45 points.

    (d) Explain how for any line, you know whether a slope (gradient)
         is positive or negative.

    (e) Find the equation of the line you drew in question (b) above

    (f) Describe the correlation in the data in this question.

    (g) What conclusion can you draw from this graph?

    2. Data below was collected from a certain supermarket in Kigali City.
        The price is in Francs:.
    (a) Calculate the average price for all commodities.

    (i) May 2016

    (ii)June 2016

    (b) Plot a scatter diagram for the two prices for all the commodities.

    (c) Draw the line of best fit for thedata.

    (d) What conclusion can you draw from the scatter diagram plotted?

    3. (a) Calculate the x and y from the data given below:
    (b) Plot a scatter diagram for the data above.

    4. The table below shows the number of students (x) and the number of days
        Yes they remained at school at the end of term one in 2017.
    (a) Make a scatter diagram for the data.

    (b) Explain how any line of best fit can be draw.

    (c) Describe the correlation of the data.
    Unit12:INVERSE AND COMPOSITE TRANSFORMATIONS IN 2D