Openintro statistics pdf download






















OpenIntro Statistics is a dynamic take on the traditional curriculum, being successfully used at Community Colleges to the Ivy League. Bringing a fresh approach to intro statistics, ISRS introduces inference faster using randomization and simulation techniques. Also known as "OpenIntro Biostatistics", this book is suited for both undergraduate and graduate courses. As part of a pilot program, OpenIntro is providing desk copies for this textbook by Dr.

Gregory Hartman, et al. Free desk copies are available to Verified Teachers on openintro. To apply for verification, you may register an account on openintro.

Eight students have completed only 32, 35, 37, 30, 33, 36, 35 and 37 pages. In this chapter we always assume stationary transition probabilities. Sarmiento What is Statistics? This module presents important terms that will be used throughout the text. If you are the bsc student and currently studying the statistics subject then you might be looking for the bsc statistics notes pdf. Divide 48 pieces of clothing into 2 groups so the ratio is 1 to 3. All other files are saved as Adobe pdf files.

Solution: In our day to day life we may collect data in various ways, a few View Chapter 1 Assessment. Chapter 4 Exploratory Data Analysis A rst look at the data. In this experimental design the change in the outcome measurement can be as- This chapter will explore the following concepts and explain how they are tested on the SAT: 1. Let X be a random variable that takes the value of 1 if the coin shows heads and the value of 0 if the coin shows tails.

If yes then you are at right page because here we have shared the BSc Statistics Chapter 1 Notes Introduction pdf - 1st yearNotes of statistics book pdf free to download for all students of Pakistan Punjab Boards and others boards. So, we have to make it in continuous class interval. Adjoint of a matrix A is T. Statistics for the behavioral and social sciences a brief course, Arthur Aron, Elaine Aron, Jan 1, , Psychology, pages.

The post is tagged and categorized under in bsc notes, bsc statistics, Education News, Notes Tags. First you need to look at descriptive statistics since you will use the descriptive statistics when making inferences. Scatterplots and Correlation 1 Chapter 4. Chapter 3 Review. Published on the Web by ndt.

Descriptive statistics is covered in one chapter chapter 2. As used in this chapter: 1. Items produced by a manufacturing process are supposed to weigh 90 grams. The purpose of this chapter View Chapter 1 Assessment. Chapter 8 2 Test 8B 5. Chapter 1. A variable is a characteristic or attribute that can assume di erent values. Probability and related concepts are covered across four chapters chapters Statistical techniques are used to make many decisions that affect our lives No matter what your career, you will makeStatistical methods are a key part of of data science, yet very few data scientists have any formal statistics training.

This fact alone makes descriptive statistics preferable to either enu-meration or visual presentation. Read a statistics book: The Think stats book is available as free PDF or in print and is a great introduction to statistics. View Chapter 1 Assessment. Chapter 1 - Introduction to Statistics. Statistics: The science of collecting, describing, and interpreting data. Let x be the number of successes observed. We have three healthy children. Still, the day that my calculus teacher got her comeuppance is a top five life moment.

The fact that I nearly failed the makeup final exam did not significantly diminish this wonderful life experience. The calculus exam incident tells you much of what you need to know about my relationship with mathematics—but not everything. Curiously, I loved physics in high school, even though physics relies very heavily on the very same calculus that I refused to do in Mrs.

Because physics has a clear purpose. I distinctly remember my high school physics teacher showing us during the World Series how we could use the basic formula for acceleration to estimate how far a home run had been hit.

Once I arrived in college, I thoroughly enjoyed probability, again because it offered insight into interesting real-life situations. That brings me to statistics which, for the purposes of this book, includes probability. I love statistics. Statistics can be used to explain everything from DNA testing to the idiocy of playing the lottery. Statistics can help us identify the factors associated with diseases like cancer and heart disease; it can help us spot cheating on standardized tests.

Statistics can even help you win on game shows. Monty Hall explained to the player that there was a highly desirable prize behind one of the doors—something like a new car—and a goat behind the other two. The idea was straightforward: the player chose one of the doors and would get the contents behind that door.

As each player stood facing the doors with Monty Hall, he or she had a 1 in 3 chance of choosing the door that would be opened to reveal the valuable prize. After the player chose a door, Monty Hall would open one of the two remaining doors, always revealing a goat. For the sake of example, assume that the player has chosen Door no. Monty would then open Door no. Two doors would still be closed, nos. If the valuable prize was behind no. But then things got more interesting: Monty would turn to the player and ask whether he would like to change his mind and switch doors from no.

Should he switch? The answer is yes. The paradox of statistics is that they are everywhere—from batting averages to presidential polls— but the discipline itself has a reputation for being uninteresting and inaccessible.

Many statistics books and classes are overly laden with math and jargon. Every chapter in this book promises to answer the basic question that I asked to no effect of my high school calculus teacher: What is the point of this? This book is about the intuition. It is short on math, equations, and graphs; when they are used, I promise that they will have a clear and enlightening purpose. Meanwhile, the book is long on examples to convince you that there are great reasons to learn this stuff.

The idea for this book was born not terribly long after my unfortunate experience in Mrs. I went to graduate school to study economics and public policy. For three weeks, we learned math all day in a windowless, basement classroom really. On one of those days, I had something very close to a career epiphany.

Our instructor was trying to teach us the circumstances under which the sum of an infinite series converges to a finite number. Stay with me here for a minute because this concept will become clear.

The three dots means that the pattern continues to infinity. This is the part we were having trouble wrapping our heads around. One of my classmates, Will Warshauer, would have none of it, despite the impressive mathematical proof.

To be honest, I was a bit skeptical myself. How can something that is infinite add up to something that is finite? Then I got an inspiration, or more accurately, the intuition of what the instructor was trying to explain. I turned to Will and talked him through what I had just worked out in my head. Imagine that you have positioned yourself exactly 2 feet from a wall.

Now move half the distance to that wall 1 foot , so that you are left standing 1 foot away. And so on. You will gradually get pretty darn close to the wall. But you will never hit the wall, because by definition each move takes you only half the remaining distance.

In other words, you will get infinitely close to the wall but never hit it. Therein lies the insight: Even though you will continue moving forever—with each move taking you half the remaining distance to the wall—the total distance you travel can never be more than 2 feet, which is your starting distance from the wall. For mathematical purposes, the total distance you travel can be approximated as 2 feet, which turns out to be very handy for computation purposes.

The point is that I convinced Will. I convinced myself. And when I do, it will probably make sense. In my experience, the intuition makes the math and other technical details more understandable—but not necessarily the other way around. The point of this book is to make the most important statistical concepts more intuitive and more accessible, not just for those of us forced to study them in windowless classrooms but for anyone interested in the extraordinary power of numbers and data.

The problem is that if the data are poor, or if the statistical techniques are used improperly, the conclusions can be wildly misleading and even potentially dangerous. Imagine that headline popping up while you are surfing the Web. According to a seemingly impressive study of 36, office workers a huge data set! Clearly we need to act on this kind of finding—perhaps some kind of national awareness campaign to prevent short breaks on the job.

Or maybe we just need to think more clearly about what many workers are doing during that ten- minute break. My professional experience suggests that many of those workers who report leaving their offices for short breaks are huddled outside the entrance of the building smoking cigarettes creating a haze of smoke through which the rest of us have to walk in order to get in or out.

Statistics is like a high-caliber weapon: helpful when used correctly and potentially disastrous in the wrong hands. This is not a textbook, which is liberating in terms of the topics that have to be covered and the ways in which they can be explained. The book has been designed to introduce the statistical concepts with the most relevance to everyday life. How do scientists conclude that something causes cancer? How does polling work and what can go wrong? How does your credit card company use data on what you are buying to predict if you are likely to miss a payment?

Seriously, they can do that. If you want to understand the numbers behind the news and to appreciate the extraordinary and growing power of data, this is the stuff you need to know. But I have even bolder aspirations than that.

I think you might actually enjoy statistics. The underlying ideas are fabulously interesting and relevant. The key is to separate the important ideas from the arcane technical details that can get in the way.

That is Naked Statistics. Students will complain that statistics is confusing and irrelevant. Then the same students will leave the classroom and happily talk over lunch about batting averages during the summer or the windchill factor during the winter or grade point averages always.

The same data completion rate, average yards per pass attempt, percentage of touchdown passes per pass attempt, and interception rate could be combined in a different way, such as giving greater or lesser weight to any of those inputs, to generate a different but equally credible measure of performance. Is the quarterback rating perfect? Does it provide meaningful information in an easily accessible way? I am a Chicago Bears fan. During the playoffs, the Bears played the Packers; the Packers won.

There are a lot of ways I could describe that game, including pages and pages of analysis and raw data.

But here is a more succinct analysis. Chicago Bears quarterback Jay Cutler had a passer rating of In contrast, Green Bay quarterback Aaron Rodgers had a passer rating of That tells you a lot of what you need to know in order to understand why the Bears beat the Packers earlier in the season but lost to them in the playoffs. That is a very helpful synopsis of what happened on the field.

Does it simplify things? Yes, that is both the strength and the weakness of any descriptive statistic. The curious thing is that the same people who are perfectly comfortable discussing statistics in the context of sports or the weather or grades will seize up with anxiety when a researcher starts to explain something like the Gini index, which is a standard tool in economics for measuring income inequality.

As such, it has the strengths of most descriptive statistics, namely that it provides an easy way to compare the income distribution in two countries, or in a single country at different points in time.

The statistic can be calculated for wealth or for annual income, and it can be calculated at the individual level or at the household level. All of these statistics will be highly correlated but not identical. A country in which every household had identical wealth would have a Gini index of zero. As you can probably surmise, the closer a country is to one, the more unequal its distribution of wealth.

The United States has a Gini index of. Once that number is put into context, it can tell us a lot. For example, Sweden has a Gini index of. We can also compare different points in time. The Gini index for the United States was. The most recent CIA data are for This tells us in an objective way that while the United States grew richer over that period of time, the distribution of wealth grew more unequal. Again, we can compare the changes in the Gini index across countries over roughly the same time period.

Inequality in Canada was basically unchanged over the same stretch. Sweden has had significant economic growth over the past two decades, but the Gini index in Sweden actually fell from.

Is the Gini index the perfect measure of inequality? Absolutely not—just as the passer rating is not a perfect measure of quarterback performance.

But it certainly gives us some valuable information on a socially significant phenomenon in a convenient format. We have also slowly backed our way into answering the question posed in the chapter title: What is the point? The point is that statistics helps us process data, which is really just a fancy name for information. Sometimes the data are trivial in the grand scheme of things, as with sports statistics.

Sometimes they offer insight into the nature of human existence, as with the Gini index. How does Netflix know what kind of movies you like? How can we figure out what substances or behaviors cause cancer, given that we cannot conduct cancer-causing experiments on humans? Does praying for surgical patients improve their outcomes? Is there really an economic benefit to getting a degree from a highly selective college or university?

What is causing the rising incidence of autism? Statistics can help answer these questions or, we hope, can soon. The world is producing more and more data, ever faster and faster. Here is a quick tour of how statistics can bring meaning to raw data.

Description and Comparison A bowling score is a descriptive statistic. So is a batting average. Most American sports fans over the age of five are already conversant in the field of descriptive statistics. We use numbers, in sports and everywhere else in life, to summarize information. How good a baseball player was Mickey Mantle?

He was a career. To a baseball fan, that is a meaningful statement, which is remarkable when you think about it, because it encapsulates an eighteen-season career. We evaluate the academic performance of high school and college students by means of a grade point average, or GPA. A letter grade is assigned a point value; typically an A is worth 4 points, a B is worth 3, a C is worth 2, and so on.

By graduation, when high school students are applying to college and college students are looking for jobs, the grade point average is a handy tool for assessing their academic potential. Someone who has a 3. That makes it a nice descriptive statistic. The GPA does not reflect the difficulty of the courses that different students may have taken.

How can we compare a student with a 3. This caused its own problems. Instead, they paid to send me to a private driving school, at nights over the summer. Was that insane? But one theme of this book will be that an overreliance on any descriptive statistic can lead to misleading conclusions, or cause undesirable behavior.

Descriptive statistics exist to simplify, which always implies some loss of nuance or detail. Anyone working with numbers needs to recognize as much.

Inference How many homeless people live on the streets of Chicago? How often do married people have sex? These may seem like wildly different kinds of questions; in fact, they both can be answered not perfectly by the use of basic statistical tools. One key function of statistics is to use the data we have to make informed conjectures about larger questions for which we do not have full information.

It is expensive and logistically difficult to count the homeless population in a large metropolitan area. Yet it is important to have a numerical estimate of this population for purposes of providing social services, earning eligibility for state and federal revenues, and gaining congressional representation. One important statistical practice is sampling, which is the process of gathering data for a small area, say, a handful of census tracts, and then using those data to make an informed judgment, or inference, about the homeless population for the city as a whole.

Sampling requires far less resources than trying to count an entire population; done properly, it can be every bit as accurate.

A political poll is one form of sampling. A research organization will attempt to contact a sample of households that are broadly representative of the larger population and ask them their views about a particular issue or candidate. This is obviously much cheaper and faster than trying to contact every household in an entire state or country. The polling and research firm Gallup reckons that a methodologically sound poll of 1, households will produce roughly the same results as a poll that attempted to contact every household in America.

In the mids, the National Opinion Research Center at the University of Chicago carried out a remarkably ambitious study of American sexual behavior.

The results were based on detailed surveys conducted in person with a large, representative sample of American adults. If you read on, Chapter 10 will tell you what they learned. How many other statistics books can promise you that? That does not mean that they are making money at any given moment. When the bells and whistles go off, some high roller has just won thousands of dollars. The whole gambling industry is built on games of chance, meaning that the outcome of any particular roll of the dice or turn of the card is uncertain.

At the same time, the underlying probabilities for the relevant events—drawing 21 at blackjack or spinning red in roulette—are known. This turns out to be a powerful phenomenon in areas of life far beyond casinos. Many businesses must assess the risks associated with assorted adverse outcomes. However, any business facing uncertainty can manage these risks by engineering processes so that the probability of an adverse outcome, anything from an environmental catastrophe to a defective product, becomes acceptably low.

Wall Street firms will often evaluate the risks posed to their portfolios under different scenarios, with each of those scenarios weighted based on its probability. The financial crisis of was precipitated in part by a series of market events that had been deemed extremely unlikely, as if every player in a casino drew blackjack all night.

I will argue later in the book that these Wall Street models were flawed and that the data they used to assess the underlying risks were too limited, but the point here is that any model to deal with risk must have probability as its foundation. The entire insurance industry is built upon charging customers to protect them against some adverse outcome, such as a car crash or a house fire. The insurance industry does not make money by eliminating these events; cars crash and houses burn every day.

Sometimes cars even crash into houses, causing them to burn. Instead, the insurance industry makes money by charging premiums that are more than sufficient to pay for the expected payouts from car crashes and house fires.

The insurance company may also try to lower its expected payouts by encouraging safe driving, fences around swimming pools, installation of smoke detectors in every bedroom, and so on. Probability can even be used to catch cheats in some situations.

The mathematical logic stems from the fact that we cannot learn much when a large group of students all answer a question correctly. But when those same test takers get an answer wrong, they should not all consistently have the same wrong answer.

If they do, it suggests that they are copying from one another or sharing answers via text. Of course, you can see the limitations of using probability. A large group of test takers might have the same wrong answers by coincidence; in fact, the more schools we evaluate, the more likely it is that we will observe such patterns just as a matter of chance.

A statistical anomaly does not prove wrongdoing. We cannot arrest Mr. Kinney for fraud on the basis of that calculation alone though we might inquire whether he has any relatives who work for the state lottery. Probability is one weapon in an arsenal that requires good judgment. We have an answer for that question—but the process of answering it was not nearly as straightforward as one might think. The scientific method dictates that if we are testing a scientific hypothesis, we should conduct a controlled experiment in which the variable of interest e.

If we observe a marked difference in some outcome between the two groups e. We cannot do that kind of experiment on humans. If our working hypothesis is that smoking causes cancer, it would be unethical to assign recent college graduates to two groups, smokers and nonsmokers, and then see who has cancer at the twentieth reunion.

Smokers and nonsmokers are likely to be different in ways other than their smoking behavior. For example, smokers may be more likely to have other habits, such as drinking heavily or eating badly, that cause adverse health outcomes.

If the smokers are particularly unhealthy at the twentieth reunion, we would not know whether to attribute this outcome to smoking or to other unhealthy things that many smokers happen to do. We would also have a serious problem with the data on which we are basing our analysis. Smokers who have become seriously ill with cancer are less likely to attend the twentieth reunion.

As a result, any analysis of the health of the attendees at the twentieth reunion related to smoking or anything else will be seriously flawed by the fact that the healthiest members of the class are the most likely to show up.

The further the class gets from graduation, say, a fortieth or a fiftieth reunion, the more serious this bias will be. We cannot treat humans like laboratory rats. As a result, statistics is a lot like good detective work. The data yield clues and patterns that can ultimately lead to meaningful conclusions.

You have probably watched one of those impressive police procedural shows like CSI: New York in which very attractive detectives and forensic experts pore over minute clues—DNA from a cigarette butt, teeth marks on an apple, a single fiber from a car floor mat—and then use the evidence to catch a violent criminal.

The appeal of the show is that these experts do not have the conventional evidence used to find the bad guy, such as an eyewitness or a surveillance videotape. So they turn to scientific inference instead. Statistics does basically the same thing. The data present unorganized clues—the crime scene. Statistical analysis is the detective work that crafts the raw data into some meaningful conclusion. After Chapter 11, you will appreciate the television show I hope to pitch: CSI: Regression Analysis, which would be only a small departure from those other action-packed police procedurals.

When you read in the newspaper that eating a bran muffin every day will reduce your chances of getting colon cancer, you need not fear that some unfortunate group of human experimental subjects has been force-fed bran muffins in the basement of a federal laboratory somewhere while the control group in the next building gets bacon and eggs.

Instead, researchers will gather detailed information on thousands of people, including how frequently they eat bran muffins, and then use regression analysis to do two crucial things: 1 quantify the association observed between eating bran muffins and contracting colon cancer e.

Of course, CSI: Regression Analysis will star actors and actresses who are much better looking than the academics who typically pore over such data. What individuals are most likely to become terrorists?

Olympic beach volleyball team. When she gets the printout from her statistical analysis, she sees exactly what she has been looking for: a large and statistically significant relationship in her data set between some variable that she had hypothesized might be important and the onset of autism. She must share this breakthrough immediately! The researcher takes the printout and runs down the hall, slowed somewhat by the fact that she is wearing high heels and a relatively small, tight black skirt.

She finds her male partner, who is inexplicably fit and tan for a guy who works fourteen hours a day in a basement computer lab, and shows him the results. Together the regression analysis experts walk briskly to see their boss, a grizzled veteran who has overcome failed relationships and a drinking problem.

Just about every social challenge that we care about has been informed by the systematic analysis of large data sets. In many cases, gathering the relevant data, which is expensive and time-consuming, plays a crucial role in this process as will be explained in Chapter 7. I may have embellished my characters in CSI: Regression Analysis but not the kind of significant questions they could examine.

There is an academic literature on terrorists and suicide bombers—a subject that would be difficult to study by means of human subjects or lab rats for that matter. One such book, What Makes a Terrorist , was written by one of my graduate school statistics professors. The book draws its conclusions from data gathered on terrorist attacks around the world.

A sample finding: Terrorists are not desperately poor, or poorly educated. Well, that exposes one of the limitations of regression analysis. We can isolate a strong association between two variables by using statistical analysis, but we cannot necessarily explain why that relationship exists, and in some cases, we cannot know for certain that the relationship is causal, meaning that a change in one variable is really causing a change in the other.

In the case of terrorism, Professor Krueger hypothesizes that since terrorists are motivated by political goals, those who are most educated and affluent have the strongest incentive to change society. These individuals may also be particularly rankled by suppression of freedom, another factor associated with terrorism. This discussion leads me back to the question posed by the chapter title: What is the point? The point is not to do math, or to dazzle friends and colleagues with advanced statistical techniques.

The point is to learn things that inform our lives. As a result, there are numerous reasons that intellectually honest individuals may disagree about statistical results or their implications.

At the most basic level, we may disagree on the question that is being answered. As the next chapter will point out, more socially significant questions fall prey to the same basic challenge. What is happening to the economic health of the American middle class? Nor can we create two identical nations —except that one is highly repressive and the other is not—and then compare the number of suicide bombers that emerge in each. Even when we can conduct large, controlled experiments on human beings, they are neither easy nor cheap.

Researchers did a large-scale study on whether or not prayer reduces postsurgical complications, which was one of the questions raised earlier in this chapter. We conduct statistical analysis using the best data and methodologies and resources available. Statistical analysis is more like good detective work hence the commercial potential of CSI: Regression Analysis.

Smart and honest people will often disagree about what the data are trying to tell us. But who says that everyone using statistics is smart or honest? As mentioned, this book began as an homage to How to Lie with Statistics, which was first published in and has sold over a million copies.

The reality is that you can lie with statistics. Or you can make inadvertent errors. In either case, the mathematical precision attached to statistical analysis can dress up some serious nonsense.

This book will walk through many of the most common statistical errors and misrepresentations so that you can recognize them, not put them to use. So, to return to the title chapter, what is the point of learning statistics? To summarize huge quantities of data. To make better decisions.

To answer important social questions. To recognize patterns that can refine how we do everything from selling diapers to catching criminals. To catch cheaters and prosecute criminals.

To evaluate the effectiveness of policies, programs, drugs, medical procedures, and other innovations. And to spot the scoundrels who use these very same powerful tools for nefarious ends. If you can do all of that while looking great in a Hugo Boss suit or a short black skirt, then you might also be the next star of CSI: Regression Analysis.

In that case, the United States would have a Gini Index of The first question is profoundly important. It tends to be at the core of presidential campaigns and other social movements. The second question is trivial in the literal sense of the word , but baseball enthusiasts can argue about it endlessly. What the two questions have in common is that they can be used to illustrate the strengths and limitations of descriptive statistics, which are the numbers and calculations we use to summarize raw data.

That would be raw data, and it would take a while to digest, given that Jeter has played seventeen seasons with the New York Yankees and taken 9, at bats.

Or I can just tell you that at the end of the season Derek Jeter had a career batting average of. It is easy to understand, elegant in its simplicity—and limited in what it can tell us. Baseball experts have a bevy of descriptive statistics that they consider to be more valuable than the batting average. I called Steve Moyer, president of Baseball Info Solutions a firm that provides a lot of the raw data for the Moneyball types , to ask him, 1 What are the most important statistics for evaluating baseball talent?

Ideally we would like to find the economic equivalent of a batting average, or something even better. We would like a simple but accurate measure of how the economic well-being of the typical American worker has been changing in recent years. Are the people we define as middle class getting richer, poorer, or just running in place?

Per capita income is a simple average: total income divided by the size of the population. Congratulations to us. There is just one problem. My quick calculation is technically correct and yet totally wrong in terms of the question I set out to answer.

To begin with, the figures above are not adjusted for inflation. Per capita income merely takes all of the income earned in the country and divides by the number of people, which tells us absolutely nothing about who is earning how much of that income—in or in As the Occupy Wall Street folks would point out, explosive growth in the incomes of the top 1 percent can raise per capita income significantly without putting any more money in the pockets of the other 99 percent.

In other words, average income can go up without helping the average American. As with the baseball statistic query, I have sought outside expertise on how we ought to measure the health of the American middle class. From baseball to income, the most basic task when working with data is to summarize a great deal of information. There are some million residents in the United States.

A spreadsheet with the name and income history of every American would contain all the information we could ever want about the economic health of the country—yet it would also be so unwieldy as to tell us nothing at all. The irony is that more data can often present less clarity. So we simplify. We perform calculations that reduce a complex array of data into a handful of numbers that describe those data, just as we might encapsulate a complex, multifaceted Olympic gymnastics performance with one number: 9.

The good news is that these descriptive statistics give us a manageable and meaningful summary of the underlying phenomenon. The bad news is that any simplification invites abuse. Descriptive statistics can be like online dating profiles: technically accurate and yet pretty darn misleading.

You have finished reading about day seven of the marriage when your boss shows up with two enormous files of data. One file has warranty claim information for each of the 57, laser printers that your firm sold last year. For each printer sold, the file documents the number of quality problems that were reported during the warranty period. The other file has the same information for each of the , laser printers that your chief competitor sold during the same stretch.

In this case, we want to know the average number of quality problems per printer sold for your firm and for your competitor. You would simply tally the total number of quality problems reported for all printers during the warranty period and then divide by the total number of printers sold.

Remember, the same printer can have multiple problems while under warranty. You would do that for each firm, creating an important descriptive statistic: the average number of quality problems per printer sold. That was easy. Or maybe not. Bill Gates walks into the bar with a talking parrot perched on his shoulder. The parrot has nothing to do with the example, but it kind of spices things up. Obviously none of the original ten drinkers is any richer though it might be reasonable to expect Bill Gates to buy a round or two.

The sensitivity of the mean to outliers is why we should not gauge the economic health of the American middle class by looking at per capita income. Because there has been explosive growth in incomes at the top end of the distribution—CEOs, hedge fund managers, and athletes like Derek Jeter—the average income in the United States could be heavily skewed by the megarich, making it look a lot like the bar stools with Bill Gates at the end. The median is the point that divides a distribution in half, meaning that half of the observations lie above the median and half lie below.

If there is an even number of observations, the median is the midpoint between the two middle observations. If you literally envision lining up the bar patrons on stools in ascending order of their incomes, the income of the guy sitting on the sixth stool represents the median income for the group.

If Warren Buffett comes in and sits down on the twelfth stool next to Bill Gates, the median still does not change.

The number of quality problems per printer is arrayed along the bottom; the height of each bar represents the percentages of printers sold with that number of quality problems. Because the distribution includes all possible quality outcomes, including zero defects, the proportions must sum to 1 or percent. The distribution is slightly skewed to the right by the small number of printers with many reported quality defects.

These outliers move the mean slightly rightward but have no impact on the median. With a few keystrokes, you get the result. Because the Kardashian marriage is getting monotonous, and because you are intrigued by this finding, you print a frequency distribution for your own quality problems.

These outliers inflate the mean but not the median. More important from a production standpoint, you do not need to retool the whole manufacturing process; you need only figure out where the egregiously low-quality printers are coming from and fix that. Meanwhile, the median has some useful relatives.

The distribution can be further divided into quarters, or quartiles. The first quartile consists of the bottom 25 percent of the observations; the second quartile consists of the next 25 percent of the observations; and so on. Or the distribution can be divided into deciles, each with 10 percent of the observations.

If your income is in the top decile of the American income distribution, you would be earning more than 90 percent of your fellow workers. We can go even further and divide the distribution into hundredths, or percentiles. The benefit of these kinds of descriptive statistics is that they describe where a particular observation lies compared with everyone else. If I tell you that your child scored in the 3rd percentile on a reading comprehension test, you should know immediately that the family should be logging more time at the library.

If the test was easy, then most test takers will have a high number of answers correct, but your child will have fewer correct than most of the others.

Here is a good point to introduce some useful terminology. If I shoot 83 for eighteen holes of golf, that is an absolute figure.



0コメント

  • 1000 / 1000