anvaya prep

SAT · Math · Data Analysis and Statistics

High YieldMedium20 min read

Data comparison

A complete SAT guide to Data comparison — covering key concepts, exam-focused explanations, and high-yield FAQs.

Overview

Data comparison is a fundamental skill tested extensively on the SAT math section, requiring students to analyze, interpret, and draw conclusions from multiple data sets or representations. This topic encompasses the ability to compare statistical measures, evaluate trends across different groups, and make informed judgments about relationships between data sets. On the SAT, data comparison questions appear in various formats, including tables, charts, graphs, and written descriptions, often requiring students to synthesize information from multiple sources simultaneously.

The importance of sat data comparison cannot be overstated—these questions appear in approximately 15-20% of all SAT math problems and frequently combine with other statistical concepts like measures of center, spread, and data interpretation. Mastering data comparison enables students to tackle complex multi-step problems that integrate algebra, statistics, and logical reasoning. These questions often serve as the bridge between basic statistical literacy and advanced analytical thinking, making them critical for achieving scores in the upper percentile ranges.

Data comparison connects directly to broader mathematical concepts including ratios, percentages, functions, and algebraic reasoning. Students who excel at comparing data sets demonstrate proficiency in pattern recognition, proportional thinking, and critical analysis—skills that extend beyond statistics into virtually every other SAT math domain. This topic also serves as the foundation for understanding experimental design, sampling methods, and the validity of statistical claims, all of which appear regularly in both the calculator and no-calculator sections of the exam.

Learning Objectives

  • [ ] Identify key features of data comparison including measures of center, spread, and distribution shape
  • [ ] Explain how data comparison appears on the SAT through various question formats and data representations
  • [ ] Apply data comparison to answer SAT-style questions involving multiple data sets and statistical measures
  • [ ] Evaluate the relative differences between two or more data sets using appropriate statistical reasoning
  • [ ] Synthesize information from multiple graphical and tabular representations to draw valid conclusions
  • [ ] Determine which statistical measure (mean, median, range, standard deviation) is most appropriate for comparing specific data sets
  • [ ] Analyze how changes in data affect comparative relationships between groups

Prerequisites

  • Basic statistical measures: Understanding mean, median, mode, and range is essential because data comparison requires calculating and interpreting these values across multiple sets
  • Reading graphs and tables: Proficiency in extracting information from bar graphs, line graphs, scatterplots, and tables is necessary since SAT questions present data in various visual formats
  • Percentages and ratios: These concepts underpin many comparison calculations, particularly when determining relative differences between groups
  • Basic algebra: Solving equations and manipulating variables appears frequently when comparing data sets mathematically
  • Number sense: Understanding magnitude, estimation, and relative size helps quickly evaluate which data set has larger or smaller values

Why This Topic Matters

Data comparison skills extend far beyond standardized testing into everyday decision-making and professional contexts. Consumers compare product ratings, prices, and reviews; medical professionals compare treatment outcomes across patient groups; business analysts compare sales figures across regions and time periods. The ability to critically evaluate competing claims based on data has become essential in an information-rich society where statistical arguments appear in news media, advertising, and public policy debates.

On the SAT specifically, data comparison questions appear with remarkable frequency—students can expect 3-5 dedicated questions per test, with additional questions incorporating comparison elements within broader statistical contexts. These questions typically appear in both multiple-choice and student-produced response formats, with difficulty levels ranging from straightforward two-set comparisons to complex multi-variable analyses. The College Board consistently includes data comparison in the "Problem Solving and Data Analysis" domain, which comprises approximately 29% of the total math section (17 out of 58 questions).

Common SAT presentations include comparing box plots showing different distributions, analyzing two-way tables to identify group differences, evaluating scatterplots with multiple data series, and interpreting survey results across demographic categories. Questions often ask students to identify which group has a higher mean, which data set shows greater variability, or whether observed differences are meaningful given the context. The integration of data comparison with real-world scenarios—such as comparing student test scores, analyzing scientific experiments, or evaluating business metrics—makes this topic particularly high-yield for exam preparation.

Core Concepts

Comparing Measures of Center

The measures of center—mean, median, and mode—provide different perspectives on typical values within data sets, and comparing these measures across groups forms the foundation of data comparison. When comparing means, students must recognize that the mean represents the arithmetic average and is sensitive to extreme values (outliers). If Data Set A has a mean of 75 and Data Set B has a mean of 82, Set B has a higher average value by 7 units, representing approximately a 9.3% increase.

The median comparison proves particularly valuable when data sets contain outliers or skewed distributions. Unlike the mean, the median represents the middle value when data is ordered, making it resistant to extreme values. On the SAT, questions frequently ask which measure better represents "typical" values, requiring students to evaluate data distribution characteristics before selecting the appropriate comparison metric.

MeasureSensitivity to OutliersBest Used WhenComparison Interpretation
MeanHighData is symmetricRepresents average difference
MedianLowData is skewed or has outliersRepresents typical value difference
ModeNoneIdentifying most common valueShows frequency difference

Comparing Measures of Spread

Measures of spread quantify variability within data sets, and comparing spread is crucial for understanding data consistency and reliability. The range (maximum minus minimum) provides the simplest spread measure but is highly sensitive to outliers. When comparing ranges, a larger range indicates greater variability between extreme values. For example, if Class A's test scores range from 60-95 (range = 35) and Class B's scores range from 70-90 (range = 20), Class B demonstrates more consistent performance despite potentially similar means.

The interquartile range (IQR), calculated as Q3 - Q1, measures the spread of the middle 50% of data and resists outlier influence. Comparing IQRs reveals which data set has more consistent central values. The standard deviation quantifies average distance from the mean; larger standard deviations indicate greater variability. On the SAT, students must recognize that two data sets can have identical means but vastly different spreads, fundamentally changing their comparative interpretation.

Comparing Distributions Using Visual Representations

Box plots (box-and-whisker plots) enable efficient visual comparison of multiple distributions simultaneously. When comparing box plots, students should examine: (1) the median line position within each box, (2) the box lengths (representing IQR), (3) the whisker lengths (showing range), and (4) the presence of outliers (typically marked as individual points). A box plot positioned higher on the scale indicates generally larger values, while a longer box indicates greater variability in the middle 50% of data.

Histograms display frequency distributions and allow comparison of distribution shapes. When comparing histograms, students should identify whether distributions are symmetric, skewed left, skewed right, uniform, or bimodal. The shape comparison reveals fundamental differences: a right-skewed distribution has mean > median, while a left-skewed distribution has mean < median. SAT questions often present two histograms and ask which group has higher variability or which measure of center would be most affected by the distribution shape.

Comparing Categorical Data in Two-Way Tables

Two-way tables (contingency tables) organize data by two categorical variables, enabling comparison across groups. When analyzing two-way tables for comparison purposes, students should calculate and compare:

  1. Row percentages: Divide each cell by its row total to compare how a characteristic distributes within each category
  2. Column percentages: Divide each cell by its column total to compare how categories distribute within each characteristic
  3. Marginal distributions: Compare row and column totals to identify overall group differences
  4. Conditional probabilities: Calculate the probability of one event given another has occurred, then compare across conditions

For example, given a table showing survey responses (Agree/Disagree) by grade level (9th/10th), calculating the percentage who agree within each grade enables direct comparison of opinion differences between grades.

When comparing linear relationships or trends across data sets, students must evaluate slopes, y-intercepts, and correlation strengths. A steeper slope indicates a faster rate of change. If Company A's revenue increases by $5,000 per month (slope = 5000) and Company B's increases by $3,000 per month (slope = 3000), Company A demonstrates faster growth despite potentially different starting values.

Scatterplot comparison requires evaluating correlation direction (positive/negative), strength (how closely points cluster around a line), and the presence of outliers. When comparing two scatterplots, students should identify which shows stronger correlation (points closer to a line) and which has a steeper trend line. The SAT frequently presents scenarios where one group shows a strong positive correlation while another shows weak or negative correlation, requiring students to interpret the practical meaning of these differences.

Comparing Proportions and Percentages

Proportional comparison forms the basis of many SAT data questions. When comparing percentages across groups, students must distinguish between absolute differences and relative differences. If Group A has 40% success rate and Group B has 60% success rate, the absolute difference is 20 percentage points, but the relative difference shows Group B has 50% higher success rate than Group A (calculated as (60-40)/40 = 0.5 = 50%).

The SAT frequently tests whether students can identify which comparison method is appropriate for the question context. Questions asking "how much more" typically require absolute differences, while questions asking "what percent more" or "how many times greater" require relative comparisons. This distinction proves critical for avoiding common calculation errors.

Concept Relationships

Data comparison concepts form an interconnected web where understanding one element enhances comprehension of others. Measures of center (mean, median, mode) connect directly to measures of spread (range, IQR, standard deviation) because together they provide complete distributional information—center tells "where" the data is located, while spread tells "how dispersed" it is. This relationship becomes crucial when comparing data sets: two groups might have identical means but completely different spreads, fundamentally changing the comparison's interpretation.

Visual representations (box plots, histograms, scatterplots) serve as the bridge between raw data and statistical measures. Box plots directly display median, quartiles, and range, making them ideal for comparing distributions visually before calculating numerical measures. Histograms reveal distribution shape, which determines whether mean or median provides better comparison basis. This creates the relationship: Distribution Shape → Appropriate Measure Selection → Valid Comparison.

Two-way tables connect to proportional reasoning and conditional probability, as comparing groups within tables requires calculating percentages and ratios. This relationship extends to rate of change comparison when tables show data across time periods, creating the pathway: Categorical Data → Proportional Analysis → Trend Comparison.

The prerequisite knowledge of basic statistics, percentages, and graph reading feeds directly into all comparison concepts. Without understanding what a mean represents, students cannot meaningfully compare means across groups. Similarly, percentage fluency enables proportional comparisons, while graph literacy allows extraction of comparison-relevant information from visual displays. This creates a hierarchical structure: Basic Statistical Literacy → Single Data Set Analysis → Multi-Set Comparison → Complex Comparative Reasoning.

Quick check — test yourself on Data comparison so far.

Try Flashcards →

High-Yield Facts

When comparing data sets with outliers, median provides a more reliable comparison than mean because it resists extreme value influence

Two data sets can have identical means but vastly different standard deviations, indicating different levels of consistency

In a right-skewed distribution, mean > median; in a left-skewed distribution, mean < median—this affects which measure better represents "typical" values

When comparing percentages, distinguish between absolute difference (percentage points) and relative difference (percent change)

A larger interquartile range (IQR) indicates greater variability in the middle 50% of data, suggesting less consistent performance

  • Range is calculated as maximum minus minimum and provides the simplest measure of spread but is highly sensitive to outliers
  • When comparing box plots, the median is shown by the line inside the box, not the center of the box itself
  • Standard deviation measures average distance from the mean; comparing standard deviations reveals which data set has more variability
  • In two-way tables, calculating row percentages enables comparison of how characteristics distribute within each category
  • A steeper slope in a linear relationship indicates a faster rate of change when comparing trends across groups
  • Correlation strength (how closely points cluster around a line) differs from correlation direction (positive or negative)
  • When comparing proportions, ensure denominators are appropriate for the comparison being made (part-to-whole vs. part-to-part)

Common Misconceptions

Misconception: A higher range always means the data set has greater overall variability → Correction: Range only measures the distance between extreme values and can be influenced by a single outlier; standard deviation or IQR provide better measures of overall variability because they consider all data points or the middle 50% respectively.

Misconception: If Data Set A has a mean of 50 and Data Set B has a mean of 100, Set B's values are always larger than Set A's values → Correction: Means represent averages, not individual values; Set A could contain values ranging from 0-100 while Set B contains values from 90-110, meaning some Set A values exceed some Set B values despite the lower mean.

Misconception: When comparing percentages, a change from 20% to 40% is the same as a change from 60% to 80% because both increase by 20 percentage points → Correction: While the absolute difference is identical (20 percentage points), the relative change differs dramatically—20% to 40% represents a 100% increase, while 60% to 80% represents only a 33% increase.

Misconception: The center of a box in a box plot represents the mean → Correction: The line inside the box represents the median, not the mean; the box itself extends from Q1 to Q3, and its center has no specific statistical meaning.

Misconception: A data set with larger values always has a larger standard deviation → Correction: Standard deviation measures spread relative to the mean, not absolute magnitude; a data set with values {100, 101, 102} has smaller standard deviation than a set with values {1, 5, 10} despite having larger values.

Misconception: When comparing two groups in a two-way table, simply comparing the raw counts provides valid comparison → Correction: Raw counts can be misleading if group sizes differ; calculating percentages or proportions within each group enables valid comparison by accounting for different sample sizes.

Worked Examples

Example 1: Comparing Test Score Distributions

Problem: Two classes took the same exam. Class A had scores with a mean of 78, median of 80, and range of 45. Class B had scores with a mean of 78, median of 78, and range of 20. Which statement best describes the difference between the classes?

Solution:

Step 1: Identify what's being compared. Both classes have identical means (78), so average performance is the same.

Step 2: Analyze the median differences. Class A's median (80) exceeds its mean (78), suggesting a left-skewed distribution with some low outliers pulling the mean down. Class B's median equals its mean (78), suggesting a more symmetric distribution.

Step 3: Compare the ranges. Class A's range (45) is more than twice Class B's range (20), indicating Class A has much greater variability between highest and lowest scores.

Step 4: Synthesize the comparison. Class A shows more inconsistent performance with greater spread and likely contains some very low scores (causing the left skew), while Class B demonstrates more consistent performance with scores clustered closer together.

Answer: Class A has greater variability in performance despite identical average scores, with some students performing significantly below the mean while others perform above it. Class B shows more consistent performance with scores more tightly clustered around the mean.

Connection to Learning Objectives: This example demonstrates how comparing multiple statistical measures (mean, median, range) provides deeper insight than any single measure alone, and shows how distribution shape affects interpretation.

Example 2: Comparing Survey Results Using Two-Way Tables

Problem: A school surveyed 200 students about whether they support a new lunch policy. The results are shown below:

SupportOpposeTotal
Freshmen4555100
Seniors7030100
Total11585200

What percent more seniors support the policy compared to freshmen?

Solution:

Step 1: Calculate the percentage of freshmen who support the policy.

Freshmen support rate = 45/100 = 0.45 = 45%

Step 2: Calculate the percentage of seniors who support the policy.

Senior support rate = 70/100 = 0.70 = 70%

Step 3: Determine what type of comparison is requested. The question asks "what percent more," indicating a relative comparison is needed.

Step 4: Calculate the relative difference.

Relative difference = (70% - 45%) / 45% = 25% / 45% = 0.556 = 55.6%

Alternatively, recognize that 70% is 25 percentage points higher than 45%, and 25/45 ≈ 0.556.

Answer: Seniors support the policy approximately 55.6% more than freshmen (or about 56% more when rounded).

Common Error to Avoid: Students might incorrectly answer "25% more" by calculating only the absolute difference (70% - 45% = 25 percentage points). However, the question asks "what percent more," requiring the relative comparison: how much larger is 70% compared to 45%?

Connection to Learning Objectives: This example demonstrates the critical distinction between absolute and relative comparisons when working with proportions, and shows how to properly analyze two-way tables for group comparisons.

Exam Strategy

When approaching SAT data comparison questions, begin by identifying exactly what is being compared—measures of center, spread, proportions, or trends. Read the question carefully to determine whether it asks for absolute differences, relative differences, or qualitative comparisons. This initial classification prevents calculation errors and ensures the correct comparison method is applied.

Trigger words and phrases signal specific comparison types:

  • "How much greater/more" typically requires absolute difference
  • "What percent more/greater" or "how many times" requires relative comparison (ratio or percent change)
  • "More consistent" or "more variable" requires comparing measures of spread
  • "Typical value" suggests comparing medians, especially if outliers are present
  • "Average" specifically refers to mean
  • "Middle 50%" refers to the interquartile range

For visual comparison questions, systematically examine each element before answering. With box plots, check median position, box length (IQR), whisker length (range), and outliers in that order. With histograms, identify distribution shape first, then compare centers and spreads. With scatterplots, evaluate correlation direction and strength before comparing slopes or making predictions.

Process-of-elimination strategies prove particularly effective for data comparison questions. If a question asks which data set has greater variability, immediately eliminate answer choices that discuss center measures (mean, median) since these don't measure spread. If comparing skewed distributions, eliminate choices that claim mean and median are equal. If a question involves percentages, eliminate choices that confuse absolute and relative differences.

Time allocation: Most data comparison questions require 60-90 seconds. Simple two-set comparisons (comparing two means or medians) should take 30-45 seconds. Complex questions involving multiple calculations or visual interpretation may require up to 2 minutes. If a question requires more than 2 minutes, mark it for review and move forward—these questions often become clearer on second viewing.

Exam Tip: When comparing data from graphs or tables, write down the key values you extract before attempting calculations. This prevents re-reading the visual multiple times and reduces errors from misreading scales or labels.

Memory Techniques

CORDS - Remember what to compare in data sets:

  • Center (mean, median, mode)
  • Outliers (presence and impact)
  • Range (and IQR)
  • Distribution shape (symmetric, skewed)
  • Standard deviation (overall spread)

"Mean is MEAN to outliers" - The mean is sensitive to (affected by) outliers, while the median is resistant. This helps remember which measure to use when outliers are present.

"Right skew, mean FLEW" - In a right-skewed distribution, the mean "flew" to the right of the median (mean > median). Conversely, in a left-skewed distribution, the mean is left of the median (mean < median).

"Absolute is Addition, Relative is Ratio" - When comparing percentages or values:

  • Absolute difference uses subtraction (addition's inverse): 70% - 45% = 25 percentage points
  • Relative difference uses division (ratio): (70% - 45%) / 45% = 55.6%

Visualization for box plot comparison: Picture two boxes side-by-side. The higher box has larger values overall. The longer box has more spread in the middle 50%. The box with longer whiskers has more extreme values. The box with the line (median) closer to one edge is skewed toward the opposite edge.

"Same mean, different spread = CONSISTENCY" - When two data sets have identical means but different standard deviations or ranges, the comparison is about consistency, not average performance. The set with smaller spread is more consistent.

Summary

Data comparison on the SAT requires students to analyze and interpret differences between two or more data sets using appropriate statistical measures and reasoning. Mastery involves understanding when to compare measures of center (mean, median, mode) versus measures of spread (range, IQR, standard deviation), recognizing how distribution shape affects which measures provide meaningful comparisons, and distinguishing between absolute and relative differences when comparing proportions or percentages. Visual representations including box plots, histograms, and scatterplots enable efficient comparison of distributions, trends, and relationships across groups. Two-way tables require calculating conditional proportions to make valid group comparisons. Success on SAT data comparison questions depends on systematically identifying what is being compared, selecting appropriate statistical measures based on data characteristics, performing accurate calculations, and interpreting results in context. Students must recognize that identical measures of center can mask important differences in spread, that outliers affect different measures differently, and that the question wording determines whether absolute or relative comparison is required.

Key Takeaways

  • Data comparison questions appear in 15-20% of SAT math problems and require comparing statistical measures, distributions, or trends across multiple data sets
  • When data contains outliers or is skewed, median provides more reliable comparison than mean because it resists extreme value influence
  • Two data sets can have identical means but vastly different spreads (measured by range, IQR, or standard deviation), fundamentally changing the comparison's interpretation
  • Distinguish between absolute differences (subtraction: 70% - 45% = 25 percentage points) and relative differences (ratio: 25/45 = 55.6% more) based on question wording
  • Visual comparisons using box plots, histograms, and scatterplots require systematic examination of center, spread, shape, and outliers before drawing conclusions
  • In two-way tables, calculate percentages within each group (row or column percentages) rather than comparing raw counts to make valid comparisons across different-sized groups
  • Distribution shape determines which measure of center is most appropriate: use median for skewed distributions, mean for symmetric distributions

Sampling Methods and Bias: Understanding how data is collected affects the validity of comparisons between groups; biased sampling can create misleading differences that don't reflect true population characteristics.

Margin of Error and Confidence Intervals: These concepts extend data comparison by quantifying uncertainty in measurements and determining whether observed differences between groups are statistically meaningful or could result from random variation.

Correlation vs. Causation: When comparing relationships between variables across groups, distinguishing correlation from causation prevents incorrect conclusions about why differences exist.

Data Transformations: Understanding how operations like adding constants or multiplying by factors affect statistical measures enables prediction of how transformations change comparative relationships.

Probability and Expected Value: These concepts connect to data comparison when evaluating which of several options has better expected outcomes based on probability distributions.

Practice CTA

Now that you've mastered the core concepts of data comparison, it's time to solidify your understanding through active practice. Attempt the practice questions to apply these strategies to authentic SAT-style problems, and use the flashcards to reinforce high-yield facts and formulas. Remember, data comparison questions reward systematic thinking and careful attention to what's being compared—skills that improve dramatically with deliberate practice. Each practice problem you complete strengthens your pattern recognition and builds the confidence needed to tackle these high-frequency questions efficiently on test day. You've got this!

Key Diagrams

Ready to practice Data comparison?

Test yourself with SAT flashcards and practice questions — free on AnvayaPrep.

Frequently Asked Questions