Table of Contents





1Sue C. Wortham

1. An Overview of Assessment in Early Childhood


7Sue C. Wortham

2. How Infants and Young Children Should be Assessed


35Sue C. Wortham

3. How Standardized Tests Are Used, Designed, and Selected


61Sue C. Wortham

4. Using and Reporting Standardized Test Results


91Sue C. Wortham

5. Observation


123Sue C. Wortham

6. Checklists, Rating Scales, and Rubrics


163Sue C. Wortham

7. Teacher-Designed Strategies


201Sue C. Wortham

8. Performance-Based Strategies


231Sue C. Wortham

9. Portfolio Assessment


261Sue C. Wortham

10. Communicating with Families


297Sue C. Wortham





achievement test A test that measures the extent to which a person has acquired information or mastered certain skills, usually as a result of instruction or training.

alternative assessment An assessment that is different from traditional written or multiple-choice tests. Usually related to authentic and performance assessments.

alternative-form reliability The correlation between results on alternative forms of a test. Reliability is the extent to which the two forms are consistent in measuring the same attributes.

analytic rubric A rubric that provides diag- nostic feedback and is more specific than a holistic rubric.

anecdotal record A written description of an incident in a child’s behavior that can be significant in understanding the child.

aptitude test A test designed to predict future learning or performance on some task if appropriate education or training is provided.

arena assessment An assessment process whereby a group of specialists in develop- mental disabilities observes a child in natural play and working situations. A profile of the child is developed by the group, comparing their individual observations of some facet of the child’s behaviors.

assessment software Software that has been developed to enable children to be assessed using a computer. Textbook pub- lishers and developers of early childhood assessment tools make assessment

software available as an option alongside traditional assessment tools.

attitude measure An instrument that mea- sures how an individual is predisposed to feel or think about something (a referent). A teacher can design a scale to measure students’ attitudes toward reading or mathematics.

authentic achievement Learning that is real and meaningful. Achievement that is worthwhile.

authentic assessment An assessment that uses some type of performance by a child to demonstrate understanding.

authentic performance assessment See authentic assessment.

behavioral objective An educational or instructional statement that includes the behavior to be exhibited, the conditions under which the behavior will be exhibited, and the level of performance required for mastery.

checklist A sequence or hierarchy of concepts and/or skills organized in a format that can be used to plan instruction and keep records.

concurrent validity The extent to which test scores on two forms of a test measure are correlated when they are given at the same time.

construct validity The extent to which a test measures a psychological trait or con- struct. Tests of personality, verbal ability, and critical thinking are examples of tests with construct validity.

From Glossary of Assessment in Early Childhood Education, 6/e. Sue C. Wortham. Copyright © 2012 by Pearson Education. All rights reserved.


content validity The extent to which the content of a test such as an achievement test represents the objectives of the instruc- tional program it is designed to measure.

contract An agreement between teacher and child about activities the child will complete to achieve a specific objective or purpose.

correctives Instructional materials and methods used with mastery learning that are implemented after formative evaluation to provide alternative learning strategies and resources.

criterion-referenced test A test designed to provide information on specific knowledge or skills possessed by a student. The test measures specific skills or instruc- tional objectives.

criterion-related validity To establish validity of a test, scores are correlated with an external criterion, such as another established test of the same type.

developmental checklist A checklist that emphasizes areas and levels of development in early childhood.

developmental rubric A rubric that is orga- nized using domains of development.

developmental screening Evaluation of the young child to determine whether development is proceeding normally. It is used to identify children whose develop- ment is delayed.

diagnostic evaluation An evaluation to analyze an individual’s areas of weaknesses or strengths and to determine the nature and causes of the weaknesses.

diagnostic interview An interview to deter- mine a child’s learning needs or assess weaknesses. May be part of a diagnostic evaluation.

directed assignment A specific assignment to assess a child’s performance on a learning objective or skill.

direct performance measure A performance measure that requires the student to apply knowledge in an activity specified by the teacher.

documentation A process of documenting information about progress of project activities and recording information about

children’s interests, ideas, thinking, and problem solving within their activities.

electronic management of learning (EML) Resources available to early childhood programs for instructional experiences using the computer. The materials can include creative, skill development, and assessment software.

enrichment activity In the context of mastery learning, a challenging activity at a higher cognitive level on Bloom’s taxonomy than the instructional objective described on a table of specifications.

equivalent forms Alternative forms of a test that are parallel. The forms of the test measure the same domain or objectives, have the same format, and are of equal difficulty.

event sampling An observation strategy used to determine when a particular behavior is likely to occur. The setting in which the behavior occurs is more impor- tant than the time it is likely to occur.

formative assessment An assessment designed to measure progress on an objec- tive rather than to give a qualitative result.

formative evaluation Evaluation conducted during instruction to provide the teacher with information on the learning progress of the student and the effectiveness of instructional methods and materials.

formative test A test designed to evaluate progress on specific learning objectives or a unit of study.

game In the context of authentic assessment, a structured assessment whereby the student’s performance progress is evaluated through engagement with the game.

grade equivalent The grade level for which a given score on a standardized test is the estimated average. Grade-equivalent scores, commonly used for elementary achievement tests, are expressed in terms of the grade and month.

grade norms Norms on standardized tests based on the performance of students in given grades.



graphic rating scale A rating scale that can be used as a continuum. The rater marks characteristics by descriptors on the scale at any point along the continuum.

group test A test that can be administered to more than one person at a time.

holistic rubric A rubric with competency levels that indicate levels of performance. It assigns a single score to a student’s performance.

inclusion The process of including children with disabilities into a classroom where they would have been placed if they had not experienced a disability.

indirect performance measure A measure that assesses what a student knows about a topic. The teacher’s assessment is accom- plished by observing a student activity or examining a written test.

individualized instruction Instruction based on the learning needs of individual students. It may be based on criterion- related evaluation or diagnosis.

individual test A test that can be adminis- tered to only one person at a time. Many early childhood tests are individual tests because of the low maturity level of the examinees.

informal test A test that has not been standardized. Teacher-designed tests are an example.

instructional objective See behavioral objective.

integration Facilitating the participation of children with disabilities into the classroom with peers who do not have disabilities. The child is integrated with other children, and the needs of all children are met without treating some children as “special.”

intelligence quotient (IQ) An index of intelligence expressed as the ratio of men- tal age to chronological age. It is derived from an individual’s performance on an intelligence test as compared with that of others of the same age.

intelligence test A test measuring developed abilities that are considered signs of intelligence. Intelligence is general potential independent of prior learning.

interest inventory A measure used to deter- mine interest in an occupation or vocation. Students’ interest in reading may be determined by such an inventory.

internal consistency The degree of relationship among items on a test. A type of reliability that indicates whether items on the test are positively correlated and measure the same trait or characteristic.

interview A discussion that the teacher con- ducts with a child to make an assessment.

item analysis The analysis of single test items to determine their difficulty value and discriminating power. Item analysis is conducted in the process of developing a standardized test.

learning disability A developmental difference or delay in a young or school-age child that interferes with the individual’s ability to learn through regular methods of instruction.

mainstreaming A process of placing chil- dren with disabilities into regular classrooms for part of the school day with children who do not have disabilities. Mainstreaming is being replaced by inclu- sion or integration, in which the child with disabilities is not singled out as being different.

mastery testing Evaluation to determine the extent to which a test taker has mastered particular skills or learning objectives. Performance is compared to a predetermined standard of proficiency.

mean The arithmetic average of a set of test scores.

minimum-competency testing Evaluation to measure whether test takers have achieved a minimum level of proficiency in a given academic area.

multiple choice A type of test question in which the test taker must choose the best answer from among several options.

narrative report An alternative to report cards for reporting a child’s progress. The teacher writes a narrative to describe the child’s growth and accomplishments.

neonatologist A physician who specializes in babies less than 1 month old.



normal distribution The hypothetical dis- tribution of scores that has a bell-shaped appearance. This distribution is used as a model for many scoring systems and test statistics.

norm-referenced test A test in which the test taker’s performance is compared with the performance of people in a norm group.

norms Statistics that supply a frame of reference based on the actual performance of test takers in a norm group. A set of scores that represents the distribution of test performance in the norm group.

numerical rating scale A series of numerals, such as 1 to 5, that allows an observer to indicate the degree to which an individual possesses a particular characteristic.

obstetrician A physician who specializes in pregnancy and childbirth.

pediatrician A physician who specializes in the development, care, and diseases of young children.

percentile A point or score in a distribution at or below which falls the percentage of cases indicated by the percentile. The score scale on a normal distribution is divided into 100 segments, each containing the same number of scores.

percentile rank The test taker’s test score, as expressed in terms of its position within a group of 100 scores. The percentile rank is the percentage of scores equal to or lower than the test taker’s score.

performance assessment An assessment in which the child demonstrates knowledge by applying it to a task or a problem-solving activity.

performance-based assessment An assessment of development and/or learning that is based on the child’s natural performance, rather than on contrived tests or tasks.

personality test A test designed to obtain information on the affective characteristics of an individual (emotional, motivational, or attitudinal). The test measures psychological makeup rather than intellectual abilities.

play-based assessment Assessment often used for children with disabilities that is conducted through observation in play environments. Play activities can be spon- taneous or planned. Play-based assessment can be conducted by an individual or through arena assessment.

portfolio A format for conducting an evaluation of a child. Portfolios are a collec- tion of a child’s work, teacher assessments, and other information that contribute to a picture of the child’s progress.

preassessment An assessment conducted before the beginning of the school year or prior to any instruction at the beginning of the school year.

project An authentic learning activity that can also be used to demonstrate student achievement.

rating scale A scale using categories that allow the observer to indicate the degree of a characteristic that the person possesses.

raw score The number of right answers a test taker obtains on a test.

reliability The extent to which a test is con- sistent in measuring over time what it is designed to measure.

rubric An instrument developed to measure authentic and performance assessments. Descriptions are given for qualitative charac- teristics on a scale.

running record A description of a sequence of events in a child’s behavior that includes all behaviors observed over a period of time.

scope (sequence of skills) A list of learning objectives established for areas of learning and development at a particular age, grade level, or content area.

specimen record Detailed observational reports of children’s behavior over a period of time that are used for research purposes.

split-half reliability A measure of reliability whereby scores on equivalent sections of a single test are correlated for internal consistency.

standard deviation A measure of the varia- bility of a distribution of scores around the mean.



standard error of measurement An esti- mate of the possible magnitude of error present in test scores.

standardized test A test that has specified content, procedures for administration and scoring, and normative data for inter- preting scores.

standard score A transformed score that reports performance in terms of the num- ber of standard deviation units the raw score is from the mean.

stanine A scale on the normal curve divided into nine sections, with all divisions except the first and the last being 0.5 standard deviation wide.

structured interview A planned interview conducted by the teacher for assessment purposes.

structured performance assessment A performance assessment that has been planned by the teacher to include specific tasks or activities.

summative assessment A final assessment to assign a grade or determine mastery of an objective. Similar to summative evaluation.

summative evaluation An evaluation obtained at the end of a cycle of instruction to determine whether students have mastered the objectives and whether the instruction has been effective.

summative test A test to determine mastery of learning objectives administered for grading purposes.

T score A standard score scale with a mean of 50 and a standard deviation of 10.

table of specifications A table of curriculum objectives that have been analyzed to determine to what level of Bloom’s taxonomy of educational objectives the student must demonstrate mastery.

test–retest reliability A type of reliability obtained by administering the same test a second time after a short interval and then correlating the two sets of scores.

time sampling Observation to determine the frequency of a behavior. The observer records how many times the behavior occurs during uniform time periods.

true score A hypothetical score on a test that is free of error. Because no standardi- zed test is free of measurement error, a true score can never be obtained.

unstructured interview An assessment interview conducted by the teacher as the result of a naturally occurring perfor- mance by a child. The interview is not planned.

unstructured performance assessment An assessment that is part of regular classroom activities.

validity The degree to which a test serves the purpose for which it is to be used.

work sample An example of a child’s work. Work samples include products of all types of activities that can be used to evaluate the child’s progress.

Z score A standard score that expresses performance in terms of the number of standard deviations from the mean.




An Overview of Assessment in Early Childhood

Chapter Objectives

As a result of reading this chapter, you will be able to

1. Understand the purposes of assessment in early childhood 2. Understand different meanings of the term assessment 3. Understand the history of tests and measurements in early childhood 4. Develop an awareness of issues in testing young children

From Chapter 1 of Assessment in Early Childhood Education, 6/e. Sue C. Wortham. Copyright © 2012 by Pearson Education. All rights reserved.

Image 100


U n d e r s t a n d i n g A s s e s s m e n t i n I n f a n c y a n d E a r l y C h i l d h o o d

Not too long ago, resources on early childhood assessment were limited to occa- sional articles in journals, chapters in textbooks on teaching in early childhood pro- grams, and a few small textbooks that were used as secondary texts in an early childhood education course. Very few teacher preparation programs offered a course devoted to assessment in early childhood. Now, in the 21st century, assessment of very young children has experienced a period of very rapid growth and expansion. In fact, it has been described as a “virtual explosion of testing in public schools” (Meisels & Atkins-Burnett, 2005, p. 1).

There has also been an explosion in the numbers of infants, toddlers, and preschoolers in early childhood programs and the types of programs that serve them. Moreover, the diversity among these young children increases each year. Currently, Head Start programs serve children and families who speak at least 140 different languages. In some Head Start classrooms, ten different languages might be used. Head Start teaching teams may also be multilingual, also representing diversity (David, 2005).

What Is Assessment? What do we need to know about all these diverse children with all kinds of families, cultures, and languages? The study of individuals for measurement purposes begins before birth with assessment of fetal growth and development. At birth and throughout infancy and early childhood, various methods of measurement are used to evaluate the child’s growth and development. Before a young child enters a preschool program, he or she is measured through med- ical examinations. Children are also measured through observations of develop- mental milestones, such as saying the first word or walking independently, by parents and other family members. Children might also be screened or evalu- ated for an early childhood program or service. Assessment is really a process. A current definition describes the assessment process: “Assessment is the process of gathering information about children from several forms of evi- dence, then organizing and interpreting that information” (McAfee, Leong, & Bodrova, 2004, p. 3).

Assessment of children from birth through the preschool years is different from assessment of older people. Not only can young children not write or read, but also the young developing child presents different challenges that influence the choice of measurement strategy, or how to measure or assess the child. Assessment methods must be matched with the level of mental, social, and physical develop- ment at each stage. Developmental change in young children is rapid, and there is a need to assess whether development is progressing normally. If development is not normal, the measurement and evaluation procedures used are important in making decisions regarding appropriate intervention services during infancy and the preschool years.

An Overview of Assessment in Early Childhood


Purposes of Assessment Assessment is used for various purposes. We may want to learn about individual chil- dren. We may conduct an evaluation to assess a young child’s development in language or mathematics. When we need to learn more, we may assess the child by asking her or him to describe what she or he has achieved. For example, a first-grade teacher may use measurement techniques to determine what reading skills have been mastered and what weaknesses exist that indicate a need for additional instruction.

Assessment strategies may be used for diagnosis. Just as a medical doctor conducts a physical examination of a child to diagnose an illness, psychologists, teachers, and other adults who work with children can conduct an informal or formal assessment to diagnose a developmental delay or identify causes for poor performance in learning.

If medical problems, birth defects, or developmental delays in motor, language, cognitive, or social development are discovered during the early, critical periods of development, steps can be taken to correct, minimize, or remediate them before the child enters school. For many developmental deficits or differences, the earlier they are detected and the earlier intervention is planned, the more likely the child will be able to overcome them or compensate for them. For example, if a serious hear- ing deficit is identified early, the child can learn other methods of communicating and acquiring information.

Assessment of young children is also used for placement—to place them in infant or early childhood programs or to provide special services. To ensure that a child receives the best services, careful screening and more extensive testing may be conducted before selecting the combination of intervention programs and other services that will best serve the child.

Program planning is another purpose of assessment. After children have been identified and evaluated for an intervention program or service, assessment results can be used in planning the programs that will serve them. These programs, in turn, can be evaluated to determine their effectiveness.

Besides identifying and correcting developmental problems, assessment of very young children is conducted for other purposes. One purpose is research. Researchers study young children to better understand their behavior or to measure the appro- priateness of the experiences that are provided for them.

The National Early Childhood Assessment Resource Group summarized the purposes for appropriate uses of assessment in the early childhood years as follows:

Purpose 1: Assessing to promote children’s learning and development Purpose 2: Identifying children for health and social services Purpose 3: Monitoring trends and evaluating programs and services Purpose 4: Assessing academic achievement to hold individual students, teachers,

and schools accountable (Shepard, Kagan, Lynn, & Wurtz, 1998). (See Figure 2-1.)

How were these assessment strategies developed? In the next section, I describe how certain movements or factors, especially during the past century, have affected the development of testing instruments, procedures, and other measurement tech- niques that are used with infants and young children.

An Overview of Assessment in Early Childhood


T h e E v o l u t i o n o f A s s e s s m e n t o f Y o u n g C h i l d r e n

Interest in studying young children to understand their growth and development dates back to the initial recognition of childhood as a separate period in the life cycle. Johann Pestalozzi, a pioneer in developing educational programs specifically for children, wrote about the development of his 31/2-year-old son in 1774 (Irwin & Bushnell, 1980). Early publications also reflected concern for the proper upbringing and education of young children. Some Thoughts Concerning Education by John Locke (1699), Emile (Rousseau, 1762/1911), and Frederick Froebel’s Education of Man (1896) were influential in focusing attention on the characteristics and needs of children in the 18th and 19th centuries. Rousseau believed that human nature was essentially good and that education must allow that goodness to unfold. He stated that more attention should be given to studying the child so that education could be adapted to meet individual needs (Weber, 1984). The study of children, as advocated by Rousseau, did not begin until the late 19th and early 20th centuries.

Scientists throughout the world used observation to measure human behaviors. Ivan Pavlov proposed a theory of conditioning to change behaviors. Alfred Binet devel- oped the concept of a normal mental age by studying memory, attention, and intel- ligence in children. Binet and Theophile Simon developed an intelligence scale to determine mental age that made it possible to differentiate the abilities of individual

Early Intervention for a Child with Hearing Impairment

J ulio, who is 2 years old, was born prematurely. He did not have regular checkupsduring his first year, but his mother took him to a community clinic when he had a cold and fever at about 9 months of age. When the doctor noticed that Julio did not

react to normal sounds in the examining room, she stood behind him and clapped her

hands near each ear. Because Julio did not turn toward the clapping sounds, the doctor

suspected that he had a hearing loss. She arranged for Julio to be examined by an

audiologist at an eye, ear, nose, and throat clinic.

Julio was found to have a significant hearing loss in both ears. He was fitted with

hearing aids and is attending a special program twice a week for children with hearing

deficits. Therapists in the program are teaching Julio to speak. They are also teaching

his mother how to make Julio aware of his surroundings and help him to develop a

vocabulary. Had Julio not received intervention services at an early age, he might have

entered school with severe cognitive and learning deficits that would have put him at a

higher risk for failing to learn.

An Overview of Assessment in Early Childhood


children (Weber, 1984). American psychologists expanded these early efforts, devel- oping instruments for various types of measurement.

The study and measurement of young children today has evolved from the child study movement, the development of standardized tests, Head Start and other federal programs first funded in the 1960s, and the passage of Public Law 94-142 (the Individuals with Disabilities Education Act) and Public Law 99-457 (an expansion of PL 94-142 to include infants). Currently, there is a movement toward more meaningful learning or authentic achievement and assessment (Newmann, 1996; Wiggins, 1993). At the same time, continuing progress is being made in identifying, diagnosing, and providing more appropriate intervention for infants and young children with disabilities (Meisels & Fenichel, 1996).

The Child Study Movement G. Stanley Hall, Charles Darwin, and Lawrence Frank were leaders in the develop- ment of …