Danielson Framework for Teaching: The AI-Powered Evaluation Guide

What Is the Danielson Framework for Teaching?

The Danielson Framework for Teaching (FFT) is the most widely adopted teacher evaluation rubric in the United States. Developed by Charlotte Danielson in 1996 and updated through multiple editions, the framework provides a common language for defining effective teaching across four domains and 22 components. More than 20 states have adopted Danielson-based evaluation systems, and individual districts in nearly every state reference the framework for professional development and accountability.

Unlike narrower rubrics that focus primarily on instruction, the Danielson Framework takes a comprehensive view of teaching practice. It recognizes that what happens before students enter the classroom (planning) and after the lesson ends (professional reflection) matters as much as the teaching itself. This holistic approach is why state education agencies from Pennsylvania to Hawaii have built their evaluation systems on the FFT foundation.

For evaluators, the Danielson Framework offers a structured, evidence-based approach to classroom observation. For teachers, it provides clear expectations and a pathway for growth. But as any principal who has scored a hundred observations knows, consistently applying the framework across evaluators, schools, and districts remains the single biggest challenge -- and the reason AI-assisted evaluation is gaining traction.

The Four Domains of the Danielson Framework

The Framework for Teaching is organized into four domains that together capture the full scope of teaching practice. Each domain contains between five and six components, for a total of 22 components that evaluators assess.

Domain 1: Planning and Preparation

Domain 1 addresses the intellectual work of teaching -- the behind-the-scenes planning that makes effective instruction possible. Its six components span knowledge of content and pedagogy, knowledge of students, setting instructional outcomes, designing coherent instruction, demonstrating knowledge of resources, and designing student assessments. Evaluators typically review lesson plans, unit plans, and assessment designs to score this domain.

Domain 2: The Classroom Environment

Domain 2 focuses on the non-instructional interactions that define a classroom culture. Its five components include creating an environment of respect and rapport, establishing a culture for learning, managing classroom procedures, managing student behavior, and organizing physical space. During an observation, evaluators listen for tone, watch transitions, and note how students interact with one another and the teacher.

Domain 3: Instruction

Domain 3 is the domain most directly observed during a classroom visit. Its five components cover communicating with students, using questioning and discussion techniques, engaging students in learning, using assessment in instruction, and demonstrating flexibility and responsiveness. This is where the quality of teacher-student dialogue, the rigor of tasks, and the pacing of the lesson are evaluated.

Domain 4: Professional Responsibilities

Domain 4 extends beyond the classroom to include reflecting on teaching, maintaining accurate records, communicating with families, participating in a professional community, growing and developing professionally, and showing professionalism. Like Domain 1, this domain is often assessed through artifacts and conversations rather than direct observation.

22 Components and the Scoring Continuum

Each of the 22 components in the Danielson Framework is scored on a four-level continuum: Unsatisfactory, Basic, Proficient, and Distinguished. The levels are not arbitrary labels -- they describe qualitatively different teaching practices.

Unsatisfactory: The teacher does not appear to understand the concepts underlying the component. Practice reflects a lack of awareness or skill.
Basic: The teacher demonstrates partial understanding. Practice is inconsistent or developing, with the teacher doing most of the intellectual work.
Proficient: The teacher clearly understands the component and implements it consistently. This is the level expected of experienced teachers and the target for most evaluation systems.
Distinguished: Teaching at this level is characterized by student ownership, seamless integration, and evidence that the teacher has created a self-sustaining learning community. Students contribute to the success of the classroom, not just the teacher.

The distinction between Proficient and Distinguished is where most inter-rater disagreements occur. Distinguished is not simply "more" of what Proficient looks like -- it requires a qualitative shift toward student-driven learning. A teacher using excellent questioning techniques is Proficient; when students begin asking equally rigorous questions of one another without teacher prompting, that is Distinguished.

This scoring complexity is precisely why inter-rater reliability has been a persistent challenge. Two well-trained evaluators can watch the same lesson and disagree on whether a component is Proficient or Distinguished because the boundary depends on judgment about student ownership and initiative -- concepts that are inherently subjective without clear evidence anchors.

10 States Using Danielson: One Framework, Many Adaptations

While the core Danielson Framework remains consistent, each state that adopts it customizes the implementation to fit local policy, collective bargaining agreements, and accountability requirements. Upraiser supports all 10 state-specific Danielson variants, each calibrated to the state's scoring rules and component emphasis.

Pennsylvania -- Act 82 (Danielson FFT)

Pennsylvania was one of the earliest adopters of the Danielson Framework under Act 82 of 2012. The state uses all four domains with component-level scoring. PA is notable for its structured observation cycle that includes pre-observation conferences, formal observations, and post-observation reflections -- all of which feed into the final evaluation rating.

Illinois -- Danielson FFT

Illinois adopted the Danielson Framework statewide, making it the default evaluation instrument for most districts. IL uses the full 22-component framework with flexibility for districts to weight domains differently based on local priorities. The Illinois State Board of Education provides training modules aligned to the FFT rubric.

Wisconsin -- Danielson FFT (2022 Edition)

Wisconsin updated its Danielson implementation in 2022 to align with the latest edition of the framework, which includes revised component descriptors and updated language around equity and culturally responsive teaching. WI districts use the Educator Effectiveness system built on the FFT.

Kentucky -- PGES (Danielson)

Kentucky's Professional Growth and Effectiveness System (PGES) uses the Danielson Framework as its observation rubric. KY evaluators score Domains 2 and 3 during classroom observations and assess Domains 1 and 4 through professional practice reviews and student growth data.

Maryland -- Danielson FFT

Maryland uses the Danielson Framework across its districts with a strong emphasis on evidence collection during observations. MD evaluators are expected to script detailed evidence notes that map to specific components -- a practice that makes AI-assisted evidence tagging particularly valuable.

Delaware -- DPAS II (Danielson)

Delaware's Performance Appraisal System II (DPAS II) is built on the Danielson Framework. DE uses a formative-summative cycle where teachers receive mid-year feedback and a final summative rating. The system includes specific weighting of components for different career stages.

Hawaii -- EES (Danielson)

Hawaii's Educator Effectiveness System (EES) applies the Danielson Framework across all public schools in the state. HI's implementation includes connections between observation scores and professional development planning, creating a growth-oriented evaluation process.

New Mexico -- Elevate NM (Danielson)

New Mexico's Elevate NM evaluation system uses the Danielson Framework with state-specific adaptations that emphasize culturally and linguistically responsive instruction. NM's implementation reflects the state's diverse student population and multilingual teaching contexts.

Idaho and South Dakota -- Danielson FFT

Both Idaho and South Dakota use the standard Danielson Framework with state-level guidance on observation protocols. These states provide evaluator training and certification requirements aligned to the FFT, ensuring that administrators are calibrated before conducting evaluations.

The Inter-Rater Reliability Problem

Inter-rater reliability -- the degree to which different evaluators assign the same score to the same teaching performance -- is the Achilles' heel of the Danielson Framework. Research consistently shows that even well-trained evaluators disagree on component-level scores roughly 30-40% of the time, particularly at the Proficient-Distinguished boundary.

The reasons for this inconsistency are well documented. First, the Danielson rubric uses qualitative descriptors rather than quantitative thresholds. "Students contribute to the success of the classroom" (Distinguished) versus "the teacher creates a successful classroom" (Proficient) requires judgment calls about the source of classroom success. Second, evaluators bring different mental models of what "student ownership" looks like across grade levels, content areas, and school contexts. A Distinguished kindergarten classroom looks fundamentally different from a Distinguished AP Physics classroom.

Third, evaluator training varies dramatically. Some districts invest in multi-day calibration sessions with video anchors; others provide a half-day overview and hand evaluators the rubric. Without ongoing calibration, evaluators drift -- their internal standards shift based on the range of teaching they observe, creating a "local norm" problem where the same teaching performance receives different scores in different buildings.

For teachers, this inconsistency undermines trust in the evaluation system. When a teacher rated Distinguished by one evaluator is rated Proficient by another, the feedback feels arbitrary rather than developmental. For districts, it creates legal and contractual vulnerabilities, especially when evaluation scores are tied to retention, tenure, or compensation decisions.

How AI-Assisted Scoring Addresses Consistency

AI-assisted evaluation does not replace the evaluator's judgment. Instead, it provides a calibrated reference point that anchors the scoring process in evidence. Here is how the approach works in practice with Upraiser.

Consistent Rubric Interpretation

The most fundamental advantage of AI scoring is that it applies the same rubric interpretation to every observation. When an AI model is trained on the Danielson Framework's component descriptors, it does not drift over time, develop biases toward certain teaching styles, or recalibrate based on the last five classrooms it visited. The rubric criteria for 3b (Using Questioning and Discussion Techniques) are applied identically whether it is the first observation of the year or the hundredth.

Evidence-Based Citations

Upraiser's AI does not simply output a score. For every component it evaluates, it provides specific transcript citations -- the exact teacher and student dialogue that supports the score. An evaluator can see, for example, that the AI scored Component 3b as Proficient because "the teacher asked 14 open-ended questions during the 45-minute observation, but student-to-student dialogue was limited to two brief exchanges." This evidence chain makes scores auditable and defensible.

Draft Scores as Calibration Anchors

Upraiser presents AI-generated scores as draft recommendations, not final ratings. The evaluator reviews the evidence, adjusts scores where their professional judgment differs, and submits the final evaluation. This workflow has a powerful calibration effect: when an evaluator consistently overrides the AI in one direction on a specific component, it signals a potential drift that can be addressed through targeted recalibration.

Full Transcript Analysis

Human evaluators take notes during observations, but even the most skilled scribe captures only a fraction of what happens in a classroom. Upraiser transcribes the entire observation, giving the AI access to every teacher-student interaction. This means the AI can identify patterns that a human observer might miss: the ratio of teacher talk to student talk, the distribution of high-order versus low-order questions, and whether specific students dominated classroom discourse.

Image Analysis for Visual Evidence

Beyond audio, Upraiser can analyze images captured during observations -- student work displays, classroom arrangements, anchor charts, learning objectives posted on the board. These visual artifacts provide evidence for components like 2e (Organizing Physical Space), 1e (Designing Coherent Instruction), and 3c (Engaging Students in Learning) that are difficult to assess from audio alone.

How Upraiser Handles State-Specific Danielson Variations

Although every Danielson-based system shares the same four-domain, 22-component structure, the differences between state implementations are significant enough that a one-size-fits-all approach does not work. Upraiser maintains separate rubric configurations for each state.

Scoring Level Labels

Some states use Danielson's original labels (Unsatisfactory, Basic, Proficient, Distinguished). Others have renamed them -- Delaware's DPAS II uses "Needs Improvement" instead of "Basic," and some states use numerical scales alongside the qualitative labels. Upraiser's state configurations map each state's labels to the appropriate scoring levels so that reports and feedback use the terminology evaluators and teachers expect.

Component Weighting and Selection

Not all states evaluate all 22 components in every observation. Kentucky's PGES focuses observations on Domains 2 and 3, while Domains 1 and 4 are assessed through professional practice portfolios. Wisconsin's 2022 update added emphasis on equity-related indicators within existing components. Upraiser's AI evaluates only the components required by each state's protocol, ensuring that scores are contextually appropriate.

Summative Rating Calculations

States differ in how they aggregate component scores into a final summative rating. Pennsylvania uses a weighted average across domains with specific thresholds for each rating level. Illinois allows districts to set their own aggregation methods within state guidelines. Upraiser generates component-level scores and presents them in the state's required format, making it straightforward for evaluators to complete their district's summative forms.

Cultural and Contextual Adaptations

New Mexico's Elevate NM framework includes specific indicators for culturally and linguistically responsive instruction that are not present in the standard Danielson FFT. Hawaii's EES connects evaluation to the state's unique geographic and cultural context. Upraiser's state-specific rubric prompts incorporate these adaptations so that the AI's scoring reflects each state's values and priorities, not a generic interpretation of the framework.

The Danielson Evaluation Workflow with Upraiser

Understanding the technology is one thing; seeing how it fits into an evaluator's daily work is another. Here is how a typical Danielson-based evaluation flows through Upraiser.

Capture: The evaluator opens Upraiser on a phone or tablet and starts recording during the classroom observation. They can snap photos of student work, anchor charts, or the physical layout. They can also tag notes with specific Danielson domains in real time.
Transcription: After the observation, the audio is sent to Upraiser's transcription engine (powered by AssemblyAI). The full transcript is generated within minutes, with word-level timestamps for playback synchronization.
AI Scoring: The transcript and any captured images are analyzed against the state-specific Danielson rubric. The AI generates draft scores for each applicable component, with transcript citations and evidence justifications.
Evaluator Review: The evaluator reviews the AI's draft scores alongside the evidence. They can adjust any score, add their own notes, and finalize the evaluation. The AI's citations make it easy to confirm or challenge each score.
Feedback Delivery: The completed evaluation is structured by domain and component, with clear evidence for each score. Teachers see exactly what was observed and how it maps to the Danielson rubric, making post-observation conferences more productive and evidence-based.

The entire process -- from classroom observation to completed evaluation -- takes a fraction of the time required for manual rubric scoring. Evaluators report saving 45-60 minutes per evaluation, which adds up quickly when a principal is responsible for 20-30 evaluations per year.

Getting Started with AI-Powered Danielson Evaluation

Whether your state uses Pennsylvania's Act 82 system, Kentucky's PGES, Delaware's DPAS II, or any of the other eight Danielson-based frameworks that Upraiser supports, the path to more consistent evaluation starts with the same step: seeing the AI in action on your state's specific rubric.

Upraiser is designed for evaluators who already know the Danielson Framework. The AI is not a replacement for evaluator expertise -- it is a tool that amplifies that expertise by ensuring every observation is scored against the same rubric interpretation, with evidence citations that make scores defensible and feedback actionable.

Districts currently using Upraiser for Danielson-based evaluation report three consistent outcomes: faster evaluation completion times, higher inter-rater agreement across their evaluator teams, and more productive post-observation conferences because both the teacher and evaluator have access to the same transcript evidence.

Request a demo to see how Upraiser handles your state's specific Danielson variant -- with real transcript evidence, component-level scoring, and the consistency that 22 components demand.

See Danielson scoring that's actually consistent

Watch Upraiser evaluate a classroom observation against the Framework for Teaching -- with evidence citations for every component, every time.

Request a Danielson Demo

← All articles

What Is the Danielson Framework for Teaching?

The Four Domains of the Danielson Framework