Façade Perception Research

Integrated Multimodal Analysis

Building Façade
Perception Research

A comprehensive eye-tracking study examining how people perceive and describe building façades, integrating Weighted Voronoi tessellation, group attention heatmaps, AI-based AOI segmentation, and verbal description analysis.

52
Participants
3
Experiments
1,851+
Voronoi Images
403
Group Heatmaps

Abstract

This study investigates visual perception of building façades through a multimodal approach combining eye-tracking, verbal descriptions, and similarity judgements. Fifty-two participants viewed line-drawing façade stimuli across three task conditions (Preview, Response, Compare). Analysis reveals that architectural elements occupy highly disproportionate attention: windows (7% area → 30% attention) and decorative features (12% area → 43% attention) are systematically over-attended, whilst walls (63% area → 29% attention) are under-attended. Task type significantly modulates gaze strategy, with Response tasks eliciting concentrated attention (TCI = 0.234) and Compare tasks producing distributed scanning (TCI = 0.180). Cross-modal analysis identifies nine significant correlations between gaze metrics and verbal features, suggesting that attention allocation directly shapes descriptive content.

Section I

Research Methodology

This study employed a within-subjects experimental design in which 52 participants viewed line-drawing façade stimuli across three distinct task conditions. Eye movements were recorded using Tobii Pro eye-tracking technology at 60 Hz, capturing fixation coordinates, durations, and saccade patterns throughout each trial.

The analytical pipeline integrates four complementary modalities: Time-Weighted Voronoi tessellation for individual attention mapping, group-level attention heatmaps for aggregate fixation density, AI-based AOI segmentation using GPT-4 Vision for architectural element classification, and verbal description analysis for linguistic feature extraction.

Preview

10 sec

Initial scanning of 3×3 building grid to form first impressions

Response

35 sec

Detailed verbal description of individual building façade features

Compare

10 sec

Side-by-side comparison and similarity judgement of building pairs

Distribution of key Voronoi metrics (fixation count, mean duration, TCI, gaze dispersion) across all task types, showing significant variation between experimental conditions.
Figure 1. Distribution of key Voronoi metrics (fixation count, mean duration, TCI, gaze dispersion) across all task types, showing significant variation between experimental conditions.
Correlation matrix of Voronoi metrics. Strong positive correlation between fixation count and total duration (r = 0.92); negative correlation between TCI and gaze dispersion (r = −0.68).
Figure 2. Correlation matrix of Voronoi metrics. Strong positive correlation between fixation count and total duration (r = 0.92); negative correlation between TCI and gaze dispersion (r = −0.68).

Section II

Weighted Voronoi Tessellation

Time-Weighted Voronoi diagrams partition the visual field into regions proportional to fixation duration. Each fixation point generates a Voronoi cell whose size is inversely proportional to dwell time, yielding the Temporal Concentration Index (TCI) as a summary metric. Over 1,851 individual Voronoi images were generated across all participants and experiments.

Conceptual illustration of Weighted Voronoi tessellation overlaid on a building façade, with cell colours indicating attention weight.
Figure 3. Conceptual illustration of Weighted Voronoi tessellation overlaid on a building façade, with cell colours indicating attention weight.
User 1, Preview task. Broad cells indicate rapid exploratory scanning.
Figure 4a. User 1, Preview task. Broad cells indicate rapid exploratory scanning.
User 1, Response task. Dense cells around windows show concentrated attention.
Figure 4b. User 1, Response task. Dense cells around windows show concentrated attention.
User 1, Compare task. Medium cells reflect strategic comparison scanning.
Figure 4c. User 1, Compare task. Medium cells reflect strategic comparison scanning.

Task-Type Comparison

ResponseComparePreview020406080Fixations00.060.120.180.24TCI
  • Mean Fixations
  • Mean TCI

Figure 5. Mean fixation count and TCI by task type. All differences significant at p < 0.001 (Kruskal-Wallis).

Comprehensive task-type comparison across all Voronoi metrics with statistical significance indicators.
Figure 6. Comprehensive task-type comparison across all Voronoi metrics with statistical significance indicators.
TCI distribution across experiments showing consistent patterns of attention concentration.
Figure 7. TCI distribution across experiments showing consistent patterns of attention concentration.

Section III

Group Attention Heatmaps

Group heatmaps aggregate fixation data across all participants to reveal collective attention patterns. A total of 403 group heatmaps were generated across three experiments. Each pixel intensity represents cumulative dwell time, normalised across the participant pool, providing immediate visual confirmation of which façade elements attract the most attention.

Illustrative group attention heatmap showing fixation density on a classical façade. Hot regions (red–yellow) indicate concentrated collective attention.
Figure 8. Illustrative group attention heatmap showing fixation density on a classical façade. Hot regions (red–yellow) indicate concentrated collective attention.

Experiment 1 — Response Tasks

Response 1: Central window zone attracts peak attention.
Figure 9a. Response 1: Central window zone attracts peak attention.
Response 3: Distributed pattern across decorative elements.
Figure 9b. Response 3: Distributed pattern across decorative elements.
Response 5: Strong focus on upper façade and roofline.
Figure 9c. Response 5: Strong focus on upper façade and roofline.

Experiment 1 — Compare Task

Compare 1: Attention alternates between left and right buildings, with fixation hotspots on distinguishing features.
Figure 10. Compare 1: Attention alternates between left and right buildings, with fixation hotspots on distinguishing features.

Key Observations

01 — Central bias is consistently observed, with fixation density peaking at the geometric centre of each façade.

02 — Windows and decorative elements act as primary attention attractors, receiving disproportionate fixation relative to their spatial extent.

03 — Response tasks produce more concentrated heatmaps than Preview tasks, reflecting deeper engagement during verbal description.


Section IV

AOI Segmentation Analysis

Area of Interest (AOI) analysis classifies each façade into architectural elements—walls, windows, roof, entrance, and decorative features—to quantify attention distribution across functional building components. The segmentation employs GPT-4 Vision for intelligent grid-based classification, producing element maps for each stimulus image. A total of 216 segmentation maps were generated.

AI-based AOI segmentation showing colour-coded architectural elements: windows (steel blue), walls (warm grey), roof (terracotta), decorative (gold), entrance (olive).
Figure 12. AI-based AOI segmentation showing colour-coded architectural elements: windows (steel blue), walls (warm grey), roof (terracotta), decorative (gold), entrance (olive).
Exp 1, Building 1: Grid-based classification identifies distinct zones.
Figure 13a. Exp 1, Building 1: Grid-based classification identifies distinct zones.
Exp 1, Building 3: Different architectural style yields different segmentation.
Figure 13b. Exp 1, Building 3: Different architectural style yields different segmentation.
Exp 2, Building 1: Segmentation adapts to varied façade compositions.
Figure 13c. Exp 2, Building 1: Segmentation adapts to varied façade compositions.

Attention Efficiency

Figure 14a. Attention distribution by architectural element.

020406080Area %015304560Attention %

Figure 14b. Area vs. attention scatter. Points above diagonal indicate over-representation.

Fixation distribution by architectural element across all three experiments, confirming consistent over-attention to windows and decorative features.
Figure 15. Fixation distribution by architectural element across all three experiments, confirming consistent over-attention to windows and decorative features.
Attention efficiency scatter plot showing the relationship between element area and fixation proportion.
Figure 16. Attention efficiency scatter plot showing the relationship between element area and fixation proportion.
Mean TCI by architectural element and task type. Decorative elements show the highest TCI during Response tasks.
Figure 17. Mean TCI by architectural element and task type. Decorative elements show the highest TCI during Response tasks.
Response vs. Compare attention patterns. Response tasks show more concentrated attention on specific elements.
Figure 18. Response vs. Compare attention patterns. Response tasks show more concentrated attention on specific elements.

Section V

Verbal Description Analysis

Participants provided verbal descriptions during Response tasks. These were transcribed and analysed for architectural feature mentions. The average description contained 5.97 distinct features, with windows and storey count being the most frequently mentioned attributes across all experiments.

WindowsStoreysRoofShapeColourEntranceMaterialSymmetry0800160024003200Mention Count

Figure 19. Architectural feature mention frequency. Windows (3,047) and storeys (2,216) dominate.

Feature distribution heatmap across experiments and participants.
Figure 20. Feature distribution heatmap across experiments and participants.
Experiment 1 feature frequency heatmap showing per-user variation.
Figure 21. Experiment 1 feature frequency heatmap showing per-user variation.
Word count distribution across verbal descriptions, showing typical description length and variability.
Figure 22. Word count distribution across verbal descriptions, showing typical description length and variability.

Section VI

Cross-Modal Integration

The cross-modal analysis examines relationships between visual attention (Voronoi metrics) and verbal descriptions (speech features). Nine statistically significant correlations were identified, suggesting that attention allocation directly influences which features participants choose to describe.

WindowsWallDecorativeRoofEntrance09182736
  • Response
  • Compare
  • Preview

Figure 23. Attention distribution across elements by task type. Response tasks show stronger window focus.

Correlation heatmap between Voronoi metrics and speech features. Shape mentions negatively correlate with TCI (r = −0.76, p < 0.001).
Figure 24. Correlation heatmap between Voronoi metrics and speech features. Shape mentions negatively correlate with TCI (r = −0.76, p < 0.001).
Cross-experiment comparison of key metrics, showing consistency across different building sets.
Figure 25. Cross-experiment comparison of key metrics, showing consistency across different building sets.
Similarity score distribution across experiments. Mean similarity of 4.1/10 indicates moderate perceived resemblance.
Figure 26. Similarity score distribution across experiments. Mean similarity of 4.1/10 indicates moderate perceived resemblance.

Significant Correlations

Speech FeatureGaze Metricrp-value
Shape mentionsTCI−0.76< 0.001
Window mentionsFixation count−0.450.03
Storey mentionsMean duration+0.520.01
Decorative mentionsGaze dispersion+0.610.005
Per-user attention heatmap for Experiment 1, showing individual differences in gaze distribution across participants.
Figure 27. Per-user attention heatmap for Experiment 1, showing individual differences in gaze distribution across participants.