Integrated Multimodal Analysis

Building Façade
Perception Research

A comprehensive eye-tracking study examining how people perceive and describe building façades, integrating Weighted Voronoi tessellation, group attention heatmaps, AI-based AOI segmentation, and verbal description analysis.

Participants

Experiments

1,851+

Voronoi Images

403

Group Heatmaps

Abstract

This study investigates visual perception of building façades through a multimodal approach combining eye-tracking, verbal descriptions, and similarity judgements. Fifty-two participants viewed line-drawing façade stimuli across three task conditions (Preview, Response, Compare). Analysis reveals that architectural elements occupy highly disproportionate attention: windows (7% area → 30% attention) and decorative features (12% area → 43% attention) are systematically over-attended, whilst walls (63% area → 29% attention) are under-attended. Task type significantly modulates gaze strategy, with Response tasks eliciting concentrated attention (TCI = 0.234) and Compare tasks producing distributed scanning (TCI = 0.180). Cross-modal analysis identifies nine significant correlations between gaze metrics and verbal features, suggesting that attention allocation directly shapes descriptive content.

Section I

Research Methodology

This study employed a within-subjects experimental design in which 52 participants viewed line-drawing façade stimuli across three distinct task conditions. Eye movements were recorded using Tobii Pro eye-tracking technology at 60 Hz, capturing fixation coordinates, durations, and saccade patterns throughout each trial.

The analytical pipeline integrates four complementary modalities: Time-Weighted Voronoi tessellation for individual attention mapping, group-level attention heatmaps for aggregate fixation density, AI-based AOI segmentation using GPT-4 Vision for architectural element classification, and verbal description analysis for linguistic feature extraction.

Preview

10 sec

Initial scanning of 3×3 building grid to form first impressions

Response

35 sec

Detailed verbal description of individual building façade features

Compare

10 sec

Side-by-side comparison and similarity judgement of building pairs

Figure 1. Distribution of key Voronoi metrics (fixation count, mean duration, TCI, gaze dispersion) across all task types, showing significant variation between experimental conditions.

Figure 2. Correlation matrix of Voronoi metrics. Strong positive correlation between fixation count and total duration (r = 0.92); negative correlation between TCI and gaze dispersion (r = −0.68).

Section II

Weighted Voronoi Tessellation

Time-Weighted Voronoi diagrams partition the visual field into regions proportional to fixation duration. Each fixation point generates a Voronoi cell whose size is inversely proportional to dwell time, yielding the Temporal Concentration Index (TCI) as a summary metric. Over 1,851 individual Voronoi images were generated across all participants and experiments.

Figure 4a. User 1, Preview task. Broad cells indicate rapid exploratory scanning.

Figure 4b. User 1, Response task. Dense cells around windows show concentrated attention.

Figure 4c. User 1, Compare task. Medium cells reflect strategic comparison scanning.

Task-Type Comparison

Mean Fixations
Mean TCI

Figure 5. Mean fixation count and TCI by task type. All differences significant at p < 0.001 (Kruskal-Wallis).

Figure 6. Comprehensive task-type comparison across all Voronoi metrics with statistical significance indicators.

Figure 7. TCI distribution across experiments showing consistent patterns of attention concentration.

Section III

Group Attention Heatmaps

Group heatmaps aggregate fixation data across all participants to reveal collective attention patterns. A total of 403 group heatmaps were generated across three experiments. Each pixel intensity represents cumulative dwell time, normalised across the participant pool, providing immediate visual confirmation of which façade elements attract the most attention.

Experiment 1 — Response Tasks

Figure 9a. Response 1: Central window zone attracts peak attention.

Figure 9b. Response 3: Distributed pattern across decorative elements.

Figure 9c. Response 5: Strong focus on upper façade and roofline.

Experiment 1 — Compare Task

Figure 10. Compare 1: Attention alternates between left and right buildings, with fixation hotspots on distinguishing features.

Key Observations

01 — Central bias is consistently observed, with fixation density peaking at the geometric centre of each façade.

02 — Windows and decorative elements act as primary attention attractors, receiving disproportionate fixation relative to their spatial extent.

03 — Response tasks produce more concentrated heatmaps than Preview tasks, reflecting deeper engagement during verbal description.

Section IV

AOI Segmentation Analysis

Area of Interest (AOI) analysis classifies each façade into architectural elements—walls, windows, roof, entrance, and decorative features—to quantify attention distribution across functional building components. The segmentation employs GPT-4 Vision for intelligent grid-based classification, producing element maps for each stimulus image. A total of 216 segmentation maps were generated.

Figure 12. AI-based AOI segmentation showing colour-coded architectural elements: windows (steel blue), walls (warm grey), roof (terracotta), decorative (gold), entrance (olive).

Figure 13a. Exp 1, Building 1: Grid-based classification identifies distinct zones.

Figure 13b. Exp 1, Building 3: Different architectural style yields different segmentation.

Figure 13c. Exp 2, Building 1: Segmentation adapts to varied façade compositions.

Attention Efficiency

Figure 14a. Attention distribution by architectural element.

Figure 14b. Area vs. attention scatter. Points above diagonal indicate over-representation.

Figure 15. Fixation distribution by architectural element across all three experiments, confirming consistent over-attention to windows and decorative features.

Figure 16. Attention efficiency scatter plot showing the relationship between element area and fixation proportion.

Figure 17. Mean TCI by architectural element and task type. Decorative elements show the highest TCI during Response tasks.

Figure 18. Response vs. Compare attention patterns. Response tasks show more concentrated attention on specific elements.

Section V

Verbal Description Analysis

Participants provided verbal descriptions during Response tasks. These were transcribed and analysed for architectural feature mentions. The average description contained 5.97 distinct features, with windows and storey count being the most frequently mentioned attributes across all experiments.

Figure 19. Architectural feature mention frequency. Windows (3,047) and storeys (2,216) dominate.

Figure 20. Feature distribution heatmap across experiments and participants.

Figure 21. Experiment 1 feature frequency heatmap showing per-user variation.

Figure 22. Word count distribution across verbal descriptions, showing typical description length and variability.

Section VI

Cross-Modal Integration

The cross-modal analysis examines relationships between visual attention (Voronoi metrics) and verbal descriptions (speech features). Nine statistically significant correlations were identified, suggesting that attention allocation directly influences which features participants choose to describe.

Response
Compare
Preview

Figure 23. Attention distribution across elements by task type. Response tasks show stronger window focus.

Figure 24. Correlation heatmap between Voronoi metrics and speech features. Shape mentions negatively correlate with TCI (r = −0.76, p < 0.001).

Figure 25. Cross-experiment comparison of key metrics, showing consistency across different building sets.

Figure 26. Similarity score distribution across experiments. Mean similarity of 4.1/10 indicates moderate perceived resemblance.

Significant Correlations

Speech Feature	Gaze Metric	r	p-value
Shape mentions	TCI	−0.76	< 0.001
Window mentions	Fixation count	−0.45	0.03
Storey mentions	Mean duration	+0.52	0.01
Decorative mentions	Gaze dispersion	+0.61	0.005

Figure 27. Per-user attention heatmap for Experiment 1, showing individual differences in gaze distribution across participants.