Efficient translation of HTML to JSON for enhanced web content production

Di Stefano A.;Ramzan F.;Reforgiato Recupero D.
2026-01-01

Abstract

The automated transformation of unstructured HTML into schema-compliant data is a foundational challenge for platform interoperability and the scalability of modern no-code web editors. While powerful, Large Language Models (LLMs) are often ill-suited for this task due to their inherent stochasticity and computational cost, failing to guarantee the deterministic precision required at scale. This paper addresses this challenge by introducing a novel, deterministic pipeline to translate arbitrary HTML emails into the proprietary, grid-based JSON of the Beefree content platform. Our core contributions are: (1) a hybrid methodology that combines Document Object Model (DOM) analysis for semantics with computer vision for geometric layout interpretation; (2) a vision-based abstraction technique using visual placeholders for robust row-column detection, resilient to DOM structural variations; and (3) a rigorous, dual-faceted validation of its real-world viability via a large-scale assessment on over 16,000 HTML emails and qualitative usability studies (SUS) with 16 industry professionals and 10 academic researchers. The results confirm our deterministic, vision-augmented approach is a highly effective and scalable alternative to generative models for structured content creation in production environments.
2026
Inglese
85
1
19
Esperti anonimi
scientifica
Html management
Html translation
Large language models
Web content production
Di Stefano, A.; Fadda, M.; Marini, C.; Ramzan, F.; Reforgiato Recupero, D.
1.1 Articolo in rivista
info:eu-repo/semantics/article
1 Contributo su Rivista::1.1 Articolo in rivista
262
5
mixed
Files in This Item:
File Size Format  
s11042-026-21232-7-1.pdf

Solo gestori archivio

Type: versione editoriale
Size 2.38 MB
Format Adobe PDF
2.38 MB Adobe PDF & nbsp; View / Open   Request a copy
s11042-026-21232-7-1 (1) (1) (1).pdf

embargo until 23/01/2027

Type: Author’s Accepted Manuscript AAM, Post-print, (version accepted by the publisher)
Size 2.22 MB
Format Adobe PDF
2.22 MB Adobe PDF & nbsp; View / Open   Request a copy

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Questionnaire and social

Share on:
Impostazioni cookie