Received 01.08.2024, Revised 24.10.2024, Accepted 25.11.2024
This study aimed to examine existing approaches and technologies for the digitisation of genealogical documents, drawing on international experience. This enabled more efficient organisation of digitisation processes and mechanisms for archive groups, their centralised storage, accelerated genealogical research, and improved user accessibility. The digitisation of archives had became a critically important aspect of preserving cultural heritage, particularly in the context of Russia's military aggression against Ukraine. The introduction of automatic text recognition technology had contributed to the optimisation of this process, facilitating access to information and enhancing the efficiency of research, particularly in the field of genealogy. The study analysed the operating principles of optical character recognition, its advantages, the features of ready-made solutions, and the functionality of software based on this technology. The strategy for digitisation in Ukraine was assessed, along with the challenges facing the archival sector in terms of digitisation and access to archive groups. The research also examined the outcomes of implementing automatic text recognition in leading archives worldwide, as well as the capabilities of online archives that offered contextual search functions. Particular attention was given to the opportunities afforded to researchers through the integration of such systems into archival operations, notably the ease of locating required information, the increased speed of data processing, and the provision of round-the-clock access to archival resources regardless of users’ geographical location. The study also reviewed the research of scholars involved in the development and implementation of optical character recognition in archival institutions. Drawing on international experience, the potential of modern Optical Character Recognition technologies to modernise the archival sector in Ukraine was identified, with positive implications for genealogical research and the preservation of cultural heritage. The practical value of the study lies in demonstrating the effectiveness of information technologies in improving the digitisation process of archival documents and enhancing access to them. The proposed recommendations aim to optimise the organisation of digital archives, improve document storage and retrieval processes, and accelerate genealogical research. These developments will contribute to the preservation of cultural heritage and improve access to archival information for users
archive group; scanning; genealogical research; Optical Character Recognition; information technologies; automation
[1] Amazon rekognition. (2024). Amazon Web Services. Retrieved from https://aws.amazon.com/rekognition/?nc1=h_ls.
[2] Artemenkova, O. (2022). Information tools of genealogical research in the archives of Ukraine. Visnyk of Kharkiv State Academy of Culture, 61, 81-93. doi: 10.31516/2410-5333.061.08.
[3] Artemenkova, O. (2023). Information technologies as a tool for popularizing genealogical research in the archives of Ukraine. (Doctoral dissertation, Kyiv National University of Culture and Arts, Kyiv, Ukraine).
[4] Creating a digital scholarly edition of the Lovelace papers with Jessica Cook. (2021). Transkribus. Retrieved from https://www.transkribus.org/success-story/lovelace-cook.
[5] Digital transformations in Ukraine: Do domestic institutional conditions meet external challenges and the European agenda? (2020). Chernihiv: Polissya Foundation for International and Regional Studies.
[6] Documentation and archiving. (2024). Arolsen Archives. Retrieved from https://arolsen-archives.org/en/about-us/what-we-do/documentation-and-archiving/.
[7] Dutch handwriting 17th-19th century. Free public AI model for handwritten text recognition with Transkribus. (2023). The Hague: National Archives Netherlands.
[8] Ferro, S., Pelillo, M., & Traviglia, A. (2023). AI-assisted digitalisation of historical documents. In The international archives of the photogrammetry, remote sensing and spatial information sciences. 29th CIPA symposium “Documenting, understanding, preserving cultural heritage: Humanities and digital technologies for shaping the future” (Vol. XLVIII-M-2-2023, pp. 557-562). Florence, Italy. doi: 10.5194/isprs-archives-XLVIII-M-2-2023-557-2023.
[9] Friedewald, M., Székely, I., & Karaboga, M. (2024). Preserving the past, enabling the future: Assessing the European policy on access to archives in the digital age. Preservation, Digital Technology & Culture, 53(2), 61-71. doi: 10.1515/pdtc-2024-0003.
[10] IBM cloud pak for business automation. (2022). IBM. Retrieved from https://www.ibm.com/products/cloud-pak-for-business-automation.
[11] Khoma, I., Vovk, N., Holoshchuk, R., & Muravska, S. (2023). Promoting the Ukrainian education and culture centre “Oseredok” through the digitization of Ukrainian studies archival collections in Canada. In SCIA-2023: 2nd international workshop on social communication and information activity in digital humanities. Lviv, Ukraine.
[12] Korzhyk, N., Solianyk, A., Borysova, A., & Aleksander, M. (2023). State archives in Ukraine during the russian aggression: Challenges and achievements. In SCIA-2023: 2nd international workshop on social communication and information activity in digital humanities. Lviv, Ukraine.
[13] Kovalska, L. (2019). Document communication of archive information users. Intercultural Communication, 6, 231-248. doi: 10.13166/inco/103415.
[14] Kovtaniuk, Yu. (2023). Normative and legal regulation of digitization of fonds of cultural institutions as requirement for development of state integration electronic information resources of national historical and cultural heritage. Manuscript and Book Heritage of Ukraine, 31, 379-406. doi: 10.15407/rksu.31.379.
[15] Lipianina-Honcharenko, K., Yarych, V., Ivasechko, A., Filinyuk, A., Yurkiv, K., & Lebid, T. (2024). Evaluating the effectiveness of attention-gated-CNNBGRU models for historical manuscript recognition in Ukraine. In The first international workshop of young scientists on artificial intelligence for sustainable development. Ternopil, Ukraine.
[16] Logvynenko, B., et al. (2024). Anatolii Khromov: On digitisation, preservation and openness of archives. Ukrainer. Retrieved from https://www.ukrainer.net/anatoliy-khromov/.
[17] Martínez-Cardama, S., & Pacios, A.R. (2022). National archives’ priorities: An international overview. Archival Science, 22, 1-42. doi: 10.1007/s10502-021-09367-y.
[18] National archives. (2024). Retrieved from https://www.nationaalarchief.nl/.
[19] Nockels, J., Gooding, P., & Terras, M. (2024). The implications of handwritten text recognition for accessing the past at scale. Journal of Documentation, 80(7), 148-167. doi: 10.1108/JD-09-2023-0183.
[20] Onuchak, V. (2024). Electronic archives in Ukraine: How the state plans to store electronic documents. Vchasno. EDO. Retrieved from https://vchasno.ua/elektronni-arhivy-ukrainy/.
[21] Order of the Cabinet of the Ministers of Ukraine No. 1353-r “On Approval of the Strategy for Digital Transformation of the Social Sphere”. (2020, October). Retrieved from https://zakon.rada.gov.ua/laws/show/1353-2020-%D1%80#n10.
[22] Paliienko, M. (2023). Rethinking approaches to archival theory and practice in Ukraine in the context of digital transformation of the society. Atlantic +, 33(2), 83-99.
[23] Prebor, G. (2024). From digitization and images to text and content: Transkribus as a case study. Manuscript Studies: A Journal of the Schoenberg Institute for Manuscript Studies, 9(1), 72-89. doi: 10.1353/mns.2024.a930877.
[24] Report on the implementation of the Ministry of Digital Transformation of Ukraine’s work plan for 2023. (2023). Official website of the Ministry of Digital Transformation of Ukraine. Retrieved from https://thedigital.gov.ua/community/reports.
[25] Rybachok, O. (2018). International integrated digital resources of documentary heritage of archives, libraries, museums: Stages of creation, development strategies (1980s-2010s). (PhD dissertation, Vernadsky National Library of Ukraine, Kyiv, Ukraine).
[26] Salamanca, L., Brandenberger, L., Gasser, L., Schlosser, S., Balode, M., Jung, V., Perez-Cruz, F., & Schweitzer, F. (2024). Processing large-scale archival records: The case of the Swiss parliamentary records. Swiss Political Science Review, 30(2), 140-153. doi: 10.1111/spsr.12590.
[27] Savon Työmies. (1920). Sanomlehdet. Retrieved from https://digi.kansalliskirjasto.fi/sanomalehti/binding/3112293?page=1.
[28] Silva, A.L., & Terra, A.L. (2023). Cultural heritage on the semantic web: The Europeana data model. IFLA Journal, 50(1), 93-107. doi: 10.1177/03400352231202506.
[29] Sokil, M., Syerov, Y., & Boiko, V. (2024). From destruction to digitization: Safeguarding Ukraine’s cultural and archival heritage in wartime. In P. Štarchoň, S. Fedushko & K. Gubíniová (Eds.), Data-centric business and applications. Lecture notes on data engineering and communications technologies (Vol. 208, pp. 253-280). Cham: Springer. doi: 10.1007/978-3-031-59131-0_12.
[30] Spina, S. (2023). Artificial intelligence in archival and historical scholarship workflow: HTS and ChatGPT. Digital Humanities, 16, 125-140. doi: 10.6092/issn.2532-8816/17205.
[31] State Archive of Zaporizhzhia Region. (2024). Retrieved from https://archivzp.gov.ua/uk/.
[32] Tesseract OCR. (2024). GitHub. Retrieved from https://github.com/tesseract-ocr/tesseract.
[33] Tikhonov, A., & Rabus, A. (2024). Handwritten text recognition of Ukrainian manuscripts in the 21st century: Possibilities, challenges, and the future of the first generic AI-based model. Kyiv-Mohyla Humanities Journal, 11, 226-247. doi: 10.18523/2313-4895.11.2024.226-247.
[34] Tiurmenko, I., Bozhuk, L., Struk, I., & Syerov, Y. (2022). Digital documentary collections of national cultural heritage on the Ukrainian regional state archives websites. In N. Kryvinska & M. Greguš (Eds.), Developments in information & knowledge management for business applications. Studies in systems, decision and control (Vol. 421, pp. 449-470). Cham: Springer. doi: 10.1007/978-3-030-97008-6_20.
[35] Unlock the past with Transkribus. (2024). Transkribus. Retrieved from https://www.transkribus.org/.
[36] User guide – introduction. (2024). OCR4all.org. Retrieved from https://www.ocr4all.org/guide/user-guide/introduction.
[37] Vision AI: Extract insights from images, documents, and videos. (2024). Cloud Vision API. Retrieved from https://cloud.google.com/vision?hl=en.
[38] What is OCR (Optical Character Recognition)? (2024). Amazon Web Services. Retrieved from https://aws.amazon.com/what-is/ocr/.
[39] What is optical character recognition (OCR)? (2024). IBM. Retrieved from https://www.ibm.com/think/topics/optical-character-recognition.
[40] Zurich council manuals 1642-1798. (2024). Canton of Zurich State Archives. Retrieved from https://ratsmanuale-zuerich.transkribus.eu/.