Generic placeholder image

Recent Advances in Computer Science and Communications


ISSN (Print): 2666-2558
ISSN (Online): 2666-2566

Systematic Review Article

Schema Extraction in NoSQL Databases: A Systematic Literature Review

Author(s): Saad Belefqih*, Ahmed Zellou and Mouna Berquedich

Volume 17, Issue 8, 2024

Published on: 15 December, 2023

Article ID: e151223224559 Pages: 13

DOI: 10.2174/0126662558273437231204061106

Price: $65

conference banner

Introduction: Nowadays, NoSQL databases have taken on an increasingly important role in the storage of massive data within companies. Due to a common property called schema-less, NoSQL databases offer great flexibility, particularly for the storage of data in different formats. However, despite their success in data storage, schema-less databases are a major obstacle in areas requiring precise knowledge of this schema, especially in the field of data integration.

Method: This study presents a Systematic Literature Review (SLR) to explore, evaluate, and discuss relevant existing research and endeavors using novel schema extraction approaches. Furthermore, we conducted this study using a well-defined methodology to examine and study the problem of schema extraction from NoSQL databases.

Results: Our research results highlight and emphasize the scheme extraction approaches and provide knowledge to researchers and practitioners by proposing schema extraction approaches and their limitations, which contributes to inventing new, more efficient approaches.

Conclusion: In our future work, inspired by the recent advances in quantum computing and the emergence of post-quantum cryptography (PQC), we aim to propose a schema extraction approach that blends cutting-edge technologies with a strong focus on database security.

Keywords: Schema-less, NoSQL databases, schema extraction, schema inference, schema versioning, data integration.

Graphical Abstract
I. Veinhardt Latták, and P. Koupil, "A comparative analysis of JSON schema inference algorithms", In Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering, 2022, pp. 379-386.
S. Klessinger, M. Klettke, U. Störl, and S. Scherzinger, "Extracting JSON schemas with tagged unions", arXiv, 2023.
N. Bansal, S. Sachdeva, and L.K. Awasthi, "A workload-driven approach for automatic schema generation for document stores", In Proceedings of the 6th Joint International Conference on Data Science & Management of Data, 2023, p. 133.
S.M. Deen, R.R. Amin, and M.C. Taylor, "Data integration in distributed Iatabases", IEEE Transactions on Software Engineering, 1987.
M. Shamila, "A review on several critical issues and challenges in IoT based e-healthcare system", In 2019 International Conference on Intelligent Computing and Control Systems (ICCS), 2019.
J.W. Kim, K. Edemacu, and B. Jang, "MPPDS: Multilevel privacy-preserving data sharing in a collaborative eHealth system", IEEE Access, vol. 7, pp. 109910-109923, 2019.
O. Ayaad, A. Alloubani, E.A. ALhajaa, M. Farhan, S. Abuseif, A. Al Hroub, and L. Akhu-Zaheya, "The role of electronic medical records in improving the quality of health care services: Comparative study", Int. J. Med. Inform., vol. 127, no. 127, pp. 63-67, 2019.
[] [PMID: 31128833]
C.J. McDonald, "The barriers to electronic medical record systems and how to overcome them", J. Am. Med. Inform. Assoc., vol. 4, no. 3, pp. 213-221, 1997.
[] [PMID: 9147340]
H. Larum, G. Ellingsen, and A. Faxvaag, "Doctors’ use of electronic medical records systems in hospitals: Cross sectional survey", BMJ, vol. 323, no. 7325, pp. 1344-1348, 2001.
[] [PMID: 11739222]
Z. Ahmed, and C. Dalila, "A Mediation Architecture for E-government", In 2nd Days of New Information and Communication Technologies, 2003 Tangier, Morocco.
Z. Ahmed, and C. Dalila, "Towards a prototype for e-government", In Plenary Conference in Information Sciences and Technologies and Communication, CoPSTICAt, 2003 Rabat, Morocco.
M.L. Markus, "Paradigm Shifts - E-Business and Business/Systems Integration", Comm. Assoc. Inform. Syst., vol. 4, 2000.
T.D. Ladd, F. Jelezko, R. Laflamme, Y. Nakamura, C. Monroe, and J.L. O’Brien, "Quantum computers", Nature, vol. 464, no. 7285, pp. 45-53, 2010.
[] [PMID: 20203602]
D.J. Bernstein, and T. Lange, "Post-quantum cryptography", Nature, vol. 549, no. 7671, pp. 188-194, 2017.
[] [PMID: 28905891]
X. HongJu, W. Fei, W. FenMei, and W. XiuZhen, "Some key problems of data management in army data engineering based on big data", IEEE Xplore, 2017.
D. Al-Fraihat, M. Joy, R. Masa’deh, and J. Sinclair, "Evaluating E-learning systems success: An empirical study", Comput. Human Behav., vol. 102, no. 1, pp. 67-86, 2020.
D. Otoo-Arthur, and T.L. van Zyl, "A scalable heterogeneous big data framework for e-learning systems", In 2020 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), 2020.
D. Chen, and G. Doumeingts, "European initiatives to develop interoperability of enterprise applications—basic concepts, framework and roadmap", Annu. Rev. Contr., vol. 27, no. 2, pp. 153-162, 2003.
X.L. Dong, and D. Srivastava, "Big data integration", In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, 2013.
A.C.P. Barbosa, F.A.M. Porto, and R.N. Melo, "Configurable data integration middleware system", J. Braz. Comput. Soc., vol. 8, no. 2, pp. 12-19, 2002.
E.M. van Mulligen, T. Timmers, and F. van den Heuvel, "A framework for uniform access to data, software and knowledge", In Proceedings of the Annual Symposium on Computer Application in Medical Care, 1991, p. 496.
H. Gupta, "Selection of views to materialize in a data warehouse", In International Conference on Database Theory, 1997.
M. Ren, and K.J. Lyytinen, Building enterprise architecture agility and sustenance with SOA., vol. 22. CAIS, 2008.
G. Wiederhold, "Mediators in the architecture of future information systems", Computer, vol. 25, no. 3, pp. 38-49, 1992.
A. Wellwood, "Interpreting Degree Semantics", Front. Psychol., 2020.
H.K. Yu, W.J. Akinola Ogbeyemi, J.H. Lin, S. Wei, and W.J. Zhang, "A semantic model for enterprise application integration in the era of data explosion and globalisation", Enterprise Information Systems, vol. 17, no. 4, 2023.
O. Ajarroud, A. Zellou, and A. Idri, "Measuring semantic coverage rates provided by cached regions in mediation systems", In: Information Systems and Technologies to Support Learning., 2019, pp. 312-320.
A. Yousfi, M.H. Elyazidi, and A. Zellou, "Assessing the performance of a new semantic similarity measure designed for schema matching for mediation systems", In International Conference on Computational Collective Intelligence, 2018.
O. Ajarroud, and A. Zellou, "SBQP: Towards a semantic-based query processing for efficient mediation caching", In: Advances in Systems Engineering., 2022, pp. 477-487.
M.H.E. Yazidi, A. Zellou, and A. Idri, "Towards a fuzzy mapping for virtual integration system", International Review on Computers and Software, vol. 8, no. 7, p. 7, 2013.
M. Carro, "NoSQL Databases", arXiv, 2014.
S. Liu, S. Nguyen, J. Ganhotra, M.R. Rahman, I. Gupta, and J. Meseguer, "Quantitative analysis of consistency in NoSQL key-value stores", In Quantitative Evaluation of Systems, 12th International Conference, QEST, 2015.
K.B.S. Kumar, "A performance comparison of document oriented NoSQL databases", In 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP), 2017, pp. 1-6.
P.D. Muñoz-Sánchez, C.J. Fernández Candel, J. García Molina, and D. Sevilla Ruiz, "Managing physical schemas in MongoDB stores", In: Advances in Conceptual Modeling., 2020, pp. 162-172.
M. Scavuzzo, E. Di Nitto, and S. Ceri, "Interoperable data migration between NoSQL columnar databases", In 2014 IEEE 18th International Enterprise Distributed Object Computing Conference Workshops and Demonstrations, 2014.
A.A. Frozza, E. Dias Defreyn, and R. Dos Santos Mello, "An approach for schema extraction of NoSQL columnar databases: The HBase case study", J. Inf. Data Manag., vol. 12, no. 5, 2021.
A. Castelltort, and A. Laurent, "Representing history in graph-oriented NoSQL databases: A versioning system", In Eighth International Conference on Digital Information Management (ICDIM 2013), 2013, pp. 228-234.
I. Comyn-Wattiau, and J. Akoka, "Query-Based reverse engineering of graph databases-from program to model", In: Symposium on Advances in Databases and Information Systems., 2019.
S. Sagiroglu, and D. Sinanc, "Big data: A review", In 2013 International Conference on Collaboration Technologies and Systems (CTS), 2013pp. 42-47 San Diego, CA, USA.
B. Kitchenham, and P. Brereton, "Deen", Inf. Softw. Technol., vol. 55, no. 12, pp. 2049-2075, 2013.
S. Normey, L. Etcheverry, A. Marotta, and M.P. Consens, "Findings from two decades of research on schema discovery using a systematic literature review", In 12th Alberto Mendelzon International Workshop on Foundations of Data Management, 2018. Cali, Colombia.
D. Sevilla Ruiz, S.F. Morales, and J. García Molina, "Inferring versioned schemas from NoSQL databases and its applications", ER'2015, 2015.
M. Klettke, H. Awolin, U. Storl, D. Muller, and S. Scherzinger, "Uncovering the evolution history of data lakes", In 2017 IEEE International Conference on Big Data (Big Data), 2017.
M.A. Baazizi, D. Colazzo, G. Ghelli, and C. Sartiani, "Parametric schema inference for massive JSON datasets", VLDB J., vol. 28, no. 4, pp. 497-521, 2019.
A.A. Frozza, E.D. Defreyn, and R.D.S. Mello, "Process for inference of columnar NoSQL database schemas", In Proceedings of the XXXV Brazilian Database Symposium (SBBD 2020), pp. 175-180, 2020.Brazil: Brazilian Computing Society - SBC.
R. Bouhamoum, K. Kellou-Menouer, S. Lopes, and Z. Kedad, "Scaling up schema discovery for RDF datasets", In 2018 IEEE 34th International Conference on Data Engineering Workshops, 2018.
M. Souibgui, F. Atigui, S. Ben Yahia, and S. Si-Said Cherfi, "An embedding driven approach to automatically detect identifiers and references in document stores", Data Knowl. Eng., vol. 139, p. 102003, 2022.
F. Machado, D. Saccol, E. Piveta, R. Padilha, and E. Ribeiro, "A text similarity-based process for extracting JSON conceptual schemas", In Proceedings of the 23rd International Conference on Enterprise Information Systems (ICEIS 2021), vol. 1, 2021, pp. 264-271.
V.I. Levenshtein, "Binary codes capable of correcting deletions, insertions, and reversals-Semantic Scholar", Available from:
G.A. Miller, "WordNet: A lexical database for english", Communications of the ACM, vol. 38, no. 11, 1995.
P. Koupil, S. Hricko, and I. Holubová, "A universal approach for multi-model schema inference", J. Big Data, vol. 9, no. 1, p. 97, 2022.
B. Dobing, and J. Parsons, "How UML is used", Commun. ACM, vol. 49, no. 5, pp. 109-113, 2006.
OMG, "Object Management Group", Available from:
C.J. Fernández Candel, D.S. Ruiz, and J.J. García-Molina, "A unified metamodel for NoSQL and relational databases", Information Systems, vol. 104, p. 101898, 2022.
M. Fruth, K. Dauberschmidt, and S. Scherzinger, "Josch: Managing schemas for NoSQL document stores", In 2021 IEEE 37th International Conference on Data Engineering (ICDE), 2021.
F. Pezoa, J.L. Reutter, F. Suarez, M. Ugarte, and D. Vrgoč, "Foundations of JSON Schema", In WWW '16: Proceedings of the 25th International Conference on World Wide Web, 2016, pp. 263-273.
F. Abdelhedi, H. Rajhi, and G. Zurfluh, "Extraction process of the logical schema of a document-oriented NoSQL database", In 10th International Conference on Model-Driven Engineering and Software Development, 2022.
S. Hamouda, R. Sughayyar, and O. Elejla, "Semi-Structured Schema for a Big Data (S-SSBD)", In International Conference on Knowledge Engineering and Ontology Development, 2021.
A. Brahim, R. Ferhat, and G. Zurfluh, "MDA process to extract the data model from document-oriented NoSQL database", In Proceedings of the 21st International Conference on Enterprise Information Systems, vol. 1, 2019, pp. 141-148.
Z. Aftab, W. Iqbal, K.M. Almustafa, F. Bukhari, and M. Abdullah, "Automatic NoSQL to relational database transformation with dynamic schema mapping", Sci. Program., vol. 2020, pp. 1-13, 2020.
A.H. Chillón, M. Klettke, D.S. Ruiz, and J.G. Molina, "A taxonomy of schema changes for NoSQL databases", arXiv, 2022.
P. Uma Priya, and S. Thilagam, "ClustVariants: An approach for schema variants extraction from JSON document collections", In 2022 IEEE IAS Global Conference on Emerging Technologies (Glob-ConET), 2022.
U. Störl, and M. Klettke, "Darwin: A data platform for NoSQL schema evolution management and data migration", In Workshop Proceedings of the EDBT/ICDT 2022 Joint Conference (March 29- April 1, 2022), 2022.Edinburgh, UK.
M. Möller, "Keeping nosql databases up to date-semantics of evolution operations and their impact on data quality", In Proceedings of the Conference on "Lernen, Wissen, Daten, Analysen", 2019.
H. Winkelmann, and H. Kuchen, "Symbolic execution of NoSQL applications using versioned schemas", In Proceedings of the 36th Annual ACM Symposium on Applied Computing, Virtual Event, 2021, pp. 1778-1787.
Z. Brahmia, F. Grandi, and R. Bouaziz, "$JOWL$: A systematic approach to build and evolve a temporal OWL 2 ontology based on temporal JSON Big Data", Big Data Mining and Analytics, vol. 5, no. 4, pp. 271-281, 2022.
B. Maity, A. Acharya, T. Goto, and S. Sen, "A framework to convert NoSQL to relational model", In Proceedings of the 6th ACM/ACIS International Conference on Applied Computing and Information Technology, 2018, pp. 1-6.
C-F. Andor, V. Varga, and C. Sacarea, "A graph based knowledge and reasoning representation approach for modeling MongoDB data structure and query", In 2019 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), 2019pp. 1-6 Split, Croatia.
G. Yuan, J. Lu, Z. Yan, and S. Wu, "A survey on mapping semi-structured data and graph data to relational data", ACM Comput. Surv., vol. 55, no. 10, pp. 1-38, 2023.
K. Kellou-Menouer, N. Kardoulakis, G. Troullinou, Z. Kedad, D. Plexousakis, and H. Kondylakis, "A survey on semantic schema discovery", VLDB J., vol. 31, no. 4, pp. 675-710, 2022.
M.L. Möller, N. Scharlau, and M. Klettke, "An empirical study of open data JSON files", In Proc. DOLAP ’21, vol. 2840, 2021, pp. 121-125.
S. Scherzinger, and S. Sidortschuck, "An empirical study on the design and evolution of NoSQL database schemas", International Conference on Conceptual Modeling, pp. 441-455, 2020.
P. Gómez, C. Roncancio, and R. Casallas, "Analysis and evaluation of document-oriented structures", Data Knowl. Eng., vol. 134, p. 101893, 2021.
A. Abelló, X. de Palol, and M.S. Hacid, "Approximating the schema of a set of documents by means of resemblance", J. Data Semant., vol. 7, no. 2, pp. 87-105, 2018.
F. Abdelhedi, A. Brahim, H. Rajhi, R. Ferhat, and G. Zurfluh, "Automatic extraction of a document-oriented NoSQL schema", In 23rd International Conference on Enterprise Information Systems, 2021.
A.A. Imam, S. Basri, R. Ahmad, J. Watada, and M.T. González-Aparicio, "Automatic schema suggestion model for NoSQL document-stores databases", J. Big Data, vol. 5, no. 1, p. 46, 2018.
V. Varga, C-F. Andor, and C. Săcărea, Conceptual graphs based modeling of MongoDB data structure and query., Graph-Based Representation and Reasoning., 2019, pp. 262-270.
H. Vera-Olivera, R. Guo, R.C. Huacarpuma, A.P.B. Da Silva, A.M. Mariano, and M. Holanda, "Data modeling and NoSQL databases-a systematic mapping review", ACM Comput. Surv., vol. 54, no. 6, pp. 1-26, 2022.
R.A.S.N. Soransso, and M.C. Cavalcanti, "Data modeling for analytical queries on document-oriented DBMS", In Proceedings of the 33rd Annual ACM Symposium on Applied Computing, 2018, pp. 541-548.
S. Bouaziz, A. Nabli, and F. Gargouri, "Design a data warehouse schema from document-oriented database", Procedia Comput. Sci., vol. 159, pp. 221-230, 2019.
A. Hernández Chillón, J.R. Hoyos, J. García-Molina, and D. Sevilla Ruiz, "Discovering entity inheritance relationships in document stores", Knowl. Base. Syst., vol. 230, p. 107394, 2021.
F. Abdelhedi, A. Brahim, R. Ferhat, and G. Zurfluh, "Discovering of a Conceptual Model from a NoSQL Database", In 22nd International Conference on Enterprise Information Systems, 2020.
D. Aggarwal, and K.C. Davis, "Employing graph databases as a standardization model for addressing heterogeneity and integration", In Workshop on Formal Methods Integration International Conference on Information Reuse and Integration, 2018.
J.R. Namba, "Enhancing JSON schema discovery by uncovering hidden data", In Proceedings of the VLDB 2021 PhD Workshop2021 Copenhagen, Denmark
U. Priya, and P.S. Thilagam, "Extracting schema variants from JSON collections using JSVTree", In Proceedings of the 6th Joint International Conference on Data Science & Management of Data, 2023, p. 137.
F. Abdelhedi, H. Rajhi, and G. Zurfluh, "Extraction of semantic links from a document-oriented NoSQL database", SN Computer Science, vol. 4, no. 2, p. 148, 2023.
M. Souibgui, F. Atigui, S.B. Yahia, and S.S-S. Cherfi, "IRIS-DS: A new approach for identifiers and references discovery in document stores", In 54th Hawaii International Conference on System Sciences (HICSS 2021), 2021.
A.A. Frozza, and R.S. Mello, "JS4Geo: A canonical JSON Schema for geographic data suitable to NoSQL databases", GeoInformatica, vol. 24, no. 4, pp. 987-1019, 2020.
D.U. Priya, and P.S. Thilagam, "JSON document clustering based on schema embeddings", J. Inf. Sci., no. Sept, 2022.
P. Čontoš, and M. Svoboda, "JSON Schema Inference Approaches", In: Advances in Conceptual Modeling., 2020, pp. 173-183.
S. Scherzinger, M. Klettke, and U. Störl, "Managing Schema Evolution in NoSQL Data Stores", arxiv, 2013.
A. Conrad, P. Utzmann, M. Klettke, and U. Störl, "Metamodels to support database migration between heterogeneous data stores", In Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings, 2022pp. 546-551 Montreal Quebec Canada.
A. Brahim, R. Ferhat, and G. Zurfluh, "Model driven extraction of NoSQL databases schema: Case of MongoDB", In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019), 2019, pp. 145-154.
M. Fruth, K. Dauberschmidt, and S. Scherzinger, "New workflows in NoSQL schema management", In 2nd Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores (SEA Data 2021)., 2021.
U. Störl, M. Klettke, and S. Scherzinger, "NoSQL schema evolution and data migration: State-of-the-Art and opportunities", In International Conference on Extending Database Technology, 2020.
E. Gallinucci, M. Golfarelli, and S. Rizzi, "Schema profiling of document-oriented databases", Inf. Syst., vol. 75, pp. 13-25, 2018.
W. Spoth, O. Kennedy, Y. Lu, B. Hammerschmidt, and Z.H. Liu, "Reducing ambiguity in Json schema discovery", In Proceedings of the 2021 International Conference on Management of Data, Virtual Event, 2021, pp. 1732-1744.
F. Abdelhedi, A. Ait Brahim, R. Tighilt Ferhat, and G. Zurfluh, "Reverse engineering approach for NoSQL databases", In 22nd International Conference, DaWaK, pp. 60-69, 2020.Bratislava, Slovakia.
P. Koupil, S. Hricko, and I. Holubová, "Schema inference for multi-model data", In Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems, 2022, pp. 13-23.

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy