How are informal diagrams used in software engineering? An exploratory study of open-source and industrial practices

Jongeling, Robbert; Cicchetti, Antonio; Ciccozzi, Federico

doi:10.1007/s10270-024-01252-3

How are informal diagrams used in software engineering? An exploratory study of open-source and industrial practices

Special Section Paper
Open access
Published: 20 December 2024

Volume 24, pages 601–613, (2025)
Cite this article

You have full access to this open access article

Download PDF

View saved research

Software and Systems Modeling Aims and scope Submit manuscript

How are informal diagrams used in software engineering? An exploratory study of open-source and industrial practices

Download PDF

1863 Accesses
3 Altmetric
Explore all metrics

Abstract

In software engineering practice, models created for communication and documentation are often informal. This limits the applicability of powerful model-driven engineering mechanisms. Understanding the motivations and use of informal diagrams can improve modelling techniques and tools, by bringing together the benefits of both informal diagramming and modelling using modelling languages and modelling tools. In this paper, we report on an initial exploration effort to investigate the use of informal diagramming in both open-source software repositories and industrial software engineering practices. We carried out a repository mining study on open-source software repositories seeking informal diagrams and classified them according to what they represent and how they are used. Additionally, we describe industrial practices that rely to some extent on informal diagramming, as gathered through unstructured interviews with practitioners. We compare the findings from these data sources and discuss how informal diagrams are used in practice.

From Informal Architecture Diagrams to Flexible Blended Models

Model-Driven Production of Data-Centric Infographics: An Application to the Impact Measurement Domain

Diversity in UML Modeling Explained: Observations, Classifications and Theorizations

1 Introduction

In model-driven engineering, we commonly regard conventional models as being artefacts that conform to a modelling language and that are created in modelling tools. Moreover, advanced applications for these models, such as automated transformation into source code, are provided too. Our experience from industrial practice is different, and informal models are at least as common as conventional ones. That is, we encounter free-form diagrams that are not created in modelling tools and that do not conform, entirely or partially, to a modelling language. These diagrams may still describe the structure or behaviour of a system under development, and we refer to them as “informal diagrams”.

Informal diagrams represent models in the sense that they are abstractions with a certain purpose [4]. However, they lack full conformance to a modelling language’s syntax and semantics. While commonly created to sketch early designs and communication, informal diagrams are also often used for the documentation of the system under development [1, 27]. Informal diagrams differ from models also in the way they are created, for example in general-purpose office tools or diagramming tools, rather than in modelling tools.

Despite the informal nature of these models, they are an essential part of modelling activities and understanding their usage can give the modelling community insights into the experienced shortcomings of modelling and modelling tools. Research in this direction might give us insight into what functionality is needed to be supported in modelling tools, in particular concerning various aspects of flexibility such as (non-)conformance, (in-)completeness, and (un-)certainty. Thereby, this work can contribute to narrowing the gap between theory and practice in the body of knowledge in model-driven engineering. Indeed, the modelling community highlights the importance of informal diagramming as a means to increase the adoption of modelling in industrial software engineering settings [6].

Our overall research question is as the title of this article: “How are informal diagrams used in software engineering?”. To answer this question, we perform exploratory research of public open-source projects and industrial closed-source practices that include informal diagramming. We consider aspects related to what the diagrams represent, how they are used, and how they are maintained. In particular, we mine open-source software repositories for informal diagrams and we collect closed-source industrial practices through unstructured interviews within previously established industry–academia collaborations.

The remainder of this article is organized as follows. Section 2 outlines related studies on informal diagramming in software engineering. Section 3 describes our research methodology. Section 4 presents the results of our repository mining study as well as the gathered industrial practices. Section 5 contains a discussion of our findings. Section 6 concludes.

2 Related work

In this section, we discuss initiatives that aimed at understanding and improving informal diagramming practices in software engineering. Firstly, we discuss definitions of modelling and informal diagramming. Secondly, we describe work on the relationship between informal diagramming and the unified modelling language (UML). Thirdly, we discuss flexibility in modelling.

2.1 Characterizing modelling and informal diagramming

The use of informal notations is common for communicating among engineers [1, 27]. Informal diagrams and free-form sketches may represent abstractions to describe or prescribe the intended structure or behaviour of systems, this characteristic they share with models [4]. However, we refer to informal diagrams for those representations that do not have an explicit conformance to a metamodel, an underlying metamodel may exist, but is not captured in a means that allows for automatic checking of the conformance of the informal diagram to it.

Informal diagrams are thus (i) not conforming to an explicitly defined modelling language, that is, conformance cannot be checked by automated means, and (ii) created in drawing tools or by other means that allow for syntax not conforming to a pre-defined set of symbols (such as is done in modelling tools). In contrast, we refer to canonical models as those models that are (i) conforming to a modelling language that is defined and available, and (ii) are created in modelling tools.

An example of an informal diagram is included in Fig. 2. We see that it is an abstract representation of a software system, describing some aspects of its implementation. We also see that the diagram does not follow a specific known underlying modelling language, and the arrows have no specified semantics. Moreover, it is created in a drawing tool and the diagram cannot be automatically machine processed due to the absence of an explicitly defined metamodel.

2.2 Use of UML and informal diagrams in practice

Robles et al. mined GitHub to build a dataset of UML models and found a majority of them residing in image file format, indicating that these models had likely not been produced in modelling tools but rather by manual sketching or informal diagramming in other tools [26]. In our study, we are interested in extending this search space to not only cover diagrams using UML notations, but also considering other informal ad hoc created notations by practitioners. Image file formats imply that the diagram is a one-time created artefact. While possible to use it for communication in the development, it is likely never updated because it is not editable as diagram file formats are. We do not consider image file formats in our study, since they cannot be used in the same incremental way as diagram files. Instead, we are interested in those types of diagrams that are possibly maintained during the development.

Störrle surveyed 96 practitioners and researchers and found that informal modelling is used much more than canonical modelling [27]. At the same time, the respondents reported that UML is widely used, even if mostly in an informal manner. Moreover, the respondents thought that architects would benefit the most from modelling, being it through informal diagrams or canonical models. The survey finds that most activities for which informal diagrams are used relate to communication between engineers, and design and documentation of the system under development. In our study, we aim to extend these findings by looking at the content of the diagrams too.

Walny et al. have interviewed academics who develop software to chart the life cycles of informal diagrams used during software development [30]. The study also includes hand-drawn sketches and shows that the developers use them both for personal understanding and for communication to other stakeholders. In our study, we aim to expand on these initial findings by studying a broader set of artefacts and seeing not only the process in which they are used, but also understanding what these informal diagrams represent.

In a survey of 394 software professionals and researchers about their use of sketching and informal diagramming [1], Baltes and Diehl found that, while UML was rarely used, informal diagrams did typically include UML elements. Their survey distinguished 11 aspects of informal diagramming. One of those aspects shows that whiteboards were the preferred medium when involving more than two persons in informal diagramming. In the era of increased geographically distributed development teams, and open-source projects with rarely co-located collaborators, it is interesting to see how digital media are currently used for informal diagramming. That is why we aim in this study to collect diagrams from GitHub, from projects that may have a fundamentally different collaborative nature than those in which engineers are co-located and can discuss on whiteboards.

While the popularity of UML in industry is sometimes questioned, a recent SLR shows that UML is still the most used modelling language in practice [28]. The authors confirm also the trend shown by the other studies that models are mostly used for documentation and communication purposes. To fulfil these purposes, it is enough for models to be rather informal and they do not need to conform strictly to a modelling language.

2.3 Flexibility in modelling

Several works have discussed flexible modelling as a mechanism to support more relaxed or more strict conformance of models to modelling languages [14]. When considering the conformance dimension, flexible modelling applies to the practices we discuss in this paper.

Sketching as an easily accessible way of modelling has been recognized by other research too, leading to approaches such as FlexiSketch [31] and OctoUML [29], which allow sketched input diagrams to be automatically transformed into (UML) models. This represents one type of approach among several that have been proposed under the term “flexible modelling” [11]. There, the goal becomes to facilitate informal diagramming and help practitioners migrate the knowledge they are capturing informally into a more structured format, thereby allowing them to benefit more from the modelling paradigm.

Under the flexible modelling paradigm, some works have looked at sketches (hand-drawn) as input into the modelling process. This can be limited to recognizing hand-drawn shapes as UML elements and automatically converting them [7]. Other approaches go a step further and help with the creation of models by supporting the bottom-up creation of domain-specific languages [2].

Other approaches start closer to the type of informal diagrams that we are interested in, as created in office tools, diagramming tools, or other digital media. Examples of the most commonly used tools for informal diagramming are Microsoft Visio [23], Microsoft Powerpoint [25], and Draw.io [15]. Other studies show bridges between informal and formal representations [9, 18, 21, 24]. These studies show further evidence that informal diagrams are commonly used and they carry knowledge that is worth preserving. Indeed, a flexible modelling approach would not be worth the effort if the knowledge captured in the informal diagrams was merely for one-time communication of an idea. While this may be the initial purpose of informal diagrams, when their purpose changes to being part of the system documentation, a need for well-defined modelling representations arises.

The reverse direction—allowing formal models to get informal—has been studied too. A common approach is to focus on the flexibility of the types associated with model elements. Some approaches focus on delaying typing (a posteriori) until it is needed, or simply do not enforce typing at all [3, 10, 17, 19, 22]. Other approaches allow model elements to have multiple types or to be retyped dynamically [5, 8, 20]. These approaches also support informal diagramming, albeit from the starting point of models that conform to modelling languages and that are created in modelling tools.

3 Methodology

We present exploratory research on the practices of informal diagrams in both open-source projects and closed-source industrial projects. The study consists of mining open-source repositories to extract informal diagrams and a sample of industrial practices from ongoing collaborations in our research projects portfolio. The overall research flow is shown in Fig. 1. In this section, we describe the design of our repository mining study and the collection of industrial practices.

3.1 Repository mining

In this step, we collected informal diagrams from GitHub.^{Footnote 1} We sought those types of diagrams that are used as development artefacts, for example to capture architecture designs, or to serve as up-to-date documentation of other aspects of the software. Given this rather narrow focus, we chose to search for informal diagrams by file types, as produced by tools that are commonly used to create such diagrams. We have selected a sample of tools based on our prior knowledge of such tools and an informal internet search for informal diagramming tools. While images may contain informal diagrams as well, we excluded them from the search, since they are most likely to be one-time used artefacts, or contain other information than development artefacts.

Before mining repositories for informal diagrams, we performed a quick exploration using GitHub advanced search^{Footnote 2} to determine the most frequently occurring file type from tools commonly used for informal diagramming. The search string used had the format “path:*.<extension>”. An overview of the searched file extensions and the reported number of occurrences of those files is included in Table 1. Note that the advanced search does not return precise numbers, moreover, reporting them in that detail would not be beneficial since they change on a daily basis. The main point to take away from Table 1 is that we found out that.drawio is by far the most occurring of the searched file formats. Due to the nature of the mining procedure, which we will see later, it is most efficient to search for a single file type, and since Draw.io files were in such an overwhelming majority, we chose to gather informal diagrams limited to the.drawio extension.

Table 1 Github advanced search on 24 January 2024, number of results per file extension (in thousands)

Full size table

There are numerous pitfalls to consider when mining data from GitHub. For example, many repositories are personal projects, others do not show software development, and many are inactive [16]. Considering this, we wanted to collect informal diagrams from active repositories that show the software development of real (i.e. non-personal) projects. Conversely, we wanted to exclude repositories that contain homework assignments or answers to interview questions. Concretely, we formulated the following inclusion and exclusion criteria for repositories in our search.

Inclusion criteria

IC1
The repository must have at least 5000 stars^{Footnote 3}
IC2
The repository must be active in the last month
IC3
The repository must include at least one file with extension “.drawio”.

Exclusion criteria

EC1
Repositories containing only educational materials such as tutorials, course materials, or answers to common interview questions
EC2
Repositories using a natural language other than English

As a crude measure, we collected all.drawio diagrams from the main branches of repositories with at least 5000 stars. Although being aware that stars do not say everything, they give at least some indication of the popularity of a project. Moreover, we limited our search to projects that were active in the last month. We assume that active projects are more representative of current usage of informal diagrams. Another rationale for the cut-off point at 5000 stars and activity in the last month is to simply limit the size of the dataset to be analysed manually. As this is an exploratory study, we do not aim for completeness but rather aim to ensure a valid sample of current practices. We used the public GitHub API to collect the repositories with the following restrictions in the query following from the inclusion criteria: "stars:>5000 pushed:>2024-01-01". We collected the data on 31 January 2024.

Out of nearly 5000 repositories in total, 52 repositories containing in total 245 informal diagrams matched the inclusion criteria. After manually applying the exclusion criteria, we were left with 104 informal diagrams across 41 repositories. In total, 105 diagrams from 4 repositories were excluded due to EC1 and EC2. Thirty-four diagrams from 5 repositories were excluded due to EC1 only. Two diagrams from 2 repositories were excluded due to EC2 only.

Figure 2 shows an example of encountered informal diagrams.

3.2 Classifying the diagrams

We analysed the diagrams by considering the following four aspects that we used for classifying the diagrams as well.

1.
Content. What does the diagram represent? (e.g. network architecture, component interaction, or user interface)
2.
Purpose. What would be the usage of the diagram? (e.g. documentation or specification)
3.
Notation. Does the diagram conform to any known syntax (even only partially)? (e.g. UML class diagram, something similar to a sequence diagram, or flow chart)
4.
Creation and maintenance. How is the diagram created and maintained? Do multiple people collaborate on the diagram?

The answers to these questions are based on the layout and contents of the drawing, as well as on metadata available from mining the repository such as file name and the number of commits. The coding relies on experience and knowledge of the authors. For the purpose of the diagrams, we have distinguished between documentation of the software itself and documentation of the development process, based on the diagram types encountered. For the notation, we did not have an apriori list of known syntaxes to compare to, nor a strict definition of similarity. However, considering the knowledge of the persons rating the diagrams, at least the following syntaxes have been considered: UML/SysML, ER, BPMN, Petri nets, Simulink, Ecore, C4 model, ArchiMate, and general types of diagrams such as statecharts and flow charts. As a result, the types included in Table 3 represent a broad range of encountered diagrams, from rather specific to more general types. We attempt to summarize these types into the three categories further discussed in Sect. 4.

As it was a first exploration, we rely on agreement among researchers to arrive at the labels ultimately assigned to each diagram. Given the partially subjective nature of this classification, each diagram was coded by two out of three researchers. We divided the dataset into two parts, the first researcher (ordered as in the paper’s authors list) coded both parts, while the second and third coded one part each. We discuss the results of the classification in Sect. 4.

3.3 Collecting industrial practices

In the context of an ongoing industry–academia collaborative research project, we have discussed with engineers several of their practices that, to some extent, rely on informal diagrams in their software engineering process. We talked to teams at five companies, their practices are briefly described in Table 2. We did not use a fixed means to collect the data, rather these are convenience samples based on conversations during project meetings with the partners. We had at least a one-hour online meeting with engineers from each company. For some of them, we had multiple meetings to discuss their practices and possible research directions in this and related topics. The discussions were focused on typical engineering activities in daily practice and were not limited in scope to the practices within our research collaboration. The summary descriptions in Table 2 are resulting from these discussions.

Table 2 Brief descriptions of industrial practices in which informal diagrams are used as development artefacts. Each row corresponds to practices at a separate company

Full size table

We typically talked to two or three experienced engineers per company in roles such as architect. The discussed practices typically apply to a given team within the company. Since the companies may be larger than just those teams we talked to, there may be multiple different practices in each company. The companies develop complex software-intensive systems, and the teams are involved in developing and maintaining components of these systems. They are thus describing software with a long lifetime and that needs to be integrated in larger systems. The practices we describe are summarized per team, where each team is at a separate company.

Our collection of practices is another convenience sample and it is not a complete overview of all possible practices. Hence, Table 2 should be seen as serving only as illustrative of how informal diagrams are sometimes used in industry. We do not have extensive insights data on wider usage, but it can be assumed that these teams are not the only ones using informal diagramming. As this paper reports on an initial exploratory study, we do not focus on strengthening this dataset to enable generalizing the results.

4 Results

In this section, we present the results of the exploratory repository mining phase and an overview of the collected industrial practices. For both, we discuss the findings according to the four classification aspects listed in Sect. 3.2.

4.1 Repository mining

From the 104 inspected diagrams, 28 were marked as not representing development artefacts. Instead, they represented elements that do not describe the structure or behaviour of the system under development, such as example data, screenshots of code snippets, algorithm explanations, or a logo. Moreover, some were too unclear to mark as anything. The remaining 76 diagrams were classified as shown in Table 3. The complete set of mined diagrams is made available to the interested reader [13].

Content Table 3 lists the results of classifying the informal diagrams by their types. Furthermore, we divide the collected types into three categories: architecture, software, and development. Architecture diagrams represent high-level descriptions of a system. These diagrams are the most commonly encountered, and 51 diagrams in our dataset are of this type. Software diagrams represent aspects of the system at the implementation/configuration level. We included 18 of them in the collected data. Diagrams representing developer documentation occur the least and we included 7 of them in the collected data.

The assigned types are the result of agreement among the two authors labelling the diagrams. We use common software engineering vocabulary to indicate the content of the diagrams but do not have an underlying precise definition of these types. Instead, we consider these as illustrative of what such diagrams may represent, thereby directly contributing to understanding how informal diagrams are used in software engineering.

Table 3 Types of informal diagrams as found in the open-source projects, sorted by their categories, including the number of diagrams found in each category

Full size table

Purpose Almost all the diagrams seem to be used for documentation purposes. None of the diagrams contains enough detail to be used for specification, that is, to prescribe the structure or behaviour of a system. Rather, the diagrams seek to document and explain how the software is built, configured, or deployed. Moreover, we labelled 5 out of 76 diagrams as development documentation. These diagrams do not concern directly the software under development but rather describe things such as the working procedures with Git or around releases.

Notation Eleven diagrams conformed partially or completely to a syntax according to a known modelling paradigm or language. Four diagrams use entity–relationship (ER) diagram notation. Two diagrams follow UML class diagram syntax and one follows UML sequence diagram syntax. In addition, 4 diagrams used a general flow chart notation, which is similar to the activity diagram notation of UML.

The majority of diagrams do not conform to any syntaxes known by us (nor do they declare a specific syntax that is used) but rather use ad hoc notations. A list of considered syntax is mentioned in Sect. 3. In general, the diagrams are boxes and arrows, for which rarely a meaning or legend is provided. By boxes and arrows, here we refer to those kinds of diagrams without explicitly defined syntax or semantics, and without using an established notation such as UML. They are simply graph structures and can be distinguished from high-level UML “boxes and lines” diagrams due to the difference in concrete syntax. Commonly, icons from the existing drawing palettes of the tools are used to imply a meaning for some of the boxes. Some diagrams include a lot of text to better describe what the diagram is intended to represent. This further implies that the diagrams are used for documentation and explanation/communication, rather than as a design specification.

Creation and maintenance We noticed a low number of commits on the informal diagram files, the median is 1 and the maximum in our dataset is 6. There are some perils to these numbers. For example, diagrams may have been renamed and their corresponding history lost. Moreover, diagrams may be updated many times in a particular branch but then copied and uploaded as a new “final” diagram in the main branch. However, with a median of 1 and an average of 1.5, we can safely conclude that most diagrams are not updated during development.

4.2 Classifying industrial practices

For each of the industrial practices briefly described in Table 2, we aimed to answer the same questions as for the informal diagrams, following the classification approach described in Sect. 3.2. The results are summarized in Table 4.

Table 4 Summary of state of practice in industry P1—P5 (see Table 2) for the four aspects listed in Sect. 3.2

Full size table

Content We see mostly informal diagrams at the architecture level, four of the five practices include diagrams at this level. Commonly, the diagrams describe component views of the system, including interfaces or specific connections to other components. This is consistent with several formal means of describing software architectures. However, the practices are using other informal notations to reflect architectures. This can have multiple causes. Sometimes, these are legacy practices that grow to include an increasing number of informal diagrams, simply because that is the way it has been done previously. Another reason could be that there is not a central control and multiple-team model, for specific reasons, their architectures in their own ways. In those cases, there is little incentive to adopt canonical modelling languages and tools, since modelling in each small scope does not require any level of formality.

Purpose The purpose of the architecture diagrams is commonly twofold. First, they are created in the early stages to capture design decisions. The diagrams survive until later development stages when they are referred to as approximate documentation for the system. Commonly, there is no automated support to update the diagram or to check how far it has deviated from the code. Consequently, the documentation is considered with caution, it may not tell the whole story, and it is certainly not a source of truth about the system, but it rather gives engineers ideas on what the overall architecture looks like.

Notation The informal nature of the diagrams becomes evident when analysing the used notations. In the listed practices, the engineers do not follow existing modelling languages and use instead their own notations. Partially, these are similar to UML diagrams such as deployment diagrams or sequence diagrams, which is not surprising given the history of UML as a union of commonly used notations.

Creation and maintenance Within the industrial practices, the target audience of the diagrams is commonly a small development team. Typically a single person, or a single role such as architect (that may be filled by multiple persons) is responsible for the creation of the diagrams. In four of the practices, there are little to no updates to the diagrams. This implies an initial usage for high-level design and perhaps communication, but interestingly, the diagrams are not deleted but are stored in the repositories. That implies that, later in the development, the diagrams are consulted for getting information about the system, even if at that point there is no concrete link between the diagram and the system and, so, no knowledge about how well it represents the system.

5 Discussion

In this section, we discuss the similarities and differences between the usage of informal diagrams in the sampled practices. Furthermore, we discuss our experiences during this exploratory study. Lastly, we discuss relevant future research directions based on our findings.

5.1 Contents of the diagrams

In both datasets, diagrams contain mostly high-level descriptions of the structure and sometimes behaviour of the system. This level of abstraction is pertinent during the initial design phases for engineers and remains significant in subsequent stages for comprehending how the system is built.

Communicating aspects of the software system is also a probable reason for the content of the few diagrams at a lower level of abstraction. In one exceptional case, code generation was used from informal diagrams, but all other cases represent other views of the system. In those views, the diagrams focus on conveying an explanation of other aspects of the system than its implementation. This also means that, at this level, stringent similarity between the diagram and implementation is not needed.

Among the industry practices, we encountered mostly informal diagramming at the level of the software architecture. Partly, that has to do with the collection of the practices (in our project, stakeholders are mostly interested in software architecture). Additionally, engineers do not need to communicate in terms of diagrams about either the software level or the meta-level (development process). Indeed, they still rely on other ways of communicating and establishing these aspects, such as in-person discussions, documented ways of working, and automated checkers for, for example, coding guidelines. This informal communication is presumably more difficult to achieve in a distributed team in open-source practices. Further research is needed to understand the dynamics of teams and all their motivations for choosing informal diagramming (see also Sect. 5.5) and possible benefits of conventional modelling in those settings.

5.2 Purpose of the diagrams

In open source, we commonly see that diagrams are used for documentation, as well as for explaining the setup and the way of working. In the industrial practices, we see that the diagrams are also used prescriptively, as a means to denote the design of the system upfront. This leads to a wish for more automated support to analyse the diagrams, especially if they are intended to be used at a later stage during the development, as a form of documentation.

Another difference comes in the target audience of the diagrams. In the industrial practices, the intention seems to be to share the intended architecture with multiple development teams, each with multiple members. There may not be much close control over how these teams communicate on the architecture, and thereby, they may draw their diagrams using their own ad hoc developed notations. The nature of open-source projects leads to a different type of communication challenge related to distributed development and hence a different purpose for the informal diagrams.

An additional difference between the datasets is that the diagrams are used more during the project in the industrial practices, where they are often treated as software development artefacts. Instead, in the collected dataset from open-source projects, the diagrams are rather used for a posteriori documentation.

Related studies report a common use of informal diagrams for understanding, design, and communication [1, 27, 30]. Our results are similar, we see limited use for other purposes such as creating automation, strict definitions of interfaces or the bottom-up creation of domain-specific modelling languages. This type of activity would require the use of canonical models due to the required strictness of conformance to a modelling language to be able to automatically process the models.

5.3 Notation

There are similarities between the types of notations used for informal diagrams in both datasets. In both cases, the notations are mostly ad hoc and sometimes show hints of similarities to established syntaxes, such as containing elements of UML diagrams. Surveyed engineers report similar usage, with mostly informal use of notation but still about half of them containing UML elements [1].

The choice of informal notation is presumably closely linked to the domain of the engineers and the purpose of the model. It would be interesting to further investigate to which extent the notations of existing graphical modelling languages are insufficient to meet the needs in practice. Alternatively, the notation may be of secondary interest, and instead, there may be other concerns driving the use of informal diagrams over canonical models.

For now, we have considered graphical notations. It may be relevant to extend the work in the future to include also textual representations. We know from industrial settings that textual model representations are often appreciated due to them being manipulated and versioned easier than graphical models.

5.4 Creation and maintenance

In the open-source dataset, we notice that most diagrams are not updated. This indicates that they are most probably not kept in sync with the evolving implementation, but rather represent static documentation. Some work goes even further to state that the lifespan of sketches is often only a few days [1]. While not that extreme, our results align with related work, in which it was found that most UML models (some of which contained in informal diagrams) were introduced at the start of projects and in majority never updated [12]. That is similar to all our industrial practices, where the diagrams are most often created once and rarely updated to reflect changes in the system. Partly, this is due to the abstraction level of the diagrams; at a high level of abstraction, not much changes throughout the system’s development.

We have limited data on collaboration in the open-source dataset due to the low number of commits. However, that fact in itself makes it plausible that there is little or no collaboration on those diagrams. This is similar to the industrial practices in which a single person, or at most a single role, is commonly responsible for creating the diagrams. Note that here we focus on the stakeholders creating and updating the diagrams. Possible collaboration in the target audience of the diagrams is not considered.

5.5 Lessons learnt from diagram classification

Classifying the diagrams was not straightforward. Given the absence of formal semantics or a legend stating the intended meaning of the symbols, there is some subjectivity in our interpretations of the intentions behind the diagrams. Part of the purpose of the diagrams is to document the project and thus to communicate design decisions even to uninitiated stakeholders, but based on the challenging nature of classifying these diagrams, we can say that it is hard to understand them in isolation. Presumably, more context is available to those who need to properly understand and work with them.

The use of informal diagrams in the industrial practices seems to be slightly more structured. At least, the diagrams have a clearer purpose and include legends as well as commonly used icons. This makes the diagrams closer to “models” usable for automation purposes (e.g. model transformations).

5.6 Limitations of the study

This work is exploratory in nature; such main underlying decision brings most limitations of the work. Our sample of industrial practices is a convenience sample and the sample of open-source practices is similarly opportunistic and not necessarily broadly representative of the domain. Moreover, it represents a snapshot of current practices and thereby ignores possibly relevant practices from the recent past. Nevertheless, from the perspective of an exploratory study, we find the collected data appropriate to illustrate several ways in which informal diagrams are used in practice.

The sampled industrial practices are limited and do not cover a complete overview. Nevertheless, these are existing practices and probably not unique to the specific teams and companies we talked to. Moreover, limiting our search of informal diagrams to open-source projects hosted on GitHub and to files with the.drawio extension has limited the size of our dataset. We cannot draw conclusions on the general usage of informal diagrams, but can indicate that at least the practices we encountered occur. It is likely that these practices are not unique to the selected samples, but we do not have data to support this intuition.

The comparisons we draw between practices in the two studied domains do not rely on the completeness of our exploration, but are influenced by it. Therefore, we ensure that the discussion points and future research directions we identify are the result of indications we have of usage of informal diagrams. We do not (and cannot) present definitive quantitative findings on the usage of informal diagrams, but focus instead on illustrative qualitative findings.

Future work can elaborate on several aspects that we have not considered in this work. For example, diagrams.net offers templates for certain types of diagrams including UML state machines, activity diagrams and class diagrams. Other drawing tools may provide similar templates that might impact the type of diagrams commonly encountered in practice. Other diagramming tools may also have other different impacts on the way of working with diagrams, which we have not considered in this study.

Due to its broad definition, it is very difficult to be complete in an assessment of practices of informal diagramming. We have made choices to significantly constrain the search space, for example by looking at a single file type, and limiting the repositories in which we search for these files. Consequently, the types of findings of this study are not generalizable ones about informal diagramming practices, but rather illustrative ones about several types of practices that we have seen.

5.7 Future research directions

Based on our observations in this exploratory work, we identified the following areas for further research.

Why practitioners create informal diagrams How can we identify what is missing in modelling tools and methods to improve its adoption and thereby support more automation in analysing, for example, aspects of software architectures? In this paper, we explored how engineers benefit from informal diagramming. We also saw its limitations, especially when informal diagrams are to be used as actual development artefacts and are expected to accurately reflect the system under development. A part of our answer to the question above may be derived from this limitation. The reason for informal diagramming does not seem to be the limited expressiveness of modelling languages, but rather the convenience and freedom to tailor notations to the diagrams’ purpose and target audience. For any short-lived explanation or communication about the system, informal diagramming is the obvious favourite. However, using these informal diagrams in the long term as development artefacts should be discouraged, since changing the purpose of a diagram at a later stage in the development crucially changes the diagram’s requirements; to be able to reliably use the diagram to reason about the system, at least the semantics of the diagram must be clear, and some degree of consistency with the implementation shall be in place. Among our industry contacts, software architecture practitioners feel the need for more automated support in their routines, and one way to achieve it is to capture designs in more conventional models, rather than in informal diagrams.

Providing maturing support for industrial practices Regardless of the specific ways by which companies are using informal diagramming, the underlying motivation is similar in all cases: to document and communicate design decisions. Nevertheless, they also commonly want to improve the amount of automation they can do surrounding their architecture. To facilitate automated manipulation and analysis of architectures, these must be captured into more strictly defined forms than informal diagrams with entirely free syntax and undefined semantics. At the same time, it may not always be possible to impose a certain means of modelling on a team. For example, an architecture may rely on third-party components, or individual teams or members may perceive the benefits in using canonical modelling for their particular tasks. Research shall focus on providing stepping stones from an informal diagramming starting point, towards a state of practice that can support more automation. Note that we are not advocating modellers to drop their models and start informal diagramming. Rather, we see a need for providing guidelines for software architects on what and how to model, when they want to achieve a certain level of automation.

In this regard, we identified the need for a maturity model that can help practitioners in (i) identifying the maturity of their current practices regarding modelling and automated support or lack thereof, (ii) listing possible gains from model-based automated support, and (iii) the means to migrate from the current practice to one enabling the envisioned goals. As a start for this work on a maturity model, we collaborated with multiple companies to identify how they currently carry out modelling activities (we summarized these practices in Table 2). Common questions that came up are: “why should we model?” “what should we model?” and “how should we model?” To provide guidance for practitioners in understanding the potential value of moving away from informal diagramming and benefit more from the powerful mechanisms of modelling and model-driven engineering, a maturity model could be very useful.

Flexible modelling One aspect of providing support for gradually introducing more automation for analysis of aspects of the software architecture is by supporting flexible modelling. When we acknowledge that a common starting point for modelling is with rather informal diagrams, and observe that the purpose of these diagrams sometimes changes during development, we can support practitioners by providing some flexibility in their modelling approach. That is, we may support practitioners by allowing them to both include informal aspects in canonical models and to establish semantics for informal diagrams allowing their partial use for automating certain activities (as shown in [15]).

6 Conclusion

In this paper, we explored the use of informal diagrams in open-source projects and in closed-source industrial practices by observing their content, purpose, notation, and the way they are created and maintained. We found similarities in notation, creation, and maintenance. We also found differences related to the content and perceived purpose of the diagrams. In industrial practices, we experienced the use of informal diagrams as development artefacts, as if they were canonical models. We have not been able to identify a similar use in the open-source repositories, although it cannot be ruled out. By better understanding the usage of informal diagrams, we ultimately aim to improve modelling practices and facilitate their gradual adoption. To this end, we speculate on areas of future work on providing a maturity model and guidelines for maturing modelling practices from various starting points in industry.

Notes

The set of mined diagrams is available to the interested reader [13].
https://github.com/search/advanced
Stars allow users to bookmark repositories, although “starring” a repository does not necessarily correlate with its quality, a large number of stars implies at least that a large number of people found the repository noteworthy. The lower limit of 5000 was chosen just to limit the number of results in our initial exploration.

References

Baltes, S., Diehl, S.: Sketches and diagrams in practice. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, pp. 530–541 (2014). https://doi.org/10.1145/2635868.2635891
Bartelt, C., Vogel, M., Warnecke, T.: Collaborative creativity: from hand drawn sketches to formal domain specific models and back again. In: MoRoCo@ECSCW, pp. 25–32 (2013)
Bider, I., Perjons, E., Bork, D.: Towards on-the-fly creation of modeling language jargons. In: ICTERI, pp. 142–157 (2021)
Brambilla, M., Cabot, J., Wimmer, M.: Model-driven software engineering in practice. Morgan & Claypool Publishers (2017). https://doi.org/10.1007/978-3-031-02549-5
Bruneliere, H., Garcia, J., Desfray, P., Khelladi, D.E., Hebig, R., Bendraou, R., Cabot, J.: On lightweight metamodel extension to support modeling tools agility. In: Modelling Foundations and Applications: 11th European Conference, ECMFA 2015, Held as Part of STAF 2015, LAquila, Italy, July 20-24, 2015. Proceedings 11, pp. 62–74. Springer (2015). https://doi.org/10.1007/978-3-319-21151-0_5
Bucchiarone, A., Ciccozzi, F., Lambers, L., Pierantonio, A., Tichy, M., Tisi, M., Wortmann, A., Zaytsev, V.: What is the future of modeling? IEEE Softw. 38(2), 119–127 (2021). https://doi.org/10.1109/MS.2020.3041522
Article Google Scholar
Chen, Q., Grundy, J., Hosking, J.: SUMLOW: early design-stage sketching of UML diagrams on an e-whiteboard. Softw. Pract. Exp. 38(9), 961–994 (2008). https://doi.org/10.1002/spe.856
Article Google Scholar
Degueule, T., Combemale, B., Blouin, A., Barais, O., Jézéquel, J.M.: Safe model polymorphism for flexible modeling. Comput. Lang. Syst. Struct. 49, 176–195 (2017). https://doi.org/10.1016/j.cl.2016.09.001
Article Google Scholar
Demuth, A., Lopez-Herrejon, R.E., Egyed, A.: Cross-layer modeler: a tool for flexible multilevel modeling with consistency checking. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pp. 452–455 (2011). https://doi.org/10.1145/2025113.2025189
Gogolla, M., Selic, B., Kästner, A., Degrandow, L., Namegni, C.: From object to class models: more steps towards flexible modeling (short paper). In: FPVM@ STAF, vol. 3250. CEUR-WS (2022)
Guerra, E., de Lara, J.: On the quest for flexible modelling. In: Proceedings of the 21th ACM/IEEE International conference on model driven engineering languages and systems, pp. 23–33 (2018). https://doi.org/10.1145/3239372.3239376
Hebig, R., Quang, T.H., Chaudron, M.R., Robles, G., Fernandez, M.A.: The quest for open source projects that use uml: mining github. In: Proceedings of the ACM/IEEE 19th international conference on model driven engineering languages and systems, pp. 173–183 (2016). https://doi.org/10.1145/2976767.2976778
Jongeling, R.: Informal diagrams (.drawio) mined from github projects with more than 5000 stars. https://doi.org/10.5281/zenodo.10820951
Jongeling, R., Ciccozzi, F.: Flexible modelling: a systematic literature review. J. Object Technol. (2024). https://doi.org/10.5381/jot.2024.23.3.a3
Article Google Scholar
Jongeling, R., Ciccozzi, F., Cicchetti, A., Carlson, J.: From informal architecture diagrams to flexible blended models. In: European conference on software architecture, pp. 143–158. Springer (2022). https://doi.org/10.1007/978-3-031-16697-6_10
Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: An in-depth study of the promises and perils of mining github. Empir. Softw. Eng. 21, 2035–2071 (2016). https://doi.org/10.1007/s10664-015-9393-5
Article Google Scholar
Kimelman, D., Hirschman, K.: A spectrum of flexibility-lowering barriers to modeling tool adoption. In: ICSE 2011 Workshop on flexible modeling tools (2011)
Kolovos, D.S., Matragkas, N.D., Rodríguez, H.H., Paige, R.F.: Programmatic muddle management. XM@MoDELS pp. 2–10 (2013)
Lara, J.D., Guerra, E.: A posteriori typing for model-driven engineering: concepts, analysis, and applications. ACM Trans. Softw. Eng. Methodol. (TOSEM) 25(4), 1–60 (2017). https://doi.org/10.1145/3063384
Article Google Scholar
Lara, J.D., Guerra, E., Kienzle, J.: Facet-oriented modelling. ACM Trans. Softw. Eng. Methodol. (TOSEM) 30(3), 1–59 (2021). https://doi.org/10.1145/3428076
Article Google Scholar
Mukherjee, D., Dhoolia, P., Sinha, S., Rembert, A.J., Gowri Nanda, M.: From informal process diagrams to formal process models. In: International conference on business process management, pp. 145–161. Springer (2010). https://doi.org/10.1007/978-3-642-15618-2_12
Nachreiner, L., Raschke, A., Stegmaier, M., Tichy, M.: CouchEdit: a relaxed conformance editing approach. In: Proceedings of the 23rd ACM/IEEE international conference on model driven engineering languages and systems: companion proceedings, pp. 1–5 (2020). https://doi.org/10.1145/3417990.3421401
Ossher, H., Bellamy, R., Simmonds, I., Amid, D., Anaby-Tavor, A., Callery, M., Desmond, M., de Vries, J., Fisher, A., Krasikov, S.: Flexible modeling tools for pre-requirements analysis: conceptual architecture and research challenges. In: Proceedings of the ACM international conference on Object oriented programming systems languages and applications, pp. 848–864 (2010). https://doi.org/10.1145/1869459.1869529
Peltonen, J., Felin, M., Vartiala, M.: From a freeform graphics tool to a repository based modeling tool. In: Proceedings of the fourth european conference on software architecture: companion volume, pp. 277–284 (2010). https://doi.org/10.1145/1842752.1842804
Reiz, A., Sandkuhl, K., Smirnov, A., Shilov, N.: Grass-root enterprise modeling: issues and potentials of retrieving models from powerpoint. In: The practice of enterprise modeling: 11th IFIP WG 8.1. Working Conference, PoEM 2018, Vienna, Austria, October 31–November 2, 2018, Proceedings 11, pp. 55–70. Springer (2018). https://doi.org/10.1007/978-3-030-02302-7_4
Robles, G., Ho-Quang, T., Hebig, R., Chaudron, M.R., Fernandez, M.A.: An extensive dataset of UML models in GitHub. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR), pp. 519–522. IEEE (2017). https://doi.org/10.1109/MSR.2017.48
Störrle, H.: How are conceptual models used in industrial software development? a descriptive survey. In: Proceedings of the 21st international conference on evaluation and assessment in software engineering, pp. 160–169 (2017). https://doi.org/10.1145/3084226.3084256
Verbruggen, C., Snoeck, M.: Practitioners’ experiences with model-driven engineering: a meta-review. Softw. Syst. Model. 22(1), 111–129 (2023). https://doi.org/10.1007/s10270-022-01020-1
Article Google Scholar
Vesin, B., Jolak, R., Chaudron, M.R.: Octouml: an environment for exploratory and collaborative software design. In: 2017 IEEE/ACM 39th International conference on software engineering companion (ICSE-C), pp. 7–10. IEEE (2017). https://doi.org/10.1109/ICSE-C.2017.19
Walny, J., Haber, J., Dörk, M., Sillito, J., Carpendale, S.: Follow that sketch: lifecycles of diagrams and sketches in software development. In: 2011 6th international workshop on visualizing software for understanding and analysis (VISSOFT), pp. 1–8. IEEE (2011). https://doi.org/10.1109/VISSOF.2011.6069462
Wüest, D., Seyff, N., Glinz, M.: FlexiSketch: a lightweight sketching and metamodeling approach for end-users. Softw. Syst. Model. 18, 1513–1541 (2019). https://doi.org/10.1007/s10270-017-0623-8
Article Google Scholar

Download references

Funding

Open access funding provided by Mälardalen University.

Author information

Authors and Affiliations

School of Innovation, Design and Engineering (IDT), Mälardalen University, Västerås, Sweden
Robbert Jongeling, Antonio Cicchetti & Federico Ciccozzi

Authors

Robbert Jongeling
View author publications
Search author on:PubMed Google Scholar
Antonio Cicchetti
View author publications
Search author on:PubMed Google Scholar
Federico Ciccozzi
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Robbert Jongeling.

Additional information

Communicated by Javier Troya and Alfonso Pierantonio.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (csv 167 KB)

Supplementary file 2 (py 3 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jongeling, R., Cicchetti, A. & Ciccozzi, F. How are informal diagrams used in software engineering? An exploratory study of open-source and industrial practices. Softw Syst Model 24, 601–613 (2025). https://doi.org/10.1007/s10270-024-01252-3

Download citation

Received: 15 March 2024
Revised: 20 November 2024
Accepted: 29 November 2024
Published: 20 December 2024
Version of record: 20 December 2024
Issue date: June 2025
DOI: https://doi.org/10.1007/s10270-024-01252-3

Keywords

Profiles

Antonio Cicchetti View author profile

How are informal diagrams used in software engineering? An exploratory study of open-source and industrial practices

Abstract

Similar content being viewed by others

From Informal Architecture Diagrams to Flexible Blended Models

Model-Driven Production of Data-Centric Infographics: An Application to the Impact Measurement Domain

Diversity in UML Modeling Explained: Observations, Classifications and Theorizations

Explore related subjects

1 Introduction

2 Related work

2.1 Characterizing modelling and informal diagramming

2.2 Use of UML and informal diagrams in practice

2.3 Flexibility in modelling

3 Methodology

3.1 Repository mining

3.2 Classifying the diagrams

3.3 Collecting industrial practices

4 Results

4.1 Repository mining

4.2 Classifying industrial practices

5 Discussion

5.1 Contents of the diagrams

5.2 Purpose of the diagrams

5.3 Notation

5.4 Creation and maintenance

5.5 Lessons learnt from diagram classification

5.6 Limitations of the study

5.7 Future research directions

6 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (csv 167 KB)

Supplementary file 2 (py 3 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles