The Relevance of the Spectral OAS Ruleset (My Bachelor Thesis)

Jan 29, 25

Why?

In Germany many universities require an academical thesis at the end of undergraduate degree programs. As I studied in a university for applied sciences I had the chance to do this project with the supervision of the company I was working for (SAP LeanIX).

For some time I had worked with the OpenAPI Linting tool Spectral. Spectral has a default ruleset for compliance to the OpenAPI standard. I investigated how providers of HTTP APIs that use OpenAPI adhere to the rules made by Spectral. If a lot of providers of real world APIs would violate against rule it implies that the rule does not target critical errors.

In order to give developers a better error severity estimation I analyzed the linting results of over 4000 real world OpenAPI Specifications and mapped the Spectral rules to a new rule severity assignment.

What was done?

In the course of the thesis I wrote a data pipeline that runs the spectral linter for every OpenAPI Specification in the APIs.guru dataset. I spread the results to a boolean matrix over linter rules and OpenAPI Specifications that indicates whether a specification triggered a rule.

Using the inverted document frequency (IDF) as a relevancy metric every rule was given an importance value. Using the k-means clustering algorithm the rules were mapped to three importance levels (As the Spectral API Linter supports three severity levels). This is the new rule severity assignment that reflects how much a rule was adhered to in real world software projects.

Results

In total 929, 699 linter messages were recorded. 128 / 4136 Specifications do not trigger a single rule. The results show that most rules are violated commonly. Only 14 rules are retained by every Specification. These rules are mapped to the highest severity level (error). The other rules are distributed between the levels warn and hint.

Read the whole thesis

If you are interested in reading the whole thesis, or you are interested in how I set up my data analysis pipeline and the LaTeX project repository feel free to take a look here: github.com/paulbrenker/thesis

Latest Thesis PDFs

Loading latest PDFs...