Scroll to navigation

SQLFLUFF(1) SQLFluff SQLFLUFF(1)

NAME

sqlfluff - SQLFluff 2.3.5

Bored of not having a good SQL linter that works with whichever dialect you're working with? Fluff is an extensible and modular linter designed to help you write good SQL and catch errors and bad SQL before it hits your database.

Notable releases:

  • 1.0.x: First stable release, no major changes to take advantage of a point of relative stability.
  • 2.0.x: Recode of rules, whitespace fixing consolidation, sqlfluff format and removal of support for dbt versions pre 1.1. Note, that this release brings with it some breaking changes to rule coding and configuration, see Upgrading from 1.x to 2.0.

For more detail on other releases, see our Release Notes.

Want to see where and how people are using SQLFluff in their projects? Head over to SQLFluff in the Wild for inspiration.

GETTING STARTED

To get started just install the package, make a sql file and then run SQLFluff and point it at the file. For more details or if you don't have python or pip already installed see Getting Started.

$ pip install sqlfluff
$ echo "  SELECT a  +  b FROM tbl;  " > test.sql
$ sqlfluff lint test.sql --dialect ansi
== [test.sql] FAIL
L:   1 | P:   1 | LT01 | Expected only single space before 'SELECT' keyword.

| Found ' '. [layout.spacing] L: 1 | P: 1 | LT02 | First line should not be indented.
| [layout.indent] L: 1 | P: 1 | LT13 | Files must not begin with newlines or whitespace.
| [layout.start_of_file] L: 1 | P: 11 | LT01 | Expected only single space before binary operator '+'.
| Found ' '. [layout.spacing] L: 1 | P: 14 | LT01 | Expected only single space before naked identifier.
| Found ' '. [layout.spacing] L: 1 | P: 27 | LT01 | Unnecessary trailing whitespace at end of file.
| [layout.spacing] L: 1 | P: 27 | LT12 | Files must end with a single trailing newline.
| [layout.end_of_file] All Finished 📜 🎉!


CONTENTS

Getting Started

To get started with SQLFluff you'll need python and pip installed on your machine, if you're already set up, you can skip straight to Installing sqlfluff.

Installing Python

How to install python and pip depends on what operating system you're using. In any case, the python wiki provides up to date instructions for all platforms here.

There's a chance that you'll be offered the choice between python versions. Support for python 2 was dropped in early 2020, so you should always opt for a version number starting with a 3. As for more specific options beyond that, SQLFluff aims to be compatible with all current python versions, and so it's best to pick the most recent.

You can confirm that python is working as expected by heading to your terminal or console of choice and typing python --version which should give you a sensible read out and not an error.

$ python --version
Python 3.9.1


For most people, their installation of python will come with pip (the python package manager) preinstalled. To confirm this you can type pip --version similar to python above.

$ pip --version
pip 21.3.1 from ...


If however, you do have python installed but not pip, then the best instructions for what to do next are on the python website.

Installing SQLFluff

Assuming that python and pip are already installed, then installing SQLFluff is straight forward.

$ pip install sqlfluff


You can confirm its installation by getting SQLFluff to show its version number.

$ sqlfluff version
2.3.5


Basic Usage

To get a feel for how to use SQLFluff it helps to have a small .sql file which has a simple structure and some known issues for testing. Create a file called test.sql in the same folder that you're currently in with the following content:

SELECT a+b  AS foo,
c AS bar from my_table


You can then run sqlfluff lint test.sql --dialect ansi to lint this file.

$ sqlfluff lint test.sql --dialect ansi
== [test.sql] FAIL
L:   1 | P:   1 | LT09 | Select targets should be on a new line unless there is

| only one select target.
| [layout.select_targets] L: 1 | P: 1 | ST06 | Select wildcards then simple targets before calculations
| and aggregates. [structure.column_order] L: 1 | P: 7 | LT02 | Expected line break and indent of 4 spaces before 'a'.
| [layout.indent] L: 1 | P: 9 | LT01 | Expected single whitespace between naked identifier and
| binary operator '+'. [layout.spacing] L: 1 | P: 10 | LT01 | Expected single whitespace between binary operator '+'
| and naked identifier. [layout.spacing] L: 1 | P: 11 | LT01 | Expected only single space before 'AS' keyword. Found '
| '. [layout.spacing] L: 2 | P: 1 | LT02 | Expected indent of 4 spaces.
| [layout.indent] L: 2 | P: 9 | LT02 | Expected line break and no indent before 'from'.
| [layout.indent] L: 2 | P: 10 | CP01 | Keywords must be consistently upper case.
| [capitalisation.keywords] All Finished 📜 🎉!


You'll see that SQLFluff has failed the linting check for this file. On each of the following lines you can see each of the problems it has found, with some information about the location and what kind of problem there is. One of the errors has been found on line 1, position * (as shown by :code:`L: 1 | P: 9`) and it's a problem with rule *LT01 (for a full list of rules, see Rules Reference). From this (and the following error) we can see that the problem is that there is no space either side of the + symbol in a+b. Head into the file, and correct this issue so that the file now looks like this:

SELECT a + b  AS foo,
c AS bar from my_table


Rerun the same command as before, and you'll see that the original error (violation of LT01) no longer shows up.

$ sqlfluff lint test.sql --dialect ansi
== [test.sql] FAIL
L:   1 | P:   1 | LT09 | Select targets should be on a new line unless there is

| only one select target.
| [layout.select_targets] L: 1 | P: 1 | ST06 | Select wildcards then simple targets before calculations
| and aggregates. [structure.column_order] L: 1 | P: 7 | LT02 | Expected line break and indent of 4 spaces before 'a'.
| [layout.indent] L: 1 | P: 13 | LT01 | Expected only single space before 'AS' keyword. Found '
| '. [layout.spacing] L: 2 | P: 1 | LT02 | Expected indent of 4 spaces.
| [layout.indent] L: 2 | P: 9 | LT02 | Expected line break and no indent before 'from'.
| [layout.indent] L: 2 | P: 10 | CP01 | Keywords must be consistently upper case.
| [capitalisation.keywords]


To fix the remaining issues, we're going to use one of the more advanced features of SQLFluff, which is the fix command. This allows more automated fixing of some errors, to save you time in sorting out your sql files. Not all rules can be fixed in this way and there may be some situations where a fix may not be able to be applied because of the context of the query, but in many simple cases it's a good place to start.

For now, we only want to fix the following rules: LT02, LT12, CP01

$ sqlfluff fix test.sql --rules LT02,LT12,CP01 --dialect ansi
==== finding violations ====
== [test.sql] FAIL
L:   1 | P:   7 | LT02 | Expected line break and indent of 4 spaces before 'a'.

| [layout.indent] L: 2 | P: 1 | LT02 | Expected indent of 4 spaces.
| [layout.indent] L: 2 | P: 9 | LT02 | Expected line break and no indent before 'FROM'.
| [layout.indent] L: 2 | P: 10 | CP01 | Keywords must be consistently upper case.
| [capitalisation.keywords] ==== fixing violations ==== 4 fixable linting violations found Are you sure you wish to attempt to fix these? [Y/n]


...at this point you'll have to confirm that you want to make the changes by pressing y on your keyboard...

Are you sure you wish to attempt to fix these? [Y/n] ...
Attempting fixes...
Persisting Changes...
== [test.sql] PASS
Done. Please check your files to confirm.


If we now open up test.sql, we'll see the content is now different.

SELECT

a + b AS foo,
c AS bar FROM my_table


In particular:

  • The two columns have been indented to reflect being inside the SELECT statement.
  • The FROM keyword has been capitalised to match the other keywords.

We could also fix all of the fixable errors by not specifying --rules.

$ sqlfluff fix test.sql --dialect ansi
==== finding violations ====
== [test.sql] FAIL
L:   1 | P:   1 | ST06 | Select wildcards then simple targets before calculations

| and aggregates. [structure.column_order] L: 2 | P: 10 | LT01 | Expected only single space before 'AS' keyword. Found '
| '. [layout.spacing] ==== fixing violations ==== 2 fixable linting violations found Are you sure you wish to attempt to fix these? [Y/n] ... Attempting fixes... Persisting Changes... == [test.sql] PASS Done. Please check your files to confirm.


If we now open up test.sql, we'll see the content has been updated again.

SELECT

c AS bar,
a + b AS foo FROM my_table


The SQL statement is now well formatted according to all the rules defined in SQLFluff.

The --rules argument is optional, and could be useful when you or your organisation follows a slightly different convention than what we have defined.

Custom Usage

So far we've covered the stock settings of SQLFluff, but there are many different ways that people style their sql, and if you or your organisation have different conventions, then many of these behaviours can be configured. For example, given the example above, what if we actually think that indents should only be two spaces, and rather than uppercase keywords, they should all be lowercase?

To achieve this we create a configuration file named .sqlfluff and place it in the same directory as the current file. In that file put the following content:

[sqlfluff]
dialect = ansi
[sqlfluff:indentation]
tab_space_size = 2
[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = lower


Then rerun the same command as before.

$ sqlfluff fix test.sql --rules LT02,LT12,CP01,ST06,LT09,LT01


Then examine the file again, and you'll notice that the file has been fixed accordingly.

select

c as bar,
a + b as foo from my_table


For a full list of configuration options check out Default Configuration. Note that in our example here we've only set a few configuration values and any other configuration settings remain as per the default config. To see how these options apply to specific rules check out the "Configuration" section within each rule's documentation in Rules Reference.

Going further

From here, there are several more things to explore.

  • To understand how SQLFluff is interpreting your file explore the parse command. You can learn more about that command and more by running sqlfluff --help or sqlfluff parse --help.
  • To start linting more than just one file at a time, experiment with passing SQLFluff directories rather than just single files. Try running sqlfluff lint . (to lint every sql file in the current folder) or sqlfluff lint path/to/my/sqlfiles.
  • To find out more about which rules are available, see Rules Reference.
  • To find out more about configuring SQLFluff and what other options are available, see Configuration.

One last thing to note is that SQLFluff is a relatively new project and you may find bugs or strange things while using it. If you do find anything, the most useful thing you can do is to post the issue on GitHub where the maintainers of the project can work out what to do with it. The project is in active development and so updates and fixes may come out regularly.

SQL in the Wild

SQL has been around for a long time, as a language for communicating with databases, like a communication protocol. More recently with the rise of data as a business function, or a domain in its own right SQL has also become an invaluable tool for defining the structure of data and analysis - not just as a one off but as a form of infrastructure as code.

As analytics transitions from a profession of people doing one-offs, and moves to building stable and reusable pieces of analytics, more and more principles from software engineering are moving in the analytics space. One of the best articulations of this is written in the viewpoint section of the docs for the open-source tool dbt. Two of the principles mentioned in that article are quality assurance and modularity.

Quality assurance

The primary aim of SQLFluff as a project is in service of that first aim of quality assurance. With larger and larger teams maintaining large bodies of SQL code, it becomes more and more important that the code is not just valid but also easily comprehensible by other users of the same codebase. One way to ensure readability is to enforce a consistent style, and the tools used to do this are called linters.

Some famous linters which are well known in the software community are flake8 and jslint (the former is used to lint the SQLFluff project itself).

SQLFluff aims to fill this space for SQL.

Modularity

SQL itself doesn't lend itself well to modularity, so to introduce some flexibility and reusability it is often templated. Typically this is done in the wild in one of the following ways:

1.
Using the limited inbuilt templating abilities of a programming language directly. For example in python this would be using the format string syntax:

"SELECT {foo} FROM {tbl}".format(foo="bar", tbl="mytable")


Which would evaluate to:

SELECT bar FROM mytable


2.
Using a dedicated templating library such as jinja2. This allows a lot more flexibility and more powerful expressions and macros. See the Templating Configuration section for more detail on how this works.
Often there are tools like dbt or apache airflow which allow templated sql to be used directly, and they will implement a library like jinja2 under the hood themselves.


All of these templating tools are great for modularity but they also mean that the SQL files themselves are no longer valid SQL code, because they now contain these configured placeholder values, intended to improve modularity.

SQLFluff supports both of the templating methods outlined above, as well as dbt projects, to allow you to still lint these "dynamic" SQL files as part of your CI/CD pipeline (which is great 🙌), rather than waiting until you're in production (which is bad 🤦, and maybe too late).

During the CI/CD pipeline (or any time that we need to handle templated code), SQLFluff needs additional info in order to interpret your templates as valid SQL code. You do so by providing dummy parameters in SQLFluff configuration files. When substituted into the template, these values should evaluate to valid SQL (so SQLFluff can check its style, formatting, and correctness), but the values don't need to match actual values used in production. This means that you can use much simpler dummy values than what you would really use. The recommendation is to use the simplest possible dummy value that still allows your code to evaluate to valid SQL so that the configuration values can be as streamlined as possible.

Vision for SQLFluff

SQLFluff has a few components:

1.
A generic parser for SQL which aims to be able to unify SQL written in different dialects into a comparable format. The parser.
2.
A mechanism for measuring written SQL against a set of rules, with the added ability to fix any violations found. The linter.
3.
An opinionated set of guidelines for how SQL should be structured and formatted. The rules.

The core vision [1] for SQLFluff is to be really good at being the linter. The reasoning for this is outlined in SQL in the Wild.

Most of the codebase for SQLFluff is the parser, mostly because at the point of developing SQLFluff, there didn't appear to be a good option for a whitespace-aware parser that could be used instead.

With regards to the rules, SQLFluff aims to be opinionated but it also accepts that many organisations and groups have pre-existing strong conventions around how to write SQL and so ultimately SQLFluff should be flexible enough to support whichever rule set a user wishes to.

Notes

[1]
Credit to this article for highlighting the importance of a good vision.

Rolling out SQLFluff with a new team

Rolling out SQLFluff, like rolling out any other linter or style guide, is not just about the technical rollout, but also how you introduce the tool to the team and organisation around you.

The effect of SQLFluff should be to change your behaviours, not just your SQL.


With that in mind, it's worth reminding ourselves what we're trying to achieve with a tool like this. A set of potential success criteria might be:

1.
Faster comprehension and collaboration by the team on a shared codebase. This includes more effective (and more enjoyable) code review on top of code which is easy to review and build upon.
2.
Easier and faster onboarding for new team members. By adopting a style which is clean and consistent with other organisations we make it easier for new people to join the team.
3.
Improved adoption of shared SQL from other sources. If the SQL found in open source projects is easy to read and looks familiar then you're more likely to use it. This means more reusable code across the industry.
4.
Productive discussions around style. By defining your organisation's style guide in code, it means you can version control it, discuss changes and ultimately give a concrete output to discussions over style.

You like leading commas? Make a PR to .sqlfluff and let's discuss with the team what the implications would be.


Consider which of these success measures is most important and most desirable for your team. Write that down.

The following steps are a guide, which you should adapt to your organisation, and in particular its level of data maturity.

1. Assess the situation

This step is done by you, or a small group of people who already think that linting is a good idea.

  • Run sqlfluff lint on your project with the stock configuration to find out how things work out of the box.
  • Set up your Configuration so that things run and that you can get a readout of the errors which you would want the team to see and not the ones you don't. Great tools for this are to use Ignoring types of errors, --exclude-rules or --ignore in the CLI (see CLI Reference).
  • Identify which areas of your project are the worst and which are the tidiest. In particular, any areas which are particularly tidy already will be particularly useful in the next phase.

2. Make a plan

There are three sensible rollout phases:

1.
Pre CI/CD.
2.
Soft CI/CD (warnings but no strict fails).
3.
Hard CI/CD (violations mean deployments fail).

In each of these phases you have three levers to play with:

1.
Areas of the project in which to apply rules.
2.
Depth of rules enforced (this might also include whether to ignore parsing errors or not).
3.
Whether to just lint changes (Using SQLFluff on changes using diff-quality), or to lint all the existing code as well.

Work out a sensible roadmap of how hard you want to go in each phase. Be clear who is responsible for changes at each phase. An example plan might look like this:

1.
Pre CI/CD we get the tidiest area of a project to a stage that it fully passes the rules we eventually want to enforce. The core project team will do this. Liberal use of sqlfluff fix can be a lifesaver in this phase.
2.
Soft CI/CD is applied to the whole project, team members are encouraged to write tidy SQL, but not required to.
3.
Hard CI/CD is applied to the tidy areas of the project and also to any changes to the whole project. Anyone making changes is required to write SQL which passes check.
4.
Hard CI/CD is applied to the whole project on not just changes, with only a few particularly problematic files explicitly ignored using Ignoring types of errors.

3. Build the need

Bring your team together to introduce both linting as a concept and also SQLFluff as a tool. At this stage it's really important that the team understand *why* this is a good thing.

Consider whether to discuss the whole plan from step 2, or whether to only talk about the first few steps. Aim to make this an empowering experience that everyone can get involved with rather than another piece of admin they need to do.

At this stage, you might also want to consider other tools in the SQLFluff ecosystem such as the SQLFluff pre-commit hook and the SQLFluff VSCode plugin or SQLFluff online formatter.

4. Do, Review & Reassess

Once the plan is in motion, make sure to start putting in place norms and rituals around how you change the rules. In particular:

  • How would someone suggest changing the style guide or enabling/disabling a rule?
  • How do we assess whether the changes are working for the team or whether some are creating unnecessary stress?

It's normal for your usage of tools like SQLFluff to change and evolve over time. It's important to expect this change in advance, and welcome it when it happens. Always make sure you're driving toward the success measures you decided up front, rather than just resisting the change.

5. Spread the word 😁

Did it work? If so, spread the word. Tell a friend about SQLFluff.

If you're lucky they might share your views on comma placement 🤷‍♀️.

Let's talk about whitespace

If there is one part of building a linter that is going to be controversial it's going to be whitespace (closely followed by cApiTaLiSaTiOn 😁).

More specifically, whitespace divides into three key themes:

1.
Spacing: The amount of whitespace between elements on the same line.
2.
Line Breaks: The choice of where within the code it is inappropriate, appropriate or even compulsory to have a line break.
3.
Indentation: Given a line break, how much whitespace should precede the first code element on that line.

SQLFluff aims to be opinionated on this theme, but also configurable (see Configuring Layout). The tool will have a default viewpoint and will aim to have views on all of the important aspects of SQL layout, but if you (or your organisation) don't like those views then we aim to allow enough configuration that you can lint in line with your views, and still use SQLFluff. For more information on how to configure rules to your own viewpoint see Configuration.

NOTE:

This section of the docs handles the intent and reasoning behind how layout is handled by SQLFluff. For a deeper look at how this is achieved internally see Reflow Internals.


Spacing

Of the different elements of whitespace, spacing is likely the least controversial. By default, all elements are separated by a single space character. Except for very specific circumstances (see section on Aligned elements), any additional space between elements is usually unwanted and a distraction for the reader. There are however several common cases where no whitespace is more appropriate, which fall into two cases (for more details on where to configure these see Configuring layout and spacing).

1.
No whitespace but a newline is allowed. This option is configured using the touch option in the spacing_* configuration settings. The most common example of this is the spacing around commas. For example SELECT a , b would be unusual and more normally be written SELECT a, b. Inserting a newline between the a and comma would not cause issues and may even be desired, for example:

SELECT

col_a
, col_b
-- Newline present before column
, col_c
-- When inline, comma should still touch element before.
, GREATEST(col_d, col_e) as col_f FROM tbl_a


2.
No whitespace and a newline is not allowed. This option is configured using the inline option in the spacing_* configuration settings. The most common example of this is spacing within the parts of qualified identifier e.g. my_schema.my_table. If a newline were present between the . and either my_schema or my_table, then the expression would not parse and so no newlines should be allowed.

Aligned elements

A special case of spacing is where elements are set to be aligned within some limits. This is not enabled by default, but can be be configured to achieve layouts like:

SELECT

a AS first_column,
b AS second_column,
(a + b) / 2 AS third_column FROM foo AS bar


In this example, the alias expressions are all aligned with each other. To configure this, SQLFluff needs to know what elements to align and how far to search to find elements which should be aligned with each other. The configuration to achieve this layout is:

[sqlfluff:layout:type:alias_expression]
# We want non-default spacing _before_ the alias expressions.
spacing_before = align
# We want to align them within the next outer select clause.
# This means for example that alias expressions within the FROM
# or JOIN clause would _not_ be aligned with them.
align_within = select_clause
# The point at which to stop searching outward for siblings, which
# in this example would likely be the boundary of a CTE. Stopping
# when we hit brackets is usually a good rule of thumb for this
# configuration.
align_scope = bracketed


Of these configuration values, the align_scope is potentially the least obvious. The following example illustrates the impact it has.

-- With
--    align_scope = bracketed
--    align_within = select_clause
WITH foo as (

SELECT
a,
b,
c AS first_column
d + e AS second_column ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar; -- With -- align_scope = bracketed -- align_within = statement WITH foo as (
SELECT
a,
b,
c AS first_column
d + e AS second_column ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar -- Now the FROM alias is also aligned. -- With -- align_scope = file -- align_within = select_clause WITH foo as (
SELECT
a,
b,
c AS first_column -- Now the aliases here are aligned
d + e AS second_column -- with the outer query. ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar -- With -- align_scope = file -- align_within = statement WITH foo as (
SELECT
a,
b,
c AS first_column
d + e AS second_column ) SELECT
a AS first_column,
(a + b) / 2 AS third_column FROM foo AS bar


Line Breaks

When controlling line breaks, we are trying to achieve a few different things:

1.
Do we have enough line breaks that line length doesn't become excessive. Long lines are hard to read, especially given that readers may be on varying screen sizes or have multiple windows open. This is (of course) configurable, but the default is 80 characters (in line with the dbt Labs SQL style guide.)
2.
Is the positioning of blank lines (i.e. lines with nothing other than whitespace on them) appropriate. There are some circumstances where a blank line is desired (e.g. between CTEs). There are others where they are not, in particular multiple blank lines, for example at the beginning of a file.
3.
Where we do have line breaks, are they positioned appropriately and consistently with regards to other elements around them. This is most common when it comes to commas, and whether they should be leading (e.g. , my_column) or trailing (e.g. my_column,). In less common cases, it may also be desirable for some elements to have both a line break before and after (e.g. a set operator such as UNION).

Indentation

Lastly, given we have multiple lines of SQL, to what extent should we indent some lines to provide visual cues to the structure of that SQL. It's important to note that SQL is not whitespace sensitive in its interpretation and that means that any principles we apply here are entirely for the benefit of humans. Your database doesn't care.

The indentation therefore should be treated as a hint to the reader of the structure of the code. This explains the common practice within most languages that nested elements (for example the contents of a set of brackets in a function call) should be indented one step from the outer elements. It's also convention that elements with the same level in a nested structure should have the same indentation, at least with regards to their local surroundings. As an example:

SELECT

nested_within_select AS first_column,
some_function(
nested_within_function,
also_nested_within_function
) AS indented_the_same_as_opening_bracket FROM indented_the_same_as_select


Comment Indents

NOTE:

The notes here about block comments are not implemented prior to 2.0.x. They should be coming in that release or soon after.


Comments are dealt with differently, depending on whether they're block comments (/* like this */), which might optionally include newlines, or inline comments (-- like this) which are necessarily only on one line.

Block comments cannot share a line with any code elements (so in effect they must start on their own new line), they cannot be followed by any code elements on the same line (and so in effect must be followed by a newline, if we are to avoid trailing whitespace). None of the lines within the block comment may have an indent less than the first line of the block comment (although additional indentation within a comment is allowed), and that first line should be aligned with the first code element following the block comment.

SELECT

/* This is a block comment starting on a new line
which contains a newline (continuing with at least
the same indent.
- potentially containing greater indents
- having no other code following it in the same line
- and aligned with the line of code following it */
this_column as what_we_align_the_column_to FROM my_table


Inline comments can be on the same line as other code, but are subject to the same line-length restrictions. If they don't fit on the same line (or if it just looks nicer) they can also be the only element on a line. In this latter case, they should be aligned with the first code element following the comment.

SELECT

-- This is fine
this_column as what_we_align_to,
another_column as something_short, -- Is ok
case
-- This is aligned correctly with below
when indented then take_care
else try_harder
end as the_general_guidance -- Even here we align with the line below FROM my_table


NOTE:

When fixing issues with comment indentation, SQLFluff will attempt to keep comments in their original position but if line length concerns make this difficult, it will either abandon the fix, or move same line comments up and before the line they are currently on. This is in line with the assumption that comments on their own line refer to the elements of code which they come before, not after.



Hanging Indents

One approach to indenting nested elements is a layout called a hanging indent. In this layout, there is no line break before the first nested element, but subsequent elements are indented to match the line position of that first element. Two examples might be:

-- A select statement with two hanging indents:
SELECT no_line_break_before_me,

indented_to_match_the_first,
1 + (a
+ b) AS another_more_complex_example FROM my_table; -- This TSQL example is also in essence a hanging indent: DECLARE @prv_qtr_1st_dt DATETIME,
@last_qtr INT,
@last_qtr_first_mn INT,
@last_qtr_yr INT;


In some circumstances this layout can be quite neat (the DECLARE statement is a good example of this), however once indents are nested or indentation styles are mixed it can rapidly become confusing (as partially shown in the first example). Additionally, unless the leading element of the first line is very short, hanging indents use much larger indents than a traditional simple indent where a line break is used before the first element.

Hanging indents have been supported in SQLFluff up to the 1.x versions, however they will no longer by supported from 2.0.0 onwards. This is due to the ambiguity which they bring to fixing poorly formatted SQL. Take the following code:

SELECT   this_is,
badly_formatted, code_and,

not_obvious,
what_was, intended FROM my_table


Given the lack of line break between SELECT and this_is, it would appear that the user is intending a hanging indent, however it is also plausible that they did not and they just forgot to add a line break between them. This ambiguity is unhelpful, both for SQLFluff as a tool, but also for people who write SQL that there two ways of indenting their SQL. Given SQLFluff aims to provide consistency in SQL layout and remove some of the burden of needing to make choices like this - and that it would be very unusual to keep only hanging indents and disable traditional ones - the only route left to consistency is to not allow hanging indents. Starting in 2.0.0, any hanging indents detected will be converted to traditional indents.

Implicit Indents

A close cousin of the hanging indent is the implicit indent. While it does look a little like a hanging indent, it's much more consistent in its behaviour and is supported from SQLFluff 2.0.0 onwards.

An implicit indent is exactly like a normal indent, but doesn't have to be actually taken to influence the indentation of lines after it - it just needs to be left un-closed before the end of the line. These are normally available in clauses which take the form of KEYWORD <expression>, like WHERE clauses or CASE expressions.

-- This WHERE clause here takes advantage of an implicit indent.
SELECT *
FROM my_table
WHERE condition_a

AND condition_b; -- With implicit indents disabled (which is currently the -- default), the above formulation is not allowed, and instead -- there should be a newline immediately after `WHERE` (which -- is the location of the _implicit_ indent). SELECT * FROM my_table WHERE
condition_a
AND condition_b;


When addressing both indentation and line-length, implicit indents allow a slightly more compact layout, without significant drawbacks in legibility. They also enable a style much closer to some established style guides.

They are however not recommended by many of the major style guides at time of writing (including the dbt Labs SQL style guide and the Mozilla SQL style guide), and so are disabled by default. To enable them, set the allow_implicit_indents flag in sqluff.indentation to True.

Templated Indents

SQLFluff supports templated elements in code, such as those offered by jinja2 (or dbt which relies on it). For simple cases, templated elements are handled as you would expect by introducing additional indents into the layout.

SELECT

a,
{% for n in ['b', 'c', 'd'] %}
-- This section is indented relative to 'a' because
-- it is inside a jinja for loop.
{{ n }},
{% endfor %}
e FROM my_table


This functionality can be turned off if you wish using the template_blocks_indent option in your Configuration.

It's important to note here, that SQLFluff lints the code after it has been rendered, and so only has access to code which is still present after that process.

SELECT

a,
{% if False %}
-- This section of the code cannot be linted because
-- it is never rendered due to the `if False` condition.
my + poorly
+ spaced - and/indented AS section_of_code
{% endif %}
e FROM my_table


More complex templated cases are usually characterised by templated tags cutting across the parse tree. This more formally is where the opening and closing tags of a templated section exist at different levels in the parsed structure. Starting in version 2.x, these will be treated differently (Prior to version 2.x, situations like this were sometimes handled inconsistently or incorrectly).

Indentation should act as a visual cue to the structure of the written SQL, and as such, the most important thing is that template tags belonging to the same block structure use the same indentation. In the example below, this is the opening and closing elements of the second if statement. If treated as a simple case, these tags would have different indents, because they are at different levels of the parse tree and so clearly there is a conflict to be resolved.

The view SQLFluff takes on how to resolve this conflict is to pull all of the tags in this section down to the indent of the least indented (in the example below that would be the closing endif tag). This is similar to the treatment of C Preprocessor Directives, which are treated somewhat as being outside the structure of the rest of the file. In these cases, the content is also not further indented as in the simple case because it makes it harder to line up elements within the affected section and outside (in the example below the SELECT and FROM are a good illustration).

SELECT

a,
{% if True %}
-- This is a simple case. The opening and closing tag are
-- both at the same level within the SELECT clause.
simple_case AS example,
{% endif %}
b, {% if True %}
-- This is a complex case. The opening tag is within the SELECT
-- clause, but the closing tag is outside the statement
-- entirely.
complex_case AS example FROM table_option_one {% else %}
complex_case_two AS example FROM table_option_two {% endif %}


Configuring Layout

Configuration for layout is spread across three places:

1.
Indent behavior for particular dialect elements is controlled by the parser. This is because in the background SQLFluff inserts Indent and Dedent tokens into the parse tree where those things are expected. For more detail see Configuring indent locations.
2.
Configuration for the spacing and line position of particular types of element (such as commas or operators) is set in the layout section of the config file. For more detail see Configuring layout and spacing.
3.
Some elements of layout are still controlled by rules directly. These are usually very specific cases, see Rules Reference for more details.

Configuring indent locations

One of the key areas for this is the indentation of the JOIN expression, which we'll use as an example.

Semantically, a JOIN expression is part of the FROM expression and therefore would be expected to be indented. However according to many of the most common SQL style guides (including the dbt Labs SQL style guide and the Mozilla SQL style guide) the JOIN keyword is expected to at the same indent as the FROM keyword. By default, SQLFluff sides with the current consensus, which is to not indent the JOIN keyword, however this is one element which is configurable.

By setting values in the sqlfluff:indentation section of your config file you can control how this is parsed.

For example, the default indentation would be as follows:

SELECT

a,
b FROM my_table JOIN another_table
ON
condition1
AND condition2


By setting your config file to:

[sqlfluff:indentation]
indented_joins = True


Then the expected indentation will be:

SELECT

a,
b FROM my_table
JOIN another_table
ON
condition1
AND condition2


There is a similar indented_using_on config (defaulted to True) which can be set to False to prevent the USING or ON clause from being indented, in which case the original SQL would become:

SELECT

a,
b FROM my_table JOIN another_table ON
condition1
AND condition2


It's worth noting at this point, that for some users, the additional line break after ON is unexpected, and this is a good example of an implicit indent. By setting your config to:

[sqlfluff:indentation]
indented_using_on = False
allow_implicit_indents = True


Then the expected indentation will be:

SELECT

a,
b FROM my_table JOIN another_table ON condition1
AND condition2


There is also a similar indented_on_contents config (defaulted to True) which can be set to False to align any AND subsections of an ON block with each other. If set to False (assuming implicit indents are still enabled) the original SQL would become:

SELECT

a,
b FROM my_table JOIN another_table
ON condition1
AND condition2


These can also be combined, so if indented_using_on config is set to False, indented_on_contents is also set to False, and allow_implicit_indents is set tot True then the SQL would become:

SELECT

a,
b FROM my_table JOIN another_table ON condition1 AND condition2


There is also a similar indented_ctes config (defaulted to False) which can be set to True to enforce CTEs to be indented within the WITH clause:

WITH

some_cte AS (
SELECT 1 FROM table1
),
some_other_cte AS (
SELECT 1 FROM table1
) SELECT 1 FROM some_cte


There is also a similar indented_then config (defaulted to True) which can be set to False to allow THEN without an indent after WHEN:

SELECT

a,
CASE
WHEN b >= 42 THEN
1
ELSE 0
END AS c FROM some_table


By default, SQLFluff aims to follow the most common approach to indentation. However, if you have other versions of indentation which are supported by published style guides, then please submit an issue on GitHub to have that variation supported by SQLFluff.

Configuring layout and spacing

The [sqlfluff:layout] section of the config controls the treatment of spacing and line breaks across all rules. The syntax of this section is very expressive; however in normal use, only very small alterations should be necessary from the Default Configuration.

The syntax of the section headings here select by type, which corresponds to the type defined in the dialect. For example the following section applies to elements of the type comma, i.e. ,.

[sqlfluff:layout:type:comma]
spacing_before = touch
line_position = trailing


Within these configurable sections there are a few key elements which are available:

Spacing Elements: spacing_before, spacing_after and spacing_within. For each of these options, there are a few possible settings:
  • The default spacing for all elements is single unless otherwise specified. In this state, elements will be spaced with a single space character unless there is a line break between them.
  • The value of touch allows line breaks, but if no line break is present, then no space should be present. A great example of this is the spacing before commas (as shown in the config above), where line breaks may be allowed, but if not they should touch the element before.
  • Both of the above can be qualified with the :inline modifier - which prevents newlines within the segment. This is best illustrated by the spacing found in a qualified identifier like my_schema.my_table which uses touch:inline or other clauses where we want to force some elements to be on the same line.

Line Position: set using the line_position option. By default this is unset, which implies no particular line position requirements. The available options are:
trailing and leading, which are most common in the placement of commas. Both of these settings also allow the option of a comma on its own on a line, or in the middle of a line, but if there is a line break on either side then they make sure it's on the correct side. By default we assume trailing commas, but if you (or your organisation) have settled on leading commas then you should add the following section to your config:

[sqlfluff:layout:type:comma]
line_position = leading


  • alone, which means if there is a line break on either side, then there must be a line break on both sides (i.e. that it should be the only thing on that line.
  • All of the above options can be qualified with the :strict modifier - which prevents the inline case. For example:

-- Setting line_position to just `alone`
-- within [sqlfluff:layout:type:set_operator]
-- would not allow:
SELECT a
UNION SELECT b;
-- ...or...
SELECT a UNION
SELECT b;
-- but *would* allow both of the following:
SELECT a UNION SELECT b;
SELECT a
UNION
SELECT b;
-- However the default is set to `alone:strict`
-- then the *only* acceptable configuration is:
SELECT a
UNION
SELECT b;




Rules Reference

Rules in SQLFluff are implemented as crawlers. These are entities which work their way through the parsed structure of a query to evaluate a particular rule or set of rules. The intent is that the definition of each specific rule should be really streamlined and only contain the logic for the rule itself, with all the other mechanics abstracted away. To understand how rules are enabled and disabled see Enabling and Disabling Rules.

Core Rules

Certain rules belong to the core rule group. In order for a rule to be designated as core, it must meet the following criteria:

  • Stable
  • Applies to most dialects
  • Could detect a syntax issue
  • Isn’t too opinionated toward one style (e.g. the dbt style guide)

Core rules can also make it easier to roll out SQLFluff to a team by only needing to follow a 'common sense' subset of rules initially, rather than spending time understanding and configuring all the rules, some of which your team may not necessarily agree with.

We believe teams will eventually want to enforce more than just the core rules, and we encourage everyone to explore all the rules and customize a rule set that best suites their organization.

See the Configuration section for more information on how to enable only core rules by default.

Inline Ignoring Errors

SQLFluff features inline error ignoring. For example, the following will ignore the lack of whitespace surrounding the * operator.

a.a*a.b AS bad_1  -- noqa: LT01


Multiple rules can be ignored by placing them in a comma-delimited list.

a.a *  a.b AS bad_2,  -- noqa: LT01, LT03


It is also possible to ignore non-rule based errors, and instead opt to ignore templating (TMP) & parsing (PRS) errors.

WHERE

col1 = 2 AND
dt >= DATE_ADD(CURRENT_DATE(), INTERVAL -2 DAY) -- noqa: PRS


NOTE:

It should be noted that ignoring TMP and PRS errors can lead to incorrect sqlfluff lint and sqfluff fix results as SQLFluff can misinterpret the SQL being analysed.


Should the need arise, not specifying specific rules to ignore will ignore all rules on the given line.

a.a*a.b AS bad_3  -- noqa


Ignoring line ranges

Similar to pylint's "pylint" directive", ranges of lines can be ignored by adding -- noqa:disable=<rule>[,...] | all to the line. Following this directive, specified rules (or all rules, if "all" was specified) will be ignored until a corresponding -- noqa:enable=<rule>[,...] | all directive.

-- Ignore rule AL02 from this line forward
SELECT col_a a FROM foo -- noqa: disable=AL02
-- Ignore all rules from this line forward
SELECT col_a a FROM foo -- noqa: disable=all
-- Enforce all rules from this line forward
SELECT col_a a FROM foo -- noqa: enable=all


Rule Index

Dialects Reference

SQLFluff is designed to be flexible in supporting a variety of dialects. Not all potential dialects are supported so far, but several have been implemented by the community. Below are a list of the currently available dialects. Each inherits from another, up to the root ansi dialect.

For a canonical list of supported dialects, run the sqlfluff dialects command, which will output a list of the current dialects available on your installation of SQLFluff.

NOTE:

For technical users looking to add new dialects or add new features to existing ones, the dependent nature of how dialects have been implemented is to try and reduce the amount of repetition in how different elements are defined. As an example, when we say that the Redshift dialect inherits from the PostgreSQL dialect this is not because there is an agreement between those projects which means that features in one must end up in the other, but that the design of the Redshift dialect was heavily inspired by the postgres dialect and therefore when defining the dialect within sqlfuff it makes sense to use PostgreSQL as a starting point rather than starting from scratch.

Consider when adding new features to a dialect:

  • Should I be adding it just to this dialect, or adding it to a parent dialect?
  • If I'm creating a new dialect, which dialect would be best to inherit from?
  • Will the feature I'm adding break any downstream dependencies within dialects which inherit from this one?



ANSI

This is the base dialect which holds most of the definitions of common SQL commands and structures. If the dialect which you're actually using isn't specifically implemented by SQLFluff, using this dialect is a good place to start.

This dialect doesn't intend to be brutal in adhering to (and only to) the ANSI SQL spec (mostly because ANSI charges for access to that spec). It aims to be a representation of vanilla SQL before any other project adds their spin to it, and so may contain a slightly wider set of functions than actually available in true ANSI SQL.

Athena

The dialect for Amazon Athena.

BigQuery

The dialect for Google BigQuery.

ClickHouse

The dialect for ClickHouse.

Databricks

The dialect Databricks.

Db2

The dialect for Db2.

DuckDB

The dialect for DuckDB.

Exasol

The dialect for Exasol.

Greenplum

The dialect for Greenplum.

Hive

The dialect for Hive.

Materialize

The dialect for Materialize.

MySQL

The dialect for MySQL.

Oracle

The dialect for Oracle SQL. Note: this does not include PL/SQL.

PostgreSQL

This is based around the PostgreSQL spec. Many other SQL instances are often based on PostreSQL syntax. If you're running an unsupported dialect, then this is often the dialect to use (until someone makes a specific dialect).

Redshift

The dialect for Amazon Redshift.

Snowflake

The dialect for Snowflake, which has much of its syntax inherited from PostgreSQL.

SOQL

The dialect for SOQL (Salesforce Object Query Language).

SparkSQL

The dialect for Apache Spark SQL. It inherits from ANSI and includes relevant syntax from Greenplum for commands that permit Hive Format. Spark SQL extensions provided by the Delta Lake project are also implemented in this dialect.

This implementation focuses on the Ansi Compliant Mode introduced in Spark3, instead of being Hive Compliant. The introduction of ANSI Compliance provides better data quality and easier migration from traditional DBMS.

Versions of Spark prior to 3.x will only support the Hive dialect.

SQLite

The dialect for SQLite.

T-SQL

The dialect for T-SQL (aka Transact-SQL).

Teradata

The dialect for Teradata.

Trino

The dialect for Trino.

Production Usage & Security

SQLFluff is designed to be used both as a utility for developers but also to be part of CI/CD pipelines.

Security Considerations

A full list of Security Advisories is available on GitHub.

Given the context of how SQLFluff is designed to be used, there are three different tiers of access which users may have access to manipulate how the tool functions in a secure environment.

1.
Users may have edit access to the SQL code which is being linted. While SQLFluff does not execute the SQL itself, in the process of the templating step (in particular via jinja or dbt), certain macros may have the ability to execute arbitrary SQL code (e.g. the dbt run_query macro). For the Jinja templater, SQLFluff uses the Jinja2 SandboxedEnvironment to limit the execution on unsafe code. When looking to further secure this situation, see below for ways to limit the ability of users to import other libraries.
2.
Users may have edit access to the SQLFluff :ref:`config-files`. In some (perhaps, many) environments, the users who can edit SQL files may also be able to access and edit the Configuration Files. It's important to note that because of In-File Configuration Directives, that users who can edit SQL files which are designed to be linted, will also have access to the vast majority of any configuration options available in Configuration Files. This means that there is minimal additional protection from restricting access to Configuration Files for users who already have access to edit the linting target files (as described above).
3.
Users may have access to change how SQLFluff is invoked. SQLFluff can be invoked either as a command line too or via the python API. Typically the method is fixed for a given application. When thinking about how to restrict the ability of users to call unsecure code, SQLFluff aims to provide options at the point of invocation. In particular, as described above, the primary risk vector for SQLFluff is the macro environment as described in Templating Configuration. To restrict users being able to bring arbitrary python methods into sqlfluff via the library_path configuration value (see Library Templating), we recommend that for secure environments you override this config value either by providing an override option to the FluffConfig object if using the Python API or via the --library-path CLI option:

To disable this option entirely via the CLI:

$ sqlfluff lint my_path --library-path none


To disable this option entirely via the python API:

"""This is an example of providing config overrides."""
from sqlfluff.core import FluffConfig, Linter
sql = "SELECT 1\n"
config = FluffConfig(

overrides={
"dialect": "snowflake",
# NOTE: We explicitly set the string "none" here rather
# than a None literal so that it overrides any config
# set by any config files in the path.
"library_path": "none",
} ) linted_file = Linter(config=config).lint_string(sql) assert linted_file.get_violations() == []



Using SQLFluff on a whole sql codebase

The exit code provided by SQLFluff when run as a command line utility is designed to assist usefulness in deployment pipelines. If no violations are found then the exit code will be 0. If violations are found then a non-zero code will be returned which can be interrogated to find out more.

  • An error code of 0 means operation success, no issues found.
  • An error code of 1 means operation success, issues found. For example this might mean that a linting issue was found, or that one file could not be parsed.
  • An error code of 2 means an error occurred and the operation could not be completed. For example a configuration issue or an internal error within SQLFluff.

Using SQLFluff on changes using diff-quality

For projects with large amounts of (potentially imperfect) SQL code, the full SQLFluff output could be very large, which can be distracting -- perhaps the CI build for a one-line SQL change shouldn't encourage the developer to fix lots of unrelated quality issues.

To support this use case, SQLFluff integrates with a quality checking tool called diff-quality. By running SQLFluff using diff-quality (rather than running it directly), you can limit the the output to the new or modified SQL in the branch (aka pull request or PR) containing the proposed changes.

Currently, diff-quality requires that you are using git for version control.

NOTE: Installing SQLFluff automatically installs the diff_cover package that provides the diff-quality tool.

Adding diff-quality to your builds

In your CI build script:

1. Set the current working directory to the git repository containing the SQL code to be checked.

2.
Run diff-quality, specifying SQLFluff as the underlying tool:

$ diff-quality --violations sqlfluff


The output will look something like:

-------------
Diff Quality
Quality Report: sqlfluff
Diff: origin/master...HEAD, staged and unstaged changes
-------------
sql/audience_size_queries/constraints/_postcondition_check_gdpr_compliance.sql (0.0%):
sql/audience_size_queries/constraints/_postcondition_check_gdpr_compliance.sql:5: Unquoted Identifiers must be consistently upper case.
-------------
Total:   1 line
Violations: 1 line
% Quality: 0%
-------------


These messages are basically the same as those provided directly by SQLFluff, although the format is a little different. Note that diff-quality only lists the line _numbers_, not the character position. If you need the character position, you will need to run SQLFluff directly.

For more information on diff-quality, see the documentation. It covers topics such as:

  • Generating HTML reports
  • Controlling which branch to compare against (i.e. to determine new/changed lines). The default is origin/master.
  • Configuring diff-quality to return an error code if the quality is too low.
  • Troubleshooting

Using pre-commit

pre-commit is a framework to manage git "hooks" triggered right before a commit is made.

A git hook is a git feature to "fire off custom scripts" when specific actions occur.

Using pre-commit with SQLFluff is a good way to provide automated linting to SQL developers.

With pre-commit, you also get the benefit of only linting/fixing the files that changed.

SQLFluff comes with two pre-commit hooks:

  • sqlfluff-lint: returns linting errors.
  • sqlfluff-fix: attempts to fix rule violations.

WARNING:

For safety reasons, sqlfluff-fix by default will not make any fixes in files that had templating or parse errors, even if those errors were ignored using noqa or --ignore`.

Although it is not advised, you can tell SQLFluff to try and fix these files by overriding the fix_even_unparsable setting in .sqlfluff config file or using the sqlfluff fix --FIX-EVEN-UNPARSABLE command line option.

Overriding this behavior may break your SQL. If you use this override, always be sure to review any fixes applied to files with templating or parse errors to verify they are okay.



You should create a file named .pre-commit-config.yaml at the root of your git project, which should look like this:

repos:
- repo: https://github.com/sqlfluff/sqlfluff

rev: 2.3.5
hooks:
- id: sqlfluff-lint
# For dbt projects, this installs the dbt "extras".
# You will need to select the relevant dbt adapter for your dialect
# (https://docs.getdbt.com/docs/available-adapters):
# additional_dependencies: ['<dbt-adapter>', 'sqlfluff-templater-dbt']
- id: sqlfluff-fix
# Arbitrary arguments to show an example
# args: [--rules, "LT02,CP02"]
# additional_dependencies: ['<dbt-adapter>', 'sqlfluff-templater-dbt']


When trying to use the dbt templater, uncomment the additional_dependencies to install the extras. This is equivalent to running pip install <dbt-adapter> sqlfluff-templater-dbt.

You can specify the version of dbt-adapter used in pre-commit, for example:

additional_dependencies : ['dbt-bigquery==1.0.0', 'sqlfluff-templater-dbt']


See the list of available dbt-adapters.

Note that you can pass the same arguments available through the CLI using args:.

Using GitHub Actions to Annotate PRs

There are two way to utilize SQLFluff to annotate Github PRs.

1.
When sqlfluff lint is run with the --format github-annotation-native option, it produces output formatted as Github workflow commands which are converted into pull request annotations by Github.
2.
When sqlfluff lint is run with the --format github-annotation option, it produces output compatible with this action. Which uses Github API to annotate the SQL in GitHub pull requests.

For more information and examples on using SQLFluff in GitHub Actions, see the sqlfluff-github-actions repository.

Configuration

SQLFluff accepts configuration either through the command line or through configuration files. There is rough parity between the two approaches with the exception that templating configuration must be done via a file, because it otherwise gets slightly complicated.

For details of what's available on the command line check out the CLI Reference.

Configuration Files

For file based configuration SQLFluff will look for the following files in order. Later files will (if found) will be used to overwrite any values read from earlier files.

  • setup.cfg
  • tox.ini
  • pep8.ini
  • .sqlfluff
  • pyproject.toml

Within these files, the first four will be read like a cfg file, and SQLFluff will look for sections which start with sqlfluff, and where subsections are delimited by a semicolon. For example the jinjacontext section will be indicated in the section started with [sqlfluff:jinjacontext].

For example, a snippet from a .sqlfluff file (as well as any of the supported cfg file types):

[sqlfluff]
templater = jinja
sql_file_exts = .sql,.sql.j2,.dml,.ddl
[sqlfluff:indentation]
indented_joins = False
indented_using_on = True
template_blocks_indent = False
[sqlfluff:templater]
unwrap_wrapped_queries = True
[sqlfluff:templater:jinja]
apply_dbt_builtins = True


For the pyproject.toml file, all valid sections start with tool.sqlfluff and subsections are delimited by a dot. For example the jinjacontext section will be indicated in the section started with [tool.sqlfluff.jinjacontext].

For example, a snippet from a pyproject.toml file:

[tool.sqlfluff.core]
templater = "jinja"
sql_file_exts = ".sql,.sql.j2,.dml,.ddl"
[tool.sqlfluff.indentation]
indented_joins = False
indented_using_on = True
template_blocks_indent = False
[tool.sqlfluff.templater]
unwrap_wrapped_queries = True
[tool.sqlfluff.templater.jinja]
apply_dbt_builtins = True
# For rule specific configuration, use dots between the names exactly
# as you would in .sqlfluff. In the background, SQLFluff will unpack the
# configuration paths accordingly.
[tool.sqlfluff.rules.capitalisation.keywords]
capitalisation_policy = "upper"


New Project Configuration

When setting up a new project with SQLFluff, we recommend keeping your configuration file fairly minimal. The config file should act as a form of documentation for your team i.e. a record of what decisions you've made which govern how your format your SQL. By having a more concise config file, and only defining config settings where they differ from the defaults - you are more clearly stating to your team what choices you've made.

However, there are also a few places where the default configuration is designed more for existing projects, rather than fresh projects, and so there is an opportunity to be a little stricter than you might otherwise be with an existing codebase.

Here is a simple configuration file which would be suitable for a starter project:

[sqlfluff]
# Supported dialects https://docs.sqlfluff.com/en/stable/dialects.html
# Or run 'sqlfluff dialects'
dialect = snowflake
# One of [raw|jinja|python|placeholder]
templater = jinja
# Comma separated list of rules to exclude, or None
# See https://docs.sqlfluff.com/en/stable/configuration.html#enabling-and-disabling-rules
# AM04 (ambiguous.column_count) and ST06 (structure.column_order) are
# two of the more controversial rules included to illustrate usage.
exclude_rules = ambiguous.column_count, structure.column_order
# The standard max_line_length is 80 in line with the convention of
# other tools and several style guides. Many projects however prefer
# something a little longer.
# Set to zero or negative to disable checks.
max_line_length = 120
# CPU processes to use while linting.
# The default is "single threaded" to allow easy debugging, but this
# is often undesirable at scale.
# If positive, just implies number of processes.
# If negative or zero, implies number_of_cpus - specified_number.
# e.g. -1 means use all processors but one. 0 means all cpus.
processes = -1
# If using the dbt templater, we recommend setting the project dir.
[sqlfluff:templater:dbt]
project_dir = ./
[sqlfluff:indentation]
# While implicit indents are not enabled by default. Many of the
# SQLFluff maintainers do use them in their projects.
allow_implicit_indents = True
# The default configuration for aliasing rules is "consistent"
# which will auto-detect the setting from the rest of the file. This
# is less desirable in a new project and you may find this (slightly
# more strict) setting more useful.
[sqlfluff:rules:aliasing.table]
aliasing = explicit
[sqlfluff:rules:aliasing.column]
aliasing = explicit
[sqlfluff:rules:aliasing.length]
min_alias_length = 3
# The default configuration for capitalisation rules is "consistent"
# which will auto-detect the setting from the rest of the file. This
# is less desirable in a new project and you may find this (slightly
# more strict) setting more useful.
# Typically we find users rely on syntax highlighting rather than
# capitalisation to distinguish between keywords and identifiers.
# Clearly, if your organisation has already settled on uppercase
# formatting for any of these syntax elements then set them to "upper".
# See https://stackoverflow.com/questions/608196/why-should-i-capitalize-my-sql-keywords-is-there-a-good-reason
[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = lower
[sqlfluff:rules:capitalisation.identifiers]
capitalisation_policy = lower
[sqlfluff:rules:capitalisation.functions]
extended_capitalisation_policy = lower
[sqlfluff:rules:capitalisation.literals]
capitalisation_policy = lower
[sqlfluff:rules:capitalisation.types]
extended_capitalisation_policy = lower


Nesting

SQLFluff uses nesting in its configuration files, with files closer overriding (or patching, if you will) values from other files. That means you'll end up with a final config which will be a patchwork of all the values from the config files loaded up to that path. The exception to this is the value for templater, which cannot be set in config files in subdirectories of the working directory. You don't need any config files to be present to make SQLFluff work. If you do want to override any values though SQLFluff will use files in the following locations in order, with values from later steps overriding those from earlier:

0.
[...and this one doesn't really count] There's a default config as part of the SQLFluff package. You can find this below, in the Default Configuration section.
1.
It will look in the user's os-specific app config directory. On macOS and Unix this is ~/.config/sqlfluff, Windows is <home>\AppData\Local\sqlfluff\sqlfluff, for any of the filenames above in the main Configuration section. If multiple are present, they will patch/override each other in the order above.
2.
It will look for the same files in the user's home directory (~).
3.
It will look for the same files in the current working directory.
4.
[if parsing a file in a subdirectory of the current working directory] It will look for the same files in every subdirectory between the current working dir and the file directory.
5.
It will look for the same files in the directory containing the file being linted.

This whole structure leads to efficient configuration, in particular in projects which utilise a lot of complicated templating.

In-File Configuration Directives

In addition to configuration files mentioned above, SQLFluff also supports comment based configuration switching in files. This allows specific SQL file to modify a default configuration if they have specific needs.

When used, these apply to the whole file, and are parsed from the file in an initial step before the rest of the file is properly parsed. This means they can be used for both rule configuration and also for parsing configuration.

To use these, the syntax must start as an inline sql comment beginning with sqlfluff (i.e. -- sqlfluff). The line is then interpreted as a colon-seperated address of the configuation value you wish to set. A few common examples are shown below:

-- Set Indented Joins
-- sqlfluff:indentation:indented_joins:True
-- Set a smaller indent for this file
-- sqlfluff:indentation:tab_space_size:2
-- Set keywords to be capitalised
-- sqlfluff:rules:capitalisation.keywords:capitalisation_policy:upper
SELECT *
FROM a

JOIN b USING(c)


We recommend only using this configuration approach for configuration that applies to one file in isolation. For configuration changes for areas of a project or for whole projects we recommend Nesting of configuration files.

Rule Configuration

Rules can be configured with the .sqlfluff config files.

Common rule configurations can be set in the [sqlfluff:rules] section.

For example:

[sqlfluff:rules]
allow_scalar = True
single_table_references = consistent
unquoted_identifiers_policy = all


Rule specific configurations are set in rule specific subsections.

For example, enforce that keywords are upper case by configuring the rule CP01:

[sqlfluff:rules:capitalisation.keywords]
# Keywords
capitalisation_policy = upper


All possible options for rule sections are documented in Rules Reference.

For an overview of the most common rule configurations that you may want to tweak, see Default Configuration (and use Rules Reference to find the available alternatives).

Enabling and Disabling Rules

The decision as to which rules are applied to a given file is applied on a file by file basis, by the effective configuration for that file. There are two configuration values which you can use to set this:

  • rules, which explicitly enables the specified rules. If this parameter is unset or empty for a file, this implies "no selection" and so "all rules" is taken to be the meaning.
  • exclude_rules, which explicitly disables the specified rules. This parameter is applied after the rules parameter so can be used to subtract from the otherwise enabled set.

Each of these two configuration values accept a comma separated list of references. Each of those references can be:

  • a rule code e.g. LN01
  • a rule name e.g. layout.indent
  • a rule alias, which is often a deprecated code e.g. L003
  • a rule group e.g. layout or capitalisation

These different references can be mixed within a given expression, which results in a very powerful syntax for selecting exactly which rules are active for a given file.

NOTE:

It's worth mentioning here that the application of rules and exclude_rules, with groups, aliases and names, in projects with potentially multiple nested configuration files defining different rules for different areas of a project can get very confusing very fast. While this flexibility is intended for users to take advantage of, we do have some recommendations about how to do this is a way that remains manageable.

When considering configuration inheritance, each of rules and exclude_rules will totally overwrite any values in parent config files if they are set in a child file. While the subtraction operation between both of them is calculated "per file", there is no combination operation between two definitions of rules (just one overwrites the other).

The effect of this is that we recommend one of two approaches:

1.
Simply only use rules. This has the upshot of each area of your project being very explicit in which rules are enabled. When that changes for part of your project you just reset the whole list of applicable rules for that part of the project.
2.
Set a single rules value in your master project config file and then only use exclude_rules in sub-configuration files to turn off specific rules for parts of the project where those rules are inappropriate. This keeps the simplicity of only having one value which is inherited, but allows slightly easier and simpler rollout of new rules because we manage by exception.



For example, to disable the rules LT08 and RF02:

[sqlfluff]
exclude_rules = LT08, RF02


To enable individual rules, configure rules, respectively.

For example, to enable RF02:

[sqlfluff]
rules = RF02


Rules can also be enabled/disabled by their grouping. Right now, the only rule grouping is core. This will enable (or disable) a select group of rules that have been deemed 'core rules'.

[sqlfluff]
rules = core


More information about 'core rules' can be found in the Rules Reference.

Additionally, some rules have a special force_enable configuration option, which allows to enable the given rule even for dialects where it is disabled by default. The rules that support this can be found in the Rules Reference.

The default values can be seen in Default Configuration.

See also: Ignoring Errors & Files.

Downgrading rules to warnings

To keep displaying violations for specific rules, but not have those issues lead to a failed run, rules can be downgraded to warnings. Rules set as warnings won't cause a file to fail, but will still be shown in the CLI to warn users of their presence.

The configuration of this behaves very like exclude_rules above:

[sqlfluff]
warnings = LT01, LT04


With this configuration, files with no other issues (other than those set to warn) will pass. If there are still other issues, then the file will still fail, but will show both warnings and failures.

== [test.sql] PASS
L:   2 | P:   9 | LT01 | WARNING: Missing whitespace before +
== [test2.sql] FAIL
L:   2 | P:   8 | CP02 | Unquoted identifiers must be consistently upper case.
L:   2 | P:  11 | LT01 | WARNING: Missing whitespace before +


This is particularly useful as a transitional tool when considering the introduction on new rules on a project where you might want to make users aware of issues without blocking their workflow (yet).

Layout & Spacing Configuration

The [sqlfluff:layout] section of the config controls the treatment of spacing and line breaks across all rules. To understand more about this section, see the section of the docs dedicated to layout: Configuring Layout.

Templating Configuration

This section explains how to configure templating for SQL files.

When writing SQL files, users might utilise some kind of templating. The SQL file itself is written with placeholders which get rendered to proper SQL at run time. This can range from very simple placeholder templating to complex Jinja templating.

SQLFluff supports templated sections in SQL, see Stage 1, the templater. This is achieved by the following set of operations:

1.
SQLFluff pre-renders the templated SQL
2.
SQLFluff applies the lint and fix operations to the rendered file
3.
SQLFluff backports the rule violations to the templated section of the SQL.

SQLFluff does not automatically have access to the same environment used in production template setup. This means it is necessary to either provide that environment or provide dummy values to effectively render the template and generate valid SQL. Refer to the templater sections below for details.

SQLFluff natively supports the following templating engines

  • Jinja templater
  • Placeholder templater
  • Python templater

Also, SQLFluff has an integration to use dbt as a templater.

dbt templater (via plugin which is covered in a different section).

NOTE:

Templaters may not be able to generate a rendered SQL that cover the entire raw file.

For example, if the raw SQL uses a {% if condition %} block, the rendered version of the template will only include either the {% then %} or the {% else %} block (depending on the provided configuration for the templater), but not both.

In this case, because SQLFluff linting can only operate on the output of the templater, some areas of the raw SQL will never be seen by the linter and will not be covered by lint rules.

This is functionality we hope to support in future.



Generic Variable Templating

Variables are available in all the templaters. By default the templating engine will expect variables for templating to be available in the config, and the templater will be look in the section corresponding to the context for that templater. By convention, the config for the jinja templater is found in the sqlfluff:templater:jinja:context section, the config for the python templater is found in the sqlfluff:templater:python:context section, the one for the placeholder templater is found in the sqlfluff:templater:placeholder:context section.

For example, if passed the following .sql file:

SELECT {{ num_things }} FROM {{ tbl_name }} WHERE id > 10 LIMIT 5


...and the following configuration in .sqlfluff in the same directory:

[sqlfluff:templater:jinja:context]
num_things=456
tbl_name=my_table


...then before parsing, the sql will be transformed to:

SELECT 456 FROM my_table WHERE id > 10 LIMIT 5


NOTE:

If there are variables in the template which cannot be found in the current configuration context, then this will raise a SQLTemplatingError and this will appear as a violation without a line number, quoting the name of the variable that couldn't be found.


Jinja templater

The Jinja templater uses Jinja2 to render templates.

There are multiple, complementary ways of configuring the Jinja templater.

  • Reading variables and Jinja macros directly from the SQLFLuff config file
  • Loading macros from a path
  • Using a library

Overview of Jinja templater's configuration options

Configuration Variables Macros Filters Documentation
Config file Complex Jinja Variable Templating and Jinja Macro Templating (from config)
Macro Path Jinja Macro Templating (from file)
Library Library Templating

For example, a snippet from a .sqlfluff file that uses all config options:

[sqlfluff]
templater = jinja
[sqlfluff:templater:jinja]
apply_dbt_builtins = True
load_macros_from_path = my_macros
library_path = sqlfluff_libs
[sqlfluff:templater:jinja:context]
my_list = ['a', 'b', 'c']
MY_LIST = ("d", "e", "f")
my_where_dict = {"field_1": 1, "field_2": 2}
[sqlfluff:templater:jinja:macros]
a_macro_def = {% macro my_macro(n) %}{{ n }} + {{ n * 2 }}{% endmacro %}


Complex Jinja Variable Templating

Apart from the Generic variable templating that is supported for all templaters, two more advanced features of variable templating are available for Jinja.

case sensitivity and native python types. Both are illustrated in the following example:

[sqlfluff:templater:jinja:context]
my_list = ['a', 'b', 'c']
MY_LIST = ("d", "e", "f")
my_where_dict = {"field_1": 1, "field_2": 2}


SELECT

{% for elem in MY_LIST %}
'{{elem}}' {% if not loop.last %}||{% endif %}
{% endfor %} as concatenated_list FROM tbl WHERE
{% for field, value in my_where_dict.items() %}
{{field}} = {{value}} {% if not loop.last %}and{% endif %}
{% endfor %}


...will render as...

SELECT

'd' || 'e' || 'f' as concatenated_list FROM tbl WHERE
field_1 = 1 and field_2 = 2


Note that the variable was replaced in a case sensitive way and that the settings in the config file were interpreted as native python types.

Jinja Macro Templating (from config)

Macros (which also look and feel like functions are available only in the jinja templater. Similar to Generic Variable Templating, these are specified in config files, what's different in this case is how they are named. Similar to the context section above, macros are configured separately in the macros section of the config. Consider the following example.

If passed the following .sql file:

SELECT {{ my_macro(6) }} FROM some_table


...and the following configuration in .sqlfluff in the same directory (note the tight control of whitespace):

[sqlfluff:templater:jinja:macros]
a_macro_def = {% macro my_macro(n) %}{{ n }} + {{ n * 2 }}{% endmacro %}


...then before parsing, the sql will be transformed to:

SELECT 6 + 12 FROM some_table


Note that in the code block above, the variable name in the config is a_macro_def, and this isn't apparently otherwise used anywhere else. Broadly this is accurate, however within the configuration loader this will still be used to overwrite previous values in other config files. As such this introduces the idea of config blocks which could be selectively overwritten by other configuration files downstream as required.

Jinja Macro Templating (from file)

In addition to macros specified in the config file, macros can also be loaded from files or folders. This is specified in the config file:

[sqlfluff:templater:jinja]
load_macros_from_path = my_macros


load_macros_from_path is a comma-separated list of .sql files or folders. Locations are relative to the config file. For example, if the config file above was found at /home/my_project/.sqlfluff, then SQLFluff will look for macros in the folder /home/my_project/my_macros/ (but not subfolders). Any macros defined in the config will always take precedence over a macro defined in the path.

  • .sql files: Macros in these files are available in every .sql file without requiring a Jinja include or import.
  • Folders: To use macros from the .sql files in folders, use Jinja include or import as explained below.

Note: The load_macros_from_path setting also defines the search path for Jinja include or import. Unlike with macros (as noted above), subdirectories are supported. For example, if load_macros_from_path is set to my_macros, and there is a file my_macros/subdir/my_file.sql, you can do:

{% include 'subdir/include_comment.sql' %}


NOTE:

Throughout the templating process whitespace will still be treated rigorously, and this includes newlines. In particular you may choose to provide dummy macros in your configuration different from the actual macros used in production.

REMEMBER: The reason SQLFluff supports macros is to enable it to parse templated sql without it being a blocker. It shouldn't be a requirement that the templating is accurate - it only needs to work well enough that parsing and linting are helpful.



Builtin Jinja Macro Blocks

One of the main use cases which inspired SQLFluff as a project was dbt. It uses jinja templating extensively and leads to some users maintaining large repositories of sql files which could potentially benefit from some linting.

NOTE:

SQLFluff has now a tighter integration with dbt through the "dbt" templater. It is the recommended templater for dbt projects. If used, it eliminates the need for the overrides described in this section.

To use the dbt templater, go to dbt templater.



SQLFluff anticipates this use case and provides some built in macro blocks in the Default Configuration which assist in getting started with dbt projects. In particular it provides mock objects for:

  • ref: The mock version of this provided simply returns the model reference as the name of the table. In most cases this is sufficient.
  • config: A regularly used macro in dbt to set configuration values. For linting purposes, this makes no difference and so the provided macro simply returns nothing.

NOTE:

If there are other builtin macros which would make your life easier, consider submitting the idea (or even better a pull request) on github.


Library Templating

If using SQLFluff with jinja as your templater, you may have library function calls within your sql files that can not be templated via the normal macro templating mechanisms:

SELECT foo, bar FROM baz {{ dbt_utils.group_by(2) }}


To template these libraries, you can use the sqlfluff:jinja:library_path config option:

[sqlfluff:templater:jinja]
library_path = sqlfluff_libs


This will pull in any python modules from that directory and allow sqlfluff to use them in templates. In the above example, you might define a file at sqlfluff_libs/dbt_utils.py as:

def group_by(n):

return "GROUP BY 1,2"


If an __init__.py is detected, it will be loaded alongside any modules and submodules found within the library path.

SELECT

{{ custom_sum('foo', 'bar') }},
{{ foo.bar.another_sum('foo', 'bar') }} FROM
baz


sqlfluff_libs/__init__.py:

def custom_sum(a: str, b: str) -> str:

return a + b


sqlfluff_libs/foo/__init__.py:

# empty file


sqlfluff_libs/foo/bar.py:

def another_sum(a: str, b: str) -> str:

return a + b


Additionally, the library can be used to expose Jinja Filters to the Jinja environment used by SQLFluff.

This is achieve by setting a global variable named SQLFLUFF_JINJA_FILTERS. SQLFLUFF_JINJA_FILTERS is a dictionary where

  • dictionary keys map to the Jinja filter name
  • dictionary values map to the Python callable

For example, to make the Airflow filter ds available to SQLFLuff, add the following to the __init__.py of the library:

# https://github.com/apache/airflow/blob/main/airflow/templates.py#L53
def ds_filter(value: datetime.date | datetime.time | None) -> str | None:

"""Date filter."""
if value is None:
return None
return value.strftime("%Y-%m-%d") SQLFLUFF_JINJA_FILTERS = {"ds": ds_filter}


Now, ds can be used in SQL

SELECT "{{ "2000-01-01" | ds }}";


Interaction with --ignore=templating

Ignoring Jinja templating errors provides a way for users to use SQLFluff while reducing or avoiding the need to spend a lot of time adding variables to [sqlfluff:templater:jinja:context].

When --ignore=templating is enabled, the Jinja templater behaves a bit differently. This additional behavior is usually but not always helpful for making the file at least partially parsable and fixable. It definitely doesn’t guarantee that every file can be fixed, but it’s proven useful for some users.

Here's how it works:

  • Within the expanded SQL, undefined variables are automatically replaced with the corresponding string value.
  • If you do: {% include query %}, and the variable query is not defined, it returns a “file” containing the string query.
  • If you do: {% include "query_file.sql" %}, and that file does not exist or you haven’t configured a setting for load_macros_from_path, it returns a “file” containing the text query_file.

For example:

select {{ my_variable }}
from {% include "my_table.sql" %}


is interpreted as:

select my_variable
from my_table


The values provided by the Jinja templater act a bit (not exactly) like a mixture of several types:

  • str
  • int
  • list
  • Jinja's Undefined class

Because the values behave like Undefined, it's possible to replace them using Jinja's default() filter. For example:

select {{ my_variable | default("col_a") }}
from my_table


is interpreted as:

select col_a
from my_table


Placeholder templater

Libraries such as SQLAlchemy or Psycopg use different parameter placeholder styles to mark where a parameter has to be inserted in the query.

For example a query in SQLAlchemy can look like this:

SELECT * FROM table WHERE id = :myid


At runtime :myid will be replace by a value provided by the application and escaped as needed, but this is not standard SQL and cannot be parsed as is.

In order to parse these queries is then necessary to replace these placeholders with sample values, and this is done with the placeholder templater.

Placeholder templating can be enabled in the config using:

[sqlfluff]
templater = placeholder


A few common styles are supported:


-- colon
WHERE bla = :my_name
-- colon_nospaces
-- (use with caution as more prone to false positives)
WHERE bla = table:my_name
-- numeric_colon
WHERE bla = :2
-- pyformat
WHERE bla = %(my_name)s
-- dollar
WHERE bla = $my_name or WHERE bla = ${my_name}
-- question_mark
WHERE bla = ?
-- numeric_dollar
WHERE bla = $3 or WHERE bla = ${3}
-- percent
WHERE bla = %s
-- ampersand
WHERE bla = &s or WHERE bla = &{s} or USE DATABASE MARK_{ENV}


These can be configured by setting param_style to the names above:

[sqlfluff:templater:placeholder]
param_style = colon
my_name = 'john'


then you can set sample values for each parameter, like my_name above. Notice that the value needs to be escaped as it will be replaced as a string during parsing. When the sample values aren't provided, the templater will use parameter names themselves by default.

When parameters are positional, like question_mark, then their name is simply the order in which they appear, starting with 1.

[sqlfluff:templater:placeholder]
param_style = question_mark
1 = 'john'


In case you need a parameter style different from the ones above, you can pass a custom regex.

[sqlfluff:templater:placeholder]
param_regex = __(?P<param_name>[\w_]+)__
my_name = 'john'


N.B. quotes around param_regex in the config are interpreted literally by the templater. e.g. param_regex='__(?P<param_name>[w_]+)__' matches '__some_param__' not __some_param__

the named parameter param_name will be used as the key to replace, if missing, the parameter is assumed to be positional and numbers are used instead.

Also consider making a pull request to the project to have your style added, it may be useful to other people and simplify your configuration.

Python templater

Uses native Python f-strings. As described in Generic Variable Templating, an example usage would look be configured as follows:

If passed the following .sql file:

SELECT * FROM {tbl_name}


...and the following configuration in .sqlfluff in the same directory:

[sqlfluff]
templater = python
[sqlfluff:templater:python:context]
tbl_name = my_table


...then before parsing, the sql will be transformed to:

SELECT * FROM my_table


dbt templater

NOTE:

From sqlfluff version 0.7.0 onwards, the dbt templater has been moved to a separate plugin and python package. Projects that were already using the dbt templater may initially fail after an upgrade to 0.7.0+. See the installation instructions below to install the dbt templater.

dbt templating is still a relatively new feature added in 0.4.0 and is still in very active development! If you encounter an issue, please let us know in a GitHub issue or on the SQLFluff slack workspace.



dbt is not the default templater for SQLFluff (it is jinja). dbt is a complex tool, so using the default jinja templater will be simpler. You should be aware when using the dbt templater that you will be exposed to some of the complexity of dbt. Users may wish to try both templaters and choose according to how they intend to use SQLFluff.

A simple rule of thumb might be:

  • If you are using SQLFluff in a CI/CD context, where speed is not critical but accuracy in rendering sql is, then the dbt templater may be more appropriate.
  • If you are using SQLFluff in an IDE or on a git hook, where speed of response may be more important, then the jinja templater may be more appropriate.

Pros:

Most (potentially all) macros will work

Cons:

More complex, e.g. using it successfully may require deeper understanding of your models and/or macros (including third-party macros)
  • More configuration decisions to make
  • Best practices are not yet established or documented

If your dbt model files access a database at compile time, using SQLFluff with the dbt templater will also require access to a database.
Note that you can often point SQLFluff and the dbt templater at a test database (i.e. it doesn't have to be the production database).

Runs slower

Installation & Configuration

In order to get started using SQLFluff with a dbt project you will first need to install the relevant dbt adapter for your dialect and the sqlfluff-templater-dbt package using your package manager of choice (e.g. pip install dbt-postgres sqlfluff-templater-dbt) and then will need the following configuration:

In .sqlfluff:

[sqlfluff]
templater = dbt


In .sqlfluffignore:

target/
# dbt <1.0.0
dbt_modules/
# dbt >=1.0.0
dbt_packages/
macros/


You can set the dbt project directory, profiles directory and profile with:

[sqlfluff:templater:dbt]
project_dir = <relative or absolute path to dbt_project directory>
profiles_dir = <relative or absolute path to the directory that contains the profiles.yml file>
profile = <dbt profile>
target = <dbt target>


NOTE:

If the profiles_dir setting is omitted, SQLFluff will look for the profile in the default location, which varies by operating system. On Unix-like operating systems (e.g. Linux or macOS), the default profile directory is ~/.dbt/. On Windows, you can determine your default profile directory by running dbt debug --config-dir.


To use builtin dbt Jinja functions SQLFluff provides a configuration option that enables usage within templates.

[sqlfluff:templater:jinja]
apply_dbt_builtins = True


This will provide dbt macros like ref, var, is_incremental(). If the need arises builtin dbt macros can be customised via Jinja macros in .sqlfluff configuration file.

[sqlfluff:templater:jinja:macros]
# Macros provided as builtins for dbt projects
dbt_ref = {% macro ref(model_ref) %}{{model_ref}}{% endmacro %}
dbt_source = {% macro source(source_name, table) %}{{source_name}}_{{table}}{% endmacro %}
dbt_config = {% macro config() %}{% for k in kwargs %}{% endfor %}{% endmacro %}
dbt_var = {% macro var(variable, default='') %}item{% endmacro %}
dbt_is_incremental = {% macro is_incremental() %}True{% endmacro %}


If your project requires that you pass variables to dbt through command line, you can specify them in template:dbt:context section of .sqlfluff. See below configuration and its equivalent dbt command:

[sqlfluff:templater:dbt:context]
my_variable = 1


dbt run --vars '{"my_variable": 1}'


Known Caveats

  • To use the dbt templater, you must set templater = dbt in the .sqlfluff config file in the directory where sqlfluff is run. The templater cannot be changed in .sqlfluff files in subdirectories.
  • In SQLFluff 0.4.0 using the dbt templater requires that all files within the root and child directories of the dbt project must be part of the project. If there are deployment scripts which refer to SQL files not part of the project for instance, this will result in an error. You can overcome this by adding any non-dbt project SQL files to .sqlfluffignore.

CLI Arguments

You already know you can pass arguments (--verbose, --exclude-rules, etc.) through the CLI commands (lint, fix, etc.):

$ sqlfluff lint my_code.sql -v --exclude-rules LT08,RF02


You might have arguments that you pass through every time, e.g rules you always want to ignore. These can also be configured:

[sqlfluff]
verbose = 1
exclude_rules = LT08,RF02


Note that while the exclude_rules config looks similar to the above example, the verbose config has an integer value. This is because verbose is stackable meaning there are multiple levels of verbosity that are available for configuration. See CLI Reference for more details about the available CLI arguments. For more details about rule exclusion, see Enabling and Disabling Rules.

Ignoring Errors & Files

Ignoring individual lines

Similar to flake8's ignore, individual lines can be ignored by adding -- noqa to the end of the line. Additionally, specific rules can be ignored by quoting their code or the category.

-- Ignore all errors
SeLeCt  1 from tBl ;    -- noqa
-- Ignore rule CP02 & rule CP03
SeLeCt  1 from tBl ;    -- noqa: CP02,CP03
-- Ignore all parsing errors
SeLeCt from tBl ;       -- noqa: PRS


Ignoring line ranges

Similar to pylint's "pylint" directive", ranges of lines can be ignored by adding -- noqa:disable=<rule>[,...] | all to the line. Following this directive, specified rules (or all rules, if "all" was specified) will be ignored until a corresponding -- noqa:enable=<rule>[,...] | all directive.

-- Ignore rule AL02 from this line forward
SELECT col_a a FROM foo -- noqa: disable=AL02
-- Ignore all rules from this line forward
SELECT col_a a FROM foo -- noqa: disable=all
-- Enforce all rules from this line forward
SELECT col_a a FROM foo -- noqa: enable=all


Ignoring types of errors

General categories of errors can be ignored using the --ignore command line option or the ignore setting in .sqlfluffignore. Types of errors that can be ignored include:

  • lexing
  • linting
  • parsing
  • templating

.sqlfluffignore

Similar to Git's .gitignore and Docker's .dockerignore, SQLFluff supports a .sqlfluffignore file to control which files are and aren't linted. Under the hood we use the python pathspec library which also has a brief tutorial in their documentation.

An example of a potential .sqlfluffignore placed in the root of your project would be:

# Comments start with a hash.
# Ignore anything in the "temp" path
/temp/
# Ignore anything called "testing.sql"
testing.sql
# Ignore any ".tsql" files
*.tsql


Ignore files can also be placed in subdirectories of a path which is being linted and the sub files will also be applied within that subdirectory.

Default Configuration

The default configuration is as follows, note the Builtin Jinja Macro Blocks in section [sqlfluff:templater:jinja:macros] as referred to above.

NOTE:

This shows the entire default config. We do not recommend that users copy this whole config as the starter config file for their project.

This is for two reasons:

1.
The config file should act as a form of documentation for your team. A record of what decisions you've made which govern how your format your sql. By having a more concise config file, and only defining config settings where they differ from the defaults - you are more clearly stating to your team what choices you've made.
2.
As the project evolves, the structure of the config file may change and we will attempt to make changes as backward compatible as possible. If you have not overridden a config setting in your project, we can easily update the default config to match your expected behaviour over time. We may also find issues with the default config which we can also fix in the background. However, the longer your local config file, the more work it will be to update and migrate your config file between major versions.

If you are starting a fresh project and are looking for a good starter config, check out the New Project Configuration section above.



[sqlfluff]
# verbose is an integer (0-2) indicating the level of log output
verbose = 0
# Turn off color formatting of output
nocolor = False
# Supported dialects https://docs.sqlfluff.com/en/stable/dialects.html
# Or run 'sqlfluff dialects'
dialect = None
# One of [raw|jinja|python|placeholder]
templater = jinja
# Comma separated list of rules to check, default to all
rules = all
# Comma separated list of rules to exclude, or None
exclude_rules = None
# Below controls SQLFluff output, see max_line_length for SQL output
output_line_length = 80
# Number of passes to run before admitting defeat
runaway_limit = 10
# Ignore errors by category (one or more of the following, separated by commas: lexing,linting,parsing,templating)
ignore = None
# Warn only for rule codes (one of more rule codes, seperated by commas: e.g. LT01,LT02)
# Also works for templating and parsing errors by using TMP or PRS
warnings = None
# Whether to warn about unneeded '-- noqa:' comments.
warn_unused_ignores = False
# Ignore linting errors found within sections of code coming directly from
# templated code (e.g. from within Jinja curly braces. Note that it does not
# ignore errors from literal code found within template loops.
ignore_templated_areas = True
# can either be autodetect or a valid encoding e.g. utf-8, utf-8-sig
encoding = autodetect
# Ignore inline overrides (e.g. to test if still required)
disable_noqa = False
# Comma separated list of file extensions to lint
# NB: This config will only apply in the root folder
sql_file_exts = .sql,.sql.j2,.dml,.ddl
# Allow fix to run on files, even if they contain parsing errors
# Note altering this is NOT RECOMMENDED as can corrupt SQL
fix_even_unparsable = False
# Very large files can make the parser effectively hang.
# The more efficient check is the _byte_ limit check which
# is enabled by default. The previous _character_ limit check
# is still present for backward compatibility. This will be
# removed in a future version.
# Set either to 0 to disable.
large_file_skip_char_limit = 0
large_file_skip_byte_limit = 20000
# CPU processes to use while linting.
# If positive, just implies number of processes.
# If negative or zero, implies number_of_cpus - specified_number.
# e.g. -1 means use all processors but one. 0  means all cpus.
processes = 1
# Max line length is set by default to be in line with the dbt style guide.
# https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md
# Set to zero or negative to disable checks.
max_line_length = 80
[sqlfluff:indentation]
# See https://docs.sqlfluff.com/en/stable/layout.html#configuring-indent-locations
indent_unit = space
tab_space_size = 4
indented_joins = False
indented_ctes = False
indented_using_on = True
indented_on_contents = True
indented_then = True
indented_then_contents = True
allow_implicit_indents = False
template_blocks_indent = True
# This is a comma seperated list of elements to skip
# indentation edits to.
skip_indentation_in = script_content
# If comments are found at the end of long lines, we default to moving
# them to the line _before_ their current location as the convention is
# that a comment precedes the line it describes. However if you prefer
# comments moved _after_, this configuration setting can be set to "after".
trailing_comments = before
# Layout configuration
# See https://docs.sqlfluff.com/en/stable/layout.html#configuring-layout-and-spacing
[sqlfluff:layout:type:comma]
spacing_before = touch
line_position = trailing
[sqlfluff:layout:type:binary_operator]
spacing_within = touch
line_position = leading
[sqlfluff:layout:type:statement_terminator]
spacing_before = touch
line_position = trailing
[sqlfluff:layout:type:end_of_file]
spacing_before = touch
[sqlfluff:layout:type:set_operator]
line_position = alone:strict
[sqlfluff:layout:type:start_bracket]
spacing_after = touch
[sqlfluff:layout:type:end_bracket]
spacing_before = touch
[sqlfluff:layout:type:start_square_bracket]
spacing_after = touch
[sqlfluff:layout:type:end_square_bracket]
spacing_before = touch
[sqlfluff:layout:type:start_angle_bracket]
spacing_after = touch
[sqlfluff:layout:type:end_angle_bracket]
spacing_before = touch
[sqlfluff:layout:type:casting_operator]
spacing_before = touch
spacing_after = touch:inline
[sqlfluff:layout:type:slice]
spacing_before = touch
spacing_after = touch
[sqlfluff:layout:type:dot]
spacing_before = touch
spacing_after = touch
[sqlfluff:layout:type:comparison_operator]
spacing_within = touch
line_position = leading
[sqlfluff:layout:type:assignment_operator]
spacing_within = touch
line_position = leading
[sqlfluff:layout:type:object_reference]
spacing_within = touch:inline
[sqlfluff:layout:type:numeric_literal]
spacing_within = touch:inline
[sqlfluff:layout:type:sign_indicator]
spacing_after = touch:inline
[sqlfluff:layout:type:tilde]
spacing_after = touch:inline
[sqlfluff:layout:type:function_name]
spacing_within = touch:inline
spacing_after = touch:inline
[sqlfluff:layout:type:array_type]
spacing_within = touch:inline
[sqlfluff:layout:type:typed_array_literal]
spacing_within = touch
[sqlfluff:layout:type:sized_array_type]
spacing_within = touch
[sqlfluff:layout:type:struct_type]
spacing_within = touch:inline
[sqlfluff:layout:type:bracketed_arguments]
spacing_before = touch:inline
[sqlfluff:layout:type:typed_struct_literal]
spacing_within = touch
[sqlfluff:layout:type:semi_structured_expression]
spacing_within = touch:inline
spacing_before = touch:inline
[sqlfluff:layout:type:array_accessor]
spacing_before = touch:inline
[sqlfluff:layout:type:colon]
spacing_before = touch
[sqlfluff:layout:type:colon_delimiter]
spacing_before = touch
spacing_after = touch
[sqlfluff:layout:type:path_segment]
spacing_within = touch
[sqlfluff:layout:type:sql_conf_option]
spacing_within = touch
[sqlfluff:layout:type:sqlcmd_operator]
# NOTE: This is the spacing between the operator and the colon
spacing_before = touch
[sqlfluff:layout:type:comment]
spacing_before = any
spacing_after = any
[sqlfluff:layout:type:pattern_expression]
# Snowflake pattern expressions shouldn't have their spacing changed.
spacing_within = any
[sqlfluff:layout:type:placeholder]
# Placeholders exist "outside" the rendered SQL syntax
# so we shouldn't enforce any particular spacing around
# them.
spacing_before = any
spacing_after = any
[sqlfluff:layout:type:common_table_expression]
# The definition part of a CTE should fit on one line where possible.
# For users which regularly define column names in their CTEs they
# may which to relax this config to just `single`.
spacing_within = single:inline
# By setting a selection of clauses to "alone", we hint to the reflow
# algorithm that in the case of a long single line statement, the
# first place to add newlines would be around these clauses.
# Setting this to "alone:strict" would always _force_ line breaks
# around them even if the line isn't too long.
[sqlfluff:layout:type:select_clause]
line_position = alone
[sqlfluff:layout:type:where_clause]
line_position = alone
[sqlfluff:layout:type:from_clause]
line_position = alone
[sqlfluff:layout:type:join_clause]
line_position = alone
[sqlfluff:layout:type:groupby_clause]
line_position = alone
[sqlfluff:layout:type:orderby_clause]
# NOTE: Order by clauses appear in many places other than in a select
# clause. To avoid unexpected behaviour we use `leading` in this
# case rather than `alone`.
line_position = leading
[sqlfluff:layout:type:having_clause]
line_position = alone
[sqlfluff:layout:type:limit_clause]
line_position = alone
# Template loop tokens shouldn't dictate spacing around them.
[sqlfluff:layout:type:template_loop]
spacing_before = any
spacing_after = any
[sqlfluff:templater]
unwrap_wrapped_queries = True
[sqlfluff:templater:jinja]
apply_dbt_builtins = True
# Some rules can be configured directly from the config common to other rules
[sqlfluff:rules]
allow_scalar = True
single_table_references = consistent
unquoted_identifiers_policy = all
[sqlfluff:rules:capitalisation.keywords]
# Keywords
capitalisation_policy = consistent
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None
[sqlfluff:rules:capitalisation.identifiers]
# Unquoted identifiers
extended_capitalisation_policy = consistent
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None
[sqlfluff:rules:capitalisation.functions]
# Function names
extended_capitalisation_policy = consistent
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None
[sqlfluff:rules:capitalisation.literals]
# Null & Boolean Literals
capitalisation_policy = consistent
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None
[sqlfluff:rules:capitalisation.types]
# Data Types
extended_capitalisation_policy = consistent
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None
[sqlfluff:rules:ambiguous.join]
# Fully qualify JOIN clause
fully_qualify_join_types = inner
[sqlfluff:rules:ambiguous.column_references]
# GROUP BY/ORDER BY column references
group_by_and_order_by_style = consistent
[sqlfluff:rules:aliasing.table]
# Aliasing preference for tables
aliasing = explicit
[sqlfluff:rules:aliasing.column]
# Aliasing preference for columns
aliasing = explicit
[sqlfluff:rules:aliasing.length]
min_alias_length = None
max_alias_length = None
[sqlfluff:rules:aliasing.forbid]
# Avoid table aliases in from clauses and join conditions.
# Disabled by default for all dialects unless explicitly enabled.
# We suggest instead using aliasing.length (AL06) in most cases.
force_enable = False
[sqlfluff:rules:convention.select_trailing_comma]
# Trailing commas
select_clause_trailing_comma = forbid
[sqlfluff:rules:convention.count_rows]
# Consistent syntax to count all rows
prefer_count_1 = False
prefer_count_0 = False
[sqlfluff:rules:convention.terminator]
# Semi-colon formatting approach
multiline_newline = False
require_final_semicolon = False
[sqlfluff:rules:convention.blocked_words]
# Comma separated list of blocked words that should not be used
blocked_words = None
blocked_regex = None
match_source = False
[sqlfluff:rules:convention.quoted_literals]
# Consistent usage of preferred quotes for quoted literals
preferred_quoted_literal_style = consistent
# Disabled for dialects that do not support single and double quotes for quoted literals (e.g. Postgres)
force_enable = False
[sqlfluff:rules:convention.casting_style]
# SQL type casting
preferred_type_casting_style = consistent
[sqlfluff:rules:references.from]
# References must be in FROM clause
# Disabled for some dialects (e.g. bigquery)
force_enable = False
[sqlfluff:rules:references.qualification]
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None
[sqlfluff:rules:references.consistent]
# References must be consistently used
# Disabled for some dialects (e.g. bigquery)
force_enable = False
[sqlfluff:rules:references.keywords]
# Keywords should not be used as identifiers.
unquoted_identifiers_policy = aliases
quoted_identifiers_policy = none
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None
[sqlfluff:rules:references.special_chars]
# Special characters in identifiers
unquoted_identifiers_policy = all
quoted_identifiers_policy = all
allow_space_in_identifier = False
additional_allowed_characters = None
ignore_words = None
ignore_words_regex = None
[sqlfluff:rules:references.quoting]
# Policy on quoted and unquoted identifiers
prefer_quoted_identifiers = False
prefer_quoted_keywords = False
ignore_words = None
ignore_words_regex = None
force_enable = False
[sqlfluff:rules:layout.long_lines]
# Line length
ignore_comment_lines = False
ignore_comment_clauses = False
[sqlfluff:rules:layout.select_targets]
wildcard_policy = single
[sqlfluff:rules:structure.subquery]
# By default, allow subqueries in from clauses, but not join clauses
forbid_subquery_in = join
[sqlfluff:rules:structure.join_condition_order]
preferred_first_table_in_join_clause = earlier


CLI Reference

API Reference

SQLFluff exposes a public api for other python applications to use. A basic example of this usage is given here, with the documentation for each of the methods below.

"""This is an example of how to use the simple sqlfluff api."""
from typing import Any, Dict, Iterator, List, Union
import sqlfluff
#  -------- LINTING ----------
my_bad_query = "SeLEct  *, 1, blah as  fOO  from mySchema.myTable"
# Lint the given string and return an array of violations in JSON representation.
lint_result = sqlfluff.lint(my_bad_query, dialect="bigquery")
# lint_result =
# [
#     {
#         "code": "CP01",
#         "line_no": 1,
#         "line_pos": 1,
#         "description": "Keywords must be consistently upper case.",
#     }
#     ...
# ]
#  -------- FIXING ----------
# Fix the given string and get a string back which has been fixed.
fix_result_1 = sqlfluff.fix(my_bad_query, dialect="bigquery")
# fix_result_1 = 'SELECT  *, 1, blah AS  foo  FROM myschema.mytable\n'
# We can also fix just specific rules.
fix_result_2 = sqlfluff.fix(my_bad_query, rules=["CP01"])
# fix_result_2 = 'SELECT  *, 1, blah AS  fOO  FROM mySchema.myTable'
# Or a subset of rules...
fix_result_3 = sqlfluff.fix(my_bad_query, rules=["CP01", "CP02"])
# fix_result_3 = 'SELECT  *, 1, blah AS  fOO  FROM myschema.mytable'
#  -------- PARSING ----------
# Parse the given string and return a JSON representation of the parsed tree.
parse_result = sqlfluff.parse(my_bad_query)
# parse_result = {'file': {'statement': {...}, 'newline': '\n'}}
# This JSON structure can then be parsed as required.
# An example usage is shown below:
def get_json_segment(

parse_result: Dict[str, Any], segment_type: str ) -> Iterator[Union[str, Dict[str, Any], List[Dict[str, Any]]]]:
"""Recursively search JSON parse result for specified segment type.
Args:
parse_result (Dict[str, Any]): JSON parse result from `sqlfluff.fix`.
segment_type (str): The segment type to search for.
Yields:
Iterator[Union[str, Dict[str, Any], List[Dict[str, Any]]]]:
Retrieves children of specified segment type as either a string for a raw
segment or as JSON or an array of JSON for non-raw segments.
"""
for k, v in parse_result.items():
if k == segment_type:
yield v
elif isinstance(v, dict):
yield from get_json_segment(v, segment_type)
elif isinstance(v, list):
for s in v:
yield from get_json_segment(s, segment_type) # e.g. Retrieve array of JSON for table references. table_references = list(get_json_segment(parse_result, "table_reference")) print(table_references) # [[{'identifier': 'mySchema'}, {'dot': '.'}, {'identifier': 'myTable'}]] # Retrieve raw table name from last identifier in the table reference. for table_reference in table_references:
table_name = list(get_json_segment(parse_result, "naked_identifier"))[-1]
print(f"table_name: {table_name}") # table_name: myTable


Simple API commands

Advanced API usage

The simple API presents only a fraction of the functionality present within the core SQLFluff library. For more advanced use cases, users can import the Linter() and FluffConfig() classes from sqlfluff.core. As of version 0.4.0 this is considered as experimental only as the internals may change without warning in any future release. If you come to rely on the internals of SQLFluff, please post an issue on GitHub to share what you're up to. This will help shape a more reliable, tidy and well documented public API for use.

Release Notes

This page aims to act as a guide for migrating between major SQLFluff releases. Necessarily this means that bugfix releases, or releases requiring no change for the user are not mentioned. For full details of each individual release, see the detailed changelog.

Upgrading to 2.2

This release changes some of the interfaces between SQLFluff core and our plugin ecosystem. The only breaking change is in the interface between SQLFluff and templater plugins (which are not common in the ecosystem, hence why this is only a minor and not a major release).

For all plugins, we also recommend a different structure for their imports (especially for rule plugins which are more common in the ecosystem) - for performance and stability reasons. Some users had been experiencing very long import times with previous releases as a result of the layout of plugin imports. Users with affected plugins will begin to see a warning from this release onward, which can be resolved for their plugin by updating to a new version of that plugin which follows the guidelines.

Templater plugins

Templaters before this version would pass a make_template() callable to the slicing methods as part of being able to map the source file. This method would accept a str and return a jinja2.environment.Template object to allow the templater to render multiple variants of the template to do the slicing operation (which allows linting issues found in templated files to be mapped accurately back to their position in the unrendered source file). This approach is not very generalisable, and did not support templating operations with libraries other than jinja2.

As a result, we have amended the interface to instead pass a render_func() callable, which accepts a str and returns a str. This works fine for the jinja templater (and by extension the dbt templater) as they can simply wrap the original callable with a method that calls render() on the original Template object. It also however opens up the door to other templating engines, and in particular to remote templaters which might pass unrendered code over a HTTP connection for rendering.

Specifically:

  • The slice_file() method of the base templater classes no longer accepts an optional make_template argument or a templated_str argument.
  • Instead a render_func callable should be passed which can be called to generate the templated_str on demand.
  • Unlike the optional make_template - render_func is not optional and should always be present.

Rule plugins

We recommend that the module in a plugin which defines all of the hook implementations (anything using the @hookimpl decorator) must be able to fully import before any rule implementations are imported. More specifically, SQLFluff must be able to both import and run any implementations of get_configs_info() before any plugin rules (i.e. any derivatives of BaseRule) are imported. Because of this, we recommend that rules are defined in a separate module to the root of the plugin and then only imported within the get_rules() method.

Importing in the main body of the module was previously our recommendation and so may be the case for versions of some plugins. If one of your plugins does use imports in this way, a warning will be presented from this version onward, recommending that you update your plugin.

See the Developing Plugins section of the docs for an example.

Upgrading from 1.x to 2.0

Upgrading to 2.0 brings several important breaking changes:

  • All bundled rules have been recoded, both from generic L00X formats into groups within similar codes (e.g. an aliasing group with codes of the format AL0X), but also given names to allow much clearer referencing (e.g. aliasing.column).
  • Rule Configuration now uses the rule name rather than the rule code to specify the section. Any unrecognised references in config files (whether they are references which do match existing rules by code or alias, or whether the match no rules at all) will raise warnings at runtime.
  • A complete re-write of layout and whitespace handling rules (see Let's talk about whitespace), and with that a change in how layout is configured (see Configuring Layout) and the combination of some rules that were previously separate. One example of this is that the legacy rules L001, L005, L006, L008, L023, L024, L039, L048 & L071 have been combined simply into LT01.

To upgrade smoothly between versions, we recommend the following sequence:

1.
The upgrade path will be simpler if you have a slimmer configuration file. Before upgrading, consider removing any sections from your configuration file (often .sqlfluff, see Configuration) which match the current Default Configuration. There is no need to respecify defaults in your local config if they are not different to the stock config.
2.
In a local (or other non-production) environment, upgrade to SQLFluff 2.0.x. We recommend using a compatible release specifier such as ~=2.0.0, to ensure any minor bugfix releases are automatically included.
3.
Examine your configuration file (as mentioned above), and evaluate how rules are currently specified. We recommend primarily using either rules or exclude_rules rather than both, as detailed in Enabling and Disabling Rules. Using either the sqlfluff rules CLI command or the online Rules Reference, replace all references to legacy rule codes (i.e. codes of the form L0XX). Specifically:
  • In the rules and exclude_rules config values. Here, consider using group specifiers or names to make your config simpler to read and understand (e.g. capitalisation, is much more understandable than CP01,CP02,CP03,CP04,CP05, but the two specifiers will have the same effect). Note that while legacy codes will still be understood here (because they remain valid as aliases for those rules) - you may find that some rules no longer exist in isolation and so these references may be misleading. e.g. L005 is now an alias for layout.spacing but that rule is much more broad ranging than the original scope of L005, which was only spacing around commas.
  • In Rule Configuration. In particular here, legacy references to rule codes are no longer valid, will raise warnings, and until resolved, the configuration in those sections will be ignored. The new section references should include the rule name (e.g. [sqlfluff:rules:capitalisation.keywords] rather than [sqlfluff:rules:L010]). This switch is designed to make configuration files more readable, but we cannot support backward compatibility here without also having to resolve the potential ambiguity of the scenario where both code-based and name-based are both used.
  • Review the Configuring Layout documentation, and check whether any indentation or layout configuration should be revised.

4.
Check your project for In-File Configuration Directives which refer to rule codes. Alter these in the same manner as described above for configuration files.
5.
Test linting your project for unexpected linting issues. Where found, consider whether to use sqlfluff fix to repair them in bulk, or (if you disagree with the changes) consider changing which rules you enable or their configuration accordingly. In particular you may notice:
  • The indentation rule (L003 as was, now LT02) has had a significant rewrite, and while much more flexible and accurate, it is also more specific. Note that Hanging Indents are no longer supported, and that while not enabled by default, many users may find the enabling Implicit Indents fits their organisation's style better.
  • The spacing rule (LT01: layout.spacing) has a much wider scope, and so may pick up spacing issues that were not previously enforced. If you disagree with any of these, you can override the sqlfluff:layout sections of the config with different (or just more liberal settings, like any).


Example 2.0 config

To illustrate the points above, this is an illustrative example config for a 2.0 compatible project. Note that the config is fairly brief and sets only the values which differ from the default config.

[sqlfluff]
dialect = snowflake
templater = dbt
max_line_length = 120
# Exclude some specific rules based on a mixture of codes and names
exclude_rules = RF02, RF03, RF04, ST06, ST07, AM05, AM06, convention.left_join, layout.select_targets
[sqlfluff:indentation]
# Enabling implicit indents for this project.
# See https://docs.sqlfluff.com/en/stable/layout.html#configuring-indent-locations
allow_implicit_indents = True
# Add a few specific rule configurations, referenced by the rule names
# and not by the rule codes.
[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = lower
[sqlfluff:rules:capitalisation.identifiers]
capitalisation_policy = lower
[sqlfluff:rules:capitalisation.functions]
extended_capitalisation_policy = lower
# An example of setting a custom layout specification which
# is more lenient than default config.
[sqlfluff:layout:type:set_operator]
line_position = alone


Upgrading to 1.4

This release brings several internal changes, and acts as a prelude to 2.0.0. In particular, the following config values have changed:

  • sqlfluff:rules:L007:operator_new_lines has been changed to sqlfluff:layout:type:binary_operator:line_position.
  • sqlfluff:rules:comma_style and sqlfluff:rules:L019:comma_style have both been consolidated into sqlfluff:layout:type:comma:line_position.

If any of these values have been set in your config, they will be automatically translated to the new values at runtime, and a warning will be shown. To silence the warning, update your config file to the new values. For more details on configuring layout see Configuring Layout.

Upgrading to 1.3

This release brings several potentially breaking changes to the underlying parse tree. For users of the cli tool in a linting context you should notice no change. If however your application relies on the structure of the SQLFluff parse tree or the naming of certain elements within the yaml format, then this may not be a drop-in replacement. Specifically:

  • The addition of a new end_of_file` meta segment at the end of the parse structure.
  • The addition of a template_loop` meta segment to signify a jump backward in the source file within a loop structure (e.g. a jinja for` loop).
  • Much more specific types on some raw segments, in particular identifier and literal type segments will now appear in the parse tree with their more specific type (which used to be called name) e.g. naked_identifier, quoted_identifier, numeric_literal etc...

If using the python api, the parent type (such as identifier) will still register if you call .is_type("identifier"), as this function checks all inherited types. However the eventual type returned by .get_type()` will now be (in most cases) what used to be accessible at .name. The name attribute will be deprecated in a future release.

Upgrading to 1.2

This release introduces the capability to automatically skip large files, and sets default limits on the maximum file size before a file is skipped. Users should see a performance gain, but may experience warnings associated with these skipped files.

Upgrades pre 1.0

  • 0.13.x new rule for quoted literals, option to remove hanging indents in rule L003, and introduction of ignore_words_regex.
  • 0.12.x dialect is now mandatory, the spark3 dialect was renamed to sparksql and datatype capitalisation was extracted from L010 to it's own rule L063.
  • 0.11.x rule L030 changed to use extended_capitalisation_policy.
  • 0.10.x removed support for older dbt versions < 0.20 and stopped fix attempting to fix unparsable SQL.
  • 0.9.x refinement of the Simple API, dbt 1.0.0 compatibility, and the official SQLFluff Docker image.
  • 0.8.x an improvement to the performance of the parser, a rebuild of the Jinja Templater, and a progress bar for the CLI.
  • 0.7.x extracted the dbt templater to a separate plugin and removed the exasol_fs dialect (now merged in with the main exasol).
  • 0.6.x introduced parallel processing, which necessitated a big re-write of several innards.
  • 0.5.x introduced some breaking changes to the API.
  • 0.4.x dropped python 3.5, added the dbt templater, source mapping and also introduced the python API.
  • 0.3.x drops support for python 2.7 and 3.4, and also reworks the handling of indentation linting in a potentially not backward compatible way.
  • 0.2.x added templating support and a big restructure of rules and changed how users might interact with SQLFluff on templated code.
  • 0.1.x involved a major re-write of the parser, completely changing the behaviour of the tool with respect to complex parsing.

Internals

It is recommended that the following is read in conjunction with exploring the codebase. dialect_ansi.py in particular is helpful to understand the recursive structure of segments and grammars. Some more detail is also given on our Wiki including a Contributing Dialect Changes guide.

Architecture

At a high level, the behaviour of SQLFluff is divided into a few key stages. Whether calling sqlfluff lint, sqlfluff fix or sqlfluff parse, the internal flow is largely the same.

Stage 1, the templater

This stage only applies to templated SQL, most commonly Jinja and dbt. Vanilla SQL is sent straight to stage 2, the lexer.

In order to lint templated SQL, SQLFluff must first convert the 'raw' or pre-templated code into valid SQL, which can then be parsed. The templater returns both the raw and post-templated SQL so that any rule violations which occur in templated sections can be ignored and the rest mapped to their original line location for user feedback.

SQLFluff supports two templating engines: Jinja and dbt.

Under the hood dbt also uses Jinja, but in SQLFluff uses a separate mechanism which interfaces directly with the dbt python package.

For more details on how to configure the templater see Templating Configuration.

Stage 2, the lexer

The lexer takes SQL and separates it into segments of whitespace and code. Where we can impart some high level meaning to segments, we do, but the result of this operation is still a flat sequence of typed segments (all subclasses of RawSegment).

Stage 3, the parser

The parser is arguably the most complicated element of SQLFluff, and is relied on by all the other elements of the tool to do most of the heavy lifting.

1.
The lexed segments are parsed using the specified dialect's grammars. In SQLFluff, grammars describe the shape of SQL statements (or their components). The parser attempts to apply each potential grammar to the lexed segments until all the segments have been matched.
2.
In SQLFluff, segments form a tree-like structure. The top-level segment is a FileSegment, which contains zero or more StatementSegments, and so on. Before the segments have been parsed and named according to their type, they are 'raw', meaning they have no classification other than their literal value.
3.
A segment's .match() method uses the match_grammar, on which .match() is called. SQLFluff parses in a single pass through the file, so segments will recursively match the file based on their respective grammars. In the example of a FileSegment, it first divides up the query into statements, and then the .match() method of those segments works out the structure within them.
is called on a segment, this is the grammar which is used to decide whether there is a match.

pre-defined way. For example the OneOf grammar will match if any one of its child elements match.


1.
During the recursion, the parser eventually reaches segments which have no children (raw segments containing a single token), and so the recursion naturally finishes.

4.
If no match is found for a segment, the contents will be wrapped in an UnparsableSegment which is picked up as a parsing error later. This is usually facilitated by the ParseMode on some grammars which can be set to GREEDY, allowing the grammar to capture additional segments as unparsable. As an example, bracketed sections are often configured to capture anything unexpected as unparsable rather than simply failing to match if there is more than expected (which would be the default, STRICT, behaviour).
5.
The result of the .match() method is a MatchResult which contains the instructions on how to turn the flat sequence of raw segments into a nested tree of segments. Calling .apply() on this result at the end of the matching process is what finally creates the nested structure.

When working on the parser there are a couple of design principles to keep in mind.

  • Grammars are contained in dialects, the root dialect being the ansi dialect. The ansi dialect is used to host logic common to all dialects, and so does not necessarily adhere to the formal ansi specification. Other SQL dialects inherit from the ansi dialect, replacing or patching any segments they need to. One reason for the Ref grammar is that it allows name resolution of grammar elements at runtime and so a patched grammar with some elements overridden can still rely on lower-level elements which haven't been redeclared within the dialect
  • All grammars and segments attempt to match as much as they can and will return partial matches where possible. It is up to the calling grammar or segment to decide whether a partial or complete match is required based on the context it is matching in.

Stage 4, the linter

Given the complete parse tree, rule classes check for linting errors by traversing the tree, looking for segments and patterns of concern. If the rule discovers a violation, it returns a LintResult pointing to the segment which caused the violation.

Some rules are able to fix the problems they find. If this is the case, the rule will return a list of fixes, which describe changes to be made to the tree. This can include edits, inserts, or deletions. Once the fixes have been applied, the updated tree is written to the original file.

Reflow Internals

Many rules supported by SQLFluff involve the spacing and layout of different elements, either to enforce a particular layout or just to add or remove code elements in a way sensitive to the existing layout configuration. The way this is achieved is through some centralised utilities in the sqlfluff.utils.reflow module.

This module aims to achieve several things: * Less code duplication by implementing reflow logic in only one place.

Provide a streamlined interface for rules to easily utilise reflow logic.
Given this requirement, it's important that reflow utilities work within the existing framework for applying fixes to potentially templated code. We achieve this by returning LintFix objects which can then be returned by each rule wanting to use this logic.

Provide a consistent way of configuring layout requirements. For more details on configuration see Configuring Layout.

To support this, the module provides a ReflowSequence class which allows access to all of the relevant operations which can be used to reformat sections of code, or even a whole file. Unless there is a very good reason, all rules should use this same approach to ensure consistent treatment of layout.

Developing Rules

Rules in SQLFluff are implemented as classes inheriting from BaseRule. SQLFluff crawls through the parse tree of a SQL file, calling the rule's _eval() function for each segment in the tree. For many rules, this allows the rule code to be really streamlined and only contain the logic for the rule itself, with all the other mechanics abstracted away.

Traversal Options

recurse_into

Some rules are a poor fit for the simple traversal pattern described above. Typical reasons include:

  • The rule only looks at a small portion of the file (e.g. the beginning or end).
  • The rule needs to traverse the parse tree in a non-standard way.

These rules can override BaseRule's recurse_into field, setting it to False. For these rules False, _eval() is only called once, with the root segment of the tree. This can be much more efficient, especially on large files. For example, see rules LT13 and LT12 , which only look at the beginning or end of the file, respectively.

_works_on_unparsable

By default, SQLFluff calls _eval() for all segments, even "unparsable" segments, i.e. segments that didn't match the parsing rules in the dialect. This causes issues for some rules. If so, setting _works_on_unparsable to False tells SQLFluff not to call _eval() for unparsable segments and their descendants.

These are other fields on BaseRule. Rules can override them.

needs_raw_stack

needs_raw_stack defaults to False. Some rules use RuleContext.raw_stack property to access earlier segments in the traversal. This can be useful, but it adds significant overhead to the linting process. For this reason, it is disabled by default.

lint_phase

There are two phases of rule running.

1. The main phase is appropriate for most rules. These rules are assumed to interact and potentially cause a cascade of fixes requiring multiple passes. These rules run the runaway_limit number of times (default 10).

2. The post phase is for post-processing rules, not expected to trigger any downstream rules, e.g. capitalization fixes. They are run in a post-processing loop at the end. This loop is identical to the main loop, but is only run 2 times at the end (once to fix, and once again to confirm no remaining issues).

The two phases add complexity, but they also improve performance by allowing SQLFluff to run fewer rules during the main phase, which often runs several times.

NOTE: post rules also run on the first pass of the main phase so that any issues they find will be presented in the list of issues output by sqlfluff fix and sqlfluff lint.

Base Rules

base_rules Module

Functional API

These newer modules provide a higher-level API for rules working with segments and slices. Rules that need to navigate or search the parse tree may benefit from using these. Eventually, the plan is for all rules to use these modules. As of December 30, 2021, 17+ rules use these modules.

The modules listed below are submodules of sqlfluff.utils.functional.

segments Module

segment_predicates Module

raw_file_slices Module

raw_file_slice_predicates Module

Developing Plugins

SQLFluff is extensible through "plugins". We use the pluggy library to make linting Rules pluggable, which enable users to implement rules that are just too "organization specific" to be shared, or too platform specific to be included in the core library.

NOTE:

We recommend that the module in a plugin which defines all of the hook implementations (anything using the @hookimpl decorator) must be able to fully import before any rule implementations are imported. More specifically, SQLFluff must be able to both import and run any implementations of get_configs_info() before any plugin rules (i.e. any derivatives of BaseRule) are imported. Because of this, we recommend that rules are defined in a separate module to the root of the plugin and then only imported within the get_rules() method.

Importing in the main body of the module was previously our recommendation and so may be the case for versions of some plugins. If one of your plugins does use imports in this way, a warning will be presented, recommending that you update your plugin.


# The root module will need to import `hookimpl`, but
# should not yet import the rule definitions for the plugin.
from sqlfluff.core.plugin import hookimpl
@hookimpl
def get_rules():
# Rules should be imported within the `get_rules` method instead
from my_plugin.rules import MyRule
return [MyRule]




Creating a plugin

We have an example plugin in sqlfluff/plugins/sqlfluff-plugin-example which you can use as a template for rules, or the sqlfluff/plugins/sqlfluff-templater-dbt which you can use as a template for templater plugins.

Few things to note about plugins:

Currently, only Rules and Templaters can be added through plugins. Over time we expect more elements of SQLFluff will be extensible with plugins. Each plugin can implement multiple Rules or Templaters.

We recommend that the name of a plugin should start with "sqlfluff-" to be clear on the purpose of your plugin.

A plugin may need to include a default configuration if its rules are configurable: use plugin default configurations only for that reason! We advise against overwriting core configurations by using a default plugin configuration, as there is no mechanism in place to enforce precedence between the core library configs and plugin configs, and multiple plugins could clash.

A plugin Rule class name should have the structure: "Rule_PluginName_L000". The 'L' can be any letter and is meant to categorize rules; you could use the letter 'S' to denote rules that enforce security checks for example.

An important thing to note when running custom implemented rules: Run pip install -e ., inside the plugin folder so custom rules in linting are included.

A plugin Rule code includes the PluginName, so a rule "Rule_L000" in core will have code "L000", while "Rule_PluginName_L000" will have code "PluginName_L000". Codes are used to display errors, they are also used as configuration keys.

We make it easy for plugin developers to test their rules by exposing a testing library in sqlfluff.utils.testing.

Giving feedback

Would you like to have other parts of SQLFluff be "pluggable"? Tell us about it in a GitHub issue 😄.

SQLFluff in the Wild

Want to find other people who are using SQLFluff in production use cases? Want to brag about how you're using it? Just want to show solidarity with the project and provide a testimonial for it?

Just add a section below by raising a PR on GitHub by editing this file ✏️.

  • SQLFluff in production dbt projects at tails.com. We use the SQLFluff cli as part of our CI pipeline in codeship to enforce certain styles in our SQL codebase (with over 650 models) and keep code quality high. Contact @alanmcruickshank.
  • Netlify's data team uses SQLFluff with dbt to keep code quality in more than 350 models (and growing). Previously, we had our SQL Guidelines defined in a site hosted with Netlify, and now we're enforcing these rules in our CI workflow thanks to SQLFluff.
  • Drizly's analytics team uses SQLFluff with dbt for over 700 models as part of our CI checks in GitHub. Before SQLFluff, we had SQL best practices outlined in a google doc and had to manually enforce through PR comments. We're now able to enforce much of our style guide automatically through SQLFluff.
  • Petal's data-eng team runs SQLFluff on our 100+ model dbt project. As a pre-commit hook and as a CI check, SQLFluff helps keep our SQL readable and consistent.
  • Surfline's Analytics Engineering team implemented SQLFluff as part of our continuous integration (CI) suite across our entire dbt project (700+ models). We implement the CI suite using GitHub Actions and Workflows. The benefits of using SQLFluff at Surfline are:
  • The SQL in our dbt models is consistent and easily readable.
  • Our style guide is maintained as code, not a README that is rarely updated.
  • Reduced burden on Analytics Engineers to remember every single style rule.
  • New Analytics Engineers can quickly see and learn what "good SQL" looks like at Surfline and start writing it from day 1.

  • The HTTP Archive uses SQLFluff to automatically check for quality and consistency of code submitted by the many contributors to this project. In particular our annual Web Almanac attracts hundreds of volunteers to help analyse our BigQuery dataset and being able automatically lint Pull Requests through GitHub Actions is a fantastic way to help us maintain our growing repository of over a thousand queries.
  • Brooklyn Data Co has a dbt_artifacts dbt package from which runs SQLFluff in CI to lint pull requests automatically. It uses the GitHub Actions workflow contributed by Greg Clunies, with annotations on pull requests to make it easy for contributors to see where their SQL has failed any rules. See an example pull request with SQLFluff annotations.
  • Markerr has tightly integrated SQLFluff into our CI/CD process for data model changes and process improvements. Since adopting SQLFluff across the organization, the clarity of our SQL code has risen dramatically, freeing up review time to focus on deeper data and process-specific questions.
  • Symend has a microservices platform supporting our SaaS product. We use SQLFLuff in the CI/CD process of several of our data-oriented microservices. Among other things, it validates our database migration scripts, deployed using schemachange and we have near-term plans to implement it for our dbt projects.
  • At CarePay we use SQLFLuff to lint and fix all our dbt models as well as several other SQL heavy projects. Locally we use SQLFluff with pre-commit and have also integrated it into our CI/CD pipelines.
  • Core Analytics Team from Typeform and videoask uses SQLFluff in the production dbt project for building our datawarehouse layer for both products:
  • We use it locally in our day to day work, helping us to write cleaner code.
  • We added SQLFluff to our CI processes, so during a PR we can check that any new or modified sql file has a consistent and easy-to-read format.


SQLFluff Slack

We have a fast-growing community on Slack, come and join us!

SQLFluff on Twitter

Follow us On Twitter @SQLFluff for announcements and other related posts.

  • Index
  • Module Index
  • Search Page

AUTHOR

Alan Cruickshank

COPYRIGHT

2024, Alan Cruickshank

February 22, 2024