Alex Rickabaugh 4213e8d5f0 fix(compiler): switch to 'referencedFiles' for shim generation (#36211)
Shim generation was built on a lie.

Shims are files added to the program which aren't original files authored by
the user, but files authored effectively by the compiler. These fall into
two categories: files which will be generated (like the .ngfactory shims we
generate for View Engine compatibility) as well as files used internally in
compilation (like the __ng_typecheck__.ts file).

Previously, shim generation was driven by the `rootFiles` passed to the
compiler as input. These are effectively the `files` listed in the
`tsconfig.json`. Each shim generator (e.g. the `FactoryGenerator`) would
examine the `rootFiles` and produce a list of shim file names which it would
be responsible for generating. These names would then be added to the
`rootFiles` when the program was created.

The fatal flaw here is that `rootFiles` does not always account for all of
the files in the program. In fact, it's quite rare that it does. Users don't
typically specify every file directly in `files`. Instead, they rely on
TypeScript, during program creation, starting with a few root files and
transitively discovering all of the files in the program.

This happens, however, during `ts.createProgram`, which is too late to add
new files to the `rootFiles` list.

As a result, shim generation was only including shims for files actually
listed in the `tsconfig.json` file, and not for the transitive set of files
in the user's program as it should.

This commit completely rewrites shim generation to use a different technique
for adding files to the program, inspired by View Engine's shim generator.
In this new technique, as the program is being created and `ts.SourceFile`s
are being requested from the `NgCompilerHost`, shims for those files are
generated and a reference to them is patched onto the original file's
`ts.SourceFile.referencedFiles`. This causes TS to think that the original
file references the shim, and causes the shim to be included in the program.
The original `referencedFiles` array is saved and restored after program
creation, hiding this little hack from the rest of the system.

The new shim generation engine differentiates between two kinds of shims:
top-level shims (such as the flat module entrypoint file and
__ng_typecheck__.ts) and per-file shims such as ngfactory or ngsummary
files. The former are included via `rootFiles` as before, the latter are
included via the `referencedFiles` of their corresponding original files.

As a result of this change, shims are now correctly generated for all files
in the program, not just the ones named in `tsconfig.json`.

A few mitigating factors prevented this bug from being realized until now:

* in g3, `files` does include the transitive closure of files in the program
* in CLI apps, shims are not really used

This change also makes use of a novel technique for associating information
with source files: the use of an `NgExtension` `Symbol` to patch the
information directly onto the AST object. This is used in several
circumstances:

* For shims, metadata about a `ts.SourceFile`'s status as a shim and its
  origins are held in the extension data.
* For original files, the original `referencedFiles` are stashed in the
  extension data for later restoration.

The main benefit of this technique is a lot less bookkeeping around `Map`s
of `ts.SourceFile`s to various kinds of data, which need to be tracked/
invalidated as part of incremental builds.

This technique is based on designs used internally in the TypeScript
compiler and is serving as a prototype of this design in ngtsc. If it works
well, it could have benefits across the rest of the compiler.

PR Close #36211
2020-05-05 18:40:42 -07:00
..

Shims

The shims package deals with the specification and generation of "shim files". These are files which are not part of the user's original program, but are added by the compiler by user request or in support of certain features. For example, users can request that the compiler produce .ngfactory files alongside user files to support migration from View Engine (which used .ngfactory files) to Ivy which does not.

API

Shim generation is exposed through two interfaces: TopLevelShimGenerator and PerFileShimGenerator. Each implementation of one of these interfaces produces one or more shims of a particular type.

A top-level shim is a shim which is a "singleton" with respect to the program - it's one file that's generated and added in addition to all the user files.

A per-file shim is a shim generated from the contents of a particular file (like how .ngfactory shims are generated for each user input file, if requested).

Shims from either kind of generator can be emittable, in which case their ts.SourceFiles will be transpiled to JS and emitted alongside the user's code, or non-emittable, which means the user is unlikely to be aware of their existence.

This API is used both by the shim generators in this package as well as for other types of shims generated by other compiler subsystems.

Implementation

The shim package exposes two specific pieces of functionality related to the integration of shims into the creation of a ts.Program:

  • A ShimReferenceTagger which "tags" ts.SourceFiles prior to program creation, and creates links from each original file to all of the per-file shims which need to be created for those file.
  • A ShimAdapter which is used by an implementation of ts.CompilerHost to include shims in any program created via the host.

ShimAdapter

The shim adapter is responsible for recognizing when a path being loaded corresponds to a shim, and producing a ts.SourceFile for the shim if so.

Recognizing a shim filename involves two steps. First, the path itself must match a pattern for a particular PerFileShimGenerator's shims (for example, NgFactory shims end in .ngfactory.ts). From this filename, the "source" filename can be inferred (actually several source filenames, since the source file might be .ts or .tsx). Even if a path matches the pattern, it's only a valid shim if the source file actually exists.

Once a filename has been recognized, the ShimAdapter caches the generated shim source file and can quickly produce it on request.

Shim loading in practice

As TS starts from the root files and walks imports and references, it discovers new files which are part of the program. It will discover shim files in two different ways:

  • As references on their source files (those added by ShimReferenceTagger).
  • As imports written by users.

This means that it's not guaranteed for a source file to be loaded before its shim.

ShimReferenceTagger

During program creation, TypeScript enumerates the .ts files on disk (the original files) and includes them into the program. However, each original file may have many associated shim files, which are not referenced and do not exist on disk, but still need to be included as well.

The mechanism used to do this is "reference tagging", which is performed by the ShimReferenceTagger.

ts.SourceFiles have a referencedFiles property, which contains paths extracted from any /// <reference> comments within the file. If a ts.SourceFile with references is included in a program, so are its referenced files.

This mechanism is (ab)used by the ShimReferenceTagger to create references from each original file to its shims, causing them to be loaded as well.

Once the program has been created, the referencedFiles properties can be restored to their original values via the cleanup() operation. This is necessary as ts.SourceFiles may live on in various caches for much longer than the duration of a single compilation.

Expando

The shim system needs to keep track of various pieces of metadata for ts.SourceFiles:

  • Whether or not they're shims, and if so which generator created them.
  • If the file is not a shim, then the original referenceFiles for that file (so it can be restored later).

Instead of Maps keyed with ts.SourceFiles which could lead to memory leaks, this information is instead patched directly onto the ts.SourceFile instances using an expando symbol property NgExtension.

Usage

Factory shim generation

Generated factory files create a catch-22 in ngtsc. Their contents depends on static analysis of the current program, yet they're also importable from the current program. This importability gives rise to the requirement that the contents of the generated file must be known before program creation, so that imports of it are valid. However, until the program is created, the analysis to determine the contents of the generated file cannot take place.

ngc used to get away with this because the analysis phase did not depend on program creation but on the metadata collection / global analysis process.

ngtsc is forced to take a different approach. A lightweight analysis pipeline which does not rely on the ts.TypeChecker (and thus can run before the program is created) is used to estimate the contents of a generated file, in a way that allows the program to be created. A transformer then operates on this estimated file during emit and replaces the estimated contents with accurate information.

It is important that this estimate be an overestimate, as type-checking will always be run against the estimated file, and must succeed in every case where it would have succeeded with accurate info.

Summary shim generation

Summary shim generation is simpler than factory generation, and can be generated from a ts.SourceFile without needing to be cleaned up later.

Other uses of shims

A few other systems in the compiler make use of shim generation as well.

  • entry_point generates a flat module index (in the way View Engine used to) using a shim.
  • typecheck includes template type-checking code in the program using a shim generator.