main-concepts

No menu items for this category

Table

A Table entity organizes data in rows and columns and is defined in a Database Schema.

  • profileSampleType (string): Type of Profile Sample (percentage or rows). Must be one of: ["PERCENTAGE", "ROWS"]. Default: "PERCENTAGE".
  • samplingMethodType (string): Type of Sampling Method (BERNOULLI or SYSTEM). Must be one of: ["BERNOULLI", "SYSTEM"].
  • tableType (string): This schema defines the type used for describing different types of tables. Must be one of: ["Regular", "External", "Dynamic", "View", "SecureView", "MaterializedView", "Iceberg", "Local", "Partitioned", "Foreign", "Transient"].
  • dataType (string): This enum defines the type of data stored in a column. Must be one of: ["NUMBER", "TINYINT", "SMALLINT", "INT", "BIGINT", "BYTEINT", "BYTES", "FLOAT", "DOUBLE", "DECIMAL", "NUMERIC", "TIMESTAMP", "TIMESTAMPZ", "TIME", "DATE", "DATETIME", "INTERVAL", "STRING", "MEDIUMTEXT", "TEXT", "CHAR", "LONG", "VARCHAR", "BOOLEAN", "BINARY", "VARBINARY", "ARRAY", "BLOB", "LONGBLOB", "MEDIUMBLOB", "MAP", "STRUCT", "UNION", "SET", "GEOGRAPHY", "ENUM", "JSON", "UUID", "VARIANT", "GEOMETRY", "BYTEA", "AGGREGATEFUNCTION", "ERROR", "FIXED", "RECORD", "NULL", "SUPER", "HLLSKETCH", "PG_LSN", "PG_SNAPSHOT", "TSQUERY", "TXID_SNAPSHOT", "XML", "MACADDR", "TSVECTOR", "UNKNOWN", "CIDR", "INET", "CLOB", "ROWID", "LOWCARDINALITY", "YEAR", "POINT", "POLYGON", "TUPLE", "SPATIAL", "TABLE", "NTEXT", "IMAGE", "IPV4", "IPV6", "DATETIMERANGE", "HLL", "LARGEINT", "QUANTILE_STATE", "AGG_STATE", "BITMAP", "UINT", "BIT", "MONEY"].
  • constraint (string): This enum defines the type for column constraint. Must be one of: ["NULL", "NOT_NULL", "UNIQUE", "PRIMARY_KEY"]. Cannot contain additional properties. Default: null.
  • tableConstraint (object): This enum defines the type for table constraint. Cannot contain additional properties.
    • constraintType (string): Must be one of: ["UNIQUE", "PRIMARY_KEY", "FOREIGN_KEY", "SORT_KEY", "DIST_KEY"].
    • columns (array): List of column names corresponding to the constraint.
      • Items (string)
    • referredColumns (array): List of referred columns for the constraint. Default: null.
    • relationshipType (string): Must be one of: ["ONE_TO_ONE", "ONE_TO_MANY", "MANY_TO_ONE", "MANY_TO_MANY"].
  • columnName (string): Local name (not fully qualified name) of the column. ColumnName is - when the column is not named in struct dataType. For example, BigQuery supports struct with unnamed fields.
  • partitionIntervalTypes (string): type of partition interval. Must be one of: ["TIME-UNIT", "INTEGER-RANGE", "INGESTION-TIME", "COLUMN-VALUE", "INJECTED", "ENUM", "OTHER"].
  • tablePartition (object): This schema defines the partition column of a table and format the partition is created. Cannot contain additional properties.
  • partitionColumnDetails (object): This schema defines the partition column of a table and format the partition is created. Cannot contain additional properties.
    • columnName (string): List of column names corresponding to the partition.
    • intervalType: Refer to #/definitions/partitionIntervalTypes.
    • interval (string): partition interval , example hourly, daily, monthly.
  • column (object): This schema defines the type for a column in a table. Cannot contain additional properties.
    • name: Refer to #/definitions/columnName.
    • displayName (string): Display Name that identifies this column name.
    • dataType: Data type of the column (int, date etc.). Refer to #/definitions/dataType.
    • arrayDataType: Data type used array in dataType. For example, array<int> has dataType as array and arrayDataType as int. Refer to #/definitions/dataType.
    • dataLength (integer): Length of char, varchar, binary, varbinary dataTypes, else null. For example, varchar(20) has dataType as varchar and dataLength as 20.
    • precision (integer): The precision of a numeric is the total count of significant digits in the whole number, that is, the number of digits to both sides of the decimal point. Precision is applicable Integer types, such as INT, SMALLINT, BIGINT, etc. It also applies to other Numeric types, such as NUMBER, DECIMAL, DOUBLE, FLOAT, etc.
    • scale (integer): The scale of a numeric is the count of decimal digits in the fractional part, to the right of the decimal point. For Integer types, the scale is 0. It mainly applies to non Integer Numeric types, such as NUMBER, DECIMAL, DOUBLE, FLOAT, etc.
    • dataTypeDisplay (string): Display name used for dataType. This is useful for complex types, such as array<int>, map<int,string>, struct<>, and union types.
    • description: Description of the column. Refer to ../../type/basic.json#/definitions/markdown.
    • fullyQualifiedName: Refer to ../../type/basic.json#/definitions/fullyQualifiedEntityName.
    • tags (array): Tags associated with the column. Default: [].
    • constraint: Column level constraint. Refer to #/definitions/constraint.
    • ordinalPosition (integer): Ordinal position of the column.
    • jsonSchema (string): Json schema only if the dataType is JSON else null.
    • children (array): Child columns if dataType or arrayDataType is map, struct, or union else null. Default: null.
    • profile: Latest Data profile for a Column. Refer to #/definitions/columnProfile. Default: null.
    • customMetrics (array): List of Custom Metrics registered for a table. Default: null.
  • joinedWith (object): Fully qualified names of the fields/entities that this field/entity is joined with. Cannot contain additional properties.
  • columnJoins (object): This schema defines the type to capture how frequently a column is joined with columns in the other tables. Cannot contain additional properties.
  • tableJoins (object): This schema defines the type to capture information about how this table is joined with other tables and columns. Cannot contain additional properties.
  • tableData (object): This schema defines the type to capture rows of sample data for a table. Cannot contain additional properties.
    • columns (array): List of local column names (not fully qualified column names) of the table.
    • rows (array): Data for multiple rows of the table.
      • Items (array): Data for a single row of the table within the same order as columns fields.
  • customMetricProfile (object): Profiling results of a Custom Metric. Cannot contain additional properties.
    • name (string): Custom metric name.
    • value (number): Profiling results for the metric.
  • columnProfile (object): This schema defines the type to capture the table's column profile. Cannot contain additional properties.
    • name (string, required): Column Name.
    • timestamp: Timestamp on which profile is taken. Refer to ../../type/basic.json#/definitions/timestamp.
    • valuesCount (number): Total count of the values in this column.
    • valuesPercentage (number): Percentage of values in this column with respect to row count.
    • validCount (number): Total count of valid values in this column.
    • duplicateCount (number): No.of Rows that contain duplicates in a column.
    • nullCount (number): No.of null values in a column.
    • nullProportion (number): No.of null value proportion in columns.
    • missingPercentage (number): Missing Percentage is calculated by taking percentage of validCount/valuesCount.
    • missingCount (number): Missing count is calculated by subtracting valuesCount - validCount.
    • uniqueCount (number): No. of unique values in the column.
    • uniqueProportion (number): Proportion of number of unique values in a column.
    • distinctCount (number): Number of values that contain distinct values.
    • distinctProportion (number): Proportion of distinct values in a column.
    • min: Minimum value in a column.
    • max: Maximum value in a column.
    • minLength (number): Minimum string length in a column.
    • maxLength (number): Maximum string length in a column.
    • mean (number): Avg value in a column.
    • sum (number): Median value in a column.
    • stddev (number): Standard deviation of a column.
    • variance (number): Variance of a column.
    • median (number): Median of a column.
    • firstQuartile (number): First quartile of a column.
    • thirdQuartile (number): First quartile of a column.
    • interQuartileRange (number): Inter quartile range of a column.
    • nonParametricSkew (number): Non parametric skew of a column.
    • histogram: Histogram of a column. Cannot contain additional properties.
      • boundaries (array): Boundaries of Histogram.
      • frequencies (array): Frequencies of Histogram.
    • customMetrics (array): Custom Metrics profile list bound to a column. Default: null.
  • dmlOperationType (string): This schema defines the type of DML operation. Must be one of: ["UPDATE", "INSERT", "DELETE"].
  • systemProfile (object): This schema defines the System Profile object holding profile data from system tables.
  • columnProfilerConfig (object): This schema defines the type for Table profile config include Columns.
    • columnName (string): Column Name of the table to be included.
    • metrics (array): Include only following metrics. Default: null.
      • Items (string)
  • partitionProfilerConfig (object): This schema defines the partition configuration used by profiler.
    • enablePartitioning (boolean): whether to use partition. Default: false.
    • partitionColumnName (string): name of the column to use for the partition.
    • partitionIntervalType: Refer to #/definitions/partitionIntervalTypes.
    • partitionInterval (integer): The interval to use for the partitioning.
    • partitionIntervalUnit (string): unit used for the partition interval. Must be one of: ["YEAR", "MONTH", "DAY", "HOUR"].
    • partitionValues (array): unit used for the partition interval.
    • partitionIntegerRangeStart (integer): start of the integer range for partitioning. Default: null.
    • partitionIntegerRangeEnd (integer): end of the integer range for partitioning. Default: null.
  • tableProfilerConfig (object): This schema defines the type for Table profile config.
    • profileSampleType: Refer to #/definitions/profileSampleType.
    • profileSample (number): Percentage of data or no. of rows used to compute the profiler metrics and run data quality tests. Default: null.
    • samplingMethodType: Refer to #/definitions/samplingMethodType.
    • sampleDataCount (integer): Number of sample rows to ingest when 'Generate Sample Data' is enabled. Default: 50.
    • profileQuery (string): Users' raw SQL query to fetch sample data and profile the table. Default: null.
    • excludeColumns (array): column names to exclude from profiling. Default: null.
      • Items (string)
    • includeColumns (array): Only run profiler on included columns with specific metrics. Default: null.
    • partitioning: Partitioning configuration. Refer to #/definitions/partitionProfilerConfig.
    • computeTableMetrics (boolean): Option to turn on/off table metric computation. If enabled, profiler will compute table level metrics. Default: true.
    • computeColumnMetrics (boolean): Option to turn on/off column metric computation. If enabled, profiler will compute column level metrics. Default: true.
  • tableProfile (object): This schema defines the type to capture the table's data profile. Cannot contain additional properties.
    • timestamp: Timestamp on which profile is taken. Refer to ../../type/basic.json#/definitions/timestamp.
    • profileSample (number): Percentage of data or no. of rows we want to execute the profiler and tests on. Default: null.
    • profileSampleType: Refer to #/definitions/profileSampleType.
    • samplingMethodType: Refer to #/definitions/samplingMethodType.
    • columnCount (number): No.of columns in the table.
    • rowCount (number): No.of rows in the table. This is always executed on the whole table.
    • sizeInByte (number): Table size in GB.
    • createDateTime (string, format: date-time): Table creation time.
    • customMetrics (array): Custom Metrics profile list bound to a column. Default: null.
  • modelType: Must be one of: ["DBT", "DDL"].
  • dataModel (object): This captures information about how the table is modeled. Currently only DBT and DDL model is supported. Cannot contain additional properties.
  • fileFormat (string): File format in case of file/datalake tables. Must be one of: ["csv", "tsv", "avro", "parquet", "json", "json.gz", "json.zip", "jsonl", "jsonl.gz", "jsonl.zip"].

Documentation file automatically generated at 2025-01-15 09:05:25.266839+00:00.