base_assets module ¶

Base asset classes.

See vectorbtpro.utils.knowledge for the toy dataset.

asset_cache dict ¶

Asset cache.

AssetCacheManager class ¶

AssetCacheManager(
    persist_cache=None,
    cache_dir=None,
    cache_mkdir_kwargs=None,
    clear_cache=None,
    max_cache_count=None,
    save_cache_kwargs=None,
    load_cache_kwargs=None,
    template_context=None,
    **kwargs
)

Class for managing knowledge asset cache.

For defaults, see knowledge.

Superclasses

Inherited members

cache_dir class property ¶

Cache directory.

cleanup_cache_dir method ¶

AssetCacheManager.cleanup_cache_dir()

Keep only the most recent assets.

generate_cache_key class method ¶

AssetCacheManager.generate_cache_key(
    **kwargs
)

Generate a cache key based on the current VBT version, settings, and keyword arguments.

load_asset method ¶

AssetCacheManager.load_asset(
    cache_key
)

Load the knowledge asset under a cache key.

load_cache_kwargs class property ¶

Keyword arguments passed to load.

max_cache_count class property ¶

Maximum number of assets to be cached.

Keeps only the most recent assets.

persist_cache class property ¶

Whether to persist cache on disk.

save_asset method ¶

AssetCacheManager.save_asset(
    asset,
    cache_key
)

Save a knowledge asset under a cache key.

save_cache_kwargs class property ¶

Keyword arguments passed to save.

KnowledgeAsset class ¶

KnowledgeAsset(
    data=None,
    single_item=True,
    **kwargs
)

Class for working with a knowledge asset.

This class behaves like a mutable sequence.

For defaults, see knowledge.

Superclasses

Base
Cacheable
Chainable
Comparable
Configured
Contextable
HasSettings
Pickleable
Prettified
RankContextable
Rankable
collections.abc.Collection
collections.abc.Container
collections.abc.Iterable
collections.abc.MutableSequence
collections.abc.Reversible
collections.abc.Sequence
collections.abc.Sized

Inherited members

Subclasses

VBTAsset

append_item method ¶

KnowledgeAsset.append_item(
    d,
    inplace=False
)

Append a new data item.

Returns a new KnowledgeAsset instance if inplace is False.

apply method ¶

KnowledgeAsset.apply(
    func,
    *args,
    execute_kwargs=None,
    wrap=None,
    single_item=None,
    return_iterator=False,
    **kwargs
)

Apply a function to each data item.

Function can be either a callable, a tuple of function and its arguments, a Task instance, a subclass of AssetFunc or its prefix or full name. Moreover, function can be a list of the above. In such a case, BasicAssetPipeline will be used. If function is a valid expression, ComplexAssetPipeline will be used.

Uses execute for execution.

If wrap is True, returns a new KnowledgeAsset instance, otherwise raw output.

Usage

>>> asset.apply(["flatten", ("query", len)])
[5, 5, 5, 5, 6]

>>> asset.apply("query(flatten(d), len)")
[5, 5, 5, 5, 6]

collect method ¶

KnowledgeAsset.collect(
    sort_keys=None,
    **kwargs
)

Collect values of each key in each data item.

combine class method ¶

KnowledgeAsset.combine(
    *objs,
    **kwargs
)

Combine multiple KnowledgeAsset instances into one.

Usage

>>> asset1 = asset[[0, 1]]
>>> asset2 = asset[[2, 3]]
>>> asset1.combine(asset2).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}}]

data class property ¶

Data.

delete_items method ¶

KnowledgeAsset.delete_items(
    index,
    inplace=False
)

Delete one or more data items.

Returns a new KnowledgeAsset instance if inplace is False.

describe method ¶

KnowledgeAsset.describe(
    ignore_empty=None,
    describe_kwargs=None,
    wrap=False,
    **kwargs
)

Collect and describe each key in each data item.

describe_lengths class method ¶

KnowledgeAsset.describe_lengths(
    lengths,
    **describe_kwargs
)

Describe values representing lengths.

dump method ¶

KnowledgeAsset.dump(
    source=None,
    dump_engine=None,
    template_context=None,
    **kwargs
)

Dump data items.

Uses KnowledgeAsset.apply on DumpAssetFunc.

Following engines are supported:

"repr": Dumping with repr
"prettify": Dumping with prettify
"nestedtext": Dumping with NestedText (https://pypi.org/project/nestedtext/)
"yaml": Dumping with YAML
"toml": Dumping with TOML (https://pypi.org/project/toml/)
"json": Dumping with JSON

Use argument source to also preprocess the source. It can be a string or function (will become a template), or any custom template. In this template, the index of the data item is represented by "i", the data item itself is represented by "d" while its fields are represented by their names.

Keyword arguments are passed to the respective engine.

Usage

>>> print(asset.dump(source="{i: d}", default_flow_style=True).join())
{0: {s: ABC, b: true, d2: {c: red, l: [1, 2]}}}
{1: {s: BCD, b: true, d2: {c: blue, l: [3, 4]}}}
{2: {s: CDE, b: false, d2: {c: green, l: [5, 6]}}}
{3: {s: DEF, b: false, d2: {c: yellow, l: [7, 8]}}}
{4: {s: EFG, b: false, d2: {c: black, l: [9, 10]}, xyz: 123}}

dump_all method ¶

KnowledgeAsset.dump_all(
    source=None,
    dump_engine=None,
    template_context=None,
    **kwargs
)

Dump data list as a single data item.

See KnowledgeAsset.dump for arguments.

embed method ¶

KnowledgeAsset.embed(
    to_documents_kwargs=None,
    wrap_documents=None,
    **kwargs
)

Embed documents.

First, converts to TextDocument format using KnowledgeAsset.to_documents and **to_documents_kwargs. Then, uses embed_documents with **kwargs for actual ranking.

extend_items method ¶

KnowledgeAsset.extend_items(
    data,
    inplace=False
)

Extend by new data items.

Returns a new KnowledgeAsset instance if inplace is False.

filter method ¶

KnowledgeAsset.filter(
    *args,
    **kwargs
)

Call KnowledgeAsset.query and return a new KnowledgeAsset instance.

find method ¶

KnowledgeAsset.find(
    target,
    path=None,
    per_path=None,
    find_all=None,
    keep_path=None,
    skip_missing=None,
    source=None,
    in_dumps=None,
    dump_kwargs=None,
    template_context=None,
    return_type=None,
    return_path=None,
    merge_matches=None,
    merge_fields=None,
    unique_matches=None,
    unique_fields=None,
    **kwargs
)

Find occurrences and return a new KnowledgeAsset instance.

Uses KnowledgeAsset.apply on FindAssetFunc.

Uses contains_in_obj (keyword arguments are passed here) to find any occurrences in each data item if return_type is "item" (returns the data item when matched), return_type is "field" (returns the field), or return_type is "bool" (returns True when matched). For all other return types, uses find_in_obj and find.

Target can be one or multiple data items. If there are multiple targets and find_all is True, the match function will return True only if all targets have been found.

Use argument path to specify what part of the data item should be searched. For example, "x.y[0].z" to navigate nested dictionaries/lists. If keep_path is True, the data item will be represented as a nested dictionary with path as keys. If multiple paths are provided, keep_path automatically becomes True, and they will be merged into one nested dictionary. If skip_missing is True and path is missing in the data item, will skip the data item. If per_path is True, will consider targets to be provided per path.

Use argument source instead of path or in addition to path to also preprocess the source. It can be a string or function (will become a template), or any custom template. In this template, the index of the data item is represented by "i", the data item itself is represented by "d", the data item under the path is represented by "x" while its fields are represented by their names.

Set in_dumps to True to convert the entire data item to string and search in that string. Will use dump with **dump_kwargs.

Disable merge_matches and merge_fields to keep empty lists when searching for matches and fields respectively. Disable unique_matches and unique_fields to keep duplicate matches and fields respectively.

Usage

>>> asset.find("BC").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}}]

>>> asset.find("BC", return_type="bool").get()
[True, True, False, False, False]

>>> asset.find(vbt.Not("BC")).get()
[{'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

>>> asset.find("bc", ignore_case=True).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}}]

>>> asset.find("bl", path="d2.c").get()
[{'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

>>> asset.find(5, path="d2.l[0]").get()
[{'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}}]

>>> asset.find(True, path="d2.l", source=lambda x: sum(x) >= 10).get()
[{'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

>>> asset.find(["A", "B", "C"]).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}}]

>>> asset.find(["A", "B", "C"], find_all=True).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}}]

>>> asset.find(r"[ABC]+", mode="regex").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}}]

>>> asset.find("yenlow", mode="fuzzy").get()
[{'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}}]

>>> asset.find("yenlow", mode="fuzzy", return_type="match").get()
'yellow'

>>> asset.find("yenlow", mode="fuzzy", return_type="match", merge_matches=False).get()
[[], [], [], ['yellow'], []]

>>> asset.find("yenlow", mode="fuzzy", return_type="match", return_path=True).get()
[{}, {}, {}, {('d2', 'c'): ['yellow']}, {}]

>>> asset.find("xyz", in_dumps=True).get()
[{'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

find_code method ¶

KnowledgeAsset.find_code(
    target=None,
    language=None,
    in_blocks=None,
    escape_target=True,
    escape_language=True,
    return_type='match',
    flags=0,
    **kwargs
)

Find code using KnowledgeAsset.find.

For defaults, see code in knowledge.

find_remove method ¶

KnowledgeAsset.find_remove(
    target,
    path=None,
    per_path=None,
    find_all=None,
    keep_path=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    **kwargs
)

Find and remove occurrences and return a new KnowledgeAsset instance.

Uses KnowledgeAsset.apply on FindRemoveAssetFunc.

Similar to KnowledgeAsset.find_replace.

find_remove_empty method ¶

KnowledgeAsset.find_remove_empty(
    **kwargs
)

Find and remove empty objects.

find_replace method ¶

KnowledgeAsset.find_replace(
    target,
    replacement=None,
    path=None,
    per_path=None,
    find_all=None,
    keep_path=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    **kwargs
)

Find and replace occurrences and return a new KnowledgeAsset instance.

Uses KnowledgeAsset.apply on FindReplaceAssetFunc.

Uses find_in_obj (keyword arguments are passed here) to find occurrences in each data item. Then, uses replace_in_obj to replace them.

Target can be one or multiple of data items, either as a list or a dictionary. If there are multiple targets and find_all is True, the match function will return True only if all targets have been found.

Use argument path to specify what part of the data item should be searched. For example, "x.y[0].z" to navigate nested dictionaries/lists. If keep_path is True, the data item will be represented as a nested dictionary with path as keys. If multiple paths are provided, keep_path automatically becomes True, and they will be merged into one nested dictionary. If skip_missing is True and path is missing in the data item, will skip the data item. If per_path is True, will consider targets and replacements to be provided per path.

Set make_copy to True to not modify original data.

Set changed_only to True to keep only the data items that have been changed.

Usage

>>> asset.find_replace("BC", "XY").get()
[{'s': 'AXY', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'XYD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

>>> asset.find_replace("BC", "XY", changed_only=True).get()
[{'s': 'AXY', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'XYD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}}]

>>> asset.find_replace(r"(D)E(F)", r"X", mode="regex", changed_only=True).get()
[{'s': 'DXF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}}]

>>> asset.find_replace(True, False, changed_only=True).get()
[{'s': 'ABC', 'b': False, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': False, 'd2': {'c': 'blue', 'l': [3, 4]}}]

>>> asset.find_replace(3, 30, path="d2.l", changed_only=True).get()
[{'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [30, 4]}}]

>>> asset.find_replace({1: 10, 4: 40}, path="d2.l", changed_only=True).get()
>>> asset.find_replace({1: 10, 4: 40}, path=["d2.l[0]", "d2.l[1]"], changed_only=True).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [10, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 40]}}]

>>> asset.find_replace({1: 10, 4: 40}, find_all=True, changed_only=True).get()
[]

>>> asset.find_replace({1: 10, 2: 20}, find_all=True, changed_only=True).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [10, 20]}}]

>>> asset.find_replace("a", "X", path=["s", "d2.c"], ignore_case=True, changed_only=True).get()
[{'s': 'XBC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'blXck', 'l': [9, 10]}, 'xyz': 123}]

>>> asset.find_replace(123, 456, path="xyz", skip_missing=True, changed_only=True).get()
[{'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 456}]

flatten method ¶

KnowledgeAsset.flatten(
    path=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    **kwargs
)

Flatten data items or parts of them.

Uses KnowledgeAsset.apply on FlattenAssetFunc.

Use argument path to specify what part of the data item should be set. For example, "x.y[0].z" to navigate nested dictionaries/lists. Multiple paths can be provided. If skip_missing is True and path is missing in the data item, will skip the data item.

Set make_copy to True to not modify original data.

Set changed_only to True to keep only the data items that have been changed.

Keyword arguments are passed to flatten_obj.

Usage

>>> asset.flatten().get()
[{'s': 'ABC',
  'b': True,
  ('d2', 'c'): 'red',
  ('d2', 'l', 0): 1,
  ('d2', 'l', 1): 2},
  ...
 {'s': 'EFG',
  'b': False,
  ('d2', 'c'): 'black',
  ('d2', 'l', 0): 9,
  ('d2', 'l', 1): 10,
  'xyz': 123}]

from_json_bytes class method ¶

KnowledgeAsset.from_json_bytes(
    bytes_,
    compression=None,
    decompress_kwargs=None,
    **kwargs
)

Build KnowledgeAsset from JSON bytes.

from_json_file class method ¶

KnowledgeAsset.from_json_file(
    path,
    compression=None,
    decompress_kwargs=None,
    **kwargs
)

Build KnowledgeAsset from a JSON file.

get method ¶

KnowledgeAsset.get(
    path=None,
    keep_path=None,
    skip_missing=None,
    source=None,
    template_context=None,
    **kwargs
)

Get data items or parts of them.

Uses KnowledgeAsset.apply on GetAssetFunc.

Use argument path to specify what part of the data item should be got. For example, "x.y[0].z" to navigate nested dictionaries/lists. If keep_path is True, the data item will be represented as a nested dictionary with path as keys. If multiple paths are provided, keep_path automatically becomes True, and they will be merged into one nested dictionary. If skip_missing is True and path is missing in the data item, will skip the data item.

Use argument source instead of path or in addition to path to also preprocess the source. It can be a string or function (will become a template), or any custom template. In this template, the index of the data item is represented by "i", the data item itself is represented by "d", the data item under the path is represented by "x" while its fields are represented by their names.

Usage

>>> asset.get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

>>> asset.get("d2.l[0]")
[1, 3, 5, 7, 9]

>>> asset.get("d2.l", source=lambda x: sum(x))
[3, 7, 11, 15, 19]

>>> asset.get("d2.l[0]", keep_path=True)
[{'d2': {'l': {0: 1}}},
 {'d2': {'l': {0: 3}}},
 {'d2': {'l': {0: 5}}},
 {'d2': {'l': {0: 7}}},
 {'d2': {'l': {0: 9}}}]

>>> asset.get(["d2.l[0]", "d2.l[1]"])
[{'d2': {'l': {0: 1, 1: 2}}},
 {'d2': {'l': {0: 3, 1: 4}}},
 {'d2': {'l': {0: 5, 1: 6}}},
 {'d2': {'l': {0: 7, 1: 8}}},
 {'d2': {'l': {0: 9, 1: 10}}}]

>>> asset.get("xyz", skip_missing=True)
[123]

get_items method ¶

KnowledgeAsset.get_items(
    index
)

Get one or more data items.

get_keys_and_groups class method ¶

KnowledgeAsset.get_keys_and_groups(
    by,
    uniform_groups=False
)

get keys and groups.

groupby_reduce method ¶

KnowledgeAsset.groupby_reduce(
    func,
    *args,
    by=None,
    uniform_groups=None,
    get_kwargs=None,
    execute_kwargs=None,
    return_group_keys=False,
    **kwargs
)

Group data items by keys and reduce.

If by is provided, uses it as path in KnowledgeAsset.get, groups by unique values, and runs KnowledgeAsset.reduce on each group.

Set uniform_groups to True to only group unique values that are located adjacent to each other.

Variable arguments are passed to each call of KnowledgeAsset.reduce.

insert method ¶

KnowledgeAsset.insert(
    index,
    value
)

S.insert(index, value) -- insert value before index

join method ¶

KnowledgeAsset.join(
    separator=None
)

Join the list of string data items.

merge class method ¶

KnowledgeAsset.merge(
    *objs,
    flatten_kwargs=None,
    **kwargs
)

Either merge multiple KnowledgeAsset instances into one if called as a class method or instance method with at least one additional object, or merge data items of a single instance if called as an instance method with no additional objects.

Usage

>>> asset1 = asset.select(["s"])
>>> asset2 = asset.select(["b", "d2"])
>>> asset1.merge(asset2).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}}]

merge_dicts method ¶

KnowledgeAsset.merge_dicts(
    **kwargs
)

Merge (dict) date items into a single dict.

Final keyword arguments are passed to merge_dicts.

merge_lists method ¶

KnowledgeAsset.merge_lists(
    **kwargs
)

Merge (list) date items into a single list.

modify_data method ¶

KnowledgeAsset.modify_data(
    data
)

Modify data in place.

move method ¶

KnowledgeAsset.move(
    path,
    new_path=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    **kwargs
)

Move data items or parts of them.

Uses KnowledgeAsset.apply on MoveAssetFunc.

Use argument path to specify what part of the data item should be renamed. For example, "x.y[0].z" to navigate nested dictionaries/lists. Multiple paths can be provided. If skip_missing is True and path is missing in the data item, will skip the data item.

Use argument new_path to specify the last part of the data item (i.e., token) that should be renamed to. Multiple tokens can be provided. If None, path must be a dictionary.

Set make_copy to True to not modify original data.

Set changed_only to True to keep only the data items that have been changed.

Usage

>>> asset.move("d2.l", "l").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red'}, 'l': [1, 2]},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue'}, 'l': [3, 4]},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green'}, 'l': [5, 6]},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow'}, 'l': [7, 8]},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black'}, 'xyz': 123, 'l': [9, 10]}]

>>> asset.move({"d2.c": "c", "b": "d2.b"}).get()
>>> asset.move(["d2.c", "b"], ["c", "d2.b"]).get()
[{'s': 'ABC', 'd2': {'l': [1, 2], 'b': True}, 'c': 'red'},
 {'s': 'BCD', 'd2': {'l': [3, 4], 'b': True}, 'c': 'blue'},
 {'s': 'CDE', 'd2': {'l': [5, 6], 'b': False}, 'c': 'green'},
 {'s': 'DEF', 'd2': {'l': [7, 8], 'b': False}, 'c': 'yellow'},
 {'s': 'EFG', 'd2': {'l': [9, 10], 'b': False}, 'xyz': 123, 'c': 'black'}]

print method ¶

KnowledgeAsset.print(
    *args,
    **kwargs
)

Convert to a context and print.

Uses KnowledgeAsset.to_context.

print_sample method ¶

KnowledgeAsset.print_sample(
    k=None,
    seed=None,
    **kwargs
)

Print a random sample.

Keyword arguments are passed to KnowledgeAsset.print.

print_schema method ¶

KnowledgeAsset.print_schema(
    **kwargs
)

Print schema.

Keyword arguments are split between KnowledgeAsset.describe and dir_tree_from_paths.

Usage

>>> asset.print_schema()
/
├── s [5/5, str]
├── b [2/5, bool]
├── d2 [5/5, dict]
│   ├── c [5/5, str]
│   └── l
│       ├── 0 [5/5, int]
│       └── 1 [5/5, int]
└── xyz [1/5, int]

2 directories, 6 files

query method ¶

KnowledgeAsset.query(
    expression,
    query_engine=None,
    template_context=None,
    return_type=None,
    **kwargs
)

Query using an engine and return the queried data item(s).

Following engines are supported:

"jmespath": Evaluation with jmespath package
"jsonpath", "jsonpath-ng" or "jsonpath_ng": Evaluation with jsonpath-ng package
"jsonpath.ext", "jsonpath-ng.ext" or "jsonpath_ng.ext": Evaluation with extended jsonpath-ng package
None or "template": Evaluation of each data item as a template. The index of the data item is represented by "i", the data item itself is represented by "d", the data item under the path is represented by "x" while its fields are represented by their names. Uses KnowledgeAsset.apply on QueryAssetFunc.
"pandas": Same as above but variables being columns

If return_type is "item", returns the data item when matched. If return_type is "bool", returns True when matched.

Templates can also use the functions defined in search_config.

They work on single values and sequences alike.

Keyword arguments are passed to the respective search/parse/evaluation function.

Usage

>>> asset.query("d['s'] == 'ABC'")
>>> asset.query("x['s'] == 'ABC'")
>>> asset.query("s == 'ABC'")
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}}]

>>> asset.query("x['s'] == 'ABC'", return_type="bool")
[True, False, False, False, False]

>>> asset.query("find('BC', s)")
>>> asset.query(lambda s: "BC" in s)
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}}]

>>> asset.query("[?contains(s, 'BC')].s", query_engine="jmespath")
['ABC', 'BCD']

>>> asset.query("[].d2.c", query_engine="jmespath")
['red', 'blue', 'green', 'yellow', 'black']

>>> asset.query("[?d2.c != `blue`].d2.l", query_engine="jmespath")
[[1, 2], [5, 6], [7, 8], [9, 10]]

>>> asset.query("$[*].d2.c", query_engine="jsonpath.ext")
['red', 'blue', 'green', 'yellow', 'black']

>>> asset.query("$[?(@.b == true)].s", query_engine="jsonpath.ext")
['ABC', 'BCD']

>>> asset.query("s[b]", query_engine="pandas")
['ABC', 'BCD']

rank method ¶

KnowledgeAsset.rank(
    query,
    to_documents_kwargs=None,
    wrap_documents=None,
    cache_documents=False,
    cache_key=None,
    asset_cache_manager=None,
    asset_cache_manager_kwargs=None,
    silence_warnings=False,
    **kwargs
)

Rank documents by their similarity to a query.

First, converts to TextDocument format using KnowledgeAsset.to_documents and **to_documents_kwargs. Then, uses rank_documents with **kwargs for actual ranking.

If cache_documents is True and cache_key is not None, will use an asset cache manager to store the generated text documents in a local and/or disk cache after conversion. Running the same method again will use the cached documents.

reduce method ¶

KnowledgeAsset.reduce(
    func,
    *args,
    initializer=None,
    by=None,
    template_context=None,
    show_progress=None,
    pbar_kwargs=None,
    wrap=None,
    return_iterator=False,
    **kwargs
)

Reduce data items.

Function can be a callable, a tuple of function and its arguments, a Task instance, a subclass of AssetFunc or its prefix or full name. It can also be an expression or a template. In this template, the index of the data item is represented by "i", the data items themselves are represented by "d1" and "d2" or "x1" and "x2".

If an initializer is provided, the first set of values will be d1=initializer and d2=self.data[0]. If not, it will be d1=self.data[0] and d2=self.data[1].

If by is provided, see KnowledgeAsset.groupby_reduce.

If wrap is True, returns a new KnowledgeAsset instance, otherwise raw output.

Usage

>>> asset.reduce(lambda d1, d2: vbt.merge_dicts(d1, d2))
>>> asset.reduce(vbt.merge_dicts)
>>> asset.reduce("{**d1, **d2}")
{'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}

>>> asset.reduce("{**d1, **d2}", by="b")
[{'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

remove method ¶

KnowledgeAsset.remove(
    path,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    **kwargs
)

Remove data items or parts of them.

If path is an integer, removes the entire data item at that index.

Uses KnowledgeAsset.apply on RemoveAssetFunc.

Use argument path to specify what part of the data item should be set. For example, "x.y[0].z" to navigate nested dictionaries/lists. Multiple paths can be provided. If skip_missing is True and path is missing in the data item, will skip the data item.

Set make_copy to True to not modify original data.

Set changed_only to True to keep only the data items that have been changed.

Usage

>>> asset.remove("d2.l[0]").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [10]}, 'xyz': 123}]

>>> asset.remove("xyz", skip_missing=True).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}}]

remove_empty method ¶

KnowledgeAsset.remove_empty(
    inplace=False
)

Remove empty data items.

rename method ¶

KnowledgeAsset.rename(
    path,
    new_token=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    **kwargs
)

Rename data items or parts of them.

Uses KnowledgeAsset.apply on RenameAssetFunc.

Same as KnowledgeAsset.move but must specify new token instead of new path.

Usage

>>> asset.rename("d2.l", "x").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'x': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'x': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'x': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'x': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'x': [9, 10]}, 'xyz': 123}]

>>> asset.rename("xyz", "zyx", skip_missing=True, changed_only=True).get()
[{'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'zyx': 123}]

reorder method ¶

KnowledgeAsset.reorder(
    new_order,
    path=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    template_context=None,
    **kwargs
)

Reorder data items or parts of them.

Uses KnowledgeAsset.apply on ReorderAssetFunc.

Can change order in dicts based on reorder_dict and sequences based on reorder_list.

Argument new_order can be a sequence of tokens. To not reorder a subset of keys, they can be replaced by an ellipsis (...). For example, ["a", ..., "z"] puts the token "a" at the start and the token "z" at the end while other tokens are left in the original order. If new_order is a string, it can be "asc"/"ascending" or "desc"/"descending". Other than that, it can be a string or function (will become a template), or any custom template. In this template, the data item is the index of the data item is represented by "i", the data item itself is represented by "d", the data item under the path is represented by "x" while its fields are represented by their names.

Use argument path to specify what part of the data item should be set. For example, "x.y[0].z" to navigate nested dictionaries/lists. Multiple paths can be provided. If skip_missing is True and path is missing in the data item, will skip the data item.

Set make_copy to True to not modify original data.

Set changed_only to True to keep only the data items that have been changed.

Usage

>>> asset.reorder(["xyz", ...], skip_missing=True).get()
>>> asset.reorder(lambda x: ["xyz", ...] if "xyz" in x else [...]).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'xyz': 123, 's': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}}]

>>> asset.reorder("descending", path="d2.l").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [2, 1]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [4, 3]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [6, 5]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [8, 7]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [10, 9]}, 'xyz': 123}]

sample method ¶

KnowledgeAsset.sample(
    k=None,
    seed=None,
    wrap=True
)

Pick a random sample of data items.

select method ¶

KnowledgeAsset.select(
    *args,
    **kwargs
)

Call KnowledgeAsset.get and return a new KnowledgeAsset instance.

set method ¶

KnowledgeAsset.set(
    value,
    path=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    template_context=None,
    **kwargs
)

Set data items or parts of them.

Uses KnowledgeAsset.apply on SetAssetFunc.

Argument value can be any value, function (will become a template), or a template. In this template, the index of the data item is represented by "i", the data item itself is represented by "d", the data item under the path is represented by "x" while its fields are represented by their names.

Use argument path to specify what part of the data item should be set. For example, "x.y[0].z" to navigate nested dictionaries/lists. Multiple paths can be provided. If skip_missing is True and path is missing in the data item, will skip the data item.

Set make_copy to True to not modify original data.

Set changed_only to True to keep only the data items that have been changed.

Usage

>>> asset.set(lambda d: sum(d["d2"]["l"])).get()
[3, 7, 11, 15, 19]

>>> asset.set(lambda d: sum(d["d2"]["l"]), path="d2.sum").get()
>>> asset.set(lambda x: sum(x["l"]), path="d2.sum").get()
>>> asset.set(lambda l: sum(l), path="d2.sum").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2], 'sum': 3}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4], 'sum': 7}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6], 'sum': 11}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8], 'sum': 15}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10], 'sum': 19}, 'xyz': 123}]

>>> asset.set(lambda l: sum(l), path="d2.l").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': 3}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': 7}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': 11}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': 15}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': 19}, 'xyz': 123}]

set_items method ¶

KnowledgeAsset.set_items(
    index,
    value,
    inplace=False
)

Set one or more data items.

Returns a new KnowledgeAsset instance if inplace is False.

shuffle method ¶

KnowledgeAsset.shuffle(
    seed=None,
    inplace=False
)

Shuffle data items.

single_item class property ¶

Whether this instance holds a single item.

sort method ¶

KnowledgeAsset.sort(
    *args,
    keys=None,
    ascending=True,
    inplace=False,
    **kwargs
)

Sort based on KnowledgeAsset.get called on *args and **kwargs.

Returns a new KnowledgeAsset instance if inplace is False.

Usage

>>> asset.sort("d2.c").get()
[{'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}}]

split_text method ¶

KnowledgeAsset.split_text(
    text_path=None,
    merge_chunks=None,
    **kwargs
)

Split text.

Uses KnowledgeAsset.apply on SplitTextAssetFunc.

Use argument text_path to specify a path to the content.

If merge_chunks is True, merges all chunks into a single list.

Uses split_text with **split_text_kwargs for text splitting.

to_context method ¶

KnowledgeAsset.to_context(
    *args,
    dump_all=None,
    separator=None,
    **kwargs
)

Convert to a context.

If dump_all is True, calls KnowledgeAsset.dump_all with *args and **kwargs. Otherwise, calls KnowledgeAsset.dump.

Finally, calls KnowledgeAsset.join with separator.

to_documents method ¶

KnowledgeAsset.to_documents(
    **kwargs
)

Convert to documents of type TextDocument.

Document-related keyword arguments may contain templates. In such templates, the index of the data item is represented by "i", the data item itself is represented by "d", the data item under the path is represented by "x" while its fields are represented by their names.

unflatten method ¶

KnowledgeAsset.unflatten(
    path=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    **kwargs
)

Unflatten data items or parts of them.

Uses KnowledgeAsset.apply on UnflattenAssetFunc.

Use argument path to specify what part of the data item should be set. For example, "x.y[0].z" to navigate nested dictionaries/lists. Multiple paths can be provided. If skip_missing is True and path is missing in the data item, will skip the data item.

Set make_copy to True to not modify original data.

Set changed_only to True to keep only the data items that have been changed.

Keyword arguments are passed to unflatten_obj.

Usage

>>> asset.flatten().unflatten().get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

unique method ¶

KnowledgeAsset.unique(
    *args,
    keep='first',
    inplace=False,
    **kwargs
)

De-duplicate based on KnowledgeAsset.get called on *args and **kwargs.

Returns a new KnowledgeAsset instance if inplace is False.

Usage

>>> asset.unique("b").get()
[{'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}}]

MetaKnowledgeAsset class ¶

MetaKnowledgeAsset(
    name,
    bases,
    attrs
)

Metaclass for KnowledgeAsset.

Superclasses

MetaConfigured
abc.ABCMeta
builtins.type

base_assets module¶

asset_cache dict¶

AssetCacheManager class¶

cache_dir class property¶

cleanup_cache_dir method¶

generate_cache_key class method¶

load_asset method¶

load_cache_kwargs class property¶

max_cache_count class property¶

persist_cache class property¶

save_asset method¶

save_cache_kwargs class property¶

KnowledgeAsset class¶

append_item method¶

apply method¶

collect method¶

combine class method¶

data class property¶

delete_items method¶

describe method¶

describe_lengths class method¶

dump method¶

dump_all method¶

embed method¶

extend_items method¶

filter method¶

find method¶

find_code method¶

find_remove method¶

find_remove_empty method¶

find_replace method¶

flatten method¶

from_json_bytes class method¶

from_json_file class method¶

get method¶

get_items method¶

get_keys_and_groups class method¶

groupby_reduce method¶

insert method¶

join method¶

merge class method¶

merge_dicts method¶

merge_lists method¶

modify_data method¶

move method¶

print method¶

print_sample method¶

print_schema method¶

query method¶

rank method¶

reduce method¶

remove method¶

remove_empty method¶

rename method¶

reorder method¶

sample method¶

select method¶

set method¶

set_items method¶

shuffle method¶

single_item class property¶

sort method¶

split_text method¶

to_context method¶

to_documents method¶

unflatten method¶

unique method¶

MetaKnowledgeAsset class¶

base_assets module ¶

asset_cache dict ¶

AssetCacheManager class ¶

cache_dir class property ¶

cleanup_cache_dir method ¶

generate_cache_key class method ¶

load_asset method ¶

load_cache_kwargs class property ¶

max_cache_count class property ¶

persist_cache class property ¶

save_asset method ¶

save_cache_kwargs class property ¶

KnowledgeAsset class ¶

append_item method ¶

apply method ¶

collect method ¶

combine class method ¶

data class property ¶

delete_items method ¶

describe method ¶

describe_lengths class method ¶

dump method ¶

dump_all method ¶

embed method ¶

extend_items method ¶

filter method ¶

find method ¶

find_code method ¶

find_remove method ¶

find_remove_empty method ¶

find_replace method ¶

flatten method ¶

from_json_bytes class method ¶

from_json_file class method ¶

get method ¶

get_items method ¶

get_keys_and_groups class method ¶

groupby_reduce method ¶

insert method ¶

join method ¶

merge class method ¶

merge_dicts method ¶

merge_lists method ¶

modify_data method ¶

move method ¶

print method ¶

print_sample method ¶

print_schema method ¶

query method ¶

rank method ¶

reduce method ¶

remove method ¶

remove_empty method ¶

rename method ¶

reorder method ¶

sample method ¶

select method ¶

set method ¶

set_items method ¶

shuffle method ¶

single_item class property ¶

sort method ¶

split_text method ¶

to_context method ¶

to_documents method ¶

unflatten method ¶

unique method ¶

MetaKnowledgeAsset class ¶