Skip to content

base_assets module

Base asset classes.

See vectorbtpro.utils.knowledge for the toy dataset.


asset_cache dict

Asset cache.


AssetCacheManager class

AssetCacheManager(
    persist_cache=None,
    cache_dir=None,
    cache_mkdir_kwargs=None,
    clear_cache=None,
    max_cache_count=None,
    save_cache_kwargs=None,
    load_cache_kwargs=None,
    template_context=None,
    **kwargs
)

Class for managing knowledge asset cache.

For defaults, see knowledge.

Superclasses

Inherited members


cache_dir class property

Cache directory.


cleanup_cache_dir method

AssetCacheManager.cleanup_cache_dir()

Keep only the most recent assets.


generate_cache_key class method

AssetCacheManager.generate_cache_key(
    **kwargs
)

Generate a cache key based on the current VBT version, settings, and keyword arguments.


load_asset method

AssetCacheManager.load_asset(
    cache_key
)

Load the knowledge asset under a cache key.


load_cache_kwargs class property

Keyword arguments passed to load.


max_cache_count class property

Maximum number of assets to be cached.

Keeps only the most recent assets.


persist_cache class property

Whether to persist cache on disk.


save_asset method

AssetCacheManager.save_asset(
    asset,
    cache_key
)

Save a knowledge asset under a cache key.


save_cache_kwargs class property

Keyword arguments passed to save.


KnowledgeAsset class

KnowledgeAsset(
    data=None,
    single_item=True,
    **kwargs
)

Class for working with a knowledge asset.

This class behaves like a mutable sequence.

For defaults, see knowledge.

Superclasses

Inherited members

Subclasses


append_item method

KnowledgeAsset.append_item(
    d,
    inplace=False
)

Append a new data item.

Returns a new KnowledgeAsset instance if inplace is False.


apply method

KnowledgeAsset.apply(
    func,
    *args,
    execute_kwargs=None,
    wrap=None,
    single_item=None,
    return_iterator=False,
    **kwargs
)

Apply a function to each data item.

Function can be either a callable, a tuple of function and its arguments, a Task instance, a subclass of AssetFunc or its prefix or full name. Moreover, function can be a list of the above. In such a case, BasicAssetPipeline will be used. If function is a valid expression, ComplexAssetPipeline will be used.

Uses execute for execution.

If wrap is True, returns a new KnowledgeAsset instance, otherwise raw output.

Usage

>>> asset.apply(["flatten", ("query", len)])
[5, 5, 5, 5, 6]

>>> asset.apply("query(flatten(d), len)")
[5, 5, 5, 5, 6]

collect method

KnowledgeAsset.collect(
    sort_keys=None,
    **kwargs
)

Collect values of each key in each data item.


combine class method

KnowledgeAsset.combine(
    *objs,
    **kwargs
)

Combine multiple KnowledgeAsset instances into one.

Usage

>>> asset1 = asset[[0, 1]]
>>> asset2 = asset[[2, 3]]
>>> asset1.combine(asset2).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}}]

data class property

Data.


delete_items method

KnowledgeAsset.delete_items(
    index,
    inplace=False
)

Delete one or more data items.

Returns a new KnowledgeAsset instance if inplace is False.


describe method

KnowledgeAsset.describe(
    ignore_empty=None,
    describe_kwargs=None,
    wrap=False,
    **kwargs
)

Collect and describe each key in each data item.


describe_lengths class method

KnowledgeAsset.describe_lengths(
    lengths,
    **describe_kwargs
)

Describe values representing lengths.


dump method

KnowledgeAsset.dump(
    source=None,
    dump_engine=None,
    template_context=None,
    **kwargs
)

Dump data items.

Uses KnowledgeAsset.apply on DumpAssetFunc.

Following engines are supported:

Use argument source to also preprocess the source. It can be a string or function (will become a template), or any custom template. In this template, the index of the data item is represented by "i", the data item itself is represented by "d" while its fields are represented by their names.

Keyword arguments are passed to the respective engine.

Usage

>>> print(asset.dump(source="{i: d}", default_flow_style=True).join())
{0: {s: ABC, b: true, d2: {c: red, l: [1, 2]}}}
{1: {s: BCD, b: true, d2: {c: blue, l: [3, 4]}}}
{2: {s: CDE, b: false, d2: {c: green, l: [5, 6]}}}
{3: {s: DEF, b: false, d2: {c: yellow, l: [7, 8]}}}
{4: {s: EFG, b: false, d2: {c: black, l: [9, 10]}, xyz: 123}}

dump_all method

KnowledgeAsset.dump_all(
    source=None,
    dump_engine=None,
    template_context=None,
    **kwargs
)

Dump data list as a single data item.

See KnowledgeAsset.dump for arguments.


embed method

KnowledgeAsset.embed(
    to_documents_kwargs=None,
    wrap_documents=None,
    **kwargs
)

Embed documents.

First, converts to TextDocument format using KnowledgeAsset.to_documents and **to_documents_kwargs. Then, uses embed_documents with **kwargs for actual ranking.


extend_items method

KnowledgeAsset.extend_items(
    data,
    inplace=False
)

Extend by new data items.

Returns a new KnowledgeAsset instance if inplace is False.


filter method

KnowledgeAsset.filter(
    *args,
    **kwargs
)

Call KnowledgeAsset.query and return a new KnowledgeAsset instance.


find method

KnowledgeAsset.find(
    target,
    path=None,
    per_path=None,
    find_all=None,
    keep_path=None,
    skip_missing=None,
    source=None,
    in_dumps=None,
    dump_kwargs=None,
    template_context=None,
    return_type=None,
    return_path=None,
    merge_matches=None,
    merge_fields=None,
    unique_matches=None,
    unique_fields=None,
    **kwargs
)

Find occurrences and return a new KnowledgeAsset instance.

Uses KnowledgeAsset.apply on FindAssetFunc.

Uses contains_in_obj (keyword arguments are passed here) to find any occurrences in each data item if return_type is "item" (returns the data item when matched), return_type is "field" (returns the field), or return_type is "bool" (returns True when matched). For all other return types, uses find_in_obj and find.

Target can be one or multiple data items. If there are multiple targets and find_all is True, the match function will return True only if all targets have been found.

Use argument path to specify what part of the data item should be searched. For example, "x.y[0].z" to navigate nested dictionaries/lists. If keep_path is True, the data item will be represented as a nested dictionary with path as keys. If multiple paths are provided, keep_path automatically becomes True, and they will be merged into one nested dictionary. If skip_missing is True and path is missing in the data item, will skip the data item. If per_path is True, will consider targets to be provided per path.

Use argument source instead of path or in addition to path to also preprocess the source. It can be a string or function (will become a template), or any custom template. In this template, the index of the data item is represented by "i", the data item itself is represented by "d", the data item under the path is represented by "x" while its fields are represented by their names.

Set in_dumps to True to convert the entire data item to string and search in that string. Will use dump with **dump_kwargs.

Disable merge_matches and merge_fields to keep empty lists when searching for matches and fields respectively. Disable unique_matches and unique_fields to keep duplicate matches and fields respectively.

Usage

>>> asset.find("BC").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}}]

>>> asset.find("BC", return_type="bool").get()
[True, True, False, False, False]

>>> asset.find(vbt.Not("BC")).get()
[{'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

>>> asset.find("bc", ignore_case=True).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}}]

>>> asset.find("bl", path="d2.c").get()
[{'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

>>> asset.find(5, path="d2.l[0]").get()
[{'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}}]

>>> asset.find(True, path="d2.l", source=lambda x: sum(x) >= 10).get()
[{'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

>>> asset.find(["A", "B", "C"]).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}}]

>>> asset.find(["A", "B", "C"], find_all=True).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}}]

>>> asset.find(r"[ABC]+", mode="regex").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}}]

>>> asset.find("yenlow", mode="fuzzy").get()
[{'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}}]

>>> asset.find("yenlow", mode="fuzzy", return_type="match").get()
'yellow'

>>> asset.find("yenlow", mode="fuzzy", return_type="match", merge_matches=False).get()
[[], [], [], ['yellow'], []]

>>> asset.find("yenlow", mode="fuzzy", return_type="match", return_path=True).get()
[{}, {}, {}, {('d2', 'c'): ['yellow']}, {}]

>>> asset.find("xyz", in_dumps=True).get()
[{'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

find_code method

KnowledgeAsset.find_code(
    target=None,
    language=None,
    in_blocks=None,
    escape_target=True,
    escape_language=True,
    return_type='match',
    flags=0,
    **kwargs
)

Find code using KnowledgeAsset.find.

For defaults, see code in knowledge.


find_remove method

KnowledgeAsset.find_remove(
    target,
    path=None,
    per_path=None,
    find_all=None,
    keep_path=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    **kwargs
)

Find and remove occurrences and return a new KnowledgeAsset instance.

Uses KnowledgeAsset.apply on FindRemoveAssetFunc.

Similar to KnowledgeAsset.find_replace.


find_remove_empty method

KnowledgeAsset.find_remove_empty(
    **kwargs
)

Find and remove empty objects.


find_replace method

KnowledgeAsset.find_replace(
    target,
    replacement=None,
    path=None,
    per_path=None,
    find_all=None,
    keep_path=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    **kwargs
)

Find and replace occurrences and return a new KnowledgeAsset instance.

Uses KnowledgeAsset.apply on FindReplaceAssetFunc.

Uses find_in_obj (keyword arguments are passed here) to find occurrences in each data item. Then, uses replace_in_obj to replace them.

Target can be one or multiple of data items, either as a list or a dictionary. If there are multiple targets and find_all is True, the match function will return True only if all targets have been found.

Use argument path to specify what part of the data item should be searched. For example, "x.y[0].z" to navigate nested dictionaries/lists. If keep_path is True, the data item will be represented as a nested dictionary with path as keys. If multiple paths are provided, keep_path automatically becomes True, and they will be merged into one nested dictionary. If skip_missing is True and path is missing in the data item, will skip the data item. If per_path is True, will consider targets and replacements to be provided per path.

Set make_copy to True to not modify original data.

Set changed_only to True to keep only the data items that have been changed.

Usage

>>> asset.find_replace("BC", "XY").get()
[{'s': 'AXY', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'XYD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

>>> asset.find_replace("BC", "XY", changed_only=True).get()
[{'s': 'AXY', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'XYD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}}]

>>> asset.find_replace(r"(D)E(F)", r"X", mode="regex", changed_only=True).get()
[{'s': 'DXF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}}]

>>> asset.find_replace(True, False, changed_only=True).get()
[{'s': 'ABC', 'b': False, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': False, 'd2': {'c': 'blue', 'l': [3, 4]}}]

>>> asset.find_replace(3, 30, path="d2.l", changed_only=True).get()
[{'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [30, 4]}}]

>>> asset.find_replace({1: 10, 4: 40}, path="d2.l", changed_only=True).get()
>>> asset.find_replace({1: 10, 4: 40}, path=["d2.l[0]", "d2.l[1]"], changed_only=True).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [10, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 40]}}]

>>> asset.find_replace({1: 10, 4: 40}, find_all=True, changed_only=True).get()
[]

>>> asset.find_replace({1: 10, 2: 20}, find_all=True, changed_only=True).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [10, 20]}}]

>>> asset.find_replace("a", "X", path=["s", "d2.c"], ignore_case=True, changed_only=True).get()
[{'s': 'XBC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'blXck', 'l': [9, 10]}, 'xyz': 123}]

>>> asset.find_replace(123, 456, path="xyz", skip_missing=True, changed_only=True).get()
[{'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 456}]

flatten method

KnowledgeAsset.flatten(
    path=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    **kwargs
)

Flatten data items or parts of them.

Uses KnowledgeAsset.apply on FlattenAssetFunc.

Use argument path to specify what part of the data item should be set. For example, "x.y[0].z" to navigate nested dictionaries/lists. Multiple paths can be provided. If skip_missing is True and path is missing in the data item, will skip the data item.

Set make_copy to True to not modify original data.

Set changed_only to True to keep only the data items that have been changed.

Keyword arguments are passed to flatten_obj.

Usage

>>> asset.flatten().get()
[{'s': 'ABC',
  'b': True,
  ('d2', 'c'): 'red',
  ('d2', 'l', 0): 1,
  ('d2', 'l', 1): 2},
  ...
 {'s': 'EFG',
  'b': False,
  ('d2', 'c'): 'black',
  ('d2', 'l', 0): 9,
  ('d2', 'l', 1): 10,
  'xyz': 123}]

from_json_bytes class method

KnowledgeAsset.from_json_bytes(
    bytes_,
    compression=None,
    decompress_kwargs=None,
    **kwargs
)

Build KnowledgeAsset from JSON bytes.


from_json_file class method

KnowledgeAsset.from_json_file(
    path,
    compression=None,
    decompress_kwargs=None,
    **kwargs
)

Build KnowledgeAsset from a JSON file.


get method

KnowledgeAsset.get(
    path=None,
    keep_path=None,
    skip_missing=None,
    source=None,
    template_context=None,
    **kwargs
)

Get data items or parts of them.

Uses KnowledgeAsset.apply on GetAssetFunc.

Use argument path to specify what part of the data item should be got. For example, "x.y[0].z" to navigate nested dictionaries/lists. If keep_path is True, the data item will be represented as a nested dictionary with path as keys. If multiple paths are provided, keep_path automatically becomes True, and they will be merged into one nested dictionary. If skip_missing is True and path is missing in the data item, will skip the data item.

Use argument source instead of path or in addition to path to also preprocess the source. It can be a string or function (will become a template), or any custom template. In this template, the index of the data item is represented by "i", the data item itself is represented by "d", the data item under the path is represented by "x" while its fields are represented by their names.

Usage

>>> asset.get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

>>> asset.get("d2.l[0]")
[1, 3, 5, 7, 9]

>>> asset.get("d2.l", source=lambda x: sum(x))
[3, 7, 11, 15, 19]

>>> asset.get("d2.l[0]", keep_path=True)
[{'d2': {'l': {0: 1}}},
 {'d2': {'l': {0: 3}}},
 {'d2': {'l': {0: 5}}},
 {'d2': {'l': {0: 7}}},
 {'d2': {'l': {0: 9}}}]

>>> asset.get(["d2.l[0]", "d2.l[1]"])
[{'d2': {'l': {0: 1, 1: 2}}},
 {'d2': {'l': {0: 3, 1: 4}}},
 {'d2': {'l': {0: 5, 1: 6}}},
 {'d2': {'l': {0: 7, 1: 8}}},
 {'d2': {'l': {0: 9, 1: 10}}}]

>>> asset.get("xyz", skip_missing=True)
[123]

get_items method

KnowledgeAsset.get_items(
    index
)

Get one or more data items.


get_keys_and_groups class method

KnowledgeAsset.get_keys_and_groups(
    by,
    uniform_groups=False
)

get keys and groups.


groupby_reduce method

KnowledgeAsset.groupby_reduce(
    func,
    *args,
    by=None,
    uniform_groups=None,
    get_kwargs=None,
    execute_kwargs=None,
    return_group_keys=False,
    **kwargs
)

Group data items by keys and reduce.

If by is provided, uses it as path in KnowledgeAsset.get, groups by unique values, and runs KnowledgeAsset.reduce on each group.

Set uniform_groups to True to only group unique values that are located adjacent to each other.

Variable arguments are passed to each call of KnowledgeAsset.reduce.


insert method

KnowledgeAsset.insert(
    index,
    value
)

S.insert(index, value) -- insert value before index


join method

KnowledgeAsset.join(
    separator=None
)

Join the list of string data items.


merge class method

KnowledgeAsset.merge(
    *objs,
    flatten_kwargs=None,
    **kwargs
)

Either merge multiple KnowledgeAsset instances into one if called as a class method or instance method with at least one additional object, or merge data items of a single instance if called as an instance method with no additional objects.

Usage

>>> asset1 = asset.select(["s"])
>>> asset2 = asset.select(["b", "d2"])
>>> asset1.merge(asset2).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}}]

merge_dicts method

KnowledgeAsset.merge_dicts(
    **kwargs
)

Merge (dict) date items into a single dict.

Final keyword arguments are passed to merge_dicts.


merge_lists method

KnowledgeAsset.merge_lists(
    **kwargs
)

Merge (list) date items into a single list.


modify_data method

KnowledgeAsset.modify_data(
    data
)

Modify data in place.


move method

KnowledgeAsset.move(
    path,
    new_path=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    **kwargs
)

Move data items or parts of them.

Uses KnowledgeAsset.apply on MoveAssetFunc.

Use argument path to specify what part of the data item should be renamed. For example, "x.y[0].z" to navigate nested dictionaries/lists. Multiple paths can be provided. If skip_missing is True and path is missing in the data item, will skip the data item.

Use argument new_path to specify the last part of the data item (i.e., token) that should be renamed to. Multiple tokens can be provided. If None, path must be a dictionary.

Set make_copy to True to not modify original data.

Set changed_only to True to keep only the data items that have been changed.

Usage

>>> asset.move("d2.l", "l").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red'}, 'l': [1, 2]},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue'}, 'l': [3, 4]},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green'}, 'l': [5, 6]},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow'}, 'l': [7, 8]},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black'}, 'xyz': 123, 'l': [9, 10]}]

>>> asset.move({"d2.c": "c", "b": "d2.b"}).get()
>>> asset.move(["d2.c", "b"], ["c", "d2.b"]).get()
[{'s': 'ABC', 'd2': {'l': [1, 2], 'b': True}, 'c': 'red'},
 {'s': 'BCD', 'd2': {'l': [3, 4], 'b': True}, 'c': 'blue'},
 {'s': 'CDE', 'd2': {'l': [5, 6], 'b': False}, 'c': 'green'},
 {'s': 'DEF', 'd2': {'l': [7, 8], 'b': False}, 'c': 'yellow'},
 {'s': 'EFG', 'd2': {'l': [9, 10], 'b': False}, 'xyz': 123, 'c': 'black'}]

print method

KnowledgeAsset.print(
    *args,
    **kwargs
)

Convert to a context and print.

Uses KnowledgeAsset.to_context.


print_sample method

KnowledgeAsset.print_sample(
    k=None,
    seed=None,
    **kwargs
)

Print a random sample.

Keyword arguments are passed to KnowledgeAsset.print.


print_schema method

KnowledgeAsset.print_schema(
    **kwargs
)

Print schema.

Keyword arguments are split between KnowledgeAsset.describe and dir_tree_from_paths.

Usage

>>> asset.print_schema()
/
├── s [5/5, str]
├── b [2/5, bool]
├── d2 [5/5, dict]
│   ├── c [5/5, str]
│   └── l
│       ├── 0 [5/5, int]
│       └── 1 [5/5, int]
└── xyz [1/5, int]

2 directories, 6 files

query method

KnowledgeAsset.query(
    expression,
    query_engine=None,
    template_context=None,
    return_type=None,
    **kwargs
)

Query using an engine and return the queried data item(s).

Following engines are supported:

  • "jmespath": Evaluation with jmespath package
  • "jsonpath", "jsonpath-ng" or "jsonpath_ng": Evaluation with jsonpath-ng package
  • "jsonpath.ext", "jsonpath-ng.ext" or "jsonpath_ng.ext": Evaluation with extended jsonpath-ng package
  • None or "template": Evaluation of each data item as a template. The index of the data item is represented by "i", the data item itself is represented by "d", the data item under the path is represented by "x" while its fields are represented by their names. Uses KnowledgeAsset.apply on QueryAssetFunc.
  • "pandas": Same as above but variables being columns

If return_type is "item", returns the data item when matched. If return_type is "bool", returns True when matched.

Templates can also use the functions defined in search_config.

They work on single values and sequences alike.

Keyword arguments are passed to the respective search/parse/evaluation function.

Usage

>>> asset.query("d['s'] == 'ABC'")
>>> asset.query("x['s'] == 'ABC'")
>>> asset.query("s == 'ABC'")
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}}]

>>> asset.query("x['s'] == 'ABC'", return_type="bool")
[True, False, False, False, False]

>>> asset.query("find('BC', s)")
>>> asset.query(lambda s: "BC" in s)
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}}]

>>> asset.query("[?contains(s, 'BC')].s", query_engine="jmespath")
['ABC', 'BCD']

>>> asset.query("[].d2.c", query_engine="jmespath")
['red', 'blue', 'green', 'yellow', 'black']

>>> asset.query("[?d2.c != `blue`].d2.l", query_engine="jmespath")
[[1, 2], [5, 6], [7, 8], [9, 10]]

>>> asset.query("$[*].d2.c", query_engine="jsonpath.ext")
['red', 'blue', 'green', 'yellow', 'black']

>>> asset.query("$[?(@.b == true)].s", query_engine="jsonpath.ext")
['ABC', 'BCD']

>>> asset.query("s[b]", query_engine="pandas")
['ABC', 'BCD']

rank method

KnowledgeAsset.rank(
    query,
    to_documents_kwargs=None,
    wrap_documents=None,
    cache_documents=False,
    cache_key=None,
    asset_cache_manager=None,
    asset_cache_manager_kwargs=None,
    silence_warnings=False,
    **kwargs
)

Rank documents by their similarity to a query.

First, converts to TextDocument format using KnowledgeAsset.to_documents and **to_documents_kwargs. Then, uses rank_documents with **kwargs for actual ranking.

If cache_documents is True and cache_key is not None, will use an asset cache manager to store the generated text documents in a local and/or disk cache after conversion. Running the same method again will use the cached documents.


reduce method

KnowledgeAsset.reduce(
    func,
    *args,
    initializer=None,
    by=None,
    template_context=None,
    show_progress=None,
    pbar_kwargs=None,
    wrap=None,
    return_iterator=False,
    **kwargs
)

Reduce data items.

Function can be a callable, a tuple of function and its arguments, a Task instance, a subclass of AssetFunc or its prefix or full name. It can also be an expression or a template. In this template, the index of the data item is represented by "i", the data items themselves are represented by "d1" and "d2" or "x1" and "x2".

If an initializer is provided, the first set of values will be d1=initializer and d2=self.data[0]. If not, it will be d1=self.data[0] and d2=self.data[1].

If by is provided, see KnowledgeAsset.groupby_reduce.

If wrap is True, returns a new KnowledgeAsset instance, otherwise raw output.

Usage

>>> asset.reduce(lambda d1, d2: vbt.merge_dicts(d1, d2))
>>> asset.reduce(vbt.merge_dicts)
>>> asset.reduce("{**d1, **d2}")
{'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}

>>> asset.reduce("{**d1, **d2}", by="b")
[{'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

remove method

KnowledgeAsset.remove(
    path,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    **kwargs
)

Remove data items or parts of them.

If path is an integer, removes the entire data item at that index.

Uses KnowledgeAsset.apply on RemoveAssetFunc.

Use argument path to specify what part of the data item should be set. For example, "x.y[0].z" to navigate nested dictionaries/lists. Multiple paths can be provided. If skip_missing is True and path is missing in the data item, will skip the data item.

Set make_copy to True to not modify original data.

Set changed_only to True to keep only the data items that have been changed.

Usage

>>> asset.remove("d2.l[0]").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [10]}, 'xyz': 123}]

>>> asset.remove("xyz", skip_missing=True).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}}]

remove_empty method

KnowledgeAsset.remove_empty(
    inplace=False
)

Remove empty data items.


rename method

KnowledgeAsset.rename(
    path,
    new_token=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    **kwargs
)

Rename data items or parts of them.

Uses KnowledgeAsset.apply on RenameAssetFunc.

Same as KnowledgeAsset.move but must specify new token instead of new path.

Usage

>>> asset.rename("d2.l", "x").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'x': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'x': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'x': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'x': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'x': [9, 10]}, 'xyz': 123}]

>>> asset.rename("xyz", "zyx", skip_missing=True, changed_only=True).get()
[{'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'zyx': 123}]

reorder method

KnowledgeAsset.reorder(
    new_order,
    path=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    template_context=None,
    **kwargs
)

Reorder data items or parts of them.

Uses KnowledgeAsset.apply on ReorderAssetFunc.

Can change order in dicts based on reorder_dict and sequences based on reorder_list.

Argument new_order can be a sequence of tokens. To not reorder a subset of keys, they can be replaced by an ellipsis (...). For example, ["a", ..., "z"] puts the token "a" at the start and the token "z" at the end while other tokens are left in the original order. If new_order is a string, it can be "asc"/"ascending" or "desc"/"descending". Other than that, it can be a string or function (will become a template), or any custom template. In this template, the data item is the index of the data item is represented by "i", the data item itself is represented by "d", the data item under the path is represented by "x" while its fields are represented by their names.

Use argument path to specify what part of the data item should be set. For example, "x.y[0].z" to navigate nested dictionaries/lists. Multiple paths can be provided. If skip_missing is True and path is missing in the data item, will skip the data item.

Set make_copy to True to not modify original data.

Set changed_only to True to keep only the data items that have been changed.

Usage

>>> asset.reorder(["xyz", ...], skip_missing=True).get()
>>> asset.reorder(lambda x: ["xyz", ...] if "xyz" in x else [...]).get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'xyz': 123, 's': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}}]

>>> asset.reorder("descending", path="d2.l").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [2, 1]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [4, 3]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [6, 5]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [8, 7]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [10, 9]}, 'xyz': 123}]

sample method

KnowledgeAsset.sample(
    k=None,
    seed=None,
    wrap=True
)

Pick a random sample of data items.


select method

KnowledgeAsset.select(
    *args,
    **kwargs
)

Call KnowledgeAsset.get and return a new KnowledgeAsset instance.


set method

KnowledgeAsset.set(
    value,
    path=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    template_context=None,
    **kwargs
)

Set data items or parts of them.

Uses KnowledgeAsset.apply on SetAssetFunc.

Argument value can be any value, function (will become a template), or a template. In this template, the index of the data item is represented by "i", the data item itself is represented by "d", the data item under the path is represented by "x" while its fields are represented by their names.

Use argument path to specify what part of the data item should be set. For example, "x.y[0].z" to navigate nested dictionaries/lists. Multiple paths can be provided. If skip_missing is True and path is missing in the data item, will skip the data item.

Set make_copy to True to not modify original data.

Set changed_only to True to keep only the data items that have been changed.

Usage

>>> asset.set(lambda d: sum(d["d2"]["l"])).get()
[3, 7, 11, 15, 19]

>>> asset.set(lambda d: sum(d["d2"]["l"]), path="d2.sum").get()
>>> asset.set(lambda x: sum(x["l"]), path="d2.sum").get()
>>> asset.set(lambda l: sum(l), path="d2.sum").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2], 'sum': 3}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4], 'sum': 7}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6], 'sum': 11}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8], 'sum': 15}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10], 'sum': 19}, 'xyz': 123}]

>>> asset.set(lambda l: sum(l), path="d2.l").get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': 3}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': 7}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': 11}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': 15}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': 19}, 'xyz': 123}]

set_items method

KnowledgeAsset.set_items(
    index,
    value,
    inplace=False
)

Set one or more data items.

Returns a new KnowledgeAsset instance if inplace is False.


shuffle method

KnowledgeAsset.shuffle(
    seed=None,
    inplace=False
)

Shuffle data items.


single_item class property

Whether this instance holds a single item.


sort method

KnowledgeAsset.sort(
    *args,
    keys=None,
    ascending=True,
    inplace=False,
    **kwargs
)

Sort based on KnowledgeAsset.get called on *args and **kwargs.

Returns a new KnowledgeAsset instance if inplace is False.

Usage

>>> asset.sort("d2.c").get()
[{'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}}]

split_text method

KnowledgeAsset.split_text(
    text_path=None,
    merge_chunks=None,
    **kwargs
)

Split text.

Uses KnowledgeAsset.apply on SplitTextAssetFunc.

Use argument text_path to specify a path to the content.

If merge_chunks is True, merges all chunks into a single list.

Uses split_text with **split_text_kwargs for text splitting.


to_context method

KnowledgeAsset.to_context(
    *args,
    dump_all=None,
    separator=None,
    **kwargs
)

Convert to a context.

If dump_all is True, calls KnowledgeAsset.dump_all with *args and **kwargs. Otherwise, calls KnowledgeAsset.dump.

Finally, calls KnowledgeAsset.join with separator.


to_documents method

KnowledgeAsset.to_documents(
    **kwargs
)

Convert to documents of type TextDocument.

Document-related keyword arguments may contain templates. In such templates, the index of the data item is represented by "i", the data item itself is represented by "d", the data item under the path is represented by "x" while its fields are represented by their names.


unflatten method

KnowledgeAsset.unflatten(
    path=None,
    skip_missing=None,
    make_copy=None,
    changed_only=None,
    **kwargs
)

Unflatten data items or parts of them.

Uses KnowledgeAsset.apply on UnflattenAssetFunc.

Use argument path to specify what part of the data item should be set. For example, "x.y[0].z" to navigate nested dictionaries/lists. Multiple paths can be provided. If skip_missing is True and path is missing in the data item, will skip the data item.

Set make_copy to True to not modify original data.

Set changed_only to True to keep only the data items that have been changed.

Keyword arguments are passed to unflatten_obj.

Usage

>>> asset.flatten().unflatten().get()
[{'s': 'ABC', 'b': True, 'd2': {'c': 'red', 'l': [1, 2]}},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}},
 {'s': 'CDE', 'b': False, 'd2': {'c': 'green', 'l': [5, 6]}},
 {'s': 'DEF', 'b': False, 'd2': {'c': 'yellow', 'l': [7, 8]}},
 {'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123}]

unique method

KnowledgeAsset.unique(
    *args,
    keep='first',
    inplace=False,
    **kwargs
)

De-duplicate based on KnowledgeAsset.get called on *args and **kwargs.

Returns a new KnowledgeAsset instance if inplace is False.

Usage

>>> asset.unique("b").get()
[{'s': 'EFG', 'b': False, 'd2': {'c': 'black', 'l': [9, 10]}, 'xyz': 123},
 {'s': 'BCD', 'b': True, 'd2': {'c': 'blue', 'l': [3, 4]}}]

MetaKnowledgeAsset class

MetaKnowledgeAsset(
    name,
    bases,
    attrs
)

Metaclass for KnowledgeAsset.

Superclasses