Data¶

Question

There are plenty of supported data sources for OHLC and indicator data. For the full list, see the custom module.

Listing¶

Many data classes have a class method to list all symbols that can be fetched. Usually, such as method starts with list_, for example, TVData.list_symbols, SQLData.list_tables, or CSVData.list_paths. In addition, most methods allow client-side filtering of symbols by a glob-style or regex-style pattern.

How to list symbols

all_symbols = vbt.BinanceData.list_symbols()  # (1)!
usdt_symbols = vbt.BinanceData.list_symbols("*USDT")  # (2)!
usdt_symbols = vbt.BinanceData.list_symbols(r"^.+USDT$", use_regex=True)

all_symbols = vbt.TVData.list_symbols()  # (3)!
nasdaq_symbols = vbt.TVData.list_symbols(exchange_pattern="NASDAQ")  # (4)!
btc_symbols = vbt.TVData.list_symbols(symbol_pattern="BTC*")  # (5)!
pl_symbols = vbt.TVData.list_symbols(market="poland")  # (6)!
usdt_symbols = vbt.TVData.list_symbols(fields=["currency"], filter_by=["USDT"])  # (7)!

def filter_by(market_cap_basic):
    if market_cap_basic is None:
        return False
    return market_cap_basic >= 1_000_000_000_000

trillion_symbols = vbt.TVData.list_symbols(  # (8)!
    fields=["market_cap_basic"], 
    filter_by=vbt.RepFunc(filter_by)
)

all_paths = vbt.FileData.list_paths()  # (9)!
csv_paths = vbt.CSVData.list_paths()  # (10)!
all_csv_paths = vbt.CSVData.list_paths("**/*.csv")  # (11)!
all_data_paths = vbt.HDFData.list_paths("data.h5")  # (12)!
all_paths = vbt.HDFData.list_paths()  # (13)!

all_schemas = vbt.SQLData.list_schemas(engine=engine)  # (14)!
all_tables = vbt.SQLData.list_tables(engine=engine)  # (15)!

List all Binance symbols
List Binance symbols ending with "USDT"
List all TradingView symbols
List TradingView symbols traded on NASDAQ
List TradingView symbols starting with "BTC"
List TradingView symbols traded in Poland
List TradingView symbols traded in USD currency
List TradingView symbols that have a market capitalization of 1 trillion or higher
List all files under the current directory
List CSV files under the current directory
List CSV files under the current directory and all subdirectories
List all keys in an HDF file "data.h5"
List all keys in all HDF files under the current directory
List all schemas in a SQL database
List all tables in a SQL database

Pulling¶

Each data class has the method fetch_symbol() for fetching a single symbol and returning raw data, usually in a form of a DataFrame. To return a data instance, the method pull() should be used, which takes one or multiple symbols, calls fetch_symbol() on each one, and aligns all DataFrames. For testing, use YFData, which is easy to use but poor in terms of quality. For production, use more reliable data sources, such as CCXTData for crypto and AlpacaData for stocks. For technical analysis based on the most recent data, use TVData (TradingView).

Hint

To see what arguments a data class like YFData accepts, use vbt.phelp(vbt.YFData.fetch_symbol).

How to fetch data

data = vbt.YFData.pull("AAPL")  # (1)!
data = vbt.YFData.pull(["AAPL", "MSFT"])  # (2)!
data = vbt.YFData.pull("AAPL", start="2020")  # (3)!
data = vbt.YFData.pull("AAPL", start="2020", end="2021")  # (4)!
data = vbt.YFData.pull("AAPL", start="1 month ago")  # (5)!
data = vbt.YFData.pull("AAPL", start="1 month ago", timeframe="hourly")  # (6)!
data = vbt.YFData.pull("AAPL", tz="UTC")  # (7)!
data = vbt.YFData.pull(symbols, execute_kwargs=dict(engine="threadpool"))  # (8)!

data = vbt.YFData.pull("AAPL", auto_adjust=False)  # (9)!
data = vbt.BinanceData.pull("BTCUSDT", klines_type="futures")  # (10)!
data = vbt.CCXTData.pull("BTCUSDT", exchange="binanceusdm")  # (11)!
data = vbt.BinanceData.pull("BTCUSDT", tld="us")  # (12)!
data = vbt.TVData.pull("CRYPTOCAP:TOTAL")  # (13)!

Pull all data of one symbol. Note that some data classes will return only a subset of data by default.
Pull all data of multiple symbols. They will be fetched in a sequential manner (one after another).
Pull data starting from 2020-01-01 (inclusive). Dates can be provided as strings, datetime objects, and pd.Timestamp objects. Dates are given the same timezone as the ticker unless the timezone is specified.
Pull data between 2020-01-01 (inclusive) and 2021-01-01 (exclusive)
Pull data starting 1 month ago. Both start and end arguments support human-readable strings.
Pull hourly data starting 1 month ago. Timeframes support human-readable strings.
Pull all data and convert it to the UTC timezone. Use this when multiple tickers have different timezones.
Pull multiple symbols in parallel. Note that many data providers have strict API rate limits that might lead to a ban if they are exceeded often.
Turn off auto-adjustment
Pull BTC/USDT futures data from Binance
Same as above but by using CCXT
Pass tld if you are using an exchange from the US, Japan or other TLD
Pull Crypto Total Market Cap

+

To provide different keyword arguments for different symbols, either pass an argument as symbol_dict or pass a dictionary with keyword arguments keyed by symbol as the first argument.

How to provide keyword arguments by symbol

data = vbt.TVData.pull(
    ["SPX", "NDX", "VIX"],
    exchange=vbt.symbol_dict({"SPX": "SP", "NDX": "NASDAQ", "VIX": "CBOE"})
)
data = vbt.TVData.pull({  # (1)!
    "SPX": dict(exchange="SP"),
    "NDX": dict(exchange="NASDAQ"),
    "VIX": dict(exchange="CBOE")
})
data = vbt.TVData.pull(["SP:SPX", "NASDAQ:NDX", "CBOE:VIX"])  # (2)!

Same as above
Same as above but now symbols will be prefixed by their exchange

+

If your data provider of choice takes credentials and you want to fetch multiple symbols, the client will be created for each symbol leading to multiple authentications and a slower execution. To avoid that, create the client in advance and then pass to the fetch() method.

Create a Data client in advance

client = vbt.TVData.resolve_client(username="YOUR_USERNAME", password="YOUR_PASSWORD")

Use the Data client

data = vbt.TVData.pull(["NASDAQ:AAPL", "NASDAQ:MSFT"], client=client)

# ______________________________________________________________

vbt.TVData.set_custom_settings(client=client)
data = vbt.TVData.pull(["NASDAQ:AAPL", "NASDAQ:MSFT"])

Persisting¶

Once fetched, the data can be saved in a variety of ways. The most common and recommended way is by pickling the data, which will save the entire object including the arguments used during fetching. Another ways include CSV files (Data.to_csv), HDF files (Data.to_hdf) and more, which will save only the data but not the accompanied metadata such as the timeframe.

How to save data

data.save()  # (1)!
data.save(compression="blosc")  # (2)!

data.to_csv("data", mkdir_kwargs=dict(mkdir=True))  # (3)!
data.to_csv("AAPL.csv")  # (4)!
data.to_csv("AAPL.tsv", sep="\t")  # (5)!
data.to_csv(vbt.symbol_dict(AAPL="AAPL.csv", MSFT="MSFT.csv"))  # (6)!
data.to_csv(vbt.RepEval("symbol + '.csv'"))  # (7)!

data.to_hdf("data")  # (8)!
data.to_hdf("data.h5")  # (9)!
data.to_hdf("data.h5", key=vbt.RepFunc(lambda symbol: symbol.replace(" ", "_")))  # (10)!
data.to_hdf("data.h5", key=vbt.RepFunc(lambda symbol: "stocks/" + symbol))  # (11)!
data.to_hdf(vbt.RepEval("symbol + '.h5'"), key="df")  # (12)!

data.to_parquet("data")  # (13)!
data.to_parquet(vbt.symbol_dict(
    AAPL="data/AAPL.parquet", 
    MSFT="data/MSFT.parquet"
))  # (14)!
data.to_parquet("data", partition_by="Y")  # (15)!
data.to_parquet(vbt.symbol_dict(
    AAPL="data/AAPL", 
    MSFT="data/MSFT"
), partition_by="Y")  # (16)!

data.to_sql(engine="sqlite:///data.db")  # (17)!
data.to_sql(engine="postgresql+psycopg2://postgres:admin@localhost:5432/data")  # (18)!
data.to_sql(engine=engine, schema="yahoo")  # (19)!
data.to_sql(engine=engine, table=vbt.symbol_dict(AAPL="AAPL", MSFT="MSFT"))  # (20)!
data.to_sql(engine=engine, if_exists="replace")  # (21)!
data.to_sql(engine=engine, attach_row_number=True)  # (22)!
data.to_sql(
    engine=engine, 
    attach_row_number=True, 
    row_number_column="RN",
    from_row_number=vbt.symbol_dict(AAPL=100, MSFT=200), 
    if_exists="append"
)  # (23)!

Serialize and save to a pickle file (recommended). The filename will be {class_name}.pickle, such as "YFData.pickle".
Serialize, apply a compression algorithm to reduce the size, and save to a pickle file
Save one CSV file per symbol to a "data" directory. If the directory doesn't exist, create one. Each filename will be {symbol}.csv, such as "AAPL.csv".
If there's only one symbol, save it to a comma-delimited file named "AAPL.csv"
Same but for a tab-delimited file
Specify the file path per symbol
Use a template to choose the file path based on the symbol
Save all symbols to a single HDF file in the directory "data". The filename will be {class_name}.h5, such as "YFData.pickle". Each symbol will be saved as a separate key, such as "AAPL".
Specify the path to the HDF file
Use a template to choose the key based on the symbol. Here, the space in each symbol is substituted by the underscore.
Use a template to save all symbols under the group "stocks" in the HDF file
Save each symbol to a separate HDF file with a key "df"
Save one Parquet file per symbol to a "data" directory. Each filename will be {symbol}.parquet, such as "AAPL.parquet".
Same as above but specify the path to each file
Partition each symbol by year start and save it to a separate sub-directory within the directory "data". Each sub-directory will be named after the symbol.
Same as above but specify the path to each sub-directory
Save each DataFrame as a table to a SQLite database
Same as above but to a PostgreSQL database
Specify a schema (if the database supports it!)
Specify each table name explicitly
Drop any table if it already exists
Attach a column with row numbers to each DataFrame to be able to query them later
Same as above but generate row numbers from a specific number depending on the symbol, label the column as "RN", and append to an already existing table

+

Once saved, the data can be loaded with the corresponding class method.

How to load data

data = vbt.YFData.load()  # (1)!

data = vbt.Data.from_csv("data")  # (2)!
data = vbt.Data.from_csv("data/*.csv")  # (3)!
data = vbt.Data.from_csv("data/*/**.csv")  # (4)!
data = vbt.Data.from_csv(symbols=["BTC-USD.csv", "ETH-USD.csv"])  # (5)!
data = vbt.Data.from_csv(features=["High.csv", "Low.csv"])  # (6)!
data = vbt.Data.from_csv("BTC-USD", paths="polygon_btc_1hour.csv")  # (7)!
data = vbt.Data.from_csv("AAPL.tsv", sep="\t")  # (8)!
data = vbt.Data.from_csv(["MSFT.csv", "AAPL.tsv"], sep=vbt.symbol_dict(MSFT=",", AAPL="\t"))  # (9)!
data = vbt.Data.from_csv("https://datahub.io/core/s-and-p-500/r/data.csv", match_paths=False)  # (10)!

data = vbt.Data.from_hdf("data")  # (11)!
data = vbt.Data.from_hdf("data.h5")  # (12)!
data = vbt.Data.from_hdf("data.h5/AAPL")  # (13)!
data = vbt.Data.from_hdf(["data.h5/AAPL", "data.h5/MSFT"])  # (14)!
data = vbt.Data.from_hdf(["AAPL", "MSFT"], paths="data.h5", match_paths=False)
data = vbt.Data.from_hdf("data.h5/stocks/*")  # (15)!

data = vbt.Data.from_parquet("data")  # (16)!
data = vbt.Data.from_parquet("AAPL.parquet")  # (17)!
data = vbt.Data.from_parquet("AAPL")  # (18)!

data = vbt.Data.from_sql(engine="sqlite:///data.db")  # (19)!
data = vbt.Data.from_sql("AAPL", engine=engine)  # (20)!
data = vbt.Data.from_sql("yahoo:AAPL", engine=engine)  # (21)!
data = vbt.Data.from_sql("AAPL", schema="yahoo", engine=engine)  # (22)!
data = vbt.Data.from_sql("AAPL", query="SELECT * FROM AAPL", engine=engine)  # (23)!

data = vbt.BinanceData.from_csv("BTCUSDT.csv", fetch_kwargs=dict(timeframe="hourly"))  # (24)!

Load from the pickle file named "YFData.pickle" and deserialize into a Python object (recommended)
Pull all CSV files in the directory named "data"
Same as above
Same as above but recursively
Pull two symbols into a symbol-oriented data instance
Pull two features into a feature-oriented data instance
Pull data from the file "polygon_btc_1hour.csv" and rename the symbol to "BTC-USD"
Pull one symbol that is stored in the tab-delimited file "AAPL.tsv"
Pull one symbol from the comma-delimited file "MSFT.csv" and another from the tab-delimited file "AAPL.tsv"
Pull CSV data from a URL
Pull all symbols in all HDF files in the directory named "data", recursively
Pull all symbols in the HDF file named "data.h5"
Pull the symbol "AAPL" from the HDF file named "data.h5"
Pull the symbols "AAPL" and "MSFT" from the HDF file named "data.h5"
Pull all symbols under the group "stocks" of the HDF file named "data.h5"
Pull all Parquet files and partitioned sub-directories in the directory named "data"
Pull one symbol "AAPL" from the Parquet file named "AAPL.parquet"
Pull one symbol "AAPL" from the partitioned Parquet directory named "AAPL"
Pull all symbols from a SQLite database stored locally in the file "data.db"
Pull one symbol "AAPL" by reading a table with the same name
Pull one symbol "AAPL" by reading a table with the same name from the schema "yahoo"
Same as above but schema won't become part of the symbol
Pull one symbol "AAPL" by executing an arbitrary SQL query
Pull one symbol from the file "BTCUSDT.csv" and wrap it with the BinanceData class to be able to update it later. To avoid specifying the timeframe while updating, provide it via fetch_kwargs.

Updating¶

Some data classes support fetching and appending new data to previously saved data by overriding the method Data.update_symbol, which scans the data for the latest timestamp and uses it as the start timestamp for fetching new data with Data.fetch_symbol. The method Data.update then does it for each symbol in the data instance. There's no need to provide the client, timeframe, or other arguments since they were captured during fetching and are reused automatically (unless they were lost by converting the data instance to Pandas, CSV, or HDF!).

Download 1-minute data and update it later

data = vbt.YFData.pull("AAPL", timeframe="1 minute")

# (1)!

data = data.update()  # (2)!

...wait a minute...
Returns a new instance

Download one year of data at a time

start = 2010
end = 2020
data = None
while start < end:
    if data is None:
        data = vbt.YFData.pull("AAPL", start=str(start), end=str(start + 1))
    else:
        data = data.update(end=str(start + 1))
    start += 1

Wrapping¶

Custom DataFrame can be wrapped into a data instance by using Data.from_data, which takes either a single DataFrame for one symbol, or a dict with more DataFrames keyed by their symbols.

Create a new Data instance

data = ohlc_df.vbt.ohlcv.to_data()  # (1)!
data = vbt.Data.from_data(ohlc_df)

data = close_df.vbt.to_data()  # (2)!
data = vbt.Data.from_data(close_df, columns_are_symbols=True)

data = close_df.vbt.to_data(invert_data=True)  # (3)!
data = vbt.Data.from_data(close_df, columns_are_symbols=True, invert_data=True)

data = vbt.Data.from_data(vbt.symbol_dict({"AAPL": aapl_ohlc_df, "MSFT": msft_ohlc_df}))  # (4)!
data = vbt.Data.from_data(vbt.feature_dict({"High": high_df, "Low": low_df}))  # (5)!

OHLC DataFrame
Close DataFrame where columns are symbols. Store in a feature-oriented format.
Close DataFrame where columns are symbols. Store in a symbol-oriented format.
Multiple feature DataFrames keyed by symbol
Multiple symbol DataFrames keyed by feature

Tip

You aren't required to use data instances, you can proceed with Pandas and even NumPy arrays as well since VBT converts every array-like object to a NumPy array anyway. But beware that the Pandas format is more suitable than the NumPy format because the former also contains datetime index and backtest configuration metadata such as symbols and parameter combinations in form of columns. Where data instances are essential are symbol alignment, stacking, resampling, and updating.

Extracting¶

Depending on the use case, there are multiple ways to extract the actual Pandas Series/DataFrame from an instance. To retrieve the original data with one DataFrame per symbol, query the data attribute. Such data contains OHLC and other features (of various data types too) concatenated together, which may be helpful in plotting. But note that VBT doesn't support this format: instead, you're encouraged to represent each feature as a separate DataFrame where columns are symbols. Such a feature can be queried as an attribute (data.close for close price, for example), or by using Data.get.

How to get symbol-oriented data

data_per_symbol = data.data  # (1)!
aapl_data = data_per_symbol["AAPL"]  # (2)!

sr_or_df = data.get("Close")  # (3)!
sr_or_df = data["Close"].get()
sr_or_df = data.close

sr_or_df = data.get(["Close"])  # (4)!
sr_or_df = data[["Close"]].get()

sr = data.get("Close", "AAPL")  # (5)!
sr = data["Close"].get(symbols="AAPL")
sr = data.select("AAPL").close

df = data.get("Close", ["AAPL"])  # (6)!
df = data["Close"].get(symbols=["AAPL"])
df = data.select(["AAPL"]).close

aapl_df = data.get(["Open", "Close"], "AAPL")  # (7)!
close_df = data.get("Close", ["AAPL", "MSFT"])  # (8)!
open_df, close_df = data.get(["Open", "Close"], ["AAPL", "MSFT"])  # (9)!

Get a dictionary with one (OHLC) Series/DataFrame per symbol ("AAPL", "MSFT", etc.)
Extract the (OHLC) Series/DataFrame for "AAPL"
Get the closing price as a Series (one symbol) or DataFrame (multiple symbols, one per column)
Get the closing price as a DataFrame regardless of the number of symbols
Get the closing price for "AAPL" as a Series
Get the closing price for "AAPL" as a DataFrame
Get the opening and closing price for "AAPL" as a DataFrame with two columns
Get the closing price for "AAPL" and "MSFT" as a DataFrame with two columns
Get the opening and closing price as a tuple of two DataFrames with the columns "AAPL" and "MSFT" each

+

If a data instance is feature-oriented, the behavior of features and symbols is reversed.

How to get feature-oriented data

data_per_feature = feat_data.data  # (1)!
close_data = data_per_feature["Close"]

sr_or_df = data.get("Close") 
sr_or_df = data.select("Close").get()
sr_or_df = data.close

sr = feat_data.get("Close", "AAPL")
sr = feat_data["AAPL"].get(features="Close")  # (2)!
sr = feat_data.select("Close").get(symbols="AAPL")  # (3)!

aapl_df = data.get(["Open", "Close"], "AAPL")
close_df = data.get("Close", ["AAPL", "MSFT"])
aapl_df, msft_df = data.get(["Open", "Close"], ["AAPL", "MSFT"])  # (4)!

In feature-oriented instances data dictionaries contain features as keys and symbols as columns, thus feature Series/DataFrames can be extracted easier
Indexing (such as []) is applied to columns, which are now symbols
Various methods (such as select) are applied to keys, which are now features
DataFrames are per symbol rather than per feature if a tuple of them is returned

Tip

To get the same behavior between symbol-oriented and feature-oriented instances, always use Data.get to extract the data.

Changing¶

There are four main operations to change features and symbols: adding, selecting, renaming, and removing. The first operation can be done on one feature or symbol at a time, while other operations can be done on a multiple of such. Usually, you won't need to specify whether you want to perform the operation on symbols or features as this will be determined automatically. Features and symbols are also case-insensitive. Also note that each operation doesn't change the original data instance but returns a new one.

How to add features and symbols

new_data = data.add_symbol("BTC-USD")  # (1)!
new_data = data.add_symbol("BTC-USD", fetch_kwargs=dict(start="2020"))  # (2)!
btc_df = vbt.YFData.pull("ETH-USD", start="2020").get()
new_data = data.add_symbol("BTC-USD", btc_df)  # (3)!

new_data = data.add_feature("SMA")  # (4)!
new_data = data.add_feature("SMA", run_kwargs=dict(timeperiod=20, hide_params=True))  # (5)!
sma_df = data.run("SMA", timeperiod=20, hide_params=True, unpack=True)
new_data = data.add_feature("SMA", sma_df)  # (6)!

new_data = data.add("BTC-USD", btc_df)  # (7)!
new_data = data.add("SMA", sma_df)  # (8)!

Pull "BTC-USD" and add it as a symbol
Pull "BTC-USD" from 2020 onwards and add it as a symbol
Add a custom DataFrame as a symbol
Run the SMA indicator and add it as a feature
Run the 20-period SMA indicator and add it as a feature
Add a custom DataFrame as a feature
If some columns can be found among the features of the data instance and not among the symbols, DataFrame will be automatically added as a symbol
If some columns can be found among the symbols of the data instance and not among the features, DataFrame will be automatically added as a feature

Note

Only one feature or symbol can be added at a time. To add another data instance, use merge instead.

How to select features and symbols

new_data = data.select_symbols("BTC-USD")  # (1)!
new_data = data.select_symbols(["BTC-USD", "ETH-USD"])  # (2)!

new_data = data.select_features("SMA")  # (3)!
new_data = data.select_features(["SMA", "EMA"])  # (4)!

new_data = data.select("BTC-USD")  # (5)!
new_data = data.select("SMA")  # (6)!
new_data = data.select("sma")  # (7)!

Select one symbol
Select multiple symbols
Select one feature
Select multiple features
If some keys can be found among the symbols of the data instance and not among the features, will select a symbol
If some keys can be found among the features of the data instance and not among the symbols, will select a feature
Case doesn't matter!

How to rename features and symbols

new_data = data.rename_symbols("BTC-USD", "BTCUSDT")  # (1)!
new_data = data.rename_symbols(["BTC-USD", "ETH-USD"], ["BTCUSDT", "ETHUSDT"])  # (2)!
new_data = data.rename_symbols({"BTC-USD": "BTCUSDT", "ETH-USD": "ETHUSDT"})

new_data = data.rename_features("Price", "Close")  # (3)!
new_data = data.rename_features(["Price", "MovAvg"], ["Close", "SMA"])  # (4)!
new_data = data.rename_features({"Price": "Close", "MovAvg": "SMA"})

new_data = data.rename("BTC-USD", "BTCUSDT")  # (5)!
new_data = data.rename("Price", "Close")  # (6)!
new_data = data.rename("price", "Close")  # (7)!

Rename the symbol "BTC-USD" to "BTCUSDT"
Rename the symbol "BTC-USD" to "BTCUSDT" and the symbol "ETH-USD" to "ETHUSDT"
Rename the feature "Price" to "Close"
Rename the feature "Price" to "Close" and the feature "MovAvg" to "SMA"
If some keys can be found among the symbols of the data instance and not among the features, will be renamed as a symbol
If some keys can be found among the features of the data instance and not among the symbols, will be renamed as a feature
The case of the source key doesn't matter, but the case of the target key does!

How to remove features and symbols

new_data = data.remove_symbols("BTC-USD")  # (1)!
new_data = data.remove_symbols(["BTC-USD", "ETH-USD"])  # (2)!

new_data = data.remove_features("SMA")  # (3)!
new_data = data.remove_features(["SMA", "EMA"])  # (4)!

new_data = data.remove("BTC-USD")  # (5)!
new_data = data.remove("SMA")  # (6)!
new_data = data.remove("sma")  # (7)!

Remove one symbol
Remove multiple symbols
Remove one feature
Remove multiple features
If some keys can be found among the symbols of the data instance and not among the features, will be removed as a symbol
If some keys can be found among the features of the data instance and not among the symbols, will be removed as a feature
Case doesn't matter!

+

Instances can be merged together along symbols, rows, and columns by using Data.merge.

Merge multiple different Data instances

data1 = vbt.YFData.pull("BTC-USD")
data2 = vbt.BinanceData.pull("BTCUSDT")
data3 = vbt.CCXTData.pull("BTC-USDT", exchange="kucoin")
data = vbt.Data.merge(data1, data2, data3, missing_columns="drop")

+

To apply a function to each DataFrame and return a new instance, the method Data.transform can be used. By default, it passes one single DataFrame where all individual DataFrames are concatenated along columns. This is useful for dropping missing values across all symbols. To transform the DataFrames individually, use per_symbol=True and/or per_feature=True. The only requirement is that the returned column names are identical across all features and symbols.

Drop rows with missing values

new_data = data.transform(lambda df: df.dropna(how="any"))  # (1)!
new_data = data.dropna()  # (2)!
new_data = data.dropna(how="all")  # (3)!

new_data = data.transform(your_func, per_feature=True)
new_data = data.transform(your_func, per_symbol=True)
new_data = data.transform(your_func, per_feature=True, per_symbol=True)  # (4)!
new_data = data.transform(your_func, per_feature=True, per_symbol=True, pass_frame=True)  # (5)!

Remove any row that has at least one missing value across all features and symbols
Same as above
Remove any row that has all values missing across all features and symbols
One column at a time is passed as a Series
One column at a time is passed as a DataFrame

+

If symbols have different timezones, the final timezone will become "UTC". This will make some symbols shifted in time; for example, one symbol with UTC+0200 and another with UTC+0400 will effectively double the common index and produce missing values half of the time. To align their indexes into a single one, use Data.realign, which is a special form of resampling that produces a single index where data is correctly ordered by time.

Realign multiple timezones

new_data = data.realign()  # (1)!

To not forward fill missing values, pass ffill=False

+

Operations that return a new data instance can be easily chained using the dot notation or the method pipe.

Chain multiple operations

data = (
    vbt.YFData.pull("BTC-USD")
    .add_symbol("ETH-USD")
    .rename({"btc-usd": "BTCUSDT", "eth-usd": "ETHUSDT"})
    .remove(["dividends", "stock splits"])
    .add_feature("SMA")
    .add_feature("EMA")
)

# ______________________________________________________________

data = (
    vbt.YFData
    .pipe("pull", "BTC-USD")  # (1)!
    .pipe("add_symbol", "ETH-USD")
    .pipe("rename", {"btc-usd": "BTCUSDT", "eth-usd": "ETHUSDT"})
    .pipe("remove", ["dividends", "stock splits"])
    .pipe("add_feature", "SMA")
    .pipe("add_feature", "EMA")
)

The method can be called on data classes and instances, and can take a string to call a method of the data class/instance, or any function that expects the instance as the first argument (for any other argument pass the function as a tuple where the second element is the argument position/name)