5.2. UCTB.preprocess package

5.2.1. UCTB.preprocess.GraphGenerator module

class UCTB.preprocess.GraphGenerator.GraphGenerator(data_loader, graph='Correlation', threshold_distance=1000, threshold_correlation=0, threshold_interaction=500, **kwargs)

Bases: object

This class is used to build graphs. Adajacent matrix and lapalace matrix will be stored in self.AM and self.LM.

Parameters
  • data_loader (NodeTrafficLoader) – data_loader object.

  • graph (str) – Types of graphs used in neural methods. Graphs should be a subset of { 'Correlation', 'Distance', 'Interaction', 'Line', 'Neighbor', 'Transfer' } and concatenated by '-', and dataset should have data of selected graphs. Default: 'Correlation'

  • threshold_distance (float) – Used in building of distance graph. If distance of two nodes in meters is larger than threshold_distance, the corresponding position of the distance graph will be 1 and otherwise 0.the corresponding Default: 1000

  • threshold_correlation (float) – Used in building of correlation graph. If the Pearson correlation coefficient is larger than threshold_correlation, the corresponding position of the correlation graph will be 1 and otherwise 0. Default: 0

  • threshold_interaction (float) – Used in building of interatction graph. If in the latest 12 months, the number of times of interaction between two nodes is larger than threshold_interaction, the corresponding position of the interaction graph will be 1 and otherwise 0. Default: 500

AM

Adajacent matrices of graphs.

Type

array

LM

Laplacian matrices of graphs.

Type

array

static adjacent_to_laplacian(adjacent_matrix)

Turn adjacent_matrix into Laplace matrix.

static correlation_adjacent(traffic_data, threshold)

Calculate correlation graph based on pearson coefficient.

Parameters
  • traffic_data (ndarray) – numpy array with shape [sequence_length, num_node].

  • threshold (float) – float between [-1, 1], nodes with Pearson Correlation coefficient larger than this threshold will be linked together.

distance_adjacent(lat_lng_list, threshold)

Calculate distance graph based on geographic distance.

Parameters
  • lat_lng_list (list) – A list of geographic locations. The format of each element in the list is [latitude, longitude].

  • threshold (float) – (meters) nodes with geographic distacne smaller than this threshold will be linked together.

static haversine(lat1, lon1, lat2, lon2)

Calculate the great circle distance between two points on the earth (specified in decimal degrees)

static interaction_adjacent(interaction_matrix, threshold)

Binarize interaction_matrix based on threshold.

Parameters
  • interaction_matrix (ndarray) –

    with shape [num_node, num_node], where each element represents the number of interactions during a certain time,

    e.g. 6 monthes, between the corresponding nodes.

  • threshold (float or int) – nodes with number of interactions between them greater than this threshold will be linked together.

5.2.2. UCTB.preprocess.preprocessor module

class UCTB.preprocess.preprocessor.Normalizer(X)

Bases: object

This class can help normalize and denormalize data by calling min_max_normal and min_max_denormal method.

min_max_denormal(X)

Input X, return denormalized results. :type: numpy.ndarray

min_max_normal(X)

Input X, return normalized results. :type: numpy.ndarray

class UCTB.preprocess.preprocessor.ST_MoveSample(closeness_len, period_len, trend_len, target_length=1, daily_slots=24)

Bases: object

This class can converts raw data into temporal features including closenss, period and trend features.

Parameters
  • closeness_len (int) – The length of closeness data history. The former consecutive closeness_len time slots of data will be used as closeness history.

  • period_len (int) – The length of period data history. The data of exact same time slots in former consecutive period_len days will be used as period history.

  • trend_len (int) – The length of trend data history. The data of exact same time slots in former consecutive trend_len weeks (every seven days) will be used as trend history.

  • target_length (int) – The numbers of steps that need prediction by one piece of history data. Have to be 1 now. Default: 1 default:1.

  • daily_slots (int) – The number of records of one day. Calculated by 24 * 60 /time_fitness. default:24.

move_sample(data)

Input data to generate closeness, period, trend features and target vector y.

Parameters

data (ndarray) – Orginal temporal data.

:return:closeness, period, trend and y matrices. :type: numpy.ndarray.

class UCTB.preprocess.preprocessor.SplitData

Bases: object

This class can help split data by calling split_data and split_feed_dict method.

static split_data(data, ratio_list)

Divide the data based on the given parameter ratio_list.

Parameters
  • data (ndarray) – Data to be split.

  • ratio_list (list) – Split ratio, the data will be split according to the ratio.

:return:The elements in the returned list are the divided data, and the

dimensions of the list are the same as ratio_list.

Type

list

static split_feed_dict(feed_dict, sequence_length, ratio_list)

Divide the value data in feed_dict based on the given parameter ratio_list.

Parameters
  • feed_dict (dict) – It is a dictionary composed of key-value pairs.

  • sequence_length (int) – If the length of value in feed_dict is equal to sequence_length, then this method divides the value according to the ratio without changing its key.

  • ratio_list (list) – Split ratio, the data will be split according to the ratio.

Returns

The elements in the returned list are divided dictionaries, and the dimensions of the list are the same as ratio_list.

Type

list

5.2.3. UCTB.preprocess.time_utils module

UCTB.preprocess.time_utils.is_valid_date(date_str)
Parameters

date_str (string) – e.g. 2019-01-01

Returns

True if date_str is valid date, otherwise return False.

UCTB.preprocess.time_utils.is_work_day_america(date, city)
Parameters

date (string or datetime) – e.g. 2019-01-01

Returns

True if date is not holiday in America, otherwise return False.

UCTB.preprocess.time_utils.is_work_day_china(date, city)
Parameters

date (string or datetime) – e.g. 2019-01-01

Returns

True if date is not holiday in China, otherwise return False.