多应用+插件架构,代码干净,二开方便,首家独创一键云编译技术,文档视频完善,免费商用码云13.8K 广告
# Python API ## Data Structure API > > class lightgbm.Dataset(data, label=None, max_bin=None, reference=None, weight=None, group=None, init_score=None, silent=False, feature_name='auto', categorical_feature='auto', params=None, free_raw_data=True) > Bases: `object` Dataset in LightGBM. Constract Dataset. * Parameters: * **data** (_string__,_ _numpy array_ _or_ _scipy.sparse_) – Data source of Dataset. If string, it represents the path to txt file. * **label** (_list__,_ _numpy 1-D array_ _or_ _None__,_ _optional_ _(__default=None__)_) – Label of the data. * **max_bin** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Max number of discrete bins for features. If None, default value from parameters of CLI-version will be used. * **reference** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset") _or_ _None__,_ _optional_ _(__default=None__)_) – If this is Dataset for validation, training data should be used as reference. * **weight** (_list__,_ _numpy 1-D array_ _or_ _None__,_ _optional_ _(__default=None__)_) – Weight for each instance. * **group** (_list__,_ _numpy 1-D array_ _or_ _None__,_ _optional_ _(__default=None__)_) – Group/query size for Dataset. * **init_score** (_list__,_ _numpy 1-D array_ _or_ _None__,_ _optional_ _(__default=None__)_) – Init score for Dataset. * **silent** (_bool__,_ _optional_ _(__default=False__)_) – Whether to print messages during construction. * **feature_name** (_list of strings_ _or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. * **categorical_feature** (_list of strings_ _or_ _int__, or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. * **params** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Other parameters. * **free_raw_data** (_bool__,_ _optional_ _(__default=True__)_) – If True, raw data is freed after constructing inner Dataset. > construct() Lazy init. * Returns: * **self** – Returns self. * Return type: * [Dataset](#lightgbm.Dataset "lightgbm.Dataset") > create_valid(data, label=None, weight=None, group=None, init_score=None, silent=False, params=None) Create validation data align with current Dataset. * Parameters: * **data** (_string__,_ _numpy array_ _or_ _scipy.sparse_) – Data source of Dataset. If string, it represents the path to txt file. * **label** (_list_ _or_ _numpy 1-D array__,_ _optional_ _(__default=None__)_) – Label of the training data. * **weight** (_list__,_ _numpy 1-D array_ _or_ _None__,_ _optional_ _(__default=None__)_) – Weight for each instance. * **group** (_list__,_ _numpy 1-D array_ _or_ _None__,_ _optional_ _(__default=None__)_) – Group/query size for Dataset. * **init_score** (_list__,_ _numpy 1-D array_ _or_ _None__,_ _optional_ _(__default=None__)_) – Init score for Dataset. * **silent** (_bool__,_ _optional_ _(__default=False__)_) – Whether to print messages during construction. * **params** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Other parameters. * Returns: * **self** – Returns self * Return type: * [Dataset](#lightgbm.Dataset "lightgbm.Dataset") > get_field(field_name) Get property from the Dataset. * Parameters: * **field_name** (_string_) – The field name of the information. * Returns: * **info** – A numpy array with information from the Dataset. * Return type: * numpy array > get_group() Get the group of the Dataset. * Returns: * **group** – Group size of each group. * Return type: * numpy array > get_init_score() Get the initial score of the Dataset. * Returns: * **init_score** – Init score of Booster. * Return type: * numpy array > get_label() Get the label of the Dataset. * Returns: * **label** – The label information from the Dataset. * Return type: * numpy array > get_ref_chain(ref_limit=100) Get a chain of Dataset objects, starting with r, then going to r.reference if exists, then to r.reference.reference, etc. until we hit `ref_limit` or a reference loop. * Parameters: * **ref_limit** (_int__,_ _optional_ _(__default=100__)_) – The limit number of references. * Returns: * **ref_chain** – Chain of references of the Datasets. * Return type: * set of Dataset > get_weight() Get the weight of the Dataset. * Returns: * **weight** – Weight for each data point from the Dataset. * Return type: * numpy array > num_data() Get the number of rows in the Dataset. * Returns: * **number_of_rows** – The number of rows in the Dataset. * Return type: * int > num_feature() Get the number of columns (features) in the Dataset. * Returns: * **number_of_columns** – The number of columns (features) in the Dataset. * Return type: * int > save_binary(filename) Save Dataset to binary file. * Parameters: * **filename** (_string_) – Name of the output file. > set_categorical_feature(categorical_feature) Set categorical features. * Parameters: * **categorical_feature** (_list of int_ _or_ _strings_) – Names or indices of categorical features. > set_feature_name(feature_name) Set feature name. * Parameters: * **feature_name** (_list of strings_) – Feature names. > set_field(field_name, data) Set property into the Dataset. * Parameters: * **field_name** (_string_) – The field name of the information. * **data** (_list__,_ _numpy array_ _or_ _None_) – The array of data to be set. > set_group(group) Set group size of Dataset (used for ranking). * Parameters: * **group** (_list__,_ _numpy array_ _or_ _None_) – Group size of each group. > set_init_score(init_score) Set init score of Booster to start from. * Parameters: * **init_score** (_list__,_ _numpy array_ _or_ _None_) – Init score for Booster. > set_label(label) Set label of Dataset * Parameters: * **label** (_list__,_ _numpy array_ _or_ _None_) – The label information to be set into Dataset. > set_reference(reference) Set reference Dataset. * Parameters: * **reference** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset")) – Reference that is used as a template to consturct the current Dataset. > set_weight(weight) Set weight of each instance. * Parameters: * **weight** (_list__,_ _numpy array_ _or_ _None_) – Weight to be set for each data point. > subset(used_indices, params=None) Get subset of current Dataset. * Parameters: * **used_indices** (_list of int_) – Indices used to create the subset. * **params** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Other parameters. * Returns: * **subset** – Subset of the current Dataset. * Return type: * [Dataset](#lightgbm.Dataset "lightgbm.Dataset") > class lightgbm.Booster(params=None, train_set=None, model_file=None, silent=False) Bases: `object` Booster in LightGBM. Initialize the Booster. * Parameters: * **params** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Parameters for Booster. * **train_set** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset") _or_ _None__,_ _optional_ _(__default=None__)_) – Training dataset. * **model_file** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Path to the model file. * **silent** (_bool__,_ _optional_ _(__default=False__)_) – Whether to print messages during construction. > add_valid(data, name) Add validation data. * Parameters: * **data** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset")) – Validation data. * **name** (_string_) – Name of validation data. > attr(key) Get attribute string from the Booster. * Parameters: * **key** (_string_) – The name of the attribute. * Returns: * **value** – The attribute value. Returns None if attribute do not exist. * Return type: * string or None > current_iteration() Get the index of the current iteration. * Returns: * **cur_iter** – The index of the current iteration. * Return type: * int > dump_model(num_iteration=-1) Dump Booster to json format. * Parameters: * **num_iteration** (_int__,_ _optional_ _(__default=-1__)_) – Index of the iteration that should to dumped. If <0, the best iteration (if exists) is dumped. * Returns: * **json_repr** – Json format of Booster. * Return type: * dict > eval(data, name, feval=None) Evaluate for data. * Parameters: * **data** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset")) – Data for the evaluating. * **name** (_string_) – Name of the data. * **feval** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Custom evaluation function. * Returns: * **result** – List with evaluation results. * Return type: * list > eval_train(feval=None) Evaluate for training data. * Parameters: * **feval** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Custom evaluation function. * Returns: * **result** – List with evaluation results. * Return type: * list > eval_valid(feval=None) Evaluate for validation data. * Parameters: * **feval** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Custom evaluation function. * Returns: * **result** – List with evaluation results. * Return type: * list > feature_importance(importance_type='split', iteration=-1) Get feature importances. * Parameters: * **importance_type** (_string__,_ _optional_ _(__default="split"__)_) – How the importance is calculated. If “split”, result contains numbers of times the feature is used in a model. If “gain”, result contains total gains of splits which use the feature. * Returns: * **result** – Array with feature importances. * Return type: * numpy array > feature_name() Get names of features. * Returns: * **result** – List with names of features. * Return type: * list > free_dataset() Free Booster’s Datasets. > free_network() Free Network. > get_leaf_output(tree_id, leaf_id) Get the output of a leaf. * Parameters: * **tree_id** (_int_) – The index of the tree. * **leaf_id** (_int_) – The index of the leaf in the tree. * Returns: * **result** – The output of the leaf. * Return type: * float > num_feature() Get number of features. * Returns: * **num_feature** – The number of features. * Return type: * int > predict(data, num_iteration=-1, raw_score=False, pred_leaf=False, pred_contrib=False, data_has_header=False, is_reshape=True, pred_parameter=None) Make a prediction. * Parameters: * **data** (_string__,_ _numpy array_ _or_ _scipy.sparse_) – Data source for prediction. If string, it represents the path to txt file. * **num_iteration** (_int__,_ _optional_ _(__default=-1__)_) – Iteration used for prediction. If <0, the best iteration (if exists) is used for prediction. * **raw_score** (_bool__,_ _optional_ _(__default=False__)_) – Whether to predict raw scores. * **pred_leaf** (_bool__,_ _optional_ _(__default=False__)_) – Whether to predict leaf index. * **pred_contrib** (_bool__,_ _optional_ _(__default=False__)_) – Whether to predict feature contributions. * **data_has_header** (_bool__,_ _optional_ _(__default=False__)_) – Whether the data has header. Used only if data is string. * **is_reshape** (_bool__,_ _optional_ _(__default=True__)_) – If True, result is reshaped to [nrow, ncol]. * **pred_parameter** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Other parameters for the prediction. * Returns: * **result** – Prediction result. * Return type: * numpy array > reset_parameter(params) Reset parameters of Booster. * Parameters: * **params** (_dict_) – New parameters for Booster. > rollback_one_iter() Rollback one iteration. > save_model(filename, num_iteration=-1) Save Booster to file. * Parameters: * **filename** (_string_) – Filename to save Booster. * **num_iteration** (_int__,_ _optional_ _(__default=-1__)_) – Index of the iteration that should to saved. If <0, the best iteration (if exists) is saved. > set_attr(**kwargs) Set the attribute of the Booster. * Parameters: * **kwargs** – The attributes to set. Setting a value to None deletes an attribute. | > set_network(machines, local_listen_port=12400, listen_time_out=120, num_machines=1) Set the network configuration. * Parameters: * **machines** (_list__,_ _set_ _or_ _string_) – Names of machines. * **local_listen_port** (_int__,_ _optional_ _(__default=12400__)_) – TCP listen port for local machines. * **listen_time_out** (_int__,_ _optional_ _(__default=120__)_) – Socket time-out in minutes. * **num_machines** (_int__,_ _optional_ _(__default=1__)_) – The number of machines for parallel learning application. > set_train_data_name(name) Set the name to the training Dataset. * Parameters: * **name** (_string_) – Name for training Dataset. > update(train_set=None, fobj=None) Update for one iteration. * Parameters: * **train_set** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset") _or_ _None__,_ _optional_ _(__default=None__)_) – Training data. If None, last training data is used. * **fobj** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Customized objective function. For multi-class task, the score is group by class_id first, then group by row_id. If you want to get i-th row score in j-th class, the access way is score[j * num_data + i] and you should group grad and hess in this way as well. * Returns: * **is_finished** – Whether the update was successfully finished. * Return type: * bool ## Training API > lightgbm.train(params, train_set, num_boost_round=100, valid_sets=None, valid_names=None, fobj=None, feval=None, init_model=None, feature_name='auto', categorical_feature='auto', early_stopping_rounds=None, evals_result=None, verbose_eval=True, learning_rates=None, keep_training_booster=False, callbacks=None) Perform the training with given parameters. * Parameters: * **params** (_dict_) – Parameters for training. * **train_set** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset")) – Data to be trained. * **num_boost_round** (_int__,_ _optional_ _(__default=100__)_) – Number of boosting iterations. * **valid_sets** (_list of Datasets_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of data to be evaluated during training. * **valid_names** (_list of string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Names of `valid_sets`. * **fobj** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Customized objective function. * **feval** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Customized evaluation function. Note: should return (eval_name, eval_result, is_higher_better) or list of such tuples. * **init_model** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Filename of LightGBM model or Booster instance used for continue training. * **feature_name** (_list of strings_ _or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. * **categorical_feature** (_list of strings_ _or_ _int__, or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. * **early_stopping_rounds** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Activates early stopping. The model will train until the validation score stops improving. Requires at least one validation data and one metric. If there’s more than one, will check all of them. If early stopping occurs, the model will add `best_iteration` field. * **evals_result** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – This dictionary used to store all evaluation results of all the items in `valid_sets`. Example With a `valid_sets` = [valid_set, train_set], `valid_names` = [‘eval’, ‘train’] and a `params` = (‘metric’:’logloss’) returns: {‘train’: {‘logloss’: [‘0.48253’, ‘0.35953’, …]}, ‘eval’: {‘logloss’: [‘0.480385’, ‘0.357756’, …]}}. * **verbose_eval** (_bool_ _or_ _int__,_ _optional_ _(__default=True__)_) – Requires at least one validation data. If True, the eval metric on the valid set is printed at each boosting stage. If int, the eval metric on the valid set is printed at every `verbose_eval` boosting stage. The last boosting stage or the boosting stage found by using `early_stopping_rounds` is also printed. Example With `verbose_eval` = 4 and at least one item in evals, an evaluation metric is printed every 4 (instead of 1) boosting stages. * **learning_rates** (_list__,_ _callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of learning rates for each boosting round or a customized function that calculates `learning_rate` in terms of current number of round (e.g. yields learning rate decay). * **keep_training_booster** (_bool__,_ _optional_ _(__default=False__)_) – Whether the returned Booster will be used to keep training. If False, the returned value will be converted into _InnerPredictor before returning. You can still use _InnerPredictor as `init_model` for future continue training. * **callbacks** (_list of callables_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information. * Returns: * **booster** – The trained Booster model. * Return type: * [Booster](#lightgbm.Booster "lightgbm.Booster") > lightgbm.cv(params, train_set, num_boost_round=10, folds=None, nfold=5, stratified=True, shuffle=True, metrics=None, fobj=None, feval=None, init_model=None, feature_name='auto', categorical_feature='auto', early_stopping_rounds=None, fpreproc=None, verbose_eval=None, show_stdv=True, seed=0, callbacks=None) Perform the cross-validation with given paramaters. * Parameters: * **params** (_dict_) – Parameters for Booster. * **train_set** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset")) – Data to be trained on. * **num_boost_round** (_int__,_ _optional_ _(__default=10__)_) – Number of boosting iterations. * **folds** (_a generator_ _or_ _iterator of_ _(__train_idx__,_ _test_idx__)_ _tuples_ _or_ _None__,_ _optional_ _(__default=None__)_) – The train and test indices for the each fold. This argument has highest priority over other data split arguments. * **nfold** (_int__,_ _optional_ _(__default=5__)_) – Number of folds in CV. * **stratified** (_bool__,_ _optional_ _(__default=True__)_) – Whether to perform stratified sampling. * **shuffle** (_bool__,_ _optional_ _(__default=True__)_) – Whether to shuffle before splitting data. * **metrics** (_string__,_ _list of strings_ _or_ _None__,_ _optional_ _(__default=None__)_) – Evaluation metrics to be monitored while CV. If not None, the metric in `params` will be overridden. * **fobj** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Custom objective function. * **feval** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Custom evaluation function. * **init_model** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Filename of LightGBM model or Booster instance used for continue training. * **feature_name** (_list of strings_ _or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. * **categorical_feature** (_list of strings_ _or_ _int__, or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. * **early_stopping_rounds** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Activates early stopping. CV error needs to decrease at least every `early_stopping_rounds` round(s) to continue. Last entry in evaluation history is the one from best iteration. * **fpreproc** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Preprocessing function that takes (dtrain, dtest, params) and returns transformed versions of those. * **verbose_eval** (_bool__,_ _int__, or_ _None__,_ _optional_ _(__default=None__)_) – Whether to display the progress. If None, progress will be displayed when np.ndarray is returned. If True, progress will be displayed at every boosting stage. If int, progress will be displayed at every given `verbose_eval` boosting stage. * **show_stdv** (_bool__,_ _optional_ _(__default=True__)_) – Whether to display the standard deviation in progress. Results are not affected by this parameter, and always contains std. * **seed** (_int__,_ _optional_ _(__default=0__)_) – Seed used to generate the folds (passed to numpy.random.seed). * **callbacks** (_list of callables_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information. * Returns: * **eval_hist** – Evaluation history. The dictionary has the following format: {‘metric1-mean’: [values], ‘metric1-stdv’: [values], ‘metric2-mean’: [values], ‘metric1-stdv’: [values], …}. * Return type: * dict ## Scikit-learn API > class lightgbm.LGBMModel(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=200000, objective=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=1, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, **kwargs) Bases: `object` Implementation of the scikit-learn API for LightGBM. Construct a gradient boosting model. * Parameters: * **boosting_type** (_string__,_ _optional_ _(__default="gbdt"__)_) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest. * **num_leaves** (_int__,_ _optional_ _(__default=31__)_) – Maximum tree leaves for base learners. * **max_depth** (_int__,_ _optional_ _(__default=-1__)_) – Maximum tree depth for base learners, -1 means no limit. * **learning_rate** (_float__,_ _optional_ _(__default=0.1__)_) – Boosting learning rate. * **n_estimators** (_int__,_ _optional_ _(__default=10__)_) – Number of boosted trees to fit. * **max_bin** (_int__,_ _optional_ _(__default=255__)_) – Number of bucketed bins for feature values. * **subsample_for_bin** (_int__,_ _optional_ _(__default=50000__)_) – Number of samples for constructing bins. * **objective** (_string__,_ _callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below). default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker. * **min_split_gain** (_float__,_ _optional_ _(__default=0.__)_) – Minimum loss reduction required to make a further partition on a leaf node of the tree. * **min_child_weight** (_float__,_ _optional_ _(__default=1e-3__)_) – Minimum sum of instance weight(hessian) needed in a child(leaf). * **min_child_samples** (_int__,_ _optional_ _(__default=20__)_) – Minimum number of data need in a child(leaf). * **subsample** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of the training instance. * **subsample_freq** (_int__,_ _optional_ _(__default=1__)_) – Frequence of subsample, <=0 means no enable. * **colsample_bytree** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of columns when constructing each tree. * **reg_alpha** (_float__,_ _optional_ _(__default=0.__)_) – L1 regularization term on weights. * **reg_lambda** (_float__,_ _optional_ _(__default=0.__)_) – L2 regularization term on weights. * **random_state** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Random number seed. Will use default seeds in c++ code if set to None. * **n_jobs** (_int__,_ _optional_ _(__default=-1__)_) – Number of parallel threads. * **silent** (_bool__,_ _optional_ _(__default=True__)_) – Whether to print messages while running boosting. * ****kwargs** (_other parameters_) – Check [http://lightgbm.readthedocs.io/en/latest/Parameters.html](http://lightgbm.readthedocs.io/en/latest/Parameters.html) for more parameters. Note **kwargs is not supported in sklearn, it may cause unexpected issues. > n_features_ _int_ – The number of features of fitted model. > classes_ _array of shape = [n_classes]_ – The class label array (only for classification problem). > n_classes_ _int_ – The number of classes (only for classification problem). > best_score_ _dict or None_ – The best score of fitted model. > best_iteration_ _int or None_ – The best iteration of fitted model if `early_stopping_rounds` has been specified. > objective_ _string or callable_ – The concrete objective used while fitting this model. > booster_ _Booster_ – The underlying Booster of this model. > evals_result_ _dict or None_ – The evaluation results if `early_stopping_rounds` has been specified. > feature_importances_ _array of shape = [n_features]_ – The feature importances (the higher, the more important the feature). Note A custom objective function can be provided for the `objective` parameter. In this case, it should have the signature `objective(y_true, y_pred) -> grad, hess` or `objective(y_true, y_pred, group) -> grad, hess`: > ``` > y_true: array-like of shape = [n_samples] > ``` > > The target values. > > ``` > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) > ``` > > The predicted values. > > ``` > group: array-like > ``` > > Group/query data, used for ranking task. > > ``` > grad: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) > ``` > > The value of the gradient for each sample point. > > ``` > hess: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) > ``` > > The value of the second derivative for each sample point. For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and you should group grad and hess in this way as well. > apply(X, num_iteration=0) Return the predicted leaf every tree for each sample. * Parameters: * **X** (_array-like_ _or_ _sparse matrix of shape =_ _[__n_samples__,_ _n_features__]_) – Input features matrix. * **num_iteration** (_int__,_ _optional_ _(__default=0__)_) – Limit number of iterations in the prediction; defaults to 0 (use all trees). * Returns: * **X_leaves** – The predicted leaf every tree for each sample. * Return type: * array-like of shape = [n_samples, n_trees] > best_iteration_ Get the best iteration of fitted model. > best_score_ Get the best score of fitted model. > booster_ Get the underlying lightgbm Booster of this model. > evals_result_ Get the evaluation results. > feature_importances_ Get feature importances. Note Feature importance in sklearn interface used to normalize to 1, it’s deprecated after 2.0.4 and same as Booster.feature_importance() now. > fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None) Build a gradient boosting model from the training set (X, y). * Parameters: * **X** (_array-like_ _or_ _sparse matrix of shape =_ _[__n_samples__,_ _n_features__]_) – Input feature matrix. * **y** (_array-like of shape =_ _[__n_samples__]_) – The target values (class labels in classification, real numbers in regression). * **sample_weight** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Weights of training data. * **init_score** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Init score of training data. * **group** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Group data of training data. * **eval_set** (_list_ _or_ _None__,_ _optional_ _(__default=None__)_) – A list of (X, y) tuple pairs to use as a validation sets for early-stopping. * **eval_names** (_list of strings_ _or_ _None__,_ _optional_ _(__default=None__)_) – Names of eval_set. * **eval_sample_weight** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Weights of eval data. * **eval_init_score** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Init score of eval data. * **eval_group** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Group data of eval data. * **eval_metric** (_string__,_ _list of strings__,_ _callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – If string, it should be a built-in evaluation metric to use. If callable, it should be a custom evaluation metric, see note for more details. * **early_stopping_rounds** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Activates early stopping. The model will train until the validation score stops improving. Validation error needs to decrease at least every `early_stopping_rounds` round(s) to continue training. * **verbose** (_bool__,_ _optional_ _(__default=True__)_) – If True and an evaluation set is used, writes the evaluation progress. * **feature_name** (_list of strings_ _or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. * **categorical_feature** (_list of strings_ _or_ _int__, or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. * **callbacks** (_list of callback functions_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information. * Returns: * **self** – Returns self. * Return type: * object Note Custom eval function expects a callable with following functions: `func(y_true, y_pred)`, `func(y_true, y_pred, weight)` or `func(y_true, y_pred, weight, group)`. Returns (eval_name, eval_result, is_bigger_better) or list of (eval_name, eval_result, is_bigger_better) > ``` > y_true: array-like of shape = [n_samples] > ``` > > The target values. > > ``` > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class) > ``` > > The predicted values. > > ``` > weight: array-like of shape = [n_samples] > ``` > > The weight of samples. > > ``` > group: array-like > ``` > > Group/query data, used for ranking task. > > ``` > eval_name: str > ``` > > The name of evaluation. > > ``` > eval_result: float > ``` > > The eval result. > > ``` > is_bigger_better: bool > ``` > > Is eval result bigger better, e.g. AUC is bigger_better. For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i]. > n_features_ Get the number of features of fitted model. > objective_ Get the concrete objective used while fitting this model. > predict(X, raw_score=False, num_iteration=0) Return the predicted value for each sample. * Parameters: * **X** (_array-like_ _or_ _sparse matrix of shape =_ _[__n_samples__,_ _n_features__]_) – Input features matrix. * **raw_score** (_bool__,_ _optional_ _(__default=False__)_) – Whether to predict raw scores. * **num_iter ation** (_int__,_ _optional_ _(__default=0__)_) – Limit number of iterations in the prediction; defaults to 0 (use all trees). * Returns: * **predicted_result** – The predicted values. * Return type: * array-like of shape = [n_samples] or shape = [n_samples, n_classes] > class lightgbm.LGBMClassifier(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=200000, objective=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=1, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, **kwargs) Bases: `lightgbm.sklearn.LGBMModel`, `object` LightGBM classifier. Construct a gradient boosting model. * Parameters: * **boosting_type** (_string__,_ _optional_ _(__default="gbdt"__)_) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest. * **num_leaves** (_int__,_ _optional_ _(__default=31__)_) – Maximum tree leaves for base learners. * **max_depth** (_int__,_ _optional_ _(__default=-1__)_) – Maximum tree depth for base learners, -1 means no limit. * **learning_rate** (_float__,_ _optional_ _(__default=0.1__)_) – Boosting learning rate. * **n_estimators** (_int__,_ _optional_ _(__default=10__)_) – Number of boosted trees to fit. * **max_bin** (_int__,_ _optional_ _(__default=255__)_) – Number of bucketed bins for feature values. * **subsample_for_bin** (_int__,_ _optional_ _(__default=50000__)_) – Number of samples for constructing bins. * **objective** (_string__,_ _callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below). default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker. * **min_split_gain** (_float__,_ _optional_ _(__default=0.__)_) – Minimum loss reduction required to make a further partition on a leaf node of the tree. * **min_child_weight** (_float__,_ _optional_ _(__default=1e-3__)_) – Minimum sum of instance weight(hessian) needed in a child(leaf). * **min_child_samples** (_int__,_ _optional_ _(__default=20__)_) – Minimum number of data need in a child(leaf). * **subsample** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of the training instance. * **subsample_freq** (_int__,_ _optional_ _(__default=1__)_) – Frequence of subsample, <=0 means no enable. * **colsample_bytree** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of columns when constructing each tree. * **reg_alpha** (_float__,_ _optional_ _(__default=0.__)_) – L1 regularization term on weights. * **reg_lambda** (_float__,_ _optional_ _(__default=0.__)_) – L2 regularization term on weights. * **random_state** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Random number seed. Will use default seeds in c++ code if set to None. * **n_jobs** (_int__,_ _optional_ _(__default=-1__)_) – Number of parallel threads. * **silent** (_bool__,_ _optional_ _(__default=True__)_) – Whether to print messages while running boosting. * ****kwargs** (_other parameters_) – Check [http://lightgbm.readthedocs.io/en/latest/Parameters.html](http://lightgbm.readthedocs.io/en/latest/Parameters.html) for more parameters. Note **kwargs is not supported in sklearn, it may cause unexpected issues. > n_features_ _int_ – The number of features of fitted model. > classes_ _array of shape = [n_classes]_ – The class label array (only for classification problem). > n_classes_ _int_ – The number of classes (only for classification problem). > best_score_ _dict or None_ – The best score of fitted model. > best_iteration_ _int or None_ – The best iteration of fitted model if `early_stopping_rounds` has been specified. > objective_ _string or callable_ – The concrete objective used while fitting this model. > booster_ _Booster_ – The underlying Booster of this model. > evals_result_ _dict or None_ – The evaluation results if `early_stopping_rounds` has been specified. > feature_importances_ _array of shape = [n_features]_ – The feature importances (the higher, the more important the feature). Note A custom objective function can be provided for the `objective` parameter. In this case, it should have the signature `objective(y_true, y_pred) -> grad, hess` or `objective(y_true, y_pred, group) -> grad, hess`: > ``` > y_true: array-like of shape = [n_samples] > ``` > > The target values. > > ``` > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) > ``` > > The predicted values. > > ``` > group: array-like > ``` > > Group/query data, used for ranking task. > > ``` > grad: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) > ``` > > The value of the gradient for each sample point. > > ``` > hess: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) > ``` > > The value of the second derivative for each sample point. For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and you should group grad and hess in this way as well. > classes_ Get the class label array. > fit(X, y, sample_weight=None, init_score=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_init_score=None, eval_metric='logloss', early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None) Build a gradient boosting model from the training set (X, y). * Parameters: * **X** (_array-like_ _or_ _sparse matrix of shape =_ _[__n_samples__,_ _n_features__]_) – Input feature matrix. * **y** (_array-like of shape =_ _[__n_samples__]_) – The target values (class labels in classification, real numbers in regression). * **sample_weight** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Weights of training data. * **init_score** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Init score of training data. * **group** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Group data of training data. * **eval_set** (_list_ _or_ _None__,_ _optional_ _(__default=None__)_) – A list of (X, y) tuple pairs to use as a validation sets for early-stopping. * **eval_names** (_list of strings_ _or_ _None__,_ _optional_ _(__default=None__)_) – Names of eval_set. * **eval_sample_weight** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Weights of eval data. * **eval_init_score** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Init score of eval data. * **eval_group** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Group data of eval data. * **eval_metric** (_string__,_ _list of strings__,_ _callable_ _or_ _None__,_ _optional_ _(__default="logloss"__)_) – If string, it should be a built-in evaluation metric to use. If callable, it should be a custom evaluation metric, see note for more details. * **early_stopping_rounds** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Activates early stopping. The model will train until the validation score stops improving. Validation error needs to decrease at least every `early_stopping_rounds` round(s) to continue training. * **verbose** (_bool__,_ _optional_ _(__default=True__)_) – If True and an evaluation set is used, writes the evaluation progress. * **feature_name** (_list of strings_ _or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. * **categorical_feature** (_list of strings_ _or_ _int__, or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. * **callbacks** (_list of callback functions_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information. * Returns: * **self** – Returns self. * Return type: * object Note Custom eval function expects a callable with following functions: `func(y_true, y_pred)`, `func(y_true, y_pred, weight)` or `func(y_true, y_pred, weight, group)`. Returns (eval_name, eval_result, is_bigger_better) or list of (eval_name, eval_result, is_bigger_better) > ``` > y_true: array-like of shape = [n_samples] > ``` > > The target values. > > ``` > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class) > ``` > > The predicted values. > > ``` > weight: array-like of shape = [n_samples] > ``` > > The weight of samples. > > ``` > group: array-like > ``` > > Group/query data, used for ranking task. > > ``` > eval_name: str > ``` > > The name of evaluation. > > ``` > eval_result: float > ``` > > The eval result. > > ``` > is_bigger_better: bool > ``` > > Is eval result bigger better, e.g. AUC is bigger_better. For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i]. > n_classes_ Get the number of classes. > predict_proba(X, raw_score=False, num_iteration=0) Return the predicted probability for each class for each sample. * Parameters: * **X** (_array-like_ _or_ _sparse matrix of shape =_ _[__n_samples__,_ _n_features__]_) – Input features matrix. * **raw_score** (_bool__,_ _optional_ _(__default=False__)_) – Whether to predict raw scores. * **num_iteration** (_int__,_ _optional_ _(__default=0__)_) – Limit number of iterations in the prediction; defaults to 0 (use all trees). * Returns: * **predicted_probability** – The predicted probability for each class for each sample. * Return type: * array-like of shape = [n_samples, n_classes] > class lightgbm.LGBMRegressor(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=200000, objective=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=1, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, **kwargs) Bases: `lightgbm.sklearn.LGBMModel`, `object` LightGBM regressor. Construct a gradient boosting model. * Parameters: * **boosting_type** (_string__,_ _optional_ _(__default="gbdt"__)_) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest. * **num_leaves** (_int__,_ _optional_ _(__default=31__)_) – Maximum tree leaves for base learners. * **max_depth** (_int__,_ _optional_ _(__default=-1__)_) – Maximum tree depth for base learners, -1 means no limit. * **learning_rate** (_float__,_ _optional_ _(__default=0.1__)_) – Boosting learning rate. * **n_estimators** (_int__,_ _optional_ _(__default=10__)_) – Number of boosted trees to fit. * **max_bin** (_int__,_ _optional_ _(__default=255__)_) – Number of bucketed bins for feature values. * **subsample_for_bin** (_int__,_ _optional_ _(__default=50000__)_) – Number of samples for constructing bins. * **objective** (_string__,_ _callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below). default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker. * **min_split_gain** (_float__,_ _optional_ _(__default=0.__)_) – Minimum loss reduction required to make a further partition on a leaf node of the tree. * **min_child_weight** (_float__,_ _optional_ _(__default=1e-3__)_) – Minimum sum of instance weight(hessian) needed in a child(leaf). * **min_child_samples** (_int__,_ _optional_ _(__default=20__)_) – Minimum number of data need in a child(leaf). * **subsample** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of the training instance. * **subsample_freq** (_int__,_ _optional_ _(__default=1__)_) – Frequence of subsample, <=0 means no enable. * **colsample_bytree** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of columns when constructing each tree. * **reg_alpha** (_float__,_ _optional_ _(__default=0.__)_) – L1 regularization term on weights. * **reg_lambda** (_float__,_ _optional_ _(__default=0.__)_) – L2 regularization term on weights. * **random_state** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Random number seed. Will use default seeds in c++ code if set to None. * **n_jobs** (_int__,_ _optional_ _(__default=-1__)_) – Number of parallel threads. * **silent** (_bool__,_ _optional_ _(__default=True__)_) – Whether to print messages while running boosting. * ****kwargs** (_other parameters_) – Check [http://lightgbm.readthedocs.io/en/latest/Parameters.html](http://lightgbm.readthedocs.io/en/latest/Parameters.html) for more parameters. Note **kwargs is not supported in sklearn, it may cause unexpected issues. > n_features_ _int_ – The number of features of fitted model. > classes_ _array of shape = [n_classes]_ – The class label array (only for classification problem). > n_classes_ _int_ – The number of classes (only for classification problem). > best_score_ _dict or None_ – The best score of fitted model. > best_iteration_ _int or None_ – The best iteration of fitted model if `early_stopping_rounds` has been specified. > objective_ _string or callable_ – The concrete objective used while fitting this model. > booster_ _Booster_ – The underlying Booster of this model. > evals_result_ _dict or None_ – The evaluation results if `early_stopping_rounds` has been specified. > feature_importances_ _array of shape = [n_features]_ – The feature importances (the higher, the more important the feature). Note A custom objective function can be provided for the `objective` parameter. In this case, it should have the signature `objective(y_true, y_pred) -> grad, hess` or `objective(y_true, y_pred, group) -> grad, hess`: > ``` > y_true: array-like of shape = [n_samples] > ``` > > The target values. > > ``` > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) > ``` > > The predicted values. > > ``` > group: array-like > ``` > > Group/query data, used for ranking task. > > ``` > grad: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) > ``` > > The value of the gradient for each sample point. > > ``` > hess: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) > ``` > > The value of the second derivative for each sample point. For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and you should group grad and hess in this way as well. > fit(X, y, sample_weight=None, init_score=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_init_score=None, eval_metric='l2', early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None) Build a gradient boosting model from the training set (X, y). * Parameters: * **X** (_array-like_ _or_ _sparse matrix of shape =_ _[__n_samples__,_ _n_features__]_) – Input feature matrix. * **y** (_array-like of shape =_ _[__n_samples__]_) – The target values (class labels in classification, real numbers in regression). * **sample_weight** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Weights of training data. * **init_score** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Init score of training data. * **group** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Group data of training data. * **eval_set** (_list_ _or_ _None__,_ _optional_ _(__default=None__)_) – A list of (X, y) tuple pairs to use as a validation sets for early-stopping. * **eval_names** (_list of strings_ _or_ _None__,_ _optional_ _(__default=None__)_) – Names of eval_set. * **eval_sample_weight** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Weights of eval data. * **eval_init_score** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Init score of eval data. * **eval_group** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Group data of eval data. * **eval_metric** (_string__,_ _list of strings__,_ _callable_ _or_ _None__,_ _optional_ _(__default="l2"__)_) – If string, it should be a built-in evaluation metric to use. If callable, it should be a custom evaluation metric, see note for more details. * **early_stopping_rounds** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – A ctivates early stopping. The model will train until the validation score stops improving. Validation error needs to decrease at least every `early_stopping_rounds` round(s) to continue training. * **verbose** (_bool__,_ _optional_ _(__default=True__)_) – If True and an evaluation set is used, writes the evaluation progress. * **feature_name** (_list of strings_ _or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. * **categorical_feature** (_list of strings_ _or_ _int__, or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. * **callbacks** (_list of callback functions_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information. * Returns: * **self** – Returns self. * Return type: * object Note Custom eval function expects a callable with following functions: `func(y_true, y_pred)`, `func(y_true, y_pred, weight)` or `func(y_true, y_pred, weight, group)`. Returns (eval_name, eval_result, is_bigger_better) or list of (eval_name, eval_result, is_bigger_better) > ``` > y_true: array-like of shape = [n_samples] > ``` > > The target values. > > ``` > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class) > ``` > > The predicted values. > > ``` > weight: array-like of shape = [n_samples] > ``` > > The weight of samples. > > ``` > group: array-like > ``` > > Group/query data, used for ranking task. > > ``` > eval_name: str > ``` > > The name of evaluation. > > ``` > eval_result: float > ``` > > The eval result. > > ``` > is_bigger_better: bool > ``` > > Is eval result bigger better, e.g. AUC is bigger_better. For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i]. > class lightgbm.LGBMRanker(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=200000, objective=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=1, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, **kwargs) Bases: `lightgbm.sklearn.LGBMModel` LightGBM ranker. Construct a gradient boosting model. * Parameters: * **boosting_type** (_string__,_ _optional_ _(__default="gbdt"__)_) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest. * **num_leaves** (_int__,_ _optional_ _(__default=31__)_) – Maximum tree leaves for base learners. * **max_depth** (_int__,_ _optional_ _(__default=-1__)_) – Maximum tree depth for base learners, -1 means no limit. * **learning_rate** (_float__,_ _optional_ _(__default=0.1__)_) – Boosting learning rate. * **n_estimators** (_int__,_ _optional_ _(__default=10__)_) – Number of boosted trees to fit. * **max_bin** (_int__,_ _optional_ _(__default=255__)_) – Number of bucketed bins for feature values. * **subsample_for_bin** (_int__,_ _optional_ _(__default=50000__)_) – Number of samples for constructing bins. * **objective** (_string__,_ _callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below). default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker. * **min_split_gain** (_float__,_ _optional_ _(__default=0.__)_) – Minimum loss reduction required to make a further partition on a leaf node of the tree. * **min_child_weight** (_float__,_ _optional_ _(__default=1e-3__)_) – Minimum sum of instance weight(hessian) needed in a child(leaf). * **min_child_samples** (_int__,_ _optional_ _(__default=20__)_) – Minimum number of data need in a child(leaf). * **subsample** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of the training instance. * **subsample_freq** (_int__,_ _optional_ _(__default=1__)_) – Frequence of subsample, <=0 means no enable. * **colsample_bytree** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of columns when constructing each tree. * **reg_alpha** (_float__,_ _optional_ _(__default=0.__)_) – L1 regularization term on weights. * **reg_lambda** (_float__,_ _optional_ _(__default=0.__)_) – L2 regularization term on weights. * **random_state** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Random number seed. Will use default seeds in c++ code if set to None. * **n_jobs** (_int__,_ _optional_ _(__default=-1__)_) – Number of parallel threads. * **silent** (_bool__,_ _optional_ _(__default=True__)_) – Whether to print messages while running boosting. * ****kwargs** (_other parameters_) – Check [http://lightgbm.readthedocs.io/en/latest/Parameters.html](http://lightgbm.readthedocs.io/en/latest/Parameters.html) for more parameters. Note **kwargs is not supported in sklearn, it may cause unexpected issues. > n_features_ _int_ – The number of features of fitted model. > classes_ _array of shape = [n_classes]_ – The class label array (only for classification problem). > n_classes_ _int_ – The number of classes (only for classification problem). > best_score_ _dict or None_ – The best score of fitted model. > best_iteration_ _int or None_ – The best iteration of fitted model if `early_stopping_rounds` has been specified. > objective_ _string or callable_ – The concrete objective used while fitting this model. > booster_ _Booster_ – The underlying Booster of this model. > evals_result_ _dict or None_ – The evaluation results if `early_stopping_rounds` has been specified. > feature_importances_ _array of shape = [n_features]_ – The feature importances (the higher, the more important the feature). Note A custom objective function can be provided for the `objective` parameter. In this case, it should have the signature `objective(y_true, y_pred) -> grad, hess` or `objective(y_true, y_pred, group) -> grad, hess`: > ``` > y_true: array-like of shape = [n_samples] > ``` > > The target values. > > ``` > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) > ``` > > The predicted values. > > ``` > group: array-like > ``` > > Group/query data, used for ranking task. > > ``` > grad: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) > ``` > > The value of the gradient for each sample point. > > ``` > hess: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) > ``` > > The value of the second derivative for each sample point. For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and you should group grad and hess in this way as well. > fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric='ndcg', eval_at=[1], early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None) Build a gradient boosting model from the training set (X, y). * Parameters: * **X** (_array-like_ _or_ _sparse matrix of shape =_ _[__n_samples__,_ _n_features__]_) – Input feature matrix. * **y** (_array-like of shape =_ _[__n_samples__]_) – The target values (class labels in classification, real numbers in regression). * **sample_weight** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Weights of training data. * **init_score** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Init score of training data. * **group** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Group data of training data. * **eval_set** (_list_ _or_ _None__,_ _optional_ _(__default=None__)_) – A list of (X, y) tuple pairs to use as a validation sets for early-stopping. * **eval_names** (_list of strings_ _or_ _None__,_ _optional_ _(__default=None__)_) – Names of eval_set. * **eval_sample_weight** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Weights of eval data. * **eval_init_score** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Init score of eval data. * **eval_group** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Group data of eval data. * **eval_metric** (_string__,_ _list of strings__,_ _callable_ _or_ _None__,_ _optional_ _(__default="ndcg"__)_) – If string, it should be a built-in evaluation metric to use. If callable, it should be a custom evaluation metric, see note for more details. * **eval_at** (_list of int__,_ _optional_ _(__default=__[__1__]__)_) – The evaluation positions of NDCG. * **early_stopping_rounds** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Activates early stopping. The model will train until the validation score stops improving. Validation error needs to decrease at least every `early_stopping_rounds` round(s) to continue training. * **verbose** (_bool__,_ _optional_ _(__default=True__)_) – If True and an evaluation set is used, writes the evaluation progress. * **feature_name** (_list of strings_ _or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. * **categorical_feature** (_list of strings_ _or_ _int__, or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. * **callbacks** (_list of callback functions_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information. * Returns: * **self** – Returns self. * Return type: * object Note Custom eval function expects a callable with following functions: `func(y_true, y_pred)`, `func(y_true, y_pred, weight)` or `func(y_true, y_pred, weight, group)`. Returns (eval_name, eval_result, is_bigger_better) or list of (eval_name, eval_result, is_bigger_better) > ``` > y_true: array-like of shape = [n_samples] > ``` > > The target values. > > ``` > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class) > ``` > > The predicted values. > > ``` > weight: array-like of shape = [n_samples] > ``` > > The weight of samples. > > ``` > group: array-like > ``` > > Group/query data, used for ranking task. > > ``` > eval_name: str > ``` > > The name of evaluation. > > ``` > eval_result: float > ``` > > The eval result. > > ``` > is_bigger_better: bool > ``` > > Is eval result bigger better, e.g. AUC is bigger_better. For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i]. ## Callbacks > lightgbm.early_stopping(stopping_rounds, verbose=True) Create a callback that activates early stopping. Note Activates early stopping. Requires at least one validation data and one metric. If there’s more than one, will check all of them. * Parameters: * **stopping_rounds** (_int_) – The possible number of rounds without the trend occurrence. * **verbose** (_bool__,_ _optional_ _(__default=True__)_) – Whether to print message with early stopping information. * Returns: * **callback** – The callback that activates early stopping. * Return type: * function > lightgbm.print_evaluation(period=1, show_stdv=True) Create a callback that prints the evaluation results. * Parameters: * **period** (_int__,_ _optional_ _(__default=1__)_) – The period to print the evaluation results. * **show_stdv** (_bool__,_ _optional_ _(__default=True__)_) – Whether to show stdv (if provided). * Returns: * **callback** – The callback that prints the evaluation results every `period` iteration(s). * Return type: * function > lightgbm.record_evaluation(eval_result) Create a callback that records the evaluation history into `eval_result`. * Parameters: * **eval_result** (_dict_) – A dictionary to store the evaluation results. * Returns: * **callback** – The callback that records the evaluation history into the passed dictionary. * Return type: * function > lightgbm.reset_parameter(**kwargs) Create a callback that resets the parameter after the first iteration. Note The initial parameter will still take in-effect on first iteration. * Parameters: * **kwargs** (_value should be list_ _or_ _function_) – List of parameters for each boosting round or a customized function that calculates the parameter in terms of current number of round (e.g. yields learning rate decay). If list lst, parameter = lst[current_round]. If function func, parameter = func(current_round). * Returns: * **callback** – The callback that resets the parameter after the first iteration. * Return type: * function ## Plotting > lightgbm.plot_importance(booster, ax=None, height=0.2, xlim=None, ylim=None, title='Feature importance', xlabel='Feature importance', ylabel='Features', importance_type='split', max_num_features=None, ignore_zero=True, figsize=None, grid=True, **kwargs) Plot model’s feature importances. * Parameters: * **booster** ([_Booster_](#lightgbm.Booster "lightgbm.Booster") _or_ [_LGBMModel_](#lightgbm.LGBMModel "lightgbm.LGBMModel")) – Booster or LGBMModel instance which feature importance should be plotted. * **ax** (_matplotlib.axes.Axes_ _or_ _None__,_ _optional_ _(__default=None__)_) – Target axes instance. If None, new figure and axes will be created. * **height** (_float__,_ _optional_ _(__default=0.2__)_) – Bar height, passed to `ax.barh()`. * **xlim** (_tuple of 2 elements_ _or_ _None__,_ _optional_ _(__default=None__)_) – Tuple passed to `ax.xlim()`. * **ylim** (_tuple of 2 elements_ _or_ _None__,_ _optional_ _(__default=None__)_) – Tuple passed to `ax.ylim()`. * **title** (_string_ _or_ _None__,_ _optional_ _(__default="Feature importance"__)_) – Axes title. If None, title is disabled. * **xlabel** (_string_ _or_ _None__,_ _optional_ _(__default="Feature importance"__)_) – X-axis title label. If None, title is disabled. * **ylabel** (_string_ _or_ _None__,_ _optional_ _(__default="Features"__)_) – Y-axis title label. If None, title is disabled. * **importance_type** (_string__,_ _optional_ _(__default="split"__)_) – How the importance is calculated. If “split”, result contains numbers of times the feature is used in a model. If “gain”, result contains total gains of splits which use the feature. * **max_num_features** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Max number of top features displayed on plot. If None or <1, all features will be displayed. * **ignore_zero** (_bool__,_ _optional_ _(__default=True__)_) – Whether to ignore features with zero importance. * **figsize** (_tuple of 2 elements_ _or_ _None__,_ _optional_ _(__default=None__)_) – Figure size. * **grid** (_bool__,_ _optional_ _(__default=True__)_) – Whether to add a grid for axes. * ****kwargs** (_other parameters_) – Other parameters passed to `ax.barh()`. * Returns: * **ax** – The plot with model’s feature importances. * Return type: * matplotlib.axes.Axes > lightgbm.plot_metric(booster, metric=None, dataset_names=None, ax=None, xlim=None, ylim=None, title='Metric during training', xlabel='Iterations', ylabel='auto', figsize=None, grid=True) Plot one metric during training. * Parameters: * **booster** (_dict_ _or_ [_LGBMModel_](#lightgbm.LGBMModel "lightgbm.LGBMModel")) – Dictionary returned from `lightgbm.train()` or LGBMModel instance. * **metric** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – The metric name to plot. Only one metric supported because different metrics have various scales. If None, first metric picked from dictionary (according to hashcode). * **dataset_names** (_list of strings_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of the dataset names which are used to calculate metric to plot. If None, all datasets are used. * **ax** (_matplotlib.axes.Axes_ _or_ _None__,_ _optional_ _(__default=None__)_) – Target axes instance. If None, new figure and axes will be created. * **xlim** (_tuple of 2 elements_ _or_ _None__,_ _optional_ _(__default=None__)_) – Tuple passed to `ax.xlim()`. * **ylim** (_tuple of 2 elements_ _or_ _None__,_ _optional_ _(__default=None__)_) – Tuple passed to `ax.ylim()`. * **title** (_string_ _or_ _None__,_ _optional_ _(__default="Metric during training"__)_) – Axes title. If None, title is disabled. * **xlabel** (_string_ _or_ _None__,_ _optional_ _(__default="Iterations"__)_) – X-axis title label. If None, title is disabled. * **ylabel** (_string_ _or_ _None__,_ _optional_ _(__default="auto"__)_) – Y-axis title label. If ‘auto’, metric name is used. If None, title is disabled. * **figsize** (_tuple of 2 elements_ _or_ _None__,_ _optional_ _(__default=None__)_) – Figure size. * **grid** (_bool__,_ _optional_ _(__default=True__)_) – Whether to add a grid for axes. * Returns: * **ax** – The plot with metric’s history over the training. * Return type: * matplotlib.axes.Axes > lightgbm.plot_tree(booster, ax=None, tree_index=0, figsize=None, graph_attr=None, node_attr=None, edge_attr=None, show_info=None) Plot specified tree. * Parameters: * **booster** ([_Booster_](#lightgbm.Booster "lightgbm.Booster") _or_ [_LGBMModel_](#lightgbm.LGBMModel "lightgbm.LGBMModel")) – Booster or LGBMModel instance to be plotted. * **ax** (_matplotlib.axes.Axes_ _or_ _None__,_ _optional_ _(__default=None__)_) – Target axes instance. If None, new figure and axes will be created. * **tree_index** (_int__,_ _optional_ _(__default=0__)_) – The index of a target tree to plot. * **figsize** (_tuple of 2 elements_ _or_ _None__,_ _optional_ _(__default=None__)_) – Figure size. * **graph_attr** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Mapping of (attribute, value) pairs set for the graph. * **node_attr** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Mapping of (attribute, value) pairs set for all nodes. * **edge_attr** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Mapping of (attribute, value) pairs set for all edges. * **show_info** (_list_ _or_ _None__,_ _optional_ _(__default=None__)_) – What information should be showed on nodes. Possible values of list items: ‘split_gain’, ‘internal_value’, ‘internal_count’, ‘leaf_count’. * Returns: * **ax** – The plot with single tree. * Return type: * matplotlib.axes.Axes > lightgbm.create_tree_digraph(booster, tree_index=0, show_info=None, name=None, comment=None, filename=None, directory=None, format=None, engine=None, encoding=None, graph_attr=None, node_attr=None, edge_attr=None, body=None, strict=False) Create a digraph representation of specified tree. Note For more information please visit [http://graphviz.readthedocs.io/en/stable/api.html#digraph](http://graphviz.readthedocs.io/en/stable/api.html#digraph). * Parameters: * **booster** ([_Booster_](#lightgbm.Booster "lightgbm.Booster") _or_ [_LGBMModel_](#lightgbm.LGBMModel "lightgbm.LGBMModel")) – Booster or LGBMModel instance. * **tree_index** (_int__,_ _optional_ _(__default=0__)_) – The index of a target tree to convert. * **show_info** (_list_ _or_ _None__,_ _optional_ _(__default=None__)_) – What information should be showed on nodes. Possible values of list items: ‘split_gain’, ‘internal_value’, ‘internal_count’, ‘leaf_count’. * **name** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Graph name used in the source code. * **comment** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Comment added to the first line of the source. * **filename** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Filename for saving the source. If None, `name` + ‘.gv’ is used. * **directory** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – (Sub)directory for source saving and rendering. * **format** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Rendering output format (‘pdf’, ‘png’, …). * **engine** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Layout command used (‘dot’, ‘neato’, …). * **encoding** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Encoding for saving the source. * **graph_attr** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Mapping of (attribute, value) pairs set for the graph. * **node_attr** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Mapping of (attribute, value) pairs set for all nodes. * **edge_attr** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Mapping of (attribute, value) pairs set for all edges. * **body** (_list of strings_ _or_ _None__,_ _optional_ _(__default=None__)_) – Lines to add to the graph body. * **strict** (_bool__,_ _optional_ _(__default=False__)_) – Whether rendering should merge multi-edges. * Returns: * **graph** – The digraph representation of specified tree. * Return type: * graphviz.Digraph