AST for multi-label models.
AST for multi-label models.
The sequence of all labels encountered in training. It is important that this is sequence (with the same order as the labels in the training set). This is because some algorithms may require indices based on the training data.
a string representing a function that will be used to extract labels.
the underlying model that will be produced by a
This is the capture group containing the content when the regex has been padded with the pad function.
This is the capture group containing the content when the regex has been padded with the pad function.
VW Flags automatically resulting in an error.
VW Flags automatically resulting in an error.
Find all of the regex matches and extract the first capture group from the match.
Find all of the regex matches and extract the first capture group from the match.
VW params passed to the updatedVwParams
function.
with at least one capture group (this is unchecked).
Iterator of the matches' first capture group.
Get the set of interactions (encoded as Strings).
Get the set of interactions (encoded as Strings). String length represents the interaction arity.
VW params passed to the updatedVwParams
function.
Create a JSON representation of an Aloha model.
Create a JSON representation of an Aloha model.
NOTE: Because of the inclusion of the unrestricted labelsOfInterest
parameter,
the JSON produced by this function is not guaranteed to result in a valid
Aloha model. This is because no semantics are required by this function and
so, the labelsOfInterest
function specification cannot be validated.
the type of label or class.
a location of a dataset specification.
a location of a VW binary model file.
a model ID.
The sequence of all labels encountered in the training set used
to produce the binaryVwModel
.
It is extremely important that this sequence has the
same order as the sequence of labels used in the
dataset creation process. Otherwise, the VW model might
associate scores with an incorrect label.
It is possible that a model is trained on a super set of labels for which predictions can be made. If the labels at prediction time differs (or should be extracted from the input to the model), this function can provide that capability.
whether the underlying binary VW model should remain as a separate
file and be referenced by the Aloha model specification (true
)
or the binary model content should be embeeded directly into the model
(false
). Keep in mind Aloha models must be smaller than 2 GB
because they are decoded to String
s and String
s are indexed by
32-bit integers (which have a max value of 232 - 1).
the number of missing features to tolerate before emitting a prediction failure.
a JSON object.
Remove flags (and options) for the flags listed in FlagsToRemove
.
Remove flags (and options) for the flags listed in FlagsToRemove
.
VW params passed to the updatedVwParams
function.
VW will actually update / replace files if files appear as options to flags.
VW will actually update / replace files if files appear as options to flags. To overcome this, an attempt is made to detect flags referencing files and if found, replace the the files with temp files. These files should be deleted before exiting the main program.
the parameters after the updates.
the flag
a tuple2 of the final string to try with VW for validation along with the mapping from flag to file that was used.
Adds VW parameters to make the parameters work as an Aloha multilabel model.
Adds VW parameters to make the parameters work as an Aloha multilabel model.
The algorithm works as follows:
csoaa_ldf
or wap_ldf
reduction is specified in the supplied VW
parameter list (with the appropriate option for the flag).UnrecoverableFlagSet
for flags whose appearance is considered
"unrecoverable".--ignore
, --ignore_linear
, -q
,
--quadratic
, --cubic
) do not refer to namespaces not supplied in
the namespaceNames
parameter.com.eharmony.aloha.dataset.vw.multilabel.VwMultilabelRowCreator.determineLabelNamespaces
.FlagsToRemove
.--noconstant
and --csoaa_rank
flags. --noconstant
is added because per-label
intercepts will be included and take the place of a single intercept. --csoaa_rank
is added to make the VWLearner
a VWActionScoresLearner
.namespaceNames
appears as an option to VW's ignore_linear
flag,
do not create a quadratic interaction between that namespace and the label
namespace.-q
, --quadratic
, --cubic
, --interactions
), replace it
with an interaction term also interacted with the label namespace. This increases the
arity of the interaction by 1.import com.eharmony.aloha.models.vw.jni.multilabel.VwMultilabelModel.updatedVwParams // This is a basic example. 'y' and 'Y' in the output are label // namespaces. Notice all namespaces are quadratically interacted // with the label namespace. val uvw1 = updatedVwParams( "--csoaa_ldf mc", Set("a", "b", "c") ) // Right("--csoaa_ldf mc --noconstant --csoaa_rank --ignore y " + // "--ignore_linear abc -qYa -qYb -qYc") // Here since 'a' is in 'ignore_linear', no '-qYa' term appears // in the output. val uvw2 = updatedVwParams( "--csoaa_ldf mc --ignore_linear a -qbc", Set("a", "b", "c") ) // Right("--csoaa_ldf mc --noconstant --csoaa_rank --ignore y " + // "--ignore_linear abc -qYb -qYc --cubic Ybc) // 'a' is in 'ignore', so no terms with 'a' are emitted. 'b' is // in 'ignore_linear' so it does occur in any quadratic // interactions in the output, but can appear in interaction // terms of higher arity like the cubic interaction. val uvw3 = updatedVwParams( "--csoaa_ldf mc --ignore a --ignore_linear b -qbc --cubic abc", Set("a", "b", "c") ) // Right("--csoaa_ldf mc --noconstant --csoaa_rank --ignore ay " + // "--ignore_linear bc -qYc --cubic Ybc")
import com.eharmony.aloha.models.vw.jni.multilabel.VwMultilabelModel.updatedVwParams import com.eharmony.aloha.models.vw.jni.multilabel.{ NotCsoaaOrWap, NamespaceError } assert( updatedVwParams("", Set()) == Left(NotCsoaaOrWap("")) ) assert( updatedVwParams("--wap_ldf m -qaa", Set()) == Left(NamespaceError("--wap_ldf m -qaa", Set(), Map("quadratic" -> Set('a')))) ) assert( updatedVwParams( "--wap_ldf m --ignore_linear b --ignore a -qbb -qbd " + "--cubic bcd --interactions dde --interactions abcde", Set() ) == Left( NamespaceError( "--wap_ldf m --ignore_linear b --ignore a -qbb -qbd --cubic bcd " + "--interactions dde --interactions abcde", Set(), Map( "ignore" -> Set('a'), "ignore_linear" -> Set('b'), "quadratic" -> Set('b', 'd'), "cubic" -> Set('b', 'c', 'd', 'e'), "interactions" -> Set('a', 'b', 'c', 'd', 'e') ) ) ) )
current VW parameters passed to the VW JNI
it is assumed that namespaceNames
is a superset
of all of the namespaces referred to by any flags
found in vwParams
.
the number of unique labels in the training set.
This is used to calculate the appropriate VW
ring_size
parameter.
Created by ryan.deak on 9/29/17.