Logit Model

In the mixed Logit model, the disturbance term consists of a part that follows any distribution specified by a researcher and a part that follows an iid extreme value distribution.

From: Microbehavioral Econometric Methods , 2016

Rating and Scoring Techniques

Stefan Trueck , Svetlozar T. Rachev , in Rating Based Modeling of Credit Risk, 2009

2.4.1 Logit Models

Logistic regression analysis has also been used particularly to investigate the relationship between binary or ordinal response probability and explanatory variables. For bankruptcy prediction the binary response probability is usually the default probability, while a high number of explanatory variables can be used. The method usually fits linear logistic regression models for binary or ordinal response data by the method of maximum likelihood (Hosmer and Lemeshow, 1989). One of the first applications of the logit analysis in the context of financial distress can be found in Ohlson (1980) followed, e.g., by Zavgren (1985) to give only a few references. A good treatment on different logistic models, estimation problems, and applications can also be found in Greene (1993) or Maddala (1983). Similar to the discriminant analysis, this technique weights the independent variables and assigns a Y score in a form of failure probability (PD) to each company in a sample.

Let y i denote the response of company i with respect to the outcome of the explanatory variables x 1 i , , x k i . For example, let Y = 1 denote the default of the firm and Y = 0 its survival. Then, using logistic regression, the PD for a company is denoted by

(2.4) P ( Y = 1 | x 1 , , x k ) = f ( x 1 , , x k )

The function f denotes the logistic distribution function such that we get

(2.5) P ( Y = 1 | x 1 , , x k ) = e x p ( β 0 + β 1 x 1 + + β n x n ) 1 + e x p ( β 0 + β 1 x 1 + + β n x n ) .

Obviously, the logistic distribution function transforms the regression into the interval (0, 1). Further defining the logit(x) as

the model can be rewritten as

(2.7) l o g i t ( P ( Y = 1 | x 1 , , x k ) ) = β 0 + β 1 x 1 + + β n x n

with real constants β 0 , β 1 , , β n . As it was mentioned above, the logit model can be estimated via maximum likelihood estimation using numerical methods. The advantage of the approach is that it does not assume multivariate normality and equal covariance matrixes as, e.g., discriminant analysis does ( Press and Wilson, 1978). In addition, logistic regression is well suited for problems when the predictor variable is binary or has multiple categorical levels, or even when there are multiple independent variables in the problem. For further reading on logit models, we refer to Maddala (1983) and Greene (1993).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123736833000038

What Are Neural Networks?

Paul D. McNelis , in Neural Networks in Finance, 2005

2.7.5 Neural Network Models for Discrete Choice

Logistic regression is a special case of neural network regression for binary choice, since the logistic regression represents a neural network with one hidden neuron. The following adapted form of the feedforward network may be used for a discrete binary choice model, predicting probability pi for a network with k* input characteristics and j* neurons:

(2.83) n j , i = ω j , 0 + k = 1 k * ω j , k x k , i

Note that the probability p ˜ i is a weighted average of the logsigmoid neurons p ˜ i , which are bounded between 0 and 1. Since the final probability is also bounded in this way, the final probability is a weighted average of these neurons. As in logistic regression, the coefficients are obtained by maximizing the product of likelihood function, given the preceding (or the sum of the log-likelihood function).

The partial derivatives of the neural network discrete choice models are given by the following expression:

p i x i , k j = 1 j * γ j N j , i ( 1 N j , i ) ω j , k

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780124859678500026

Handbook of the Economics of Art and Culture

Kenneth G. Willis , in Handbook of the Economics of Art and Culture, 2014

7.4.2 Nested Logit Model

The NL model is an extension of the CL or MNL model. It was devised to avoid the IIA assumption, by allowing different correlations across nests (Davidson et al., 2009); thus, for example, a choice between the opera and a visit to an archaeological site will be less correlated than a choice between which archaeological site to visit. The NL model places these choices in different nests (opera versus archaeology); thus, correlations imposed are similar within nests, but for alternatives in different nests the unobserved components are uncorrelated and indeed independent.

NL models assume a generalized extreme-value distribution for the error term ɛij , where the distribution of ɛij is correlated across alternatives in the same nest, with the IIA property retained within nests, but not between nests. Thus, NL models assume that errors are homoscedastic, correlation amongst alternatives is the same in all nests with equal scale factors and the correlation is zero between alternatives in different nests (Swait, 2007). The NL model also assumes independence in the error structure across choices made by the same respondent.

The general structure of a NL model is shown in Fig. 7.1. In the NL model individuals are not necessarily assumed to make decisions sequentially following this decision tree. However, there is evidence for this decision structure from behavioral observation; consumers often appear to decide whether to stick with the status quo position or seek a change (Samuelson and Zeckhauser, 1988) and if they decide to change, they then consider the various other goods available. Thus, respondents might be assumed to consider whether they are satisfied with their current consumption bundle of cultural heritage and, if not, then to consider what other bundles of cultural heritage they might wish to consume.

Figure 7.1. Status quo and two experimentally generated alternatives.

In the NL model, the scale factors or inclusive values (IVs) are related to the different level nodes. The IV is a measure of the attractiveness of a nest and corresponds to the expected value individual i obtains from alternatives within nest k. The IV parameters have values between 0 (perfect correlation) and 1 (no correlation or degree of similarity in the stochastic component of utility within each nest) if the NL is the correct model specification. When the IV parameter equals 1, the NL model is equivalent to the CL model (Train, 2003; Swait, 2007).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444537768000076

Data Mining

John A. Bunge , Dean H. Judson , in Encyclopedia of Social Measurement, 2005

Logistic Regression

Logistic regression is a well-known procedure that can be used for classification. This is a variant of multiple regression in which the response is binary rather than quantitative. In the simplest version, the feature variables are taken to be nonrandom. The response, which is the class, is a binary random variable that takes on the value 1 (for the class of interest) with some probability p, and the value 0 with probability 1   p. The "success probability" p is a function of the values of the feature variables; specifically, the logarithm of the odds ratio or the "log odds," log[p/(1  p)], is a linear function of the predictor variables. To use logistic regression for classification, a cutoff value is set, typically 0.5; a case is assigned to class 1 if its estimated or fitted success probability is greater than (or equal to) the cutoff, and it is assigned to class 0 if the estimated probability is less than the cutoff. Because of the nature of the functions involved, this is equivalent to a linear classification boundary, although it is not (necessarily) the same as would be derived from linear discriminant analysis.

Like standard multiple regression, logistic regression carries hypothesis tests for the significance of each variable, along with other tests, estimates, and goodness-of-fit assessments. In the classification setting, the variable significance tests can be used for feature selection: modern computational implementations incorporate several variants of stepwise (iterative) variable selection. Because of the conceptual analogy with ordinary multiple regression and the ease of automated variable selection, logistic classification is probably the most frequently used data mining procedure. Another advantage is that it produces a probability of success, given the values of the feature variables, rather than just a predicted class, which enables sorting the observations by probability of success and setting an arbitrary cutoff for classification, not necessarily 0.5. But wherever the cutoff is set, logistic classification basically entails a linear classification boundary, and this imposes a limit on the potential efficacy of the classifier. Some flexibility can be achieved by introducing transformations (e.g., polynomials) and interactions among the feature variables.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0123693985001596

Machine Learning and Short Positions in Stock Trading Strategies

David E. Allen , ... Abhay K. Singh , in Handbook of Short Selling, 2012

32.2.2 Logistic Regression

Logistic or logit models are used commonly when modeling a binary classification. Logit models take a general form of

P ( Y i = 1 | X i ) = F ( X i β )

where the dependent variable Y takes a binomial form (in present case −1, 1). P is the probability that Y = {−1, 1}, and β is the known regression coefficient. X represents the independent or predictor variables and F(.) is the density function for logistic distribution of the model. We use Weka to implement the logistic regression model, with dependent variables set to −1, 1.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123877246000325

Multivariate Analysis: Discrete Variables (Overview)

A. Agresti , in International Encyclopedia of the Social & Behavioral Sciences, 2001

3 Loglinear Models

Logistic regression resembles ordinary regression in distinguishing between a response variable Y and a set of predictors {x k }. Loglinear models, by contrast, treat all variables symmetrically and are relevant for analyses analogous to correlation analyses, studying the association structure among a set of categorical response variables.

For multidimensional contingency tables, a variety of models are available, varying in terms of the complexity of the association structure. For three variables, models include ones for which (a) the variables are mutually independent, (b) two of the variables are associated but are jointly independent of the third, (c) two of the variables are conditionally independent, given the third variable, but may both be associated with the third, (d) each pair of variables is associated, but the association between each pair is homogeneous at each level of the third variable, and (e) each pair of variables is associated and the strength of association between each pair may vary according to the level of the third variable (Bishop et al. 1975).

Relationships describing probabilities in contingency tables are naturally multiplicative, so ordinary regression-type models occur after taking the logarithm, which is the reason for the term loglinear. To illustrate, in two-way contingency tables independence between row variable X and column variable Y is equivalent to the condition whereby the probability of classification P(X=i, Y=j) in the cell in row i and in column j depends only on the marginal probabilities P(X=i) and P(Y=j),

(5) P ( X = i , Y = j ) = P ( X = i ) P ( Y = j )

for all i and j. For a sample of size n, the expected count μ ij =nP(X=i, Y=j) in that cell therefore satisfies

(6) log μ i j = log [ n P ( X = i ) P ( Y = j ) ] = log ( n ) + log [ P ( X = i ) ] + log [ P ( Y = j ) ] .

This has the loglinear form

(7) log μ i j = α + β i + γ j

for which the right-hand side resembles a simple linear model—two-way analysis of variance (ANOVA) without interaction. Alternatively this has regression form by replacing {β i } by dummy variables for the rows times parameters representing effects of classification in those rows and replacing {γ j } by dummy variables for the columns times parameters representing effects of classification in those columns.

Similarly, loglinear model formulas for more complex models such as those allowing associations resemble ANOVA models except for predicting the logarithm of each cell expected frequency rather than the expected frequency itself. Dummy variables represent levels of the qualitative responses, and their interaction terms represent associations. The associations are described by odds ratios. Logistic regression models with qualitative predictors are equivalent to certain loglinear models, having identical estimates of odds ratios and identical goodness-of-fit statistics. For ordinal variables, specialized loglinear models assign ordered scores to the categories and have parameters describing trends in associations.

For further details about loglinear models, see Exploratory Data Analysis: Multivariate Approaches (Nonparametric Regression); also Agresti (1996, Chap. 6), Bishop et al. (1975), and Fienberg (1980).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0080430767004745

Embedded Predictor Selection for Default Risk Calculation: A Southeast Asian Industry Study

Wolfgang Karl Härdle , Dedy Dwi Prastyo , in Handbook of Asian Finance: Financial Markets and Sovereign Wealth Funds, 2014

7.5 Conclusion

Regularized logit model is able to simultaneously estimate and select default predictors with very high accuracy prediction particularly for Indonesia, Singapore, and Thailand industry. For the same level of accuracy, the number of default predictors selected by Lasso for Indonesia and Singapore data are significantly smaller than those selected by elastic-net penalty. Almost all predictors selected in the Lasso are also selected in the elastic-net. The relevant default predictors vary over the country which is in line with related studies which conclude that the default prediction analysis is sample specific.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978012800982600007X

Probability Models

Robert L. Kissell , in Algorithmic Trading Methods (Second Edition), 2021

Solving Probability Output Models

In situations where we have probability outcomes with 0   y  1 we can use the logit model and solve for the model parameters using logistic regression analysis. In this case, the logistic regression model is a linearization of the logit probability model, and the parameters are solved via OLS techniques. This allows ease of calculations and it is much easier and more direct to interpret the statistics of a linear model than it is for a probability or nonlinear model.

For example, suppose that the probability of filling a limit order is known. In this case, we can now determine a statistically significant set of explanatory factors x and the corresponding model parameters.

The logistic regression probability model is solved via the following steps:

(1)

Start with the Logit Model with parameter z

f ( z ) = 1 1 + e z

(2)

Set the Logit Model equal to the probability p

1 1 + e z = p

(3)

Calculate (1−p)

1 1 1 + e z = e z 1 + e z = ( 1 p )

(4)

Calculate the Wins Ratio by Dividing by (1−p)

1 1 + e z e z 1 + e z = p ( 1 p )

It is important to note that the expression p ( 1 p ) is known as the wins ratio or the odds ratio in statistics. This expression gives the ratio of wins to losses. An important aspect of the wins ratios, as we show below, is that it is always positive.

(5)

This expression can be reduced to:

e z = p ( 1 p )

(6)

We can further reduce this expression by taking the natural logs of both sides, thus, giving:

ln ( e z ) = ln ( p 1 p )

(7)

Which yields:

z = ln ( p 1 p )

(8)

If z is a linear function of k independent variables, i.e., z  = b 0  + b 1 x 1  +     + b k x k , then our expression yields:

b 0 + b 1 x 1 + + b k x k = ln ( p 1 p )

This transformed Logit Model reduces to the linear logistic regression model and it can now be solved using OLS regression techniques. For example, if y is the natural log of the wins ratio and x is a set of explanatory factor variables, we can calculate the corresponding model parameters by solving our standard linear regression model:

y ˆ = b 0 + b 1 x 1 + + b k x k

where,

y = ln ( p 1 p )

Therefore, in situations where the probability of occurrence is known or can be estimated we are able to solve for the parameters of our probability model using linear regression and OLS techniques.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128156308000077

Needs and perspectives of multilingual information professionals: findings of an empirical study

Ximo Granell , in Multilingual Information Management, 2015

8.3.2 Predicting CAT tools adoption in relation to ICT through a logistic regression model and chi-square tests

With regard to the relationship between the adoption of CAT tools and the adoption of other ICT, the findings of the survey showed that CAT tool adopters were using a broader range of ICT and had more experience with general ICT than those who had not adopted CAT tools. This idea was reinforced by the results obtained from the logistic regression model used to analyse the relationship between the adoption of the range of ICT and the adoption of CAT tools. Stand-alone terminology management systems, both in terms of uptake and experience with them, were the type of ICT which showed stronger links with the adoption of CAT tools. This made sense as most CAT tools include terminology management functions bundled in them, so translators who are familiar with these translation-specific tools are more likely to be familiar with CAT tools as well.

The logistic regression model used to analyse the relationship between the adoption of ICT and the adoption of CAT tools revealed that, with an overall accuracy of 89.2%, adoption of CAT tools was mostly determined by the usage of, and experience with, terminology management tools, usage of and experience with, graphics applications, and usage of spreadsheets, with the usage of terminology management tools being the most influential variable of these. 2 Looking at the CAT tool usage variable, of the 391 translators, 94 (24%) could be classified as CAT tools users, while 238 (61%) could be classified as non-users, and the remaining 59 (15%) constituted missing values. Multiple logistic regression analysis was undertaken using this dichotomous CAT user variable (ignoring the missing values) as the dependent variable, and the variables on the usage of and degree of experience with the rest of the software applications as the independent variables. A total of 279 of the 391 cases were used to estimate the model. One hundred and twelve cases were not included because they contained missing data for one or more of the variables.

The first step of the logistic regression analysis included the following variables in the model: word processing usage and experience, spreadsheet usage and experience, database usage and experience, computer-based accounting usage and experience, desktop publishing usage and experience, web publishing usage and experience, graphics usage and experience, information retrieval tool usage and experience, groupware usage and experience, project and workflow management usage and experience, terminology management usage and experience, machine translation usage and experience, and localisation usage and experience. Once the variables were entered, backward elimination was used to remove those which were not significantly related to CAT tool adoption. Table 8.3 presents the statistics of the logistic regression prediction model.

Table 8.3. CAT tool vs. ICT adoption prediction (logistic regression model) 3

Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step a Spreadsheet usage 1.935 .768 6.345 1 .012 .966
20 Graphics experience -1.028 .356 8.313 1 .004 -.298
Graphics usage 1.883 .714 6.965 1 .008 2.717
Terminology Mgment experience .724 .316 5.266 1 .022 1.935
Terminology Mgment usage 2.886 .702 16.876 1 .000 17.919
Localisation experience 1.235 .699 3.123 1 .077 3.438
Constant -5.108 1.182 18.666 1 .000 .006
a
- Variable(s) entered on step 1: Word processing usage and experience, Spreadsheet usage and experience, Database usage and experience, Computer-based accounting usage and experience, Desktop publishing usage and experience, Web publishing usage and experience, Graphics us. experience, Information lettieval tools, Groupware usage and experience, Project and workflow management usage and experience, Terminology management usage and experience, Machine translation usage and experience, Localisation usage and experience.

Classification Table for CAT tool usage a

Observed Predicted
CAT user Percentage Correct
No Yes
CAT user No 197 15 92.9
Yes 15 52 77.6
Overall Percentage 89.2
a
- The outvalue is .300

These results were then compared with those obtained through individual chi-square tests conducted for each of the ICT in turn (c.f. Table 8.4), and both analyses presented similar results, thus supporting the prediction made by the logistic regression model. Terminology management systems, both in terms of adoption and also in terms of experience with them, were the type of ICT which showed strongest links with the adoption of CAT tools.

Table 8.4. CAT users and use of other ICT

Type of ICT Chi-Square Significance
Communication activity
FTP (File Transfer Protocol) 24.171 0.000
Information search and retrieval activity
Terminology management systems 167.665 0.000
Document production activity
Word processing software 2.005 0.157
Graphical / presentation software 18.539 0.000
Desktop Publishing software 14.896 0.000
Business management activity
Spreadsheet software 21.389 0.000
Database software 2.644 0.104
Accounting / bookkeeping software 1.582 0.208
Project management software 0.222 0.638
Marketing and work procurement activity
Web publishing software 8.813 0.003
Translation creation activity
Machine translation systems 11.846 0.001
Localisation software 21.984 0.000

The relationship between the degree of experience with the ICT for each activity in the translator's workflow and the adoption of CAT tools was also investigated through the use of chi-square tests (c.f. Table 8.5). Overall, experience with ICT for communication, information search and retrieval, business management, marketing and work procurement, and translation creation activity were found to be significantly related to the adoption of CAT tools (p values ≤ 0.05 in bold). Only experience with ICT for the document production activity did not present a strong link with the adoption of CAT tools. This might be due to the fact that almost all translators were using word processing software for document production. In particular, those translators not using CAT tools (i.e. undertaking their translations in a more "traditional" way) would also be mostly using word processing software.

Table 8.5. CAT users and familiarity with other ICT

Type of ICT Chi-Square Significance Scale mean
Communication activity
Email 5.291 0.071 3.85
FTP (File Transfer Protocol) 28.762 0.000 2.36
Discussion mailing lists 14.796 0.002 2.27
Online discussion groups 15.099 0.002 2.23
Activity average mean 2.68
Information search and retrieval activity
Online search engines 17.707 0.001 3.61
Online dictionaries / glossaries 17.810 0.000 3.26
Text corpora / document archives 2.902 0.407 2.87
Online terminology databanks 26.573 0.000 2.80
Online encyclopedias 9.455 0.024 2.50
Academic journals 1.764 0.623 2.39
Electronic databases 8.859 0.031 2.25
Electronic libraries 6.952 0.073 2.16
Terminology management systems 126.313 0.000 1.89
Activity average mean 2.64
Document production activity
Word processing software 1.860 0.395 3.88
Graphical / presentation software 8.440 0.038 2.01
Desktop publishing software 7.268 0.064 1.98
Activity average mean 2.62
Business management activity
Spreadsheet software 10.718 0.013 3.08
Database software 13.923 0.003 2.14
Accounting / bookkeeping software 5.474 0.140 1.58
Project management software 14.625 0.002 1.18
Activity average mean 1.99
Marketing and work procurement activity
Online translation marketplaces 22.999 0.000 2.12
Web publishing software 9.724 0.021 1.54
Activity average mean 1.83
Translation creation activity
Online machine translation services 7.477 0.058 1.49
Machine translation systems 15.114 0.002 1.35
Localisation software 36.129 0.000 1.14
Activity average mean 1.53

These findings were further considered through the comparison of the mean values of adopters and non-adopters of CAT tools in relation to their degree of experience with ICT for each activity in the translator's workflow. It seemed that CAT tool adopters had more experience with ICT for other activities in their workflow. This implied that, generally, those translators who had more experience with, and were more confident with, general purpose ICT were more likely to adopt CAT tools. This conclusion was also supported by the findings of the specific predictors of CAT tool adoption (see logistic regression analysis).

The main differences between the groups of adopters and non-adopters of CAT tools were observed in the experience with ICT that showed a more significant relationship with CAT tool adoption according to the chi-square tests conducted; for example, with terminology management systems, online translation marketplaces or online terminology databanks. Again, the relationship of CAT tool adoption with specialist purpose ICT reinforced the idea that freelance translators were more likely to embrace CAT tools once they had become familiar with general purpose ICT first, and then with other specialised ICT.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781843347712000081

Methods Based on Random Utility Theory

Luigi dell'Olio , ... Rocio de Oña , in Public Transportation Quality of Service, 2018

7.4.2.2 Logit Model

The Logit model follows the same approach as the Probit model, the difference being that in this case the inherent error in the model ε i k is assumed to distribute according to the function of logistic density.

This modification in the model is addressed in a very simple way by the Nlogit software by adding the command "Logit" at the end of the model. This tells the program that the density function to be used is the logistic function.

As can be seen, the characteristics of the model are very similar to those of the Probit model. The variables continue to have a correct sign criterion (see Section 7.4.1) and the significance level is also adequate, with the exceptions of the ticket price and the service being provided. The same reasons apply as in the case of the Probit model to explain their presence in the final model. However, the values of the parameters are seen to change considerably, as do the values of the constant and the limit parameters.

As the values of the parameters are greater, the constant and the limit parameters have higher values and in this way the model maintains a similar consistency to the Probit model. The most important variable continues to be comfort (COM) and this is more or less greater by a similar proportion as it was in the Probit model.

The nonlinearity continues and the difference is greater as the evaluation becomes more positive.

Compared to the Probit model and considering that the variables affecting the model are the same as are the degrees of freedom, the fit of the Logit model shows better indicator values. The log likelihood of −494.93661 compared to −497.06439 for the Probit model and a value of 1.365 for the AIC/N indicator compared to 1.371. The count R 2 also improves, with a value of 0.726.

In this particular case, the Logit model has a better fit; however, this improvement is minimal and so the two models could be considered to be equally valid.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780081020807000070