오리진 곡선 그래프 피팅 - olijin gogseon geulaepeu piting

전공이 화학이지만 보다 많은 독자에게 도움이 될만한 주제에 대해 글을 쓰다보니 자꾸 컴퓨터 관련된 것만 쓰게 됩니다. 이번 주제는 엑셀을 이용하여 커브 피팅(curve fitting)하기 입니다. 엑셀에 이미 선형함수, 지수로그함수, 다항식 함수 등의 추세선을 수식과 함께 그래프 상에 추가하는 기능이 있고, trend, forcast 등의 함수도 있습니다. 이런 엑셀 내장 기능들이 단순한 형태의 데이터를 피팅하는데 유용하고 대부분의 경우 충분합니다만 때때로 기본으로 제공하지 않는 형태의 함수에 대해 피팅이 필요한 경우가 있습니다. 예를 들어 다음과 같은 지수함수를 한번 생각해 보죠.

목차 Show

Curve and Surface Fitting
Linear, Polynomial, and Multiple Regression
Polynomial Fit
Multiple Linear Regression
Nonlinear Curve Fitting
Fitting Multiple Datasets
Implicit Fitting PRO
Fitting Control
Advanced Fitting Options
Surface Fitting PRO
Compare Models and Datasets PRO
Time-Saving Fitting Options

y = A*exp(-B/(x-C)) : x=C 일 때 y=0, x=무한대 일때 y=A인 1/x에 대한 지수 감쇄 곡선.
온도에 대한 유리의 점도값이 이 함수를 따릅니다. VTF equation이라고도 합니다.

아니면 이런 건 어떨까요?

y = A*(1-exp(-B*(x-C))) : x=무한대 일 때 y=A에 수렴하는 지수 곡선.
눈막힘(clogging) 현상이 있을 때, 필터/스크린을 통과하는 분체의 중량이 이 함수를 따릅니다.

이런 건 엑셀 내장 추세선으로는 피팅할 수 없습니다. 커브피팅이 안 된다는 건 실험으로 구한 x,y 데이타가 있어도 특정 지점/시점의 y값을 수학적으로 예측하지 못한다는 뜻이지요. 물론 엑셀이 아닌 오리진이나 매트랩 같은 고급 수학 소프트웨어를 쓰면 되겠지요. 그러나, 저는 15년 가까이 대학원/연구소 생활하면서 그런 소프트웨어 턱턱 사 주는 보스 만나본 적 한번도 없었습니다. 그거 사느라고 기안 올리고, 구매 절차 밟는 것보다 만만한 엑셀을 조지는 게 빠르지요. 엑셀에서 저런 복잡한 형태의 함수를 피팅하는 게 가능하냐구요? 가능합니다.

우선 기본적인 개념을 잡아보죠. 우리가 실험으로 얻는 데이타는 아래와 같이 대개 (xi, yi)의 쌍으로 이루어져 있습니다. 이걸 그래프(분산형)으로 그려보면 우리가 피팅하고자 하는 그래프의 모양이 대충 나오죠.

피팅을 하기 위해서는 우선 어떤 곡선에 피팅할 지 y=f(x)의 형태를 모델링을 해야 합니다. 우선 위의 그래프를 척~ 보면 지수적으로 감소하고 있음을 알 수 있습니다. 그럼, 엑셀의 추세선을 쓸 수 있지 않을까요? 한번 해보죠.

피팅이 잘 되었나요? 아닙니다. 실험데이타에서 y는 5 근처의 값으로 수렴하고 있는데, 엑셀은 0으로 수렴하는 것으로 가정하기 때문에 적절한 피팅곡선을 얻을 수 없습니다. 그럼, 실제 곡선은 어떤 형태를 갖고 있을까요? 아마도 이런 형태가 될 듯 싶습니다.

y = A * exp(-B * x) + C

이와 같이 모델은 기초적인 곡선의 형태(지수함수) 및 여러 개의 인자들(A, B, C)로 구성됩니다. 커브피팅은 주어진 실험데이타(xi, yi)에 가장 잘 들어맞는 곡선을 만들기 위한 인자들(A, B, C)의 값을 결정하는 과정을 말합니다. 그럼 A, B, C 값을 임의로 넣어서 피팅 곡선을 만들어 보기로 하죠.

Y_fitting 컬럼은 그 위에 있는 A, B, C 값을 이용해 모델값을 계산한 것입니다. 실험값과 많이 가까워졌죠? A, B, C 값을 수동으로 적절히 조절하면 실험곡선에 꽤 가까운 곡선을 얻을 수 있습니다. 그런데 우리가 원하는 것은 "꽤" 가까운 곡선이 아니고 "가장" 가까운 곡선입니다. 그러기 위해서는 실험값과 모델값이 가까운 정도를 수치로 정량화해야 합니다.

피팅이 잘 들어맞는다는 것은 실험값(yi)과 모델값(y_model)의 차, 즉 sum of error 가 작다는 뜻입니다. 이러한 차이는 (yi-y_model)^2 으로 수치화할 수 있고, 커브피팅은 이 누적오차를 최소화하기 위한 인자들의 값을 결정하는 것이죠. 제곱이 아닌 절대값의 합을 구해 최소화해도 되지만, 제곱을 이용하는 편이 수학적 다루기 편하기 때문에 전통적으로 제곱법(자승법)을 이용해 왔습니다.

엑셀에는 이런 목적으로 사용할 수 있는 sumxmy2라는 함수가 있습니다. 영문 엑셀에서는 함수 설명이 이렇게 나와 있군요. Sums the squares of the differences in two corresponding range or arrays. 한글 엑셀에서의 설명은 직접 찾아 보시기 바랍니다. 이 함수를 이용하여 테이블 맨 밑에 SumErrSqr라는 항목을 하나 넣어봤습니다. 실험값 및 모델값이 B, C 컬럼의 10행에서 20행까지라고 할 때 셀의 수식은 =SUMXMY2(B10:B20,C10:C20)가 됩니다.

그럼, 이제 우리가 해야 할 일은 SumErrSqr를 최소화시키기 위한 A, B, C 값을 구하는 것입니다. 엑셀에는 이런 문제를 풀어주는 해찾기(Solver, '목표값 찾기'와는 다른 기능입니다)라고 하는 기능이 있습니다. 이 기능은 추가기능(add-in)이라서 평소에는 메뉴에 나타나지 않습니다. 도구(Tools) 메뉴를 열어보면 추가기능...(Add-ins...)라는 항목이 있습니다. 그걸 선택하면 추가할 수 있는 기능목록이 나타나는데 거기서 해찾기(Solver)를 체크하시고 확인(OK) 버튼을 누르시면 설치됩니다. 처음 엑셀을 어떻게 설치했느냐에 따라 인스톨 CD를 요구하는 경우도 있고, 아닌 경우도 있습니다. 설치가 끝나고 나면 도구 메뉴에 [해찾기...] 항목이 나타납니다. 그걸 선택하면 아래와 같은 대화창이 나타납니다.

목표셀은 SumErrSqr 값을 갖고 있는 셀을 선택합니다. 예제에서는 C21 입니다. 그리고 찾고자 하는 값으로서 최소값을 선택합니다. 마지막으로 변경시킬 셀의 범위로서 A, B, C 값을 담고 있는 셀을 선택합니다. 예제에서는 C5:C7이 되겠네요. 그 다음 실행(Solve) 버튼을 누르면 해찾기 기능이 실행되어 순식간에 새로운 A, B, C 값을 찾아 줍니다. 마지막으로 대화창에 변경된 값을 저장한다고 선택하시면 변경된 A, B, C값 및 그래프를 보실 수 있습니다.

정말로 실험값에 매우 가까운 곡선이 얻어졌죠? 오차제곱합이 43.6에서 0.61으로 줄어들었습니다. 위 예제의 실험값은 실제 데이타가 아니라 A=21, B=0.5, C=4.0 을 대입한 모델함수식에 일부러 +/- 0.5 범위의 랜덤 오차를 더해서 만든 가상의 실험데이타입니다. 피팅을 통해 실제값에 매우 가까운 피팅값이 얻어짐을 확인하실 수 있습니다. 랜덤 오차를 일부러 더하지 않으면 정확한 A, B, C 값을 찾아냅니다. 같은 방법으로 이보다 복잡한 모델함수도 얼마든지 피팅할 수 있습니다.

아래 화일은 위의 예제를 담은 것입니다. 매크로를 하나 넣어 놨는데 새로운 가상 실험데이타를 만들어 주는 것으로서 피팅과는 전혀 관계없으니 개념치 마시기 바랍니다. 피팅은 위에서 설명한 대로 해찾기 기능을 추가하신 후 메뉴에서 실행하시기 바랍니다.

Fitting_Example.xls

이상에서 보신 것처럼 엑셀은 잘만 활용하면 상당한 수준의 수치해석 기법도 수행할 수 있습니다. 분석적 해(analytical solution)을 얻기 어려운 미분방정식도 그래프, 목표값 찾기, 해찾기 등의 기능을 이용하면 대략의 근사값은 얻을 수 있습니다. 거기다가 VBA까지 곁들이면 웬만한 실제 공학문제는 거의 다 다룰 수 있습니다. 아직 사용해 본 적 없는 함수나 메뉴가 있으면 도움말을 띄워 한번 살펴보시기 바랍니다.

Curve and Surface Fitting

Curve fitting is one of the most powerful and most widely used analysis tools in Origin. Curve fitting examines the relationship between one or more predictors (independent variables) and a response variable (dependent variable), with the goal of defining a "best fit" model of the relationship.

Origin provides tools for linear, polynomial, and nonlinear curve fitting along with validation and goodness-of-fit tests. You can summarize and present your results with customized fitting reports. There are many time-saving options such as a copy-and-paste-operation feature which allows you to "paste" a just-completed fitting operation to another curve or data column. Curve fitting operations can also be part of an Analysis Template™, allowing you to perform batch fitting operations on any number of data files or data columns.

Linear, Polynomial, and Multiple Regression
Nonlinear Curve Fitting
Surface Fitting PRO

Compare Models and Datasets PRO
Time-Saving Fitting Options
Apps

Linear, Polynomial, and Multiple Regression

Linear and Polynomial regressions in Origin make use of weighted least-square method to fit a linear model function or a polynomial model function to data, respectively.

Linear Fit

Masking outliers during linear fit
Linear fit with fixed intercept or slope
Ellipse Plot for graphical examination of linearity
Linear Fit with X Error PRO
Apparent Fit

Polynomial Fit

The Polynomial Fit tool in Origin can fit data with polynomial up to 9th order. Fixed intercept and apparent fit are also supported.

Multiple Linear Regression

Multiple Linear Regression fits multiple independent variables

A unique feature of Origin's Multiple Linear Regression is Partial Leverage Plots, useful in studying the relationship between the independent variable and a given dependent variable:

Graph displaying raw data, linear fit line, and 95% confidence and prediction bands.

Linear Fit with X Error minimizes the sum of square of error on both X and Y directions, which is more practical for real experimental data where errors exist in both X and Y directions.

Polynomial Fitting can be performed with polynomials up to 9th order. Fixing intercept is supported. Apparent fit can also be performed with nonlinear axis scales.

Nonlinear Curve Fitting

Origin's NLFit tool is powerful, flexible and easy to use. The NLFit tool includes more than 170 built-in fitting functions, selected from a wide range of categories and disciplines. Each built-in function includes automatic parameter initialization code that adjusts initial parameter values to your dataset(s), prior to fitting.

Can't find a suitable fitting function in the built-in function library? No problem. You can easily define a custom fitting function using our Fitting Function Builder.

“Not only does Origin handle the most demanding curve fitting tasks with ease, it also has a built in C compiler that allows me to customize complex functions - a feature that has been crucial to my research. Origin is an indispensable tool to my grad students, whose PhD work hinges on being able to code our functions in C. To top it off, Originlab has a knowledgeable and responsive technical support staff, second to none. I wholeheartedly recommend Origin.”

Mark Kuzyk, Ph.D. - Regents Professor of Physics and Astronomy, Washington State University

View more testimonials!

With just a few clicks, you can perform curve fitting and obtain "best-fit" parameter values. Origin provides over 170 built-in fitting functions

Fitting Multiple Datasets

Do you have multiple datasets that you would like to fit simultaneously? With Origin, you can fit each dataset separately and output results in separate reports or in a consolidated report. Alternately, you can perform global fitting with shared parameters; or perform a concatenated fit which combines replicate data into a single dataset prior to fitting.

Global fit with shared parameters
Concatenate fit for replicate data
Independent fit for multiple curves

The image on the left displays a global fit where the width parameter has been shared. The image on the right shows replicate data fitted by internally combining all data into one concatenated dataset.

Implicit Fitting PRO

Do you need to fit an implicit function to your data? Origin's NLFit tool supports implicit fitting using the Orthogonal Distance Regression (ODR) algorithm, including fitting with X and/or Y error data.

Implicit Fitting uses the Orthogonal Distance Regression algorithm to find optimal values for the fit parameters. Errors or weights are supported for both X and Y data.

Fitting Control

Need to fine-tune your curve-fitting analysis? With Origin, you have full control over the curve-fitting process:

Fix parameter values
Least square fit with Y weight (e.g. error as weight)
Use parameter bounds and/or linear constraints

Advanced Fitting Options

In addition to the basic fitting options, you also have access to extended options for more advanced fitting. Note that some options are available only in OriginPro:

Fit with integrals
Fit with replicas
Multivariate regression
Fit with convolution
Orthogonal Distance Regression with X and/or Y weight PRO

Surface Fitting PRO

Origin's NLFit tool provides an intuitive interface for fitting your XYZ or matrix data to a surface model. With this tool, you could locate one or multiple peaks in your surface data and fit them with the built-in or user-defined surface fitting functions.

Surface fitting can be performed on data from XYZ columns or from a matrix. Over 20 built-in surface fitting functions are provided. You can also add your own function.

Compare Models and Datasets PRO

Having trouble deciding which function works best with your data? Want to evaluate which data better fits a particular model? OriginPro's fit comparison tools make it easy for you to compare models or compare data:

Fit and rank all functions in a category PRO

Compare two fitting models to one dataset PRO

Comparing two datasets with one fitting model PRO

The Rank Models tool lets you fit multiple functions to a dataset, and then reports the best fitting model. Results are ranked by Akaike and Bayesian Information Criterion scores.

Time-Saving Fitting Options

Take advantage of Origin's many time-saving features including an intuitive set of fitting Gadgets, shortcut menu commands for commonly used fitting operations, and several modes for handling of repetitive tasks:

Quick Fit Gadget
Sigmoidal Fit and Gadget
Exponential Fit
Copy & paste fitting operations
Batch Processing with Analysis Templates

The Quick Fit gadget lets you perform regression on a subset of the data selected graphically using a Region of Interest (ROI) control. This image shows linear regression performed on two separate segments of the data. The fit results have been added as labels to the graph for the two segments.