What is ChatGPT?
ChatGPT is an artificial intelligence chatbot developed by OpenAI that allow us to have human-like conversations and generate image based on the text description. It is one of the greatest leaps in natural language processing.
Integrating OpenAI API in ruby application:
We can implement all the ChatGPT features in a ruby application to make it more engaging for users by integrating OpenAI API. For this, we are using the ruby-openai gem which allows us to use various OpenAI models which we can pick based on the use case.
Install gem:
Add the ruby-openai
gem to the Gemfile.
gem "ruby-openai"
Then run bundle install
to install the gem.
Get access key:
We have to generate an access key to get a response back, visit API keys page and create a new secret key.
Copy the secret key and assign it to OPENAI_ACCESS_TOKEN
environment variable.
export OPENAI_ACCESS_TOKEN="xxxxxxxxxxxxxxx"
Configure Ruby OpenAI:
If the account is tied to an organization then set the value in this environment variable OPENAI_ORGANIZATION_ID
. We can find the organization ID value from the Settings page.
OpenAI.configure do |config|
config.access_token = ENV.fetch("OPENAI_ACCESS_TOKEN")
config.organization_id = ENV.fetch("OPENAI_ORGANIZATION_ID") # Optional.
end
Then to create a client,
client = OpenAI::Client.new
Choosing the right model:
Before diving into models we have to understand what a token is.
This is an explanation from OpenAI article,
Tokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens. These tokens are not cut up exactly where the words start or end - tokens can include trailing spaces and even sub-words.
- 1 token ~= 4 chars in English
- 1 token ~= ¾ words
- 100 tokens ~= 75 words
We can consider approx. 4 characters as a token.
OpenAI API has various models in each version and it can be used for different use cases,
1) GPT-4
GPT-4 model is great at solving complex problems with great accuracy and much more capable than the previous models, for most basic tasks there is no significant difference between GPT-4 and GPT-3.5 models.
- gpt-4 model can do complex tasks and optimized chat it has max support for 8,192 tokens and the model is training data up to Sep 2021.
- gpt-4-32k model has the same capability as gpt-4 model but max 32,768 tokens support and training data up to Sep 2021.
2) GPT-3.5
GPT-3.5 models can understand and generate natural language or code. gpt-3.5-turbo
is optimized for chat but works well for traditional tasks.
-
gpt-3.5-turbo model is the most capable GPT-3.5 model which is optimized for chat and has max token support of 4,096 and training data up to Sept 2021.
-
text-davinci-003 model can do any language task with better quality, longer output, and consistent instruction. It has max token support of 4,096 and training data up to Jun 2021.
-
text-davinci-002 model is similar capabilities to text-davinci-003 but trained with supervised fine-tuning which has max token support of 4,096 and training data up to Jun 2021.
-
code-davinci-002 model is optimized for code completion tasks that have max token support of 8,001 and training data up to Jun 2021.
3) GPT-3
GPT-3 models can understand and generate natural language. These models were superseded by the more powerful GPT-3.5 generation models. All the models have max token support of 2,049 and training data up to Oct 2019.
-
davinci model is the most capable model and can do any tasks with higher quality than other models.
-
curie model is very capable but faster and lower cost compared to the davinci model.
-
babbage model is capable of straightforward tasks, is very fast, and has a lower cost.
-
ada model is capable of very simple tasks, the very fastest model in GPT-3 model, and has the lowest cost.
4) DALL-E
DALL-E model can generate and edit images from the description in natural language.
5) Whisper
Whisper model can convert audio into text, it can perform multilingual speech recognition, speech translation, and language identification.
6) Embedding
Embeddings are a numerical representation of text that can be used to measure the relatedness between two pieces of text these models are useful for search, clustering, recommendations, anomaly detection, and classification tasks.
7) Moderation
A fine-tuned model that can detect whether text may be sensitive or unsafe, this model will check whether the passed content complies with OpenAI usage policies.
Chat:
We are using the gpt-3.5-turbo
model as gpt-4 has only limited access at the time of writing this post.
In a request, We have to pass two required parameters, model
and messages
. Inside the messages
parameter, we should pass the role
and content
parameter values.
In the temperature
(optional) parameter, we have to pass a value between 0 to 2. A higher temperature value will result in more unpredictable and diverse responses and a lower temperature value will result in predictable and conservative responses.
OpenAI API supports three roles,
system - System instruction helps set the behavior of the assistant (OpenAI response), it is the high-level instruction given for the conversation.
user - Instruction passed by the end user.
assistant - The assistant messages help store prior responses.
As a response Ruby OpenAI API will return an object, this object will have,
id - Chat ID.
object - Name of the API that returns the response.
created - Response created at timestamp.
model - Model used to generate the response.
usage - Usage returns the number of tokens passed and generated.
choices - Message generated by the model and the status of the result.
In this case, we have set the role as a user
and the message or question in the content
parameter.
Example 1: Solving Problems:
In the below example, we have asked Open AI API to calculate the time taken for a spaceship to reach the Sun from the Earth which returns a step by step calculations.
Example 2: Technical Questions:
In the below example, we have asked Open AI to explain about <></>
use in react.
Example 3: Back and forth conversation:
In the below example, we can pass the prior conversation history as an instruction to have a more interactive and dynamic conversation.
Stream the response:
To make the application more engaging for users we can stream a chuck of responses.
For this, we’ll have to pass a stream
parameter along with the role
and content
to stream the result.
In the stream parameter, we can pass a proc that prints the stream of response chunks that is generated. With this, we can set up a ChatGPT-like messaging stream in the Rails app by following this guide.
In the below example, we have asked Open AI API to explain the Color theory, the result will be a detailed explanation of color theory instead of waiting for the complete result we can stream chunks to the user and improve the user experience.
Complete text:
We will be using the GPT-3.5 text-davinci-003
model to complete the text.
We have to pass the content in prompt
parameters which the model uses to complete the text. We can also pass the maximum tokens that need to be generated to complete the text.
Example 1: Social media description:
In the below example, We have asked Open AI to complete a social media description.
Example 2: Ask Open AI API to complete the code!:
In the below example, We have asked Open AI API to complete a simple ruby addition code.
Edit text:
We will be using the text-davinci-edit-001
model to edit the text.
We have to pass the content in input
parameter and a description of the task in instruction
parameter.
Example 1: Transplate code to different programming language:
In the below example, we have instructed Open AI API to translate a code snippet to C. In the results, it returns the entire C program and the interesting part is I didn’t even mention the programming language of the code snippet that I have passed.
Example 2: Find and replace and formatting:
In this below example, we have passed a input and insturcted Open AI API to replace a word and capitalize each word in the sentence.
Moderate text:
We can use moderate text with OpenAI API, it will check whether the passed content complies with OpenAI usage policies.
There are seven categories and OpenAI API generate score for all seven categories, the scores will be between 0 to 1 a higher value denotes higher confidence ref.
In the below example, we will be using the text-moderation-stable
model and as a result, the score of the hate category will be returned.
Image generator:
Using the DALL-E model we can describe an image or art by describing it in natural language.
In the prompt parameter, we can describe the image that needs to be generated and in the size parameter we can pass the resolution, an image can be generated in 256x256, 512x512, or 1024x1024, if the size parameter is not passed 1024x1024 will be set as default.
This is the image generated,
Edit image:
We can also edit images but for that, we have to mask the image with a transparent section. The masked section can be altered based on the description.
This is the tree image that’s used for testing,
The image generated,
Transcribe:
We can use the whisper-1
model to transcribe the audio.
Conclusion:
Integrating OpenAI API will open up endless possibilities for improving the user experience and making the site more engaging, it provides versatile language models that can simplify most of the traditional tasks and also solves complex problems. We can use it as a chatbot, to translate or transcribe audio, write or debug code, generate or edit images, and many more things.